Re: Arrow Classes
Ashley Yakeley wrote: Wolfgang Jeltsch wrote: This brings me to another point. One year ago we had a discussion on The Haskell Mailing List concerning arrows. (The subject of the mails was just arrows.) The point was that it seemed strange to me that first and second are included in the basic arrow class Arrow while left and right have their extra class ArrowChoice. Not only that it seemed strange to me but it made impossible to make Parser baseMonad an instance of Arrow. Parser baseMonad has nice implementations of pure and () but none of first or second. I agree. My own Arrow module hierarchy looks more or less like this: class Compositor comp where [...] class (Compositor arrow) = Arrow arrow where [...] class (Arrow arrow) = ProductArrow arrow where [...] class (Arrow arrow) = CoproductArrow arrow where [...] class (ProductArrow arrow,CoproductArrow arrow) = FullArrow arrow instance (ProductArrow arrow,CoproductArrow arrow) = FullArrow arrow class (Arrow arrow) = ArrowFix arrow where [...] class (FullArrow arrow) = ApplyArrow arrow where [...] On that topic, see below for what mine looks like (from HXML, URL: http://www.flightlab.com/~joe/hxml/ ). I started off with Hughes' conventions, but for some reason could never remember the difference between and ***, or between ||| and +++. I found , , |||, | to have better mnemonic value. This also frees up +++ for ArrowPlus, which -- in HXML applications -- is frequently used and should thus be easy to type. When using the ArrowChoice operators, I kept tripping over all the 'Either' coproduct types, so added some syntactic sugar (borrowed from HaXML): data Choice a = a : a class (Arrow a) = ArrowChoice a where [ ... ] ( ?) :: (b - Bool) - Choice (a b c) - a b c (?) :: a b Bool- Choice (a b c) - a b c I found p ? f : g much more pleasant to use. (I also like the idea of splitting the product operators out of the base Arrow class -- will consider doing that in my library). -- infixr 5 +++ infixr 3 , infixr 2 |, |||, ?, ?, : infixl 1 class Arrow a where arr :: (b - c) - a b c () :: a b c - a c d - a b d apfst :: a b c - a (b,x) (c,x) apsnd :: a b c - a (x,b) (x,c) () :: a b c - a d e - a (b,d) (c,e) () :: a b c - a b d - a b (c,d) liftA2 :: (b - c - d) - a e b - a e c - a e d aConst :: c - a b c idArrow :: a b b -- Minimal implementation: arr, , apfst or data Choice a = a : a class (Arrow a) = ArrowChoice a where apl :: a b c - a (Either b d) (Either c d) apr :: a b c - a (Either d b) (Either d c) (|) :: a b c - a d e - a (Either b d) (Either c e) (|||) :: a b c - a d c - a (Either b d) c ( ?) :: (b - Bool) - Choice (a b c) - a b c (?) :: a b Bool- Choice (a b c) - a b c -- Minimal implementation: | or apl class (Arrow a) = ArrowApply a where app :: a (a b c,b) c class (Arrow a) = ArrowZero a where aZero :: a b c aMaybe :: a (Maybe c) c aGuard :: (b - Bool) - a b b class (Arrow a) = ArrowPlus a where (+++) :: a b c - a b c - a b c --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Q. about XML support (WAS: Re: Interesting Read)
Graham Klyne wrote: Which leads me to a question: starting from the haskell.org web page, I have identified three XML parsers in Haskell (HaXml, hXML, Haskell XML Toolbox), none of which seem to support XML namespaces and only one of which claims to be tested on both HUGS and GHC. Can anyone offer any recommendations, or maybe pointers to other work? What are you looking for in an XML toolkit? As far as HXML goes, I have a rough sketch of an implementation of XML namespace support, not yet finished or released. (This is a somewhat thorny problem; implementing XMLNS is not hard, but implementing it in a sane way requires some ingenuity.) --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Enum on Float/Double
George Russell wrote: Indeed I think the Haskell Library Report contains quite a few examples of floating point code which a numerical analyst would have written rather better. Like the Float and Double implementations enumFrom{Then}To. Heck, even I can tell that these are Just Plain Wrong -- they're subject to horribly bad roundoff errors -- and I'm not even close to being a numerical analyst. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: UTF-8 library
Ashley Yakeley wrote: One of the things that really bothers me about C is the way its unspecifiedness about types can infect other languages. For instance, what exactly is a Haskell Int? Java, at least, stands firm, but then platform-independence was one of Java's explicit design priorities. Platform-independence is *also* one of Standard C's explicit design goals, it just approaches it in a different way. Standard C attempts platform independence by specifying the existence of a certain number of built-in numeric types, and certain guarantees about each of them. It requires that programmers know what is and is not guaranteed, however, and write code accordingly. It's possible to write portable code in C, but you must abandon the assumption that (for instance) an 'int' is exactly 32 bits, since that's not true on all platforms. The slogan is All the world is not a VAX. Java attempts platform independence by declaring that all the world *is*, in fact, a VAX [*]. [*] More precisely, a 32-bit platform with IEEE 754 floating point. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: UTF-8 library
anatoli wrote: I'd still rather associate locale with a handle. This way, all Char and String IO functions that exist, and those that are not written yet, can work with any encoding without relying on the abomination that is setlocale(). Seconded; this is the best approach. The libc locale could be consulted to determine the initial or default encoding, or it could just be ignored (I'd vote to ignore it; setlocale() *is* an abomination.) BTW, this is how Tcl does it -- each file handle has an associated encoding (which may be changed on the fly) -- and it's very convenient. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
[ADMINISTRIVIA]: Change list submission policy please?
The haskell mailing list is getting an increasing amount of spam, viruses, and virus warnings. Would it be possible to change the list policy to only allow submissions from subscribed members? Please? --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Functional design patterns (was: How to get functional software engineering experience?)
Ralf Laemmel wrote: Joost Visser and I, we worked out a few maybe not so obvious functional programming pattern [...] http://www.cs.vu.nl/Strafunski/dp-sf/ Neat! I have the feeling that the FP community has a hard time getting started with design patterns. I believe quite the opposite: there are plenty of FP design patterns in common use, it's just that FP'ers don't usually use the term design patterns to describe them. I'm thinking of things like catamorphisms, anamorphisms (aka folds/unfolds), monads and functors, the zipper, and of course the various catalogues of polytypic routines. Part of the problem is that design patterns are inherently vague and this is maybe something the authors in our field do not like to consider. That's the main difference between FP patterns and OO patterns. OO patterns tend to be informal, intuitive guidelines; and though some FP patterns are like this (e.g., combinator library, embedded domain-specific language), the majority can be described rigorously. This gives them an added usefulness -- you can actually calculate with them. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Is there a name for this structure?
Not really a Haskell question, but someone here might know the answer... Suppose you have two morphisms f : A - B and g : B - A such that neither (f . g) nor (g . f) is the identity, but satisfying (f . g . f) = f. Is there a conventional name for this? Alternately, same question, but f and g are functors and A and B categories. In some cases (g . f . g) is also equal to g; is there a name for this as well? I find myself running into pairs of functions with this property over and over again, and am looking for a short way to describe the property... Thanks, --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
ANNOUNCE: HXML 0.2, XML parser for Haskell
Announcing HXML version 0.2, a non-validating XML parser written in Haskell. It is designed for space-efficiency, taking advantage of lazy evaluation to reduce memory requirements. HXML is available at: URL: http://www.flightlab.com/~joe/hxml The current version is 0.2, and is pre-beta quality. HXML has been tested with GHC 5.02, NHC 1.10, and various recent versions of Hugs 98. Changes in version 0.2: + New Arrow-based combinator library + Added support for CDATA sections + New function parseDocument recognizes (and ignores) the document prolog (XML and DOCTYPE declarations) + Several data structures and public functions have been renamed + Space fault in comment parsing fixed Please contact Joe English [EMAIL PROTECTED] with any questions, comments, or bug reports. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
ANNOUNCE: HXML 0.2, XML parser for Haskell
Announcing HXML version 0.2, a non-validating XML parser written in Haskell. It is designed for space-efficiency, taking advantage of lazy evaluation to reduce memory requirements. HXML is available at: URL: http://www.flightlab.com/~joe/hxml The current version is 0.2, and is pre-beta quality. HXML has been tested with GHC 5.02, NHC 1.10, and various recent versions of Hugs 98. Changes in version 0.2: + New Arrow-based combinator library + Added support for CDATA sections + New function parseDocument recognizes (and ignores) the document prolog (XML and DOCTYPE declarations) + Several data structures and public functions have been renamed + Space fault in comment parsing fixed Please contact Joe English [EMAIL PROTECTED] with any questions, comments, or bug reports. --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Lazy Parsing
Brandon Michael Moore wrote: I'm wondering if there are any libraries out there for creating parsers that lazily build up their result. I know I could thread the remaining input through a parser by hand, but it seems like someone should have already done it. This turns out to be rather difficult to do in the general case (but see below -- XML is a special case). If you have type Parser sym result = [sym] - Maybe (result, [sym]) a Parser can't decide whether to return 'Just (result,rest)' or 'Nothing' until it has successfully parsed the complete result. So pattern matching on the parser's return value will force the entire production. Variations on the theme -- Either instead of Maybe, list-of-successes, continuation-passing combinators, etc -- all face a similar problem. However, if your top-level grammar is of the form: things :: empty | thing things {- == thing* -} then instead of: case runParser (pMany pThing) input of Just (result,[]) - ... you can use something like unfoldr (runParser pThing) input to build the result list incrementally. This will be less eager; instead of parsing and returning an entire list of Things, it parses one Thing at a time. Another thing to watch out for is heap drag. The list-of-successes approach tends to retain the entire input, just in case the parser needs to backtrack. Parsec [1] and UU_Parsing [?] solve this by severely restricting the amount of required lookahead. I'd like to be able to turn a stream of XML into a lazy tree of tags (probably Maybe tags, or Either errors tags), but I don't think HaXml and the like do that sort of thing. That's exactly how HXML [2] works. The parser returns a lazy list of tokens (analogous to SAX events), which are folded up into a tree by a separate function. In addition it uses a CPS parser library so (as with Parsec), there is minimal heap drag. [1] Parsec: URL: http://www.cs.ruu.nl/~daan/parsec.html [1] HXML: URL: http://www.flightlab.com/~joe/hxml (Note: HXML release 0.2 will be ready Real Soon Now, and there have been many incompatible changes since 0.1. The main thing left to be finished is the documentation, if you can live without that let me know and I'll put a snapshot up.) --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: heap and walking through a very long list
Simon Peyton-Jones wrote: There should really be a strict accumArray, just as there should be a strict foldl. Yes, please! Is there a way to write a strict version of accumArray in Haskell 98, or does this need to be done by the implementation? --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Space faults in HaXml
Last week, Dmitry Astapov posted a message to haskell-cafe describing a HaXml program that was running out of memory on large inputs. I've done some investigation on this; here's what I've discovered so far: + The HaXml parser is overly eager; it doesn't produce a value until the entire input has been read. The heap profile for Dmitry's program shows heap usage climbing steadily up to a large peak as the document is being parsed, then a sharp drop as output begins, then climbs again as the output is produced. I suspect that this behaviour is typical of most HaXml programs. I replaced the parser with HXML (recently announced here); this flattens out the first peak, but the second one persists. (HXML uses an incremental parser -- instead of parsing a Document, it parses individual Events (start-tag, character data, end-tag, et cetera) and assembles them into a tree with a separate function.) + nhc98's heap profiler is fantastic. + Under nhc98 (v1.10), 'putStr' doesn't seem to like overly-long strings. This leads to trouble since the HaXml main driver uses a single call to 'hPutStrLn' to write the entire output document. Replacing this with 'mapM_ putChar' (or using GHC 5.02), fixes this problem, but the program still leaks. + The nhc98 heap profiler is amazingly useful. + HaXml uses the Hughes Peyton Jones Pretty printer to format the output. This appears to have a slow leak. (Later tests show that the peak consists of void heap cells from the Pretty module, mostly suspended calls to Prelude.Int- and Prelude.Int+.) Figuring that it's overkill anyway I replaced it with a simpler serializer. + After this change there's *still* a space fault, which with the help of nhc's profiler (which, BTW, is great), I tracked down to the HaXml 'cat' combinator: cat :: [a - [b]] - (a - [b]) cat fs = \e - concat [f e | f - fs] Earlier I had reported that rewriting Dmitry's test case to use the HXML internal representation directly instead of converting to HaXml's representation fixed the space leak. As it turns out, I didn't implement the 'cat' combinator, but instead used a binary concatenation operator: (+++) :: (a - [b]) - (a - [b]) - (a - [b]) f +++ g = \x - f x ++ g x and my implementation of mkElem had the signature mkElem :: String - CFilter - CFilter instead of HaXml's mkElem :: String - [CFilter] - CFilter. which uses 'cat' to process the second argument. Backporting this to HaXml: mkElem' name cf = \x - [CElem (Elem h [] (cf x))] and modifying Dmitry's test case to use mkElem' and +++ instead of mkElem and cat finally fixes the problem. I'm still not sure *why* this does the trick -- hopefully somebody smarter can explain that -- but the modified program runs in bounded space, no matter how large the input file is. (It even works in Hugs, which I found surprising, since the HXML tree builder has a known problem when run with Hugs' garbage collector.) --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
ANNOUNCE: HXML 0.1, an(other) XML parser for Haskell
2 Nov 2001 Announcing HXML, an(other) XML parser for Haskell. This implementation should have better space behaviour than HaXml's parser, and may be used as a drop-in replacement in existing HaXml programs. HXML is available at: URL: http://www.flightlab.com/~joe/hxml/ The current version is 0.1, and is slightly post-alpha quality. Tested with GHC 5.02, NHC98 1.10, and various recent versions of Hugs. Please contact Joe English [EMAIL PROTECTED] with any questions, comments, or bug reports. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: HaXml, memory usage and segmentation fault
An update on Dmitry's problems with HaXml memory usage: + Compiling HaXml and the driver program with ghc -O helps a *lot*. + Using the version of HaXml that comes preinstalled with GHC (-package text) helps even more. There is a slight difference in the 'Pretty' module (which is used to print the output) between the two versions. + I wrote an adapter that converts my parser's XML representation into HaXml's, so you can use it as a drop-in replacement. This helps some, but not enough. The heap profile using HaXml 1.02 has two large humps: the first from parsing the input, and the second from pretty-printing the output. (With the GHC version of HaXml the second hump is about half as tall as with the official HaXml version). With the new parser, only the smaller hump remains. + Figuring that using a pretty-printer is overkill, I replaced it with a quick hack that converts the HaXml representation _back_ into my representation and feeds it to a serializer that I had previously written. This improves things some more: the identity transformation 'processXmlWith keep' now has a flat heap profile. + Unfortunately, Dmitry's original program still has a space leak. I suspect that the HaXml combinators (or, more likely, the HaXml internal representation) are not as space-efficient as I had originally thought, since when I rewrote Dmitry's test case to use the new parser's internal representation directly I again got a flat heap profile -- there doesn't seem to be anything wrong with the structure of the original program. The code will be ready to release Real Soon Now; I'll keep you posted. --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: HaXml, memory usage and segmentation fault
Dmitry Astapov wrote: JE and the Hugs interpreter just isn't designed to work with large JE inputs. Try compiling the program instead. well, ghc-5.02 seems to dislike something inside XmlLib.hs - it could not find interface defs file for modules IOExts .. I plan to look more deeply into it though. I got it to compile with ghc 5.02 using ghc --make -package lang translate.hs The compiled version succeeds, but on a large document it uses a *lot* of memory and starts paging pretty badly. JE Try the identity transform 'main = processXmlWith keep' on your sample JE document and see if that runs out of heap too. If so, there's not JE much you can do short of replacing the HaXml parser. I tried this as well, modifying your program to use an XML parser I wrote a while ago that has better laziness properties than the HaXML one. Alas, my parser also suffers from a space leak under Hugs, so this only deferred the problem. Under ghc/ghci, though, it has modest memory requirements and runs without paging. --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: HaXml, memory usage and segmentation fault
Dmitry Astapov wrote: JE I tried this as well, modifying your program to use an XML parser I JE wrote a while ago that has better laziness properties than the HaXML JE one. Alas, my parser also suffers from a space leak under Hugs, so JE this only deferred the problem. Under ghc/ghci, though, it has modest JE memory requirements and runs without paging. Is it's distribution restricted? Is it possible to get it somwhere, use it, patch it, etc? If you don't mind a complete lack of documentation, sure :-) The code is alpha quality; there are a few missing features and a couple of things that it just gets wrong, but it's basically working. I'll package it up and put it on the Web when I get a chance. This may take a day or two... --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: HaXml, memory usage and segmentation fault
Dmitry Astapov wrote: I have Hugs version February 2001, HaXml version 1.02 and this program: [...] This program can process following file: ?xml version='1.0'? invoice [... one customer containing two contracts ... ] /invoice Now increase amount of customers to 10, and amount of contracts within each customer to 999. After that, runhugs -h600 translate.hs invoice.xml invoice.html dumps core :( What's the reason: bug in hugs, bug in HaXml, or my own bad programming techniques? More an inappropriate use of Hugs -- 10 customers with 999 contracts each is a moderately large input file, and the Hugs interpreter just isn't designed to work with large inputs. Try compiling the program instead. The other issue is that HaXml's XML parser is insufficiently lazy (although the rest of HaXml has very nice strictness properties). For instance, there's no reason why your program shouldn't run in near-constant space, but due to the way the parser is structured it won't begin producing any output until the entire input document has been read. Try the identity transform 'main = processXmlWith keep' on your sample document and see if that runs out of heap too. If so, there's not much you can do short of replacing the HaXml parser. --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
HaXml article on IBM developerWorks
There's a nice article on HaXml (Runciman and Wallace's excellent Haskell toolkit for XML) in the XML Matters section in IBM developerWorks. From the abstract: | XML Matters : Transcending the limits of DOM, SAX, and XSLT | --- | Consider Haskell in lieu of DOM, SAX, or XSLT for processing XML data. The | library HaXml creates representations of XML documents as native recursive | data structures in the functional language Haskell. HaXml brings with it a | set of powerful higher order functions for operating on these datafied | XML documents. Many of the HaXml techniques are far more elegant, compact, | and powerful than the ones found in familiar techniques like DOM, SAX, or | XSLT. Code samples demonstrate the techniques. http://www-106.ibm.com/developerworks/xml/library/x-matters14.html --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Multithreaded stateful software
Mark Carroll wrote: One of the projects I have coming up is a multi-threaded server that manages many clients in performing a distributed computation using a number of computers. [...] (a) This really isn't what Haskell was designed for, and if I try to write this in Haskell I'll never want to touch it again. (b) This project is quite feasible in Haskell but when it's done I'll feel I should have just used Java or something. (c) Haskell's monads, concurrency stuff and TCP/IP libraries are really quite powerful and useful, and I'll be happy I picked Haskell for the task. There's also: (d) You end up learning all sorts of new things about distributed processing (as well as Haskell) and, armed with the new knowledge, future problems of the same nature will be easier to solve no matter what language you use. That's what usually happens to me. (Personally, if I had this project coming up, I'd use it as an excuse to finally learn Erlang...) --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Monads
Ashley Yakeley wrote: At 2001-05-17 02:03, Jerzy Karczmarczuk wrote: Monads are *much* more universal than that. [...] [...] Imperative programming is just one facet of the true story. Perhaps, but mostly monads are used to model imperative actions. And their use in imperative programming is the obvious starting point to learning about them. I don't know about that; I use monads most often when dealing with container classes (sets, bags, lists). They also provide a useful way to reason about parts of XPath and XSLT. As far as learning about them goes, I don't think I really got monads until reading Wadler's aptly-titled Comprehending Monads, which approaches them from this perspective. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: A sample revised prelude for numeric classes
Dylan Thurston wrote: I've started writing up a more concrete proposal for what I'd like the Prelude to look like in terms of numeric classes. I like this proposal a lot. The organization is closer to traditional mathematical structures than the current Prelude, but not as intimidating as Mechveliani's Basic Algebra Proposal. A very nice balance, IMO. A couple of requests: module Lattice where class Lattice a where meet, join :: a - a - a Could this be split into class SemiLattice a where join :: a - a - a and class (SemiLattice a) = Lattice a where meet :: a - a - a I run across a lot of structures which could usefully be modeled as semilattices, but lack a 'meet' operation. It would be reasonable to make Ord a subclass of this, but it would probably complicate the class heirarchy too much for the gain. In a similar vein, I'd really like to see the Ord class split up: class PartialOrder a where (), () :: a - a - Bool class (Eq a, PartialOrder a) = Ord a where compare:: a - a - Ordering (=), (=) :: a - a - Bool max, min :: a - a - a Perhaps it would make sense for PartialOrder to be a superclass of Lattice? --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Extending the do-notation
Sebastien Carlier wrote: Sometimes I need to write code which looks like this: do x - m1 let y = unzip x ... -- never using x anymore I thinks the following extension to do-notation would be useful: pat - exp1 # exp2 ; exp3 would be rewritten as exp2 = ((\pat - exp3) . exp1) so that the above example could be rewritten more compactly: do y - unzip # m1 This can be done in Haskell without any changes to the 'do' notation at all: just define | f # m = m = (return . f) and add an appropriate fixity declaration for '#'. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Finding primes using a primes map with Haskell and Hugs98
Shlomi Fish wrote: As some of you may know, a Haskell program that prints all the primes can be as short as the following: primes = sieve [2.. ] where sieve (p:x) = p : sieve [ n | n - x, n `mod` p 0 ] Now, this program roughly corresponds to the following perl program: [ ~20 line Perl program snipped ] The program can be more optimized for both speed and code size, but I wanted to make it as verbose as possible. There is a different algorithm which keeps a boolean map [...] The algorithm iterates over all the numbers from 2 to the square root of the desired bound, and if it encounters a prime number it marks all the numbers p*p, p*p+p, p*p+2*p, p*p+3*p, etc. as not prime. [~40 line Perl implementation snipped] Now, I tried writing an equivalent Haskell program and the best I could do was the following: [ ~45 line Haskell implementation snipped ] Another way to do this is to compute the final array directly, instead of computing successive versions of the array: import Array primes n = [ i | i - [2 ..n], not (primesMap ! i)] where primesMap = accumArray (||) False (2,n) multList multList= [(m,True) | j - [2 .. n `div` 2], m - multiples j] multiples j = takeWhile (n=) [k*j | k - [2..]] Now this version does a lot more work than the algorithm described above -- it computes multiples of *all* the integers less than n/2, not just the primes less than sqrt(n) -- but it has the virtue of being short enough to reason about effectively and is probably a better starting point for further optimization. The problem is that when running it on hugs98 on a Windows98 computer with 64MB of RAM, I cannot seem to scale beyond 30,000 or so, as my boundary. When entering how_much as 50,000 I get the following message: ERROR: Garbage collection fails to reclaim sufficient space My implementation fares even worse under Hugs -- it runs out of space around n = 4500 (Linux box, 64M RAM). With GHC it has no problem for n = 100,000, although the space usage is still extremely poor. It grows to consume all available RAM at around n = 200,000. (On the other hand, it's considerably faster than the traditional 2-liner listed above, up to the point where it starts paging). I suspect the poor memory usage is due to the way accumArray works -- it's building up a huge array of suspensions of the form (False (False ( ... True))) that aren't reduced until an array element is requested. (A strict version of accumArray, analogous to "foldl_strict" defined below, would solve this problem, but I don't see any way to implement it in Standard Haskell). In perl I can scale beyond 100,000, and if I modify the code to use a bit vector (using vec) to much more. So my question is what am I or hugs are doing wrong and how I can write better code that implements this specific algorithm. From what I saw I used tail recursion, (and hugs98 has proper tail recursion right?), and there's only one primes_map present at each iteration (and thus, at all), so it shouldn't be too problematic. Actually no; this is a common misconception. In a strict language like Scheme, tail call optimization works because a tail call is the last thing a function does. In Haskell though the tail call is the *first* thing that gets evaluated (more or less), leaving all the "earlier" work as an unevaluated suspension. Code that is space-efficient in a strict language frequently suffers from awful space leaks in a lazy language. For example: sum_first_n_integers n = f n 0 where f 0 a = a f n a = f (n-1) (n+a) quickly leads to a "Control Stack Overflow" error in Hugs. BTW, the trick to fix it is to change the last line to: f n acc = f (n-1) $! (n+acc) or to replace the whole thing with: foldl_strict (+) 0 [1..n] where foldl_strict f a [] = a foldl_strict f a (x:xs) = (foldl_strict f $! f a x) xs Does it have to do with the way hugs98 implements and Int to Bool array? Most likely yes. Hugs is optimized for interactive use and quick compilation, not for space usage. Try it with GHC or HBC and see how it does. --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: mapM/concatMapM
[EMAIL PROTECTED] (Sengan Baring-Gould) wrote: mapM seems to be a memory hog (and thus also concatMapM). In the following eg: main = mapM print ([1..102400] :: [Integer]) memory usage climbs to 1.6M with ghc and needs -K20M As a guess: since 'mapM print ([1..102400] :: [Integer])' has type 'IO [()]', perhaps the result of the IO operation -- a list of 100K empty tuples -- is the culprit, even though the result is never used. Does 'mapM_ print ... ' (:: IO ()) perform any better? --Joe English [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: SAX, XML, Haskell
Chris Angus wrote: I looked at HaXml a while ago and it seemed to offer a very Dom-like interface. You could say that it's DOM-like in that it deals with trees instead of an event stream, but it's actually somewhere in between SAX and DOM. The main difference between HaXML and DOM is that in the DOM, nodes have object identity and it's possible to access parent and sibling nodes (which implies access to the entire tree tree starting from any node). With HaXML, you can only get to the subtree rooted at a particular node. [HaXML also encapsulates a lot of the plumbing and is relatively cruft-free, which makes it considerably more pleasant to use than the DOM in my opinion, but that's a secondary issue.] I was wondering if anyone had thought of making a Sax-like interface based on lazy evaluation. where tokens are processed and taken from a (potentially) infinite stream The HaXML combinators actually have very nice laziness properties. HaXML as a whole (at least the last time I looked at it) is stricter than it needs to be -- because the XML parser processes the entire input document before returning the root node -- but a combination of the HaXML combinators and Ketil Malde's parser would give the best of both worlds. --Joe English [EMAIL PROTECTED]
Re: simple binary IO proposition.
Marcin 'Qrczak' Kowalczyk wrote: Joe English pisze: According to the ISO C standard, the meaning of wchar_t is implementation-defined. I know. How to convert between the default multibyte locale and Unicode on such systems? As far as I can tell, there's no way to do so in Standard C without investigating the details of each particular implementation. Even then it might not be possible -- I *still* can't figure out what encodings are supported on IRIX. It seems to me that the Standard C library routines are only useful for programs that wish to remain completely isolated from the details of localization. If there's any requirement at all for a specific encoding or character set (such as UTF-8 or UTF-16), they seem to be pretty much worthless as too much information is hidden. --Joe English [EMAIL PROTECTED]
Re: Test case Re: Is there a space leak here?
Simon Peyton-Jones [EMAIL PROTECTED] wrote: Consider the Data a branch. It'll be compiled to let t = build es siblings = fst t rest = snd t in ... In many implementations, until you evaluate 'rest', you'll keep the whole of 't'; which in turn contains 'siblings'. So there's a danger of a leak here. That's more or less what I suspected... GHC does a standard trick, which is to perform the 'snd' in the garbage collector, so the original form shouldn't leak. I don't think Hugs does. (But it will when we release STG Hugs.) That must be it! I just tried the developer snapshot of STG Hugs, and (since I couldn't get the profiler to work) ran it with a heap size just big enough to hold the Prelude and my test case, plus ~10K words to spare. After fixing the *other* space leak that Malcolm Wallace noted, (too much laziness in 'length' and 'sum') all the tests ran without a problem. That was using breadth=12 and depth=6, which makes Hugs98 run out of room even with the default heap size. Problem solved! Thanks! --Joe English [EMAIL PROTECTED]
Re: Test case Re: Is there a space leak here?
I wrote: [...] I just tried the developer snapshot of STG Hugs, and [...] After fixing the *other* space leak that Malcolm Wallace noted, all the tests ran without a problem. [...] Problem solved! Thanks! Forgot to mention: my "real" program -- the XML parser -- also runs without a space problem under STG Hugs. --Joe English [EMAIL PROTECTED]
Re: Is there a space leak here?
"Mark P Jones" [EMAIL PROTECTED] wrote: Joe: As you've observed, the space behavior of Haskell programs is often very subtle, and hard to understand. I glanced quickly over your program but didn't see any immediate signs of problems. My first suggestion would be that you try using the rudimentary heap profiler that Hugs provides to see if this gives some insight into the source of the leak. Ah! I didn't even know Hugs had this. Very useful! This has turned up some interesting results... here's what I've found so far: Running the parser by itself (parseInstance :: String - [XMLEvent]) yields a nice flat space profile (actually it's rather spiky, but it definitely runs in bounded space). So 'parseInstance' by itself doesn't seem to have a space leak. But when I feed its output to 'preorderTree . treeBuild' (where treeBuild :: [XMLEvent] - Tree XMLNode and preorderTree :: Tree a - [a]), the space usage grows linearly, with a sharp dropoff very near the end. Further investigation shows that the problem is *definitely* in the tree builder. It's not specific to Hugs either, GHC behaves the same way. Failing that, it might be worth trying to put together a complete example (program and data) that demonstrates the problem. I find it rather hard to think about examples like this in the abstract. Having code that I can actually run, can make a big difference in situations like this. I've boiled it down to a short test case; will post that here presently. Some more background on what I'm working on... There are two traditional approaches to processing SGML and XML: the event-driven approach, where the application responds to start-tag, end-tag, and data events; and the tree-based approach, where the application has access to the entire document tree. The tree-based approach tends to be easier to use and more flexible, but common wisdom has it that the event-driven approach is more space-efficient. I thought: wouldn't it be neat if you could write programs in a tree-based style, and automatically get good space behaviour through the magic of lazy evaluation? There are a lot of common XML processing tasks that are naturally expressed as a catamorphism, tree homomorphism, or downwards accumulation (validation, namespace processing, many simple translations to other document types, etc.), all of which should run in space bounded by the depth of the tree. --Joe English [EMAIL PROTECTED]
Test case Re: Is there a space leak here?
Here's the test case... The space profile for tests 2 and 3 look interesting; there are n triangular "spikes" (where 'n' is the breadth of the tree) that drop off sharply. My hypothesis is that 'deserialize' (the problematic function, called 'buildTree' in my earlier message) is building up a long chain of suspensions of the form snd . snd . snd . ... build that aren't getting reduced... not sure about this though. It takes a really large tree before Hugs runs out of space with this test case (breadth=11, depth=6 or so). In my real program though there's much more data stored at each node and it fails on modestly-sized inputs. Thanks in advance for any advice... --Joe English [EMAIL PROTECTED] -- module SpaceLeak where data Tree a = Tree a [Tree a] deriving (Show, Eq) -- -- a few of the usual polytypic functions... -- mapTree :: (a - b) - Tree a - Tree b mapTree f (Tree a c)= Tree (f a) (map (mapTree f) c) type TreeF a b = (a, [b]) cataTree:: (TreeF a b - b) - Tree a - b anaTree :: (b - TreeF a b) - b - Tree a cataTree f (Tree a c) = f (a,map (cataTree f) c) anaTree g b = let (a,bs) = g b in Tree a (map (anaTree g) bs) -- -- and a few useful utilies... -- preorderTree :: Tree a - [a] preorderTree t = traverse t [] where traverse (Tree a c) k = a : travlist c k travlist (c:cs) k = traverse c (travlist cs k) travlist [] k = k sizeTree :: Tree a - Integer sizeTree = cataTree (\(_,l)- 1 + sum l) treeChildren (Tree n c) = c -- -- A datatype for tree serialization: -- This is similar to the tokens returned by an XML parser. -- data Event a = StartTag a | Data a | EndTag deriving (Show,Eq) -- -- serialize turns a tree into a list of start/data/end events. -- serialize :: Tree a - [Event a] serialize t = stree t [] where stree (Tree x []) k = Data x : k stree (Tree x l) k = StartTag x : slist l (EndTag : k) slist [] k = k slist (t:ts) k = stree t (slist ts k) -- -- deserialize builds a tree from a list of events; -- (deserialize . serialize) = id -- -- This is the problematic function. -- deserialize :: [Event a] - Tree a deserialize events = head (fst (build events)) where build :: [Event a] - ([Tree a], [Event a]) build [] = ([],[]) build (e:es) = case e of Data a - let (siblings, rest) = build es in (Tree a [] : siblings, rest) StartTag a - let (children,es') = build es (siblings,rest) = build es' in (Tree a children : siblings, rest) EndTag - ([],es) -- -- 'sampleTree breadth depth' generates a tree of the specified depth, -- where each non-leaf node node has 'breadth' children. -- testTree breadth depth = anaTree testCOAlg depth where testCOAlg n = (n, if n 1 then take breadth $ repeat (n-1) else []) -- -- Quick sanity check to make sure 'deserialize' works as specified: -- test0 n m = testTree n m == (deserialize . serialize) (testTree n m) -- True -- -- The following all run in bounded space: -- try with ':set -d1000' in Hugs, n=4, m=6 or so... -- In particular, serialize $ testTree n m behaves nicely. -- test1a n m = sizeTree $ testTree n m test1b n m = length $ preorderTree $ testTree n m test1c n m = length $ serialize$ testTree n m -- -- These seem to leak space: -- test2a n m = sizeTree $ deserialize $ serialize $ testTree n m test2b n m = length $ preorderTree $ deserialize $ serialize $ testTree n m test3a n m = deserialize $ serialize $ testTree n m test3b n m = preorderTree $ deserialize $ serialize $ testTree n m -- This does not: test4a n m = length $ treeChildren $ deserialize $ serialize $ testTree n m -- But this does: test4b n m = map (length . treeChildren) $ treeChildren $ deserialize $serialize $ testTree n m -- *EOF* --
Re: Reverse composition
[EMAIL PROTECTED] wrote: Some time ago there was a discussion about what to call reverse composition (I can't find it in the archive - needs a search option?) Just now I thought of .~ from . for composition and ~ (tilde, but commonly called twiddle) for twiddling the order about. Maybe we could adopt that as normal usage? I've also seen .| and |. used for this purpose (by analogy with Unix pipes.) John Hughes' Arrow library spells it "", but generalized to arbitrary arrows. At the (-) instance it's the same as "flip (.)". Along the same lines, are there accepted conventional infix operators for the functions with types: (a0 - b0) - (a1 - b1) - (a0,a1) - (b0,b1) (a - b0) - (a - b1) - a - (b0,b1)) (a0 - b0) - (a1 - b1) - Either a0 a1 - Either b0 b1 (a0 - b) - (a1 - b) - Either a0 a1 - b (the last one is called "either" in the standard Prelude). I personally like: (f * g) (x,y) = (f x, g y) (f g) x = (f x, g x) (f + g) (Left x) = Left (f x) (f + g) (Right y) = Right (g y) (f | g) (Left x) = f x (f | g) (Right y) = g y Hughes spells these ***, , +++, and ||| (again generalized to arbitrary arrows), but those don't look as nice typeset IMHO. I also like: apfst :: (a - c) - (a,b) - (c,b) apsnd :: (b - c) - (a,b) - (a,c) apl :: (a - c) - Either a b - Either c b apr :: (b - c) - Either a b - Either a c These are called "first", "second", "left", and "right" in the Arrow library. --Joe English [EMAIL PROTECTED]
Re: To all those who don't like ad-hoc overloading
Kevin Atkinson wrote: "Generic comparison function" is not really what I mean here. What I mean is a single generic union which will have its comparison function default to (==) if one is not specified. It COULD be written something like union (cmp = (==)) l1 l2 ... where union l1 l2 means union (==) l1 l2 I don't quite see what algorithm you're using to decide how many arguments are passed to the function. What would you get if you typed: foo = foldr union [] for example? --Joe English [EMAIL PROTECTED]
Re: Cryptarithm solver - comparing oranges and oranges
Mark Engelberg [EMAIL PROTECTED] wrote: A few observations: [...] 4. Andy Gill's general solution is most impressive, and I hope that I eventually get to the point where his program makes sense to me. I don't quite understand Monads and some of the built-in functions enough to follow it. (Monads are Haskell's way of making imperative and mutative programming harder to do so that programmers are less likely to do it, right? :)) Actually, quite the opposite... you should see how difficult it was to do I/O in Haskell *before* Monads were introduced! --Joe English [EMAIL PROTECTED]
Re: Guidance on strictness
Juan Jose Garcia Ripoll [EMAIL PROTECTED] wrote: can anybody point me to tutorials, papers, etc, on how to properly annotate strictness in Haskell code? I am concerned with the following stupid piece of code that eats a lot of memory and takes an incredible amount of time to produce some output. I hope somebody will help me in finding what I am doing wrong. I don't know of any general techniques, but there are a few specific rules of thumb that apply here: In strict languages like ML and Scheme, "foldl f" (or the equivalent thereof) is preferable to "foldr f" (when given the choice, i.e., if "f : a-a-a" is associative), since "foldl" is tail-recursive. In lazy languages like Haskell, "foldr" is (usually) preferable, since it's lazier -- if "f" is lazy, then "foldr f z l" only needs to evaluate the first term of "l", whereas "foldl f z l" needs to force all of "l" before it can proceed. If "f" is strict, however, both "foldr f z l" and "foldl f z l" will take up heap space proportional to the length of "l", but "foldl'" -- the strict version of "foldl" -- only takes constant space. So the rule of thumb is: if "f" is associative and lazy, use "foldr f". If "f" is associative and strict, use "foldl' f". In general, try to avoid "foldl". produce :: Int - Double - Array Int Double produce n x = array (1,n) [(i,x) | i - [1..n]] scprod :: Array Int Double - Array Int Double - Double scprod a b = case (bounds a, bounds b) of ((1,i), (1,j)) - foldl (+) start [a!(x) * b!(x) | x - [2..i]] where start = a!(1) * b!(1) main = print (show (scprod a a)) where a = produce 100 1.0 Replacing "foldl" with "foldl'" in your program doesn't fix the whole problem though. I suspect that the use of Arrays is the culprit. Just trying to evaluate (produce 100 1.0) ! 1 causes Hugs to run out of heap almost immediately on my machine, and (produce 1 1.0) ! 1 succeeds, but runs for several seconds and 8 garbage collections before returning an answer. This is because "array bds l" has to fully evaluate the spine of "l", plus the first member of each element of "l", before it can be subscripted. Moreover, it leaves the second member of each element of "l" unevaluated until it's requested, so the space behaviour of "array" is particularly bad. The second rule of thumb, then, is "avoid Haskell Arrays unless you really, really need constant-time random access." In this particular problem, 'scprod' consumes elements in sequential order, so it may be better to use lists instead of arrays. (In fact "scprod a a where a = produce n 1.0" has a closed-form, O(1) solution, but I assume that's not the problem you're really trying to solve :-) Hope this helps, --Joe English [EMAIL PROTECTED]
Re: Modifying the monomorphism restriction
John Hughes [EMAIL PROTECTED] wrote: Everybody agrees the monomorphism restriction is a pain: [...] So, let's make it visible, in the simplest possible way. Let there be TWO forms of binding: x = e, and x := e (say). A binding of the form `x = e' is interpreted using call-by-name, and may of course be overloaded: it makes `x' and `e' exactly equivalent. A binding of the form `x := e' is interpreted using call-by-need, and is monomorphic; `x' behaves as though it were lambda-bound. This is a good idea, except for the use of ':='. I'd hate to lose that symbol from the programmers' namespace just to solve the MR problem. (Am I the only one who's never been bitten by the MR restriction?) How about leaving the 'a = b' binding form as it is, (monomorphism restriction and all) and using 'a = ~ b' to indicate call-by-name. '~' is already a reserved symbol, and it can't currently appear on the RHS of a binding, so this new syntax would't break any existing programs. This way call-by-need remains the default, and compilers will still signal an error if the programmer accidentally bumps into the MR. If this happens you only need to twiddle the code to fix it. For people reading the code, a ~ on the RHS of a binding is a signal that something out-of-the-ordinary is going on operationally, the same as when it appears on the LHS. --Joe English [EMAIL PROTECTED]
Re: Modifying the monomorphism restriction
I wrote: Operationally I expect that in "let x = f y in ... x ... x", 'f y' is only evaluated once, no matter what type it is. Which, of course, is not how Haskell actually works, if x :: (SomeClass a) = SomeType a. DOH! Please disregard my earlier remarks... --Joe English [EMAIL PROTECTED]
Re: Modifying the monomorphism restriction
Alex Ferguson [EMAIL PROTECTED] wrote: Joe English: How about leaving the 'a = b' binding form as it is, (monomorphism restriction and all) and using 'a = ~ b' to indicate call-by-name. [...] I like that much less [...] because I consider it (still) to be the wrong 'default'. For people reading the code, a ~ on the RHS of a binding is a signal that something out-of-the-ordinary is going on operationally, the same as when it appears on the LHS. I dispute that there's anything 'operationally out-of-the-ordinary' about an overloaded function definition, which is almost invariably what the MR is (silently) complaining at me for doing when I fall over it. It's only out-of-the-ordinary if you were depending on it being operationally a CAF, so having some sort of special CAF syntax seems reasonable to me. I was thinking of the example from the Haskell Report: let { len = genericLength xs } in (len, len) which, without the MR, computes 'len' twice. Operationally I expect that in "let x = f y in ... x ... x", 'f y' is only evaluated once, no matter what type it is. Also John Hughes' warning that: * When converting Haskell 1.x to Haskell 2, many := would need to be inserted. Failure to do so could make programs much less efficient. That's why I'd prefer to keep call-by-need the default and use new syntax for call-by-name/overloading. This way bizarre type errors remain the default, and compilers will signal an error _somewhere else_ in the program when you bump into the MR, if my experience is anything to go by. Could you provide an example? Every instance of the MR I've been able to come up with winds up giving an error message right at the definition that would need to have a ~ or an explicit type signature added. --Joe English [EMAIL PROTECTED]