Re: How to force evaluation entirely?
Michael Marte wrote: > (First I defined > > class Eager a where > eager :: a -> a > > and some instances: > > instance Eager Int where > eager i = i `seq` i For Int (and other base types), forcing to WHNF is as far as you can go, so I think you can simplify the above to > instance Eager Int where > eager i = i > instance Eager a => Eager [a] where > eager [] = [] > eager (x : xs) = eager x `seq` eager xs `seq` (x : xs) If you want eager to preserve identity (i.e. eager x `unsafePtrEq` x), as well as to avoid duplicating data, you might prefer: > instance Eager a => Eager [a] where > eager xs@[] = xs > eager xs@(x : xs') = eager x `seq` eager xs' `seq` xs > instance Eager a => Eager (a, a) where > eager (x, y) = eager x `seq` eager y `seq` (x, y) I think the following is more general: > instance (Eager a, Eager b) => Eager (a, b) where > eager xy@(x, y) = eager x `seq` eager y `seq` xy Ultimately, generic Haskell may come handy. -- Zhanyong Wan Dept of Computer Science, Yale University
Re: How to force evaluation entirely?
Hello, thank you for the discussion on eager evaluation. For solving my memory problem I finally used the x == x `seq` return x approach because it is easier to derive Eq classes automatically than to write some sequentialisation functions. (First I defined class Eager a where eager :: a -> a and some instances: instance Eager Int where eager i = i `seq` i instance Eager Char where eager c = c `seq` c instance Eager a => Eager [a] where eager [] = [] eager (x : xs) = eager x `seq` eager xs `seq` (x : xs) instance Eager a => Eager (a, a) where eager (x, y) = eager x `seq` eager y `seq` (x, y) (Is this correct?) But I stopped here because its quite a lot of work to give all the instances for the types involved. Furthermore, I was not sure how to cope with record types.) Michael
Re: How to force evaluation entirely?
John Hughes <[EMAIL PROTECTED]> writes: > As far as the power of the optimizer is concerned, my guess is programmers > very rarely write x==x (unless they MEAN to force x!), so the loss of > optimization doesn't matter. Of course, in principle, an optimizer *could* > replace x==x by x`seq`True (if x is known to be of base type), and the x`seq` > might well be removed by later transformations (if x can be shown to be > defined, something compilers do analyses to discover). Who knows, maybe this > happens in the innards of ghc... Or the compiler could internally create its own HyperStrict class and replace x==x by x`hyperSeq`True, if all the Eq instances involved in the type of x are known to be reflexive (which is the case if they were all automatically derived). :-) Carl Witty
Re: How to force evaluation entirely?
>As far as the power of the optimizer is concerned, my guess is programmers >very rarely write x==x (unless they MEAN to force x!), so the loss of >optimization doesn't matter. I imagine expressions like x==x could appear as the result of transformations in program specialisation, like specialising f x x where f x y = if x==y then Replacing x==x with True is of course still unsafe in Haskell, for the reasons already pointed out. Björn Lisper
Re: combinator parsers and XSLT
Wed, 27 Sep 2000 00:22:05 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]> pisze: > Hmm, this seems like a shortcoming in the Haskell spec. We have all > these isAlpha, isDigit, etc functions, but I can't get at a list of, > say, all characters for which isAlpha is true. You can: filter isAlpha ['\0'..'\x'] (don't use maxBound here because it's too large and we know that currently there are no isAlpha characters outside this range). Working on large explicit lists is inefficient. 45443 characters are isAlpha. A lexer should be designed to avoid using a full list. -- __("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK
RE: combinator parsers and XSLT
I think unicode is very important for xml document processing, which is my interest. > -Original Message- > From: Manuel M. T. Chakravarty [mailto:[EMAIL PROTECTED]] > Sent: Monday, September 25, 2000 9:11 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: RE: combinator parsers and XSLT > > > Doug Ransom <[EMAIL PROTECTED]> wrote, > > > > There is no need for "." or [^abc] as Haskell list operators > > > can be used to "simulate" them. The following is from the C > > > lexer and matches all visible characters and all characters > > > except newline, respectively: > > > > > > visible = alt [' '..'\127'] > > > anyButNL = alt (['\0'..'\255'] \\ ['\n']) > > > > > > That is true, but how about dealing with unicode characters? > > > > anyButNl = anyButNL = alt (['\0'..'\65536'] \\ ['\n']) > > > > The space required becomes excessive. > > True, but the current implementation would be hopeless for > unicode anyway, as it builds a table representing a > deterministic finite state automaton (DFA), where the worst > case size of the table is > >* > > In all practical cases, the required space is much smaller > as states with less than 20 characters having a non-error > transition store the state transitions in a list. > Furthermore, even in states with more than 20 characters > with a non-error transition, the size of the table is only > that of > > ord - ord + 1 > > (these are characters with non-error transitions). > > For 16bit character ranges, it would be necessary to > directly store negated character sets (such as [^abc]). > From what he told me, Doitse Swierstra is working on a lexer > that is using explicit ranges, but I am not sure whether he > also has negated ranges. > > Currently, most Haskell systems don't support unicode anyway > (I think, hbc is the only exception), so I guess this is not > a pressing issue. As soon as, we have unicode support and > there is a need for lexers handling unicode input, I am > willing to extend the lexer library to gracefully handle the > cases that you outlined. > > Cheers, > Manuel >
Re: How to force evaluation entirely?
"Ch. A. Herrmann" wrote: > Hi, > > John> There's an easier way to force structures hyperstrictly. To > John> force x to be evaluated to normal form before computing y, > John> write (x==x) `seq` y > > I'm heavily confused here. > > What happens, if > >(a) an optimizer replaces (x==x) by True? If an optimizer did that it would be severly broken in several ways. First, there is absolutelty no guarantee that the (==) operator defines anything that is a reflexive relation. E.g., I can (for a particular type) define it to always return False if I like. Second, even if (==) was defined to be reflexive it's highly likely that `x==x' would behave differently than `True'. The former probably diverges if x is bottom, whereas the latter doesn't. > >If the optimizer is not permitted to do that, >its power appears to be limited severely. Who said Haskell was easy to optimize? -- Lennart
Re: combinator parsers and XSLT
Lars Henrik Mathiesen <[EMAIL PROTECTED]> wrote, > > From: "Manuel M. T. Chakravarty" <[EMAIL PROTECTED]> > > Date: Tue, 26 Sep 2000 15:11:23 +1100 > > > For 16bit character ranges, it would be necessary to > > directly store negated character sets (such as [^abc]). > > >From what he told me, Doitse Swierstra is working on a lexer > > that is using explicit ranges, but I am not sure whether he > > also has negated ranges. > > People with experience from other Unicode-enabled environments will > expect support for character classes like letter or digit --- which in > Unicode are not simple single ranges, but widely scattered over the > map. (Just look at Latin-1, where you have to use [A-Za-zÀ-ÖØ-öø-ÿ] > because two arithmetic operators snuck into the accented character > range. (Blame the French)). > > Such support will also allow your parser to work with the next, bigger > version of Unicode, since the parser library should just inherit the > character class support from the Haskell runtime, which should in turn > get it from the OS. The OS people are already doing the work to get > the necessary tables and routines compressed into a few kilobytes. Hmm, this seems like a shortcoming in the Haskell spec. We have all these isAlpha, isDigit, etc functions, but I can't get at a list of, say, all characters for which isAlpha is true. > Also, Unicode isn't 16-bit any more, it's more like 20.1 bits --- the > range is hex 0 to 1f. Although the official character assignments > will stay below hex 2 or so, your code may have to work on systems > with private character assignments in the hex 10+ range. Ok, I didn't really mean that the mentioned extension will rely on Unicode being 16 bits. This is only a size, where you don't really want to build an exhaustive transition table anymore. Manuel
Re: How to force evaluation entirely?
Tue, 26 Sep 2000 13:29:39 +0200 (MET DST), John Hughes <[EMAIL PROTECTED]> pisze: > Of course, in principle, an optimizer *could* replace x==x by > x`seq`True (if x is known to be of base type), and the x`seq` might > well be removed by later transformations (if x can be shown to be > defined, something compilers do analyses to discover). Who knows, > maybe this happens in the innards of ghc... It gets optimized for Bool. It does not get optimized for Int, because (==) on Int is defined in terms of a primitive operation (==#) and the compiler does not know that x ==# x is always True. But the programmer can teach it :-) {-# RULES "x == x" forall (x :: Int). (==) x x = x `seq` True #-} Well, it's probably better to tell this on a lower level, because some other operations might use (==#) directly instead of through (==): import PrelGHC {-# RULES "x ==# x" forall (x :: Int#). (==#) x x = True #-} Here it is not necessary to use seq because Int# values are always evaluated. Now a function like f:: Int -> Int f x = if x == x then x else 10 compiles to exactly the same code as id. -- __("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK
Re: combinator parsers and XSLT
Lennart Augustsson <[EMAIL PROTECTED]> wrote, > "Manuel M. T. Chakravarty" wrote: > > > Currently, most Haskell systems don't support unicode anyway > > (I think, hbc is the only exception), so I guess this is not > > a pressing issue. As soon as, we have unicode support and > > there is a need for lexers handling unicode input, I am > > willing to extend the lexer library to gracefully handle the > > cases that you outlined. > I'm sorry, but I much object (strongly) towards this attitude. > It's this kind of reasoning that stops Unicode from becoming > widespread. I am tempted to agree with you. I am just a lazy bastard, that's the problem. > Soon the GHC people (or whoever :) will say "Well, why should we > support Unicode, there's all this software out there that breaks down > with it." and we're in a viscious circle. Hmmm, in this particular case nothing breaks down. The lexer combinators themselves never internally use the assumption that a char is 8bit (I may be lazy, but I still prefer clean code). Only when you explicily use them to build a scanner that does scan unicode files (and is aware of it), you might run into space efficiency problems. > Strong hint to various people: > Haskell has had Unicode for a long time now. I think that before > you start implementing various extensions to Haskell, perhaps you > should implement what the standard says should be there. > Implementing Unicode isn't that hard, just a few days work. You might be pleased to hear that - if I am not mistaken - Qrczak is working at Unicode support for ghc. > Strongly opposing Anglosaxan language imperialism :-) Manuel
Re: °ü°èÇü °Ë»ö¿£Áø Ŭ¸¯¾ÆÀÌÄÜÀÌ ÇùÁ¶¸¦ ºÎŹÇÕ´Ï´Ù.
Yesterday Lennart decided to fight against the Anglo-Saxon imperialism in the language. Today we got from Mr Click: > ¾È³çÇϽʴϱî? > Ŭ¸¯¾ÆÀÌÄÜ ¸¶ÄÉÀÌÆÃÆÀÀÇ Á¤ Çõ¿ì ÀÔ´Ï´Ù. > ±ÍÇÏ¿¡°Ô À̸ÞÀÏÀÌ ½ºÆÔÀÌ¾Æ´Ñ Á¤º¸ÀÇ Á¦°ø°ú °øÀ¯·Î½á ÀÎ½Ä etc. That's good. Muy dobrze. Jerzy Karczmarczuk Caen, Normandy (The place of departure of one William the Conqueror thanks to whom we speak English [[a particularly distorted version of French]], and not a specific Danish dialect popular in Britain before Edward the Confessor. Don't forget BTW that both Saxons and Anglons (Angles?) spoke some distorted German.)
Re: combinator parsers and XSLT
Tue, 26 Sep 2000 00:42:15 -0400, Lennart Augustsson <[EMAIL PROTECTED]> pisze: > Implementing Unicode isn't that hard, just a few days work. GHC in CVS already has Unicode Chars. Unfortunately designing libraries such that file contents are correctly translated between Unicode and the local charset by default, but with the ability to use other encodings when needed, is not that trivial... -- __("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTÊPCZA QRCZAK
RE: How to force evaluation entirely?
John> There's an easier way to force structures hyperstrictly. To John> force x to be evaluated to normal form before computing y, John> write (x==x) `seq` y I'm heavily confused here. What happens, if (a) an optimizer replaces (x==x) by True? If the optimizer is not permitted to do that, its power appears to be limited severely. (b) a run-time system thinks pointer equality implies value equality? Neither of these is permitted by Haskell's semantics. Firstly, == is just an overloaded operator and the programmer is free to define it however he likes (even stupidly). Secondly, even if the type of x is known -- say Int -- so that the code for == is known, the optimizer may not replace code by a more or less lazy `equivalent'. `Optimizing' x==x to True would change the meaning of the program when x is undefined; comparing pointers to x instead of evaluating x would equally be wrong. As far as the power of the optimizer is concerned, my guess is programmers very rarely write x==x (unless they MEAN to force x!), so the loss of optimization doesn't matter. Of course, in principle, an optimizer *could* replace x==x by x`seq`True (if x is known to be of base type), and the x`seq` might well be removed by later transformations (if x can be shown to be defined, something compilers do analyses to discover). Who knows, maybe this happens in the innards of ghc... John Hughes
RE: How to force evaluation entirely?
On Tue, 26 Sep 2000, Ch. A. Herrmann wrote: > John> write (x==x) `seq` y > I'm heavily confused here. > What happens, if > >(a) an optimizer replaces (x==x) by True? >If the optimizer is not permitted to do that, >its power appears to be limited severely. Remember that (==) in Haskell is just a method of the Eq class - it is not semantic equality. Nothing prevents users from defining their own instances for (==) and these instances need not satisfy any reasonable equality laws. Anyway, using (x==x) as a hyperstrict seq is a hack, a useful and mostly working hack, but not guaranteed to work in all cases. A more reliable hyperseq can be implemented as a class method for a new class with instances for all types you want to use it on. I believe this method and this class were the reasons for implementing DrIFT (search for the class NFData). http://www.dcs.gla.ac.uk/~nww/Derive/derivehome.html /Patrik Jansson
RE: How to force evaluation entirely?
Hi, John> There's an easier way to force structures hyperstrictly. To John> force x to be evaluated to normal form before computing y, John> write (x==x) `seq` y I'm heavily confused here. What happens, if (a) an optimizer replaces (x==x) by True? If the optimizer is not permitted to do that, its power appears to be limited severely. (b) a run-time system thinks pointer equality implies value equality? Are these issues officially documented for Haskell98, beyond the fact that "seq _|_ x = _|_" ? -- Christoph Herrmann E-mail: [EMAIL PROTECTED] WWW: http://brahms.fmi.uni-passau.de/cl/staff/herrmann.html
RE: How to force evaluation entirely?
Simon PJ says: Did you try "seq"? x `seq` y should evalute x to WHNF before returning y. If x is a pair you may need to say seqPair x `seq` y where seqPair (a,b) = a `seq` b in order to force the components. Simon There's an easier way to force structures hyperstrictly. To force x to be evaluated to normal form before computing y, write (x==x) `seq` y This depends on all the types occurring in x being Eq types, and also on the implementation of == being hyperstrict when its result is true. This holds for all derived instances, and for most programmer defined ones too. After all, if x==x holds for any non-total x, then x==y must hold for some pair of different values x and y, which we normally try to avoid! I sometimes write if x==x then y else error "I am the pope!" but the seq form is nicer! John Hughes | -Original Message- | From: Michael Marte [mailto:[EMAIL PROTECTED]] | | | | I am trying to process a huge bunch of large XML files in order | to extract some data. For each XML file, a small summary (6 integers) | is created which is kept until writing a HTML page displaying the | results. | | The ghc-compiled program behaves as expected: It opens one | XML file after | the other but does not read a lot. After some 50 files, it | bails out due | to lack of heap storage. | | To overcome the problem, I tried to force the program to | compute summaries | immediately after reading the corresponding XML file. I tried | some eager | application ($!), some irrefutable pattern, and some | strictness flags, but | I did not succeed.
Re: combinator parsers and XSLT
> From: "Manuel M. T. Chakravarty" <[EMAIL PROTECTED]> > Date: Tue, 26 Sep 2000 15:11:23 +1100 > For 16bit character ranges, it would be necessary to > directly store negated character sets (such as [^abc]). > >From what he told me, Doitse Swierstra is working on a lexer > that is using explicit ranges, but I am not sure whether he > also has negated ranges. People with experience from other Unicode-enabled environments will expect support for character classes like letter or digit --- which in Unicode are not simple single ranges, but widely scattered over the map. (Just look at Latin-1, where you have to use [A-Za-zÀ-ÖØ-öø-ÿ] because two arithmetic operators snuck into the accented character range. (Blame the French)). Such support will also allow your parser to work with the next, bigger version of Unicode, since the parser library should just inherit the character class support from the Haskell runtime, which should in turn get it from the OS. The OS people are already doing the work to get the necessary tables and routines compressed into a few kilobytes. Also, Unicode isn't 16-bit any more, it's more like 20.1 bits --- the range is hex 0 to 1f. Although the official character assignments will stay below hex 2 or so, your code may have to work on systems with private character assignments in the hex 10+ range. Lars Mathiesen (U of Copenhagen CS Dep) <[EMAIL PROTECTED]> (Humour NOT marked)
RE: How to force evaluation entirely?
Did you try "seq"? x `seq` y should evalute x to WHNF before returning y. If x is a pair you may need to say seqPair x `seq` y where seqPair (a,b) = a `seq` b in order to force the components. Simon | -Original Message- | From: Michael Marte [mailto:[EMAIL PROTECTED]] | Sent: 25 September 2000 19:16 | To: [EMAIL PROTECTED] | Subject: How to force evaluation entirely? | | | Hello, | | I am trying to process a huge bunch of large XML files in order | to extract some data. For each XML file, a small summary (6 integers) | is created which is kept until writing a HTML page displaying the | results. | | The ghc-compiled program behaves as expected: It opens one | XML file after | the other but does not read a lot. After some 50 files, it | bails out due | to lack of heap storage. | | To overcome the problem, I tried to force the program to | compute summaries | immediately after reading the corresponding XML file. I tried | some eager | application ($!), some irrefutable pattern, and some | strictness flags, but | I did not succeed. It seems to me that as long as something | is not really | used, the implementation cannot be forced to evaluate it completely. I | resorted to printing the summary to /dev/null right after | reading the XML | file. This works fine. Is there a more elegant solution? | | Thank you. | | Michael Marte | | | | | |