Re: How to force evaluation entirely?

2000-09-26 Thread Zhanyong Wan

Michael Marte wrote:

> (First I defined
> 
> class Eager a where
> eager :: a -> a
> 
> and some instances:
> 
> instance Eager Int where
> eager i = i `seq` i

For Int (and other base types), forcing to WHNF is as far as you can go,
so I think you can simplify the above to

  > instance Eager Int where
  > eager i = i

> instance Eager a => Eager [a] where
> eager [] = []
> eager (x : xs) = eager x `seq` eager xs `seq` (x : xs)

If you want eager to preserve identity (i.e. eager x `unsafePtrEq` x),
as well as to avoid duplicating data, you might prefer: 

  > instance Eager a => Eager [a] where
  > eager xs@[] = xs
  > eager xs@(x : xs') = eager x `seq` eager xs' `seq` xs


> instance Eager a => Eager (a, a) where
> eager (x, y) = eager x `seq` eager y `seq` (x, y)

I think the following is more general:

  > instance (Eager a, Eager b) => Eager (a, b) where
  > eager xy@(x, y) = eager x `seq` eager y `seq` xy

Ultimately, generic Haskell may come handy.

-- Zhanyong Wan
Dept of Computer Science, Yale University




Re: How to force evaluation entirely?

2000-09-26 Thread Michael Marte

Hello,

thank you for the discussion on eager evaluation.

For solving my memory problem I finally used the 

x == x `seq` return x

approach because it is easier to derive Eq classes automatically than to
write some sequentialisation functions.

(First I defined

class Eager a where
eager :: a -> a

and some instances:

instance Eager Int where
eager i = i `seq` i

instance Eager Char where
eager c = c `seq` c

instance Eager a => Eager [a] where
eager [] = []
eager (x : xs) = eager x `seq` eager xs `seq` (x : xs)

instance Eager a => Eager (a, a) where
eager (x, y) = eager x `seq` eager y `seq` (x, y)

(Is this correct?)

But I stopped here because its quite a lot of work to give all the
instances for the types involved. Furthermore, I was not sure how to cope
with record types.)

Michael










Re: How to force evaluation entirely?

2000-09-26 Thread Carl R. Witty

John Hughes <[EMAIL PROTECTED]> writes:

> As far as the power of the optimizer is concerned, my guess is programmers
> very rarely write x==x (unless they MEAN to force x!), so the loss of
> optimization doesn't matter. Of course, in principle, an optimizer *could*
> replace x==x by x`seq`True (if x is known to be of base type), and the x`seq`
> might well be removed by later transformations (if x can be shown to be
> defined, something compilers do analyses to discover). Who knows, maybe this
> happens in the innards of ghc...

Or the compiler could internally create its own HyperStrict class and
replace x==x by x`hyperSeq`True, if all the Eq instances involved in
the type of x are known to be reflexive (which is the case if they
were all automatically derived). :-)

Carl Witty




Re: How to force evaluation entirely?

2000-09-26 Thread Bjorn Lisper

>As far as the power of the optimizer is concerned, my guess is programmers
>very rarely write x==x (unless they MEAN to force x!), so the loss of
>optimization doesn't matter.

I imagine expressions like x==x could appear as the result of
transformations in program specialisation, like specialising

f x x

where

f x y = if x==y then 

Replacing x==x with True is of course still unsafe in Haskell, for the
reasons already pointed out.

Björn Lisper




Re: combinator parsers and XSLT

2000-09-26 Thread Marcin 'Qrczak' Kowalczyk

Wed, 27 Sep 2000 00:22:05 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]> pisze:

> Hmm, this seems like a shortcoming in the Haskell spec.  We have all
> these isAlpha, isDigit, etc functions, but I can't get at a list of,
> say, all characters for which isAlpha is true.

You can: filter isAlpha ['\0'..'\x']
(don't use maxBound here because it's too large and we know that
currently there are no isAlpha characters outside this range).

Working on large explicit lists is inefficient. 45443 characters
are isAlpha. A lexer should be designed to avoid using a full list.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTÊPCZA
QRCZAK





RE: combinator parsers and XSLT

2000-09-26 Thread Doug Ransom

I think unicode is very important for xml document processing, which  is my
interest.


> -Original Message-
> From: Manuel M. T. Chakravarty [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 25, 2000 9:11 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: RE: combinator parsers and XSLT 
> 
> 
> Doug Ransom <[EMAIL PROTECTED]> wrote,
> 
> > > There is no need for "." or [^abc] as Haskell list operators
> > > can be used to "simulate" them.  The following is from the C
> > > lexer and matches all visible characters and all characters
> > > except newline, respectively:
> > > 
> > >   visible  = alt [' '..'\127']
> > >   anyButNL = alt (['\0'..'\255'] \\ ['\n'])
> > 
> > 
> > That is true, but how about dealing with unicode characters?
> > 
> > anyButNl = anyButNL = alt (['\0'..'\65536'] \\ ['\n'])
> > 
> > The space required becomes excessive.
> 
> True, but the current implementation would be hopeless for
> unicode anyway, as it builds a table representing a
> deterministic finite state automaton (DFA), where the worst
> case size of the table is
> 
>* 
> 
> In all practical cases, the required space is much smaller
> as states with less than 20 characters having a non-error
> transition store the state transitions in a list.
> Furthermore, even in states with more than 20 characters
> with a non-error transition, the size of the table is only
> that of
> 
>   ord  - ord  + 1
> 
> (these are characters with non-error transitions).
> 
> For 16bit character ranges, it would be necessary to
> directly store negated character sets (such as [^abc]).
> From what he told me, Doitse Swierstra is working on a lexer
> that is using explicit ranges, but I am not sure whether he
> also has negated ranges.
> 
> Currently, most Haskell systems don't support unicode anyway
> (I think, hbc is the only exception), so I guess this is not
> a pressing issue.  As soon as, we have unicode support and
> there is a need for lexers handling unicode input, I am
> willing to extend the lexer library to gracefully handle the
> cases that you outlined.
> 
> Cheers,
> Manuel
> 




Re: How to force evaluation entirely?

2000-09-26 Thread Lennart Augustsson

"Ch. A. Herrmann" wrote:

> Hi,
>
> John> There's an easier way to force structures hyperstrictly. To
> John> force x to be evaluated to normal form before computing y,
> John> write (x==x) `seq` y
>
> I'm heavily confused here.
>
> What happens, if
>
>(a) an optimizer replaces (x==x) by True?

If an optimizer did that it would be severly broken in several ways.
First, there is absolutelty no guarantee that the (==) operator defines
anything that is a reflexive relation.  E.g., I can (for a particular type) define
it to always return False if I like.
Second, even if (==) was defined to be reflexive it's highly likely that
`x==x' would behave differently than `True'.  The former probably
diverges if x is bottom, whereas the latter doesn't.


>
>If the optimizer is not permitted to do that,
>its power appears to be limited severely.

Who said Haskell was easy to optimize?

-- Lennart






Re: combinator parsers and XSLT

2000-09-26 Thread Manuel M. T. Chakravarty

Lars Henrik Mathiesen <[EMAIL PROTECTED]> wrote,

> > From: "Manuel M. T. Chakravarty" <[EMAIL PROTECTED]>
> > Date: Tue, 26 Sep 2000 15:11:23 +1100
> 
> > For 16bit character ranges, it would be necessary to
> > directly store negated character sets (such as [^abc]).
> > >From what he told me, Doitse Swierstra is working on a lexer
> > that is using explicit ranges, but I am not sure whether he
> > also has negated ranges.
> 
> People with experience from other Unicode-enabled environments will
> expect support for character classes like letter or digit --- which in
> Unicode are not simple single ranges, but widely scattered over the
> map. (Just look at Latin-1, where you have to use [A-Za-zÀ-ÖØ-öø-ÿ]
> because two arithmetic operators snuck into the accented character
> range. (Blame the French)).
> 
> Such support will also allow your parser to work with the next, bigger
> version of Unicode, since the parser library should just inherit the
> character class support from the Haskell runtime, which should in turn
> get it from the OS. The OS people are already doing the work to get
> the necessary tables and routines compressed into a few kilobytes.

Hmm, this seems like a shortcoming in the Haskell spec.  We
have all these isAlpha, isDigit, etc functions, but I can't
get at a list of, say, all characters for which isAlpha is
true. 

> Also, Unicode isn't 16-bit any more, it's more like 20.1 bits --- the
> range is hex 0 to 1f. Although the official character assignments
> will stay below hex 2 or so, your code may have to work on systems
> with private character assignments in the hex 10+ range.

Ok, I didn't really mean that the mentioned extension will
rely on Unicode being 16 bits.  This is only a size, where
you don't really want to build an exhaustive transition
table anymore.

Manuel




Re: How to force evaluation entirely?

2000-09-26 Thread Marcin 'Qrczak' Kowalczyk

Tue, 26 Sep 2000 13:29:39 +0200 (MET DST), John Hughes <[EMAIL PROTECTED]> pisze:

> Of course, in principle, an optimizer *could* replace x==x by
> x`seq`True (if x is known to be of base type), and the x`seq` might
> well be removed by later transformations (if x can be shown to be
> defined, something compilers do analyses to discover). Who knows,
> maybe this happens in the innards of ghc...

It gets optimized for Bool. It does not get optimized for Int, because
(==) on Int is defined in terms of a primitive operation (==#) and
the compiler does not know that x ==# x is always True.

But the programmer can teach it :-)

{-# RULES
"x == x" forall (x :: Int). (==) x x = x `seq` True
#-}

Well, it's probably better to tell this on a lower level, because
some other operations might use (==#) directly instead of through (==):

import PrelGHC
{-# RULES
"x ==# x" forall (x :: Int#). (==#) x x = True
#-}

Here it is not necessary to use seq because Int# values are always
evaluated.

Now a function like

f:: Int -> Int
f x = if x == x then x else 10

compiles to exactly the same code as id.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTÊPCZA
QRCZAK





Re: combinator parsers and XSLT

2000-09-26 Thread Manuel M. T. Chakravarty

Lennart Augustsson <[EMAIL PROTECTED]> wrote,

> "Manuel M. T. Chakravarty" wrote:
> 
> > Currently, most Haskell systems don't support unicode anyway
> > (I think, hbc is the only exception), so I guess this is not
> > a pressing issue.  As soon as, we have unicode support and
> > there is a need for lexers handling unicode input, I am
> > willing to extend the lexer library to gracefully handle the
> > cases that you outlined.
> I'm sorry, but I much object (strongly) towards this attitude.
> It's this kind of reasoning that stops Unicode from becoming
> widespread.

I am tempted to agree with you.  I am just a lazy bastard,
that's the problem.

> Soon the GHC people (or whoever :) will say "Well, why should we
> support Unicode, there's all this software out there that breaks down
> with it." and we're in a viscious circle.

Hmmm, in this particular case nothing breaks down.  The
lexer combinators themselves never internally use the
assumption that a char is 8bit (I may be lazy, but I still
prefer clean code).  Only when you explicily use them to
build a scanner that does scan unicode files (and is aware
of it), you might run into space efficiency problems.

> Strong hint to various people:
> Haskell has had Unicode for a long time now.  I think that before
> you start implementing various extensions to Haskell, perhaps you
> should implement what the standard says should be there.
> Implementing Unicode isn't that hard, just a few days work.

You might be pleased to hear that - if I am not mistaken -
Qrczak is working at Unicode support for ghc.

> Strongly opposing Anglosaxan language imperialism

:-)

Manuel




Re: °ü°èÇü °Ë»ö¿£Áø Ŭ¸¯¾ÆÀÌÄÜÀÌ ÇùÁ¶¸¦ ºÎŹÇÕ´Ï´Ù.

2000-09-26 Thread Jerzy Karczmarczuk

Yesterday Lennart decided to fight against the
Anglo-Saxon imperialism in the language. Today
we got from Mr Click:

>   ¾È³çÇϽʴϱî?
>   Ŭ¸¯¾ÆÀÌÄÜ ¸¶ÄÉÀÌÆÃÆÀÀÇ Á¤ Çõ¿ì ÀÔ´Ï´Ù.
>   ±ÍÇÏ¿¡°Ô À̸ÞÀÏÀÌ ½ºÆÔÀÌ¾Æ´Ñ Á¤º¸ÀÇ Á¦°ø°ú °øÀ¯·Î½á ÀνÄ

etc.

That's good. Muy dobrze.

Jerzy Karczmarczuk
Caen, Normandy

(The place of departure of one William the Conqueror thanks to
whom we speak English [[a particularly distorted version of
French]], and not a specific Danish dialect popular in Britain
before Edward the Confessor.
Don't forget BTW that both Saxons and Anglons (Angles?)
spoke some distorted German.)




Re: combinator parsers and XSLT

2000-09-26 Thread Marcin 'Qrczak' Kowalczyk

Tue, 26 Sep 2000 00:42:15 -0400, Lennart Augustsson <[EMAIL PROTECTED]> 
pisze:

> Implementing Unicode isn't that hard, just a few days work.

GHC in CVS already has Unicode Chars. Unfortunately designing
libraries such that file contents are correctly translated between
Unicode and the local charset by default, but with the ability to
use other encodings when needed, is not that trivial...

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTÊPCZA
QRCZAK





RE: How to force evaluation entirely?

2000-09-26 Thread John Hughes


John> There's an easier way to force structures hyperstrictly. To
John> force x to be evaluated to normal form before computing y,
John> write (x==x) `seq` y 

I'm heavily confused here.

What happens, if 

   (a) an optimizer replaces (x==x) by True?
   If the optimizer is not permitted to do that,
   its power appears to be limited severely.

   (b) a run-time system thinks pointer equality implies value equality?

Neither of these is permitted by Haskell's semantics. Firstly, == is just an
overloaded operator and the programmer is free to define it however he likes
(even stupidly). Secondly, even if the type of x is known -- say Int -- so
that the code for == is known, the optimizer may not replace code by a more or
less lazy `equivalent'. `Optimizing' x==x to True would change the meaning of
the program when x is undefined; comparing pointers to x instead of evaluating
x would equally be wrong.

As far as the power of the optimizer is concerned, my guess is programmers
very rarely write x==x (unless they MEAN to force x!), so the loss of
optimization doesn't matter. Of course, in principle, an optimizer *could*
replace x==x by x`seq`True (if x is known to be of base type), and the x`seq`
might well be removed by later transformations (if x can be shown to be
defined, something compilers do analyses to discover). Who knows, maybe this
happens in the innards of ghc...

John Hughes




RE: How to force evaluation entirely?

2000-09-26 Thread Patrik Jansson

On Tue, 26 Sep 2000, Ch. A. Herrmann wrote:
> John> write (x==x) `seq` y 
> I'm heavily confused here.
> What happens, if 
> 
>(a) an optimizer replaces (x==x) by True?
>If the optimizer is not permitted to do that,
>its power appears to be limited severely.

Remember that (==) in Haskell is just a method of the Eq class - it is not
semantic equality. Nothing prevents users from defining their own
instances for (==) and these instances need not satisfy any reasonable
equality laws. 

Anyway, using (x==x) as a hyperstrict seq is a hack, a useful and mostly
working hack, but not guaranteed to work in all cases. A more reliable
hyperseq can be implemented as a class method for a new class with
instances for all types you want to use it on. I believe this method and
this class were the reasons for implementing DrIFT (search for the class
NFData).

  http://www.dcs.gla.ac.uk/~nww/Derive/derivehome.html

/Patrik Jansson






RE: How to force evaluation entirely?

2000-09-26 Thread Ch. A. Herrmann

Hi,

John> There's an easier way to force structures hyperstrictly. To
John> force x to be evaluated to normal form before computing y,
John> write (x==x) `seq` y 

I'm heavily confused here.

What happens, if 

   (a) an optimizer replaces (x==x) by True?
   If the optimizer is not permitted to do that,
   its power appears to be limited severely.

   (b) a run-time system thinks pointer equality implies value equality?

Are these issues officially documented for Haskell98,
beyond the fact that "seq _|_ x = _|_" ?
-- 
 Christoph Herrmann
 E-mail:  [EMAIL PROTECTED]
 WWW: http://brahms.fmi.uni-passau.de/cl/staff/herrmann.html




RE: How to force evaluation entirely?

2000-09-26 Thread John Hughes


Simon PJ says:

Did you try "seq"?  
x `seq` y
should evalute x to WHNF before returning y.  If x is a pair
you may need to say

seqPair x `seq` y

where
seqPair (a,b) = a `seq` b

in order to force the components.

Simon

There's an easier way to force structures hyperstrictly. To force x to be
evaluated to normal form before computing y, write
(x==x) `seq` y
This depends on all the types occurring in x being Eq types, and also on 
the implementation of == being hyperstrict when its result is true. This holds
for all derived instances, and for most programmer defined ones too. After
all, if x==x holds for any non-total x, then x==y must hold for some pair of
different values x and y, which we normally try to avoid!

I sometimes write
if x==x then y else error "I am the pope!"
but the seq form is nicer!

John Hughes

| -Original Message-
| From: Michael Marte [mailto:[EMAIL PROTECTED]]
| 
| 
| 
| I am trying to process a huge bunch of large XML files in order
| to extract some data. For each XML file, a small summary (6 integers)
| is created which is kept until writing a HTML page displaying the
| results.
| 
| The ghc-compiled program behaves as expected: It opens one 
| XML file after
| the other but does not read a lot. After some 50 files, it 
| bails out due
| to lack of heap storage.
| 
| To overcome the problem, I tried to force the program to 
| compute summaries
| immediately after reading the corresponding XML file. I tried 
| some eager
| application ($!), some irrefutable pattern, and some 
| strictness flags, but
| I did not succeed. 




Re: combinator parsers and XSLT

2000-09-26 Thread Lars Henrik Mathiesen

> From: "Manuel M. T. Chakravarty" <[EMAIL PROTECTED]>
> Date: Tue, 26 Sep 2000 15:11:23 +1100

> For 16bit character ranges, it would be necessary to
> directly store negated character sets (such as [^abc]).
> >From what he told me, Doitse Swierstra is working on a lexer
> that is using explicit ranges, but I am not sure whether he
> also has negated ranges.

People with experience from other Unicode-enabled environments will
expect support for character classes like letter or digit --- which in
Unicode are not simple single ranges, but widely scattered over the
map. (Just look at Latin-1, where you have to use [A-Za-zÀ-ÖØ-öø-ÿ]
because two arithmetic operators snuck into the accented character
range. (Blame the French)).

Such support will also allow your parser to work with the next, bigger
version of Unicode, since the parser library should just inherit the
character class support from the Haskell runtime, which should in turn
get it from the OS. The OS people are already doing the work to get
the necessary tables and routines compressed into a few kilobytes.

Also, Unicode isn't 16-bit any more, it's more like 20.1 bits --- the
range is hex 0 to 1f. Although the official character assignments
will stay below hex 2 or so, your code may have to work on systems
with private character assignments in the hex 10+ range.

Lars Mathiesen (U of Copenhagen CS Dep) <[EMAIL PROTECTED]> (Humour NOT marked)




RE: How to force evaluation entirely?

2000-09-26 Thread Simon Peyton-Jones

Did you try "seq"?  
x `seq` y
should evalute x to WHNF before returning y.  If x is a pair
you may need to say

seqPair x `seq` y

where
seqPair (a,b) = a `seq` b

in order to force the components.

Simon

| -Original Message-
| From: Michael Marte [mailto:[EMAIL PROTECTED]]
| Sent: 25 September 2000 19:16
| To: [EMAIL PROTECTED]
| Subject: How to force evaluation entirely?
| 
| 
| Hello,
| 
| I am trying to process a huge bunch of large XML files in order
| to extract some data. For each XML file, a small summary (6 integers)
| is created which is kept until writing a HTML page displaying the
| results.
| 
| The ghc-compiled program behaves as expected: It opens one 
| XML file after
| the other but does not read a lot. After some 50 files, it 
| bails out due
| to lack of heap storage.
| 
| To overcome the problem, I tried to force the program to 
| compute summaries
| immediately after reading the corresponding XML file. I tried 
| some eager
| application ($!), some irrefutable pattern, and some 
| strictness flags, but
| I did not succeed. It seems to me that as long as something 
| is not really
| used, the implementation cannot be forced to evaluate it completely. I
| resorted to printing the summary to /dev/null right after 
| reading the XML
| file. This works fine. Is there a more elegant solution?
| 
| Thank you.
| 
| Michael Marte
| 
| 
| 
| 
| 
|