Re: laziness in `length'

2010-06-15 Thread Denys Rtveliashvili
Hi Daniel,

Thank you very much for the explanation of this issue.

While I understand the parts about rewrite rules and the big thunk, it
is still not clear why it is the way it is.

Please could you explain which Nums are not strict? The ones I am aware
about are all strict.

Also, why doesn't it require building the full thunk for non-strict
Nums? Even if they are not strict, an addition requires both parts to be
evaluated. This means the thunk will have to be pre-built, doesn't it?

With kind regards,
Denys


 On Monday 14 June 2010 16:25:06, Serge D. Mechveliani wrote:
  Dear people and GHC team,
 
  I have a naive question about the compiler and library of  ghc-6.12.3.
  Consider the program
 
import List (genericLength)
main = putStr $ shows (genericLength [1 .. n]) \n
   where
   n = -- 10^6, 10^7, 10^8 ...
 
  (1) When it is compiled under  -O,  it runs in a small constant space
  in  n  and in a time approximately proportional to  n.
  (2) When it is compiled without -O,  it takes at the run-time the
  stack proportional to  n,  and it takes enormousely large time
  for  n = 10^7.
  (3) In the interpreter mode  ghci,   `genericLength [1 .. n]'
  takes as much resource as (2).
 
  Are the points (2) and (3) natural for an Haskell implementation?
 
  Independently on whether  lng  is inlined or not, its lazy evaluation
  is, probably, like this:
   lng [1 .. n] =
   lng (1 : (list 2 n)) =  1 + (lng $ list 2 n) =
   1 + (lng (2: (list 3 n))) = 1 + 1 + (lng $ list 3 n) =
   2 + (lng (3: (list 4 n)))   -- because this + is of Integer
   = 2 + 1 + (lng $ list 4 n) =
   3 + (lng $ list 4 n)
   ...
  And this takes a small constant space.
 
 Unfortunately, it would be
 
 lng [1 .. n]
 ~ 1 + (lng [2 .. n])
 ~ 1 + (1 + (lng [3 .. n]))
 ~ 1 + (1 + (1 + (lng [4 .. n])))
 ~
 
 and that builds a thunk of size O(n).
 
 The thing is, genericLength is written so that for lazy number types, the 
 construction of the result can begin before the entire list has been 
 traversed. This means however, that for strict number types, like Int or 
 Integer, it is woefully inefficient.
 
 In the code above, the result type of generic length (and the type of list 
 elements) is defaulted to Integer.
 When you compile with optimisations, a rewrite-rule fires:
 
 -- | The 'genericLength' function is an overloaded version of 'length'.  In
 -- particular, instead of returning an 'Int', it returns any type which is
 -- an instance of 'Num'.  It is, however, less efficient than 'length'.
 genericLength   :: (Num i) = [b] - i
 genericLength []=  0
 genericLength (_:l) =  1 + genericLength l
 
 {-# RULES
   genericLengthInt genericLength = (strictGenericLength :: [a] - 
 Int);
   genericLengthInteger genericLength = (strictGenericLength :: [a] - 
 Integer);
  #-}
 
 strictGenericLength :: (Num i) = [b] - i
 strictGenericLength l   =  gl l 0
   where
 gl [] a = a
 gl (_:xs) a = let a' = a + 1 in a' `seq` gl xs a'
 
 which gives a reasonabley efficient constant space calculation.
 
 Without optimisations and in ghci, you get the generic code, which is slow 
 and thakes O(n) space.
 
  Thank you in advance for your explanation,
 
  -
  Serge Mechveliani
  mech...@botik.ru
 
 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: laziness in `length'

2010-06-15 Thread Daniel Fischer
On Tuesday 15 June 2010 16:52:04, Denys Rtveliashvili wrote:
 Hi Daniel,

 Thank you very much for the explanation of this issue.

 While I understand the parts about rewrite rules and the big thunk, it
 is still not clear why it is the way it is.

 Please could you explain which Nums are not strict? The ones I am aware
 about are all strict.

There are several implementations of lazy (to different degrees) Peano 
numbers on hackage.
The point is that it's possible to have lazy Num types, and the decision 
was apparently to write genericLength so that lazy Num types may profit 
from it.
Arguably, one should have lazyGenericLength for lazy number types and 
strictGenericLength for strict number types (Integer, Int64, Word, Word64, 
...).
On the other hand, fromIntegral . length works fine in practice (calling 
length on a list exceeding the Int range would be doubtful on 32-bit 
systems and plain madness on 64-bit systems).


 Also, why doesn't it require building the full thunk for non-strict
 Nums? Even if they are not strict, an addition requires both parts to be
 evaluated.

Not necessarily for lazy numbers.

 This means the thunk will have to be pre-built, doesn't it?

For illustration, the very simple-minded lazy Peano numbers:

data Peano
= Zero
| Succ Peano
  deriving (Show, Eq)

instance Ord Peano where
compare Zero Zero = EQ
compare Zero _= LT
compare _Zero = GT
compare (Succ m) (Succ n) = compare m n
min Zero _ = Zero
min _ Zero = Zero
min (Succ m) (Succ n) = Succ (min m n)
max Zero n = n
max m Zero = m
max (Succ m) (Succ n) = Succ (max m n)

instance Num Peano where
Zero + n = n
(Succ m) + n = Succ (m + n)
-- omitted other methods due to laziness (mine, not Haskell's)
fromInteger n
| n  0 = error Peano.fromInteger: negative argument
| n == 0 = Zero
| otherwise = Succ (fromInteger (n-1))

one, two, three, four :: Peano
one = Succ Zero
two = Succ one
three = Succ two
four = Succ three

min two (genericLength [1 .. ])
~ min (Succ one) (genericLength [1 .. ])
~ min (Succ one) (1 + (genericLength [2 .. ]))
~ min (Succ one) ((Succ Zero) + (genericLength [2 .. ]))
~ min (Succ one) (Succ (Zero + (genericLength [2 .. ])))
~ Succ (min one (Zero + (genericLength [2 .. ])))
~ Succ (min (Succ Zero) (Zero + (genericLength [2 .. ])))
~ Succ (min (Succ Zero) (genericLength [2 .. ]))
~ Succ (min (Succ Zero) (1 + (genericLength [3 .. ])))
~ Succ (min (Succ Zero) ((Succ Zero) + (genericLength [3 ..])))
~ Succ (min (Succ Zero) (Succ (Zero + (genericLength [3 .. ]
~ Succ (Succ (min Zero (Zero + (genericLength [3 .. ]
~ Succ (Succ Zero)


 With kind regards,
 Denys
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: laziness in `length'

2010-06-15 Thread Roman Beslik

On 14.06.10 17:25, Serge D. Mechveliani wrote:

  lng [1 .. n] =
  lng (1 : (list 2 n)) =  1 + (lng $ list 2 n) =
  1 + (lng (2: (list 3 n))) = 1 + 1 + (lng $ list 3 n) = {- !!! -}
  2 + (lng (3: (list 4 n)))   -- because this + is of Integer
  = 2 + 1 + (lng $ list 4 n) = {- !!! -}
  3 + (lng $ list 4 n)
   
Actually matters are more complicated. In the highlighted steps you 
implicitly used associativity of (+). Of course, Haskell can not do 
this. Also 'lng' and 'genericLength' *are not tail recursive*. This 
explains stack overflow. If you compute length with 'foldl' 
(tail-recursively) and without -O flag, than you will see excessive 
heap usage. Also, GHC's 'length' and 'foldl'' are tail recursive and 
eagerly computes length/accumulator, so they are effective without -O 
flag. See for explanation

http://www.haskell.org/haskellwiki/Stack_overflow

--
Best regards,
  Roman Beslik.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


laziness in `length'

2010-06-14 Thread Serge D. Mechveliani
Dear people and GHC team,

I have a naive question about the compiler and library of  ghc-6.12.3.
Consider the program

  import List (genericLength)
  main = putStr $ shows (genericLength [1 .. n]) \n
 where
 n = -- 10^6, 10^7, 10^8 ... 
  
(1) When it is compiled under  -O,  it runs in a small constant space
in  n  and in a time approximately proportional to  n.
(2) When it is compiled without -O,  it takes at the run-time the 
stack proportional to  n,  and it takes enormousely large time 
for  n = 10^7.
(3) In the interpreter mode  ghci,   `genericLength [1 .. n]'
takes as much resource as (2). 

Are the points (2) and (3) natural for an Haskell implementation?

Independently on whether  lng  is inlined or not, its lazy evaluation
is, probably, like this:
 lng [1 .. n] = 
 lng (1 : (list 2 n)) =  1 + (lng $ list 2 n) = 
 1 + (lng (2: (list 3 n))) = 1 + 1 + (lng $ list 3 n) =
 2 + (lng (3: (list 4 n)))   -- because this + is of Integer
 = 2 + 1 + (lng $ list 4 n) =
 3 + (lng $ list 4 n)
 ...
And this takes a small constant space.
Thank you in advance for your explanation,

-
Serge Mechveliani
mech...@botik.ru
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: laziness in `length'

2010-06-14 Thread Daniel Fischer
On Monday 14 June 2010 16:25:06, Serge D. Mechveliani wrote:
 Dear people and GHC team,

 I have a naive question about the compiler and library of  ghc-6.12.3.
 Consider the program

   import List (genericLength)
   main = putStr $ shows (genericLength [1 .. n]) \n
  where
  n = -- 10^6, 10^7, 10^8 ...

 (1) When it is compiled under  -O,  it runs in a small constant space
 in  n  and in a time approximately proportional to  n.
 (2) When it is compiled without -O,  it takes at the run-time the
 stack proportional to  n,  and it takes enormousely large time
 for  n = 10^7.
 (3) In the interpreter mode  ghci,   `genericLength [1 .. n]'
 takes as much resource as (2).

 Are the points (2) and (3) natural for an Haskell implementation?

 Independently on whether  lng  is inlined or not, its lazy evaluation
 is, probably, like this:
  lng [1 .. n] =
  lng (1 : (list 2 n)) =  1 + (lng $ list 2 n) =
  1 + (lng (2: (list 3 n))) = 1 + 1 + (lng $ list 3 n) =
  2 + (lng (3: (list 4 n)))   -- because this + is of Integer
  = 2 + 1 + (lng $ list 4 n) =
  3 + (lng $ list 4 n)
  ...
 And this takes a small constant space.

Unfortunately, it would be

lng [1 .. n]
~ 1 + (lng [2 .. n])
~ 1 + (1 + (lng [3 .. n]))
~ 1 + (1 + (1 + (lng [4 .. n])))
~

and that builds a thunk of size O(n).

The thing is, genericLength is written so that for lazy number types, the 
construction of the result can begin before the entire list has been 
traversed. This means however, that for strict number types, like Int or 
Integer, it is woefully inefficient.

In the code above, the result type of generic length (and the type of list 
elements) is defaulted to Integer.
When you compile with optimisations, a rewrite-rule fires:

-- | The 'genericLength' function is an overloaded version of 'length'.  In
-- particular, instead of returning an 'Int', it returns any type which is
-- an instance of 'Num'.  It is, however, less efficient than 'length'.
genericLength   :: (Num i) = [b] - i
genericLength []=  0
genericLength (_:l) =  1 + genericLength l

{-# RULES
  genericLengthInt genericLength = (strictGenericLength :: [a] - 
Int);
  genericLengthInteger genericLength = (strictGenericLength :: [a] - 
Integer);
 #-}

strictGenericLength :: (Num i) = [b] - i
strictGenericLength l   =  gl l 0
  where
gl [] a = a
gl (_:xs) a = let a' = a + 1 in a' `seq` gl xs a'

which gives a reasonabley efficient constant space calculation.

Without optimisations and in ghci, you get the generic code, which is slow 
and thakes O(n) space.

 Thank you in advance for your explanation,

 -
 Serge Mechveliani
 mech...@botik.ru

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users