[Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread oleg

Lennart Augustsson wrote

 main = do
name:_ - getArgs
file - readFile name
print $ length $ lines file

Given the stance against top-level mutable variables, I have not
expected to see this Lazy IO code. After all, what could be more against
the spirit of Haskell than a `pure' function with observable side
effects. With Lazy IO, one indeed has to choose between correctness
and performance. The appearance of such code is especially strange
after the evidence of deadlocks with Lazy IO, presented on this list
less than a month ago. Let alone unpredictable resource usage and
reliance on finalizers to close files (forgetting that GHC does not
guarantee that finalizers will be run at all).

Is there an alternative?

-- Counting the lines in a file
import IterateeM

count_nl = liftI $ IE_cont (step 0)
 where
 step acc (Chunk str)  = liftI $ IE_cont (step $! acc + count str)
 step acc stream   = liftI $ IE_done acc stream
 count [] = 0
 count ('\n':str) = succ $! count str
 count (_:str) = count str

main = do
   name:_ - getArgs
   IE_done counter _ - unIM $ enum_file name . enum_eof == count_nl
   print counter


The function count_nl could have been in the library, but I'm a
minimalist. It is written in a declarative rather than imperative
style, and one easily sees what it does. The above code as well as the
IterateeM library is Haskell98. It does not use any unsafe Haskell
functions whatsoever.

time wc -l /usr/share/dict/words 
  235882 /usr/share/dict/words

real0m0.024s
user0m0.022s
sys 0m0.000s

time ~/Docs/papers/DEFUN08/Wc /usr/share/dict/words 
235882

real0m0.141s
user0m0.126s
sys 0m0.008s

To compare with lazy IO, the code using readFile gives

time ~/Docs/papers/DEFUN08/Wc /usr/share/dict/words 
235882

real0m0.297s
user0m0.262s
sys 0m0.023s

So, choosing correctness does not mean losing in performance; in fact,
one may even gain.

Can enumerators compose? Well, we already seen the example above
(enum_file name . enum_eof)
where the operation (.) 
e1 . e2 = (==) e2 . e1
is a flipped composition if monadic bind were considered a flipped
application.


Here is a more interesting example: count words in all the files whose
names are given on the command line. There may be many files given,
thousands of them.

-- Count the stream. Again, could have been in the library
stream_count :: Monad m = IterateeGM el m Int
stream_count = liftI $ IE_cont (step 0)
 where
 step acc (Chunk [])  = liftI $ IE_cont (step acc)
 step acc (Chunk [_]) = liftI $ IE_cont (step $! succ acc)
 step acc (Chunk ls)  = liftI $ IE_cont (step $! acc + length ls)
 step acc stream  = liftI $ IE_done acc stream


main = do
   names - getArgs
   let enumerators = foldr (\name - (enum_file name .)) enum_eof names
   IE_done (IE_done counter _) _ - unIM $ enumerators ==
 enum_words stream_count
   print counter

We notice that the composition of enumerators corresponds to the
`concatenation' of their sources. Declaratively, the meaning of the
above code is:
-- all the given files are concatenated
-- the resulting stream of characters is converted to a stream
of words
-- the stream of words is counted.

Operationally, the code does not open more than one file at a
time. More importantly, the code *never* reads more than 4096
characters at a time. A block of the file is read, split into words,
counted, and only then another chunk is read. After one file is done,
it is closed, and another file is processed. One can see that only one
file is being opened at a time by enabling traces. The processing is
fully incremental.


/usr/local/share/doc/ghc6 find . -name \*.html -print | time xargs 
~/Docs/papers/DEFUN08/Wc
3043421
   16.99 real15.83 user 0.71 sys

BTW, the program has counted words in 1169 files.

It is interesting to compare the above main function with the
corresponding lazy IO:

main'' = do
   names - getArgs
   files - mapM readFile names
   print $ length $ words (concat files)

The number of lines is comparable. The execution is not. If we try to
run the lazy IO code, we get:

/usr/local/share/doc/ghc6 find . -name \*.html -print | time xargs 
~/Docs/papers/DEFUN08/Wc
Wc: ./libraries/Win32/Graphics-Win32-GDI-Path.html: 
   openFile: resource exhausted (Too many open files)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread Don Stewart
oleg:
 Given the stance against top-level mutable variables, I have not
 expected to see this Lazy IO code. After all, what could be more against
 the spirit of Haskell than a `pure' function with observable side
 effects. With Lazy IO, one indeed has to choose between correctness
 and performance. The appearance of such code is especially strange
 after the evidence of deadlocks with Lazy IO, presented on this list
 less than a month ago. Let alone unpredictable resource usage and
 reliance on finalizers to close files (forgetting that GHC does not
 guarantee that finalizers will be run at all).
 
 Is there an alternative?

Hi Oleg!

I'm glad you joined the thread at this point.

Some background: our best solutions for this problem using lazy IO, are
based on chunk-wise lazy data structures, typically lazy bytestrings.
Often we'll write programs like:

import qualified Data.ByteString.Lazy.Char8 as B
import System.Environment

main = do
[f] - getArgs
s   - B.readFile f
print (B.count '\n' s)

Which are nicely efficient

$ ghc -O2 A.hs --make
$ du -hs data
100M data

$ time ./A data 
11078540
./A data  0.17s user 0.04s system 100% cpu 0.210 total

And we know from elsewhere the performance is highlycompetitive:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcollang=all

Now, enumerators are very promising, and there's a lot of interest at
the moment, (e.g. just this week, Johan Tibell gave an inspiring talk at
Galois about this approach to IO, 
http://www.galois.com/blog/2008/09/12/left-fold-enumerators-a-safe-expressive-and-efficient-io-interface-for-haskell/
and we spent the day sketching out an enumerator bytestring design, 

But there are some open questions. Perhaps you have some answers?

* Can we write a Data.ByteString.Enumerator that has matching or
  better performance than its dual, the existing chunk-wise lazy
  stream type?

* Is there a translation from 

data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString

  and functions on this type,

foldlChunks :: (a - S.ByteString - a) - a - ByteString - a
foldlChunks f z = go z
  where
go !a Empty= a
go !a (Chunk c cs) = go (f a c) cs

  to an enumerator implementation?

* Can we compose enumerators as we can stream functions?

* Can we do fusion on enumerators? Does that make composition easier?
(Indeed, is there an encoding of enumerators analogous to stream
fusion control?)

Any thoughts?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Library design question

2008-09-19 Thread apfelmus
Andre Nathan wrote:
 I'm trying to write a simple graph library for a project of mine

There's also Martin Erwig's functional graph library in  Data.Graph.Inductive  (
fgl  on hackage).


Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Library design question

2008-09-19 Thread Christian Maeder
Duncan Coutts wrote:
 On Thu, 2008-09-18 at 15:43 -0300, Andre Nathan wrote:
 
 My Graph type is the following.

   data Graph a b = Graph
 { adjacencies :: Map Int (a, (Map Int b))
 , numVertices :: Int
 , numEdges:: Int
 }
 
   addVertex :: Int - a - State (Graph a b) ()
   addVertex vertex label = do
 g - get
 let adj = Map.insert vertex (label, Map.empty) (adjacencies g)
 put $ g { adjacencies = adj, numVertices = numVertices g + 1 }
 
 So I'm confused right now about how I should proceed. Any hints
 regarding that?
 
 To be honest I would not bother with the state monad and just make them
 pure operations returning new graphs:
 
 addVertex :: Int - a - Graph a b - Graph a b
 
 It's very common to have this style, ie returning a new/updated
 structure explicitly rather than implicitly in a state monad. Just look
 at the Data.Map api for example. If you later want to stick it in a
 state monad then that would be straightforward but also easy to use
 directly.

I agree. Duncan's version also looks more atomic to me, because Andre's
version (that's renamed to addVertexM below) could be easily derived by:

  addVertexM v l = modify (addVertex v l)

The opposite derivation is also possible but does additional wrapping
into and out of a state:

  addVertex v l = execState (addVertexM v l)

(Furthermore the module Control.Monad.State is non-portable, because
get, put, modify and gets come in via the MonadState class, but separate
not overloaded versions for these function would make sense in the same
way we have map despite fmap.)

Cheers Christian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A round of golf

2008-09-19 Thread Ketil Malde
Don Stewart [EMAIL PROTECTED] writes:

 If I want to make my own efficient bytestring consumer, is that
 what I need to use in order to preserve the inherent laziness of
 the datastructure? 

 you can get foldChunks from Data.ByteString.Lazy.Internal,
 or write your own chunk folder.

IME you can also get nicely by using the standard list-alikes:
uncons, head, tail, take, drop... 

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] A round of golf

2008-09-19 Thread Ketil Malde
Creighton Hogg [EMAIL PROTECTED] writes:

 To ask an overly general question, if lazy bytestring makes a nice
 provider for incremental processing are there reasons to _not_ reach
 for that as my default when processing large files?

I think it is a nice default.  

I'd reach for strict bytestrings if I know the file will be processed
in a strict manner (not single-pass stream-through), and I just have
to have the last few percent speedup.  I'll use [String] only for
small examples, where the extra imports cost more than the performance
loss. 

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Updated formlets sample?

2008-09-19 Thread Martin Huschenbett

Hi all,

I found a blog post concerning formlets [1] in the web. Since looks very 
interesting I tried to compile the sample code with recent versions of 
HAppS and formlets from hackage. But this didn't work as the API of 
formlets has changed since this post.


I tried to adopt the code to the new API but I was unable to finish this 
since there is a new monadic context I don't know to handle in the right 
way.


So my question is, is there an updated version of this sample code in 
the web or has anybody tried to adopt it and can send me the results?


Thanks in advance,

Martin.

[1] http://blog.tupil.com/formlets-in-haskell/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO

2008-09-19 Thread Ketil Malde
[EMAIL PROTECTED] writes:

 It is interesting to compare the above main function with the
 corresponding lazy IO:

Minor point I know, but aren't you really comparing it with the
corresponding *strict* IO?

 main'' = do
names - getArgs
files - mapM readFile names
  

print $ length $ words (concat files)

This works nicely if you replace the middle line with a lazy version, e.g.:

   files - mapM (unsafeInterleaveIO . B.readFile) names

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] control-timeout with gtk

2008-09-19 Thread Marco Túlio Gontijo e Silva
Em Qui, 2008-09-18 às 16:14 -0300, Marco Túlio Gontijo e Silva escreveu:
 Em Qui, 2008-09-18 às 11:51 -0700, Adam Langley escreveu:
  Do you want control-timeout?
 
 I think control-timeout is very useful.  I'll try to fix it, and if I
 could, I'll upload it to hackage then.

I couldn't, and I found a solution to what I want in
System.Glib.MainLoop.  So, if someone else is interested in maintaining
this package, feel free to do it.

Greetings.

-- 
marcot
Página: http://marcotmarcot.iaaeee.org/
Blog: http://marcotmarcot.blogspot.com/
Correio: [EMAIL PROTECTED]
XMPP: [EMAIL PROTECTED]
IRC: [EMAIL PROTECTED]
Telefone: 25151920
Celular: 98116720
Endereço:
  Rua Turfa, 639/701
  Prado 30410-370
  Belo Horizonte/MG Brasil


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] control-timeout with gtk

2008-09-19 Thread Marco Túlio Gontijo e Silva
Hello,

Em Qui, 2008-09-18 às 10:05 -0700, Judah Jacobson escreveu:
 Just a guess, but this might be a problem with control-timeout's use
 of the unsafePerformIO global variables hack.  It's missing the
 standard NOINLINE annotations which prevent multiple copies of the
 global variable from being created.

I added the NOINLINE annotations and even tried building with -fno-cse,
but the result was the same.  Do you have any other suggestions?

Greetings.

-- 
marcot
Página: http://marcotmarcot.iaaeee.org/
Blog: http://marcotmarcot.blogspot.com/
Correio: [EMAIL PROTECTED]
XMPP: [EMAIL PROTECTED]
IRC: [EMAIL PROTECTED]
Telefone: 25151920
Celular: 98116720
Endereço:
  Rua Turfa, 639/701
  Prado 30410-370
  Belo Horizonte/MG Brasil


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread Kim-Ee Yeoh


oleg-30 wrote:
 
 I have not expected to see this Lazy IO code. After all, what could be
 more against
 the spirit of Haskell than a `pure' function with observable side effects.
 

What could even be more against the spirit of Haskell than 
the original theme of this thread, i.e. code that makes us cry?

Lennart's piece fudges purity, agreed, but it reads nicely as
idiomatic Haskell, swift on the eyes if not on the machine.

Consider if readFile's semantics were modified, i.e. not lazy,
at least not always.

In the ideal world, a smart enough compiler would just do 
the right thing, i.e. the IO String returned would be strict, or better 
yet, it would automatically chunkify the read to obtain constant 
space usage.

Lazy IO is indeed a nasty can of worms, not unrelated to the issue
of monadic IO as a gigantic sin bin. We could avoid it entirely, or we 
could sort out and algebraize the different interactions into a happier 
marriage of the pair.

-- 
View this message in context: 
http://www.nabble.com/Lazy-vs-correct-IO--Was%3A-A-round-of-golf--tp19567128p19573538.html
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread Lennart Augustsson
I agree that lazy IO is a can with some worms in it.  But it's not that strange.
The readFile operation is in the IO monad, so it has an effect on the world.
This effect is not finished when readFile returns, and from the world
point of view
it's not entirely deterministic.

On Fri, Sep 19, 2008 at 7:51 AM,  [EMAIL PROTECTED] wrote:

 Lennart Augustsson wrote

 main = do
name:_ - getArgs
file - readFile name
print $ length $ lines file

 Given the stance against top-level mutable variables, I have not
 expected to see this Lazy IO code. After all, what could be more against
 the spirit of Haskell than a `pure' function with observable side
 effects. With Lazy IO, one indeed has to choose between correctness
 and performance. The appearance of such code is especially strange
 after the evidence of deadlocks with Lazy IO, presented on this list
 less than a month ago. Let alone unpredictable resource usage and
 reliance on finalizers to close files (forgetting that GHC does not
 guarantee that finalizers will be run at all).

 Is there an alternative?

 -- Counting the lines in a file
 import IterateeM

 count_nl = liftI $ IE_cont (step 0)
  where
  step acc (Chunk str)  = liftI $ IE_cont (step $! acc + count str)
  step acc stream   = liftI $ IE_done acc stream
  count [] = 0
  count ('\n':str) = succ $! count str
  count (_:str) = count str

 main = do
   name:_ - getArgs
   IE_done counter _ - unIM $ enum_file name . enum_eof == count_nl
   print counter


 The function count_nl could have been in the library, but I'm a
 minimalist. It is written in a declarative rather than imperative
 style, and one easily sees what it does. The above code as well as the
 IterateeM library is Haskell98. It does not use any unsafe Haskell
 functions whatsoever.

 time wc -l /usr/share/dict/words
  235882 /usr/share/dict/words

 real0m0.024s
 user0m0.022s
 sys 0m0.000s

 time ~/Docs/papers/DEFUN08/Wc /usr/share/dict/words
 235882

 real0m0.141s
 user0m0.126s
 sys 0m0.008s

 To compare with lazy IO, the code using readFile gives

 time ~/Docs/papers/DEFUN08/Wc /usr/share/dict/words
 235882

 real0m0.297s
 user0m0.262s
 sys 0m0.023s

 So, choosing correctness does not mean losing in performance; in fact,
 one may even gain.

 Can enumerators compose? Well, we already seen the example above
(enum_file name . enum_eof)
 where the operation (.)
e1 . e2 = (==) e2 . e1
 is a flipped composition if monadic bind were considered a flipped
 application.


 Here is a more interesting example: count words in all the files whose
 names are given on the command line. There may be many files given,
 thousands of them.

 -- Count the stream. Again, could have been in the library
 stream_count :: Monad m = IterateeGM el m Int
 stream_count = liftI $ IE_cont (step 0)
  where
  step acc (Chunk [])  = liftI $ IE_cont (step acc)
  step acc (Chunk [_]) = liftI $ IE_cont (step $! succ acc)
  step acc (Chunk ls)  = liftI $ IE_cont (step $! acc + length ls)
  step acc stream  = liftI $ IE_done acc stream


 main = do
   names - getArgs
   let enumerators = foldr (\name - (enum_file name .)) enum_eof names
   IE_done (IE_done counter _) _ - unIM $ enumerators ==
 enum_words stream_count
   print counter

 We notice that the composition of enumerators corresponds to the
 `concatenation' of their sources. Declaratively, the meaning of the
 above code is:
-- all the given files are concatenated
-- the resulting stream of characters is converted to a stream
 of words
-- the stream of words is counted.

 Operationally, the code does not open more than one file at a
 time. More importantly, the code *never* reads more than 4096
 characters at a time. A block of the file is read, split into words,
 counted, and only then another chunk is read. After one file is done,
 it is closed, and another file is processed. One can see that only one
 file is being opened at a time by enabling traces. The processing is
 fully incremental.


 /usr/local/share/doc/ghc6 find . -name \*.html -print | time xargs 
 ~/Docs/papers/DEFUN08/Wc
 3043421
   16.99 real15.83 user 0.71 sys

 BTW, the program has counted words in 1169 files.

 It is interesting to compare the above main function with the
 corresponding lazy IO:

 main'' = do
   names - getArgs
   files - mapM readFile names
   print $ length $ words (concat files)

 The number of lines is comparable. The execution is not. If we try to
 run the lazy IO code, we get:

 /usr/local/share/doc/ghc6 find . -name \*.html -print | time xargs 
 ~/Docs/papers/DEFUN08/Wc
 Wc: ./libraries/Win32/Graphics-Win32-GDI-Path.html:
   openFile: resource exhausted (Too many open files)

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Manlio Perillo

Hi.

After having read
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390

I have a doubt about Data.ByteString.Lazy.

Why getContents function don't use pread (or an emulation, if not 
available)?



Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ANNOUNCE: Extensible and Modular Generics for the Masses (EMGM) 0.1

2008-09-19 Thread Sean Leather
Extensible and Modular Generics for the Masses
==

Extensible and Modular Generics for the Masses (EMGM) is a library for
generic programming in Haskell using type classes.

This is the initial release of a maintained library for EMGM. Other versions
have previously existed in various states from various sources. We plan to
continue updating and maintaining this version.

Visit the home page:

  http://www.cs.uu.nl/wiki/GenericProgramming/EMGM

Features


The primary features of EMGM include:

*  Datatype-generic programming using sum-of-product views
*  Large collection of ready-to-use generic functions
*  Included support for standard datatypes: lists, Maybe, tuples
*  Easy to add support for new datatypes
*  Type classes make writing new functions straightforward in a structurally
inductive style
*  Generic functions are extensible with ad-hoc cases for arbitrary
datatypes
*  Good performance of generic functions

The features of this distribution include:

*  The API is thoroughly documented with Haddock
*  Fully tested with QuickCheck and HUnit
*  Program coverage ensures that all useful code has been touched by tests
*  Tested on both Mac and Windows systems

Requirements


EMGM has the following requirements:

*  GHC 6.8.1 - It has been tested with versions 6.8.3 and 6.9.20080916.
*  Cabal library 1.2.1 - It has been tested with versions 1.2.3 and 1.4.0.1.

Download  Source
-

Use caball-install:

  cabal install emgm

Get the package:

  http://hackage.haskell.org/cgi-bin/hackage-scripts/package/emgm

Get the source:

  svn checkout https://svn.cs.uu.nl:12443/repos/dgp-haskell/EMGM

Examples


Check out the examples:

  https://svn.cs.uu.nl:12443/viewvc/dgp-haskell/EMGM/examples/

Bugs  Support
--

Report issues or request features:

  http://code.google.com/p/emgm/

Discuss EMGM with the authors, maintainers, and other interested persons:

  http://www.haskell.org/mailman/listinfo/generics

Credits
---

The research for EMGM originated with Ralf Hinze. It was extended with work
by Bruno Oliveira and Andres Löh. More details of the library functionality
were explored by Alexey Rodriguez. We are very grateful to all of these
people for the foundation on which this library was built.

The current authors and maintainers of EMGM are:

*  Sean Leather
*  José Pedro Magalhães
*  Alexey Rodriguez
*  Andres Löh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Don Stewart
manlio_perillo:
 Hi.
 
 After having read
 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390
 
 I have a doubt about Data.ByteString.Lazy.
 
 Why getContents function don't use pread (or an emulation, if not 
 available)?

Why would it?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Hi again.

In 
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390

there is a map reduce based log parser.

I have written an alternative version:
http://paste.pocoo.org/show/85699/

but, with a file of 315 MB, I have [1]:

1) map reduce implementation, non parallel
real0m6.643s
user0m6.252s
sys 0m0.212s

2) map reduce implementation, parallel with 2 cores
real0m3.840s
user0m6.384s
sys 0m0.652s

3) my implementation
real0m8.121s
user0m7.804s
sys 0m0.216s



What is the reason of the map reduce implementation being faster, even 
if not parallelized?


It is possible to implement a map reduce version that can handle gzipped 
log files?



[1] These tests does not consider the first run.
For the first run (no data in OS cache), I have (not verified):

1) map reduce implementation, parallel with 2 cores
real0m3.735s
user0m6.328s
sys 0m0.604s

2) my implementation
real0m13.659s
user0m7.712s
sys 0m0.360s




Thanks   Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:

Hi.

After having read
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390

I have a doubt about Data.ByteString.Lazy.

Why getContents function don't use pread (or an emulation, if not 
available)?


Why would it?



So that you don't need to open the same file multiple time:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193




-- Don





Thanks   Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Manlio Perillo

Manlio Perillo ha scritto:

Hi.

After having read
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390 



I have a doubt about Data.ByteString.Lazy.

Why getContents function don't use pread (or an emulation, if not 
available)?




A correction.
getContents should not use pread, so the question is:
why there is not a
  hParGetContents Handle - Integer - IO ByteString
function, where Integer is the absolute file offset where start to read?


Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Duncan Coutts
On Fri, 2008-09-19 at 18:46 +0200, Manlio Perillo wrote:
 Don Stewart ha scritto:
  manlio_perillo:
  Hi.
 
  After having read
  http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390
 
  I have a doubt about Data.ByteString.Lazy.
 
  Why getContents function don't use pread (or an emulation, if not 
  available)?
  
  Why would it?
  
 
 So that you don't need to open the same file multiple time:
 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

We are constrained here by the semantics of Handle which requires
that getContents semi-close the Handle (which in turn is to make it
harder to shoot yourself in the foot while doing lazy IO).

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Don Stewart
manlio_perillo:
 Hi again.
 
 In 
 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390
 there is a map reduce based log parser.
 
 I have written an alternative version:
 http://paste.pocoo.org/show/85699/
 
 but, with a file of 315 MB, I have [1]:
 
 1) map reduce implementation, non parallel
 real  0m6.643s
 user  0m6.252s
 sys   0m0.212s
 
 2) map reduce implementation, parallel with 2 cores
 real  0m3.840s
 user  0m6.384s
 sys   0m0.652s
 
 3) my implementation
 real  0m8.121s
 user  0m7.804s
 sys   0m0.216s
 
 
 
 What is the reason of the map reduce implementation being faster, even 
 if not parallelized?

Changes in how GC is utilised, or how optimisation works?
  
 It is possible to implement a map reduce version that can handle gzipped 
 log files?

Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Manlio Perillo

Duncan Coutts ha scritto:

On Fri, 2008-09-19 at 18:46 +0200, Manlio Perillo wrote:

Don Stewart ha scritto:

manlio_perillo:

Hi.

After having read
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390

I have a doubt about Data.ByteString.Lazy.

Why getContents function don't use pread (or an emulation, if not 
available)?

Why would it?


So that you don't need to open the same file multiple time:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193


We are constrained here by the semantics of Handle which requires
that getContents semi-close the Handle (which in turn is to make it
harder to shoot yourself in the foot while doing lazy IO).



But if every function that reads the data uses pread, then this should 
no more be a problem.


Or I'm missing some other thing?



Duncan




Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread Jonathan Cast
On Fri, 2008-09-19 at 16:30 +0100, Lennart Augustsson wrote:
 I agree that lazy IO is a can with some worms in it.  But it's not that 
 strange.
 The readFile operation is in the IO monad, so it has an effect on the world.
 This effect is not finished when readFile returns, and from the world
 point of view
 it's not entirely deterministic.

On operating systems that fail to maintain file system consistency.

Blaming an effect of an *operating system* misfeature on a *language*
feature is somewhat perverse.

jcc


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] lazy strings and parallel read

2008-09-19 Thread Duncan Coutts
On Fri, 2008-09-19 at 19:50 +0200, Manlio Perillo wrote:

 But if every function that reads the data uses pread, then this should 
 no more be a problem.
 
 Or I'm missing some other thing?

If you used something like pread instead of hGetContents then yes that
would not involve semi-closing a handle or doing lazy IO. Of course
pread only works for seekable Handles like files, not terminals, pipes,
sockets etc etc.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Bulat Ziganshin
Hello Don,

Friday, September 19, 2008, 9:12:43 PM, you wrote:

 It is possible to implement a map reduce version that can handle gzipped
 log files?

 Using the zlib binding on hackage.haskell.org, you can stream multiple
 zlib decompression threads with lazy bytestrings, and combine the
 results.

for gzip decompression you need to decompress all the previous data,
so decompression of single file cannot be multithreaded


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

2008-09-19 Thread David Menendez
On Fri, Sep 19, 2008 at 2:51 AM,  [EMAIL PROTECTED] wrote:

 Lennart Augustsson wrote

 main = do
name:_ - getArgs
file - readFile name
print $ length $ lines file

 Given the stance against top-level mutable variables, I have not
 expected to see this Lazy IO code. After all, what could be more against
 the spirit of Haskell than a `pure' function with observable side
 effects. With Lazy IO, one indeed has to choose between correctness
 and performance. The appearance of such code is especially strange
 after the evidence of deadlocks with Lazy IO, presented on this list
 less than a month ago. Let alone unpredictable resource usage and
 reliance on finalizers to close files (forgetting that GHC does not
 guarantee that finalizers will be run at all).

 Is there an alternative?

 -- Counting the lines in a file
 import IterateeM

 count_nl = liftI $ IE_cont (step 0)
  where
  step acc (Chunk str)  = liftI $ IE_cont (step $! acc + count str)
  step acc stream   = liftI $ IE_done acc stream
  count [] = 0
  count ('\n':str) = succ $! count str
  count (_:str) = count str

 main = do
   name:_ - getArgs
   IE_done counter _ - unIM $ enum_file name . enum_eof == count_nl
   print counter


 The function count_nl could have been in the library, but I'm a
 minimalist. It is written in a declarative rather than imperative
 style, and one easily sees what it does. The above code as well as the
 IterateeM library is Haskell98. It does not use any unsafe Haskell
 functions whatsoever.

Is the IterateeM library available on-line anywhere? I'm familiar
enough with your earlier work on enumerators that I can guess what
most of what that code is doing, but I'd like a better idea of what
== does.

-- 
Dave Menendez [EMAIL PROTECTED]
http://www.eyrie.org/~zednenem/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hackage and HaXml situation

2008-09-19 Thread John Goerzen
Duncan Coutts wrote:
 On Thu, 2008-09-18 at 15:32 -0500, John Goerzen wrote:
 Basically for each package we have an optional suggested version
 constraint. This would be used in the hackage website to direct people

That would solve the problem nicely, I think.  Do you have an ETA for
this feature?


 to the 'current' version but most importantly it'd be used by
 cabal-install and other cabal - native package conversion tools. It
 wold be editable on the hackage website by the package author/maintainer
 and probably other people with the role of managing the hackage
 collection.
 
 Cases like HaXml-1.13 - 1.19 (or a future stable version) or old-time
 - time are also things that the platform might be able to help with in
 future by managing the transition in a more coherent way rather than
 what we have now where the transition is rather hap-hazard with half
 using one and half the other.
 
 Duncan
 
 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Library design question

2008-09-19 Thread Andre Nathan
On Fri, 2008-09-19 at 10:35 +0200, Christian Maeder wrote:
 I agree. Duncan's version also looks more atomic to me,
[...]

OK, so what I have now is

  addVertex :: Int - a - Graph a b - Graph a b
  addVertex v l g = Graph adj (numVertices g + 1) (numEdges g)
where adj = Map.insert v (l, Map.empty) (adjacencies g)

  addEdge :: Int - Int - b - Graph a b - Graph a b
  addEdge v w l g = Graph adj (numVertices g) (numEdges g + 1)
where adj = Map.insert v (vl, ns') (adjacencies g)
  ns' = Map.insert w l ns
  (vl, ns) = fromJust $ Map.lookup v (adjacencies g)

Creating a random graph [G(n,p) model] the naive way:

  type RandomGraph a b = StateT (Graph a b) IO ()

  randomGraph :: Int - Double - IO (Graph Int Int)
  randomGraph n p = execStateT create Graph.empty
where create = mapM_ (uncurry $ createVertex p) vls
  vls= zip [1..n] (repeat 1)

  createVertex :: Double - Int - a - RandomGraph a Int
  createVertex p v l = do
modify (Graph.addVertex v l)
createEdges v p

  createEdges :: Int - Double - RandomGraph a Int
  createEdges n p = mapM_ (maybeAddEdges n) [1..n-1]
where maybeAddEdges v w = do
maybeAddEdge v w
maybeAddEdge w v
  maybeAddEdge v w = do
r - lift randomDouble
when (r  p) $ modify (addEdge v w 1)

  randomDouble :: IO Double
  randomDouble = randomIO

So, to reference another thread, does this make anyone cry? :)

Thanks a lot for the suggestions,
Andre

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Library design question

2008-09-19 Thread Andre Nathan
On Fri, 2008-09-19 at 09:51 +0200, apfelmus wrote:
 There's also Martin Erwig's functional graph library in  Data.Graph.Inductive 
  (
 fgl  on hackage).

I tried it some time ago, but for large graphs it has a very high memory
usage.

Best,
Andre

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] control-timeout with gtk

2008-09-19 Thread Thomas M. DuBuisson
On Fri, 2008-09-19 at 09:09 -0300, Marco Túlio Gontijo e Silva wrote:
 I added the NOINLINE annotations and even tried building with -fno-cse,
 but the result was the same.  Do you have any other suggestions?

A while ago I made a shim using control-event to provide the
control-timeout api in Control.Event.Timeout.  It performs worse than
control-timeout because it computes absolute times on each addTimeout
call, but might be fine for your needs.

If this is a problem with unsafePeformIO it won't be fixed by this
change - control-event uses the same hack to provide the control-timeout
API.  Alternatively, you could pass the timeout data structure as an
argument as expected by the Control.Event module.

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/control-event

Tom

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

2008-09-19 Thread Manlio Perillo

Brandon S. Allbery KF8NH ha scritto:

On Sep 18, 2008, at 15:10 , Manlio Perillo wrote:
Allocation areas are per-CPU, not per-thread.  A Concurrent Haskell 
thread consists of a TSO (thread state object, currently 11 machine 
words), and a stack, which we currently start with 1KB and grow on 
demand.


How is this implemented?

I have seen some coroutine implementations in C, using functions from 
ucontext.h (or direct asm code), but all have the problem that the 
allocated stack is fixed.



That's because it's much easier to use a fixed stack.

There are two ways to handle a growable stack; both start with 
allocating each stack in a separate part of the address space with room 
to grow it downward.  The simpler way uses stack probes on function 
entry to detect impending stack overflow.  The harder (and less 
portable) one involves trapping page faults (segmentation violation on 
POSIX), enlarging the stack, and restarting the instruction that caused 
the trap; this requires fairly detailed knowledge of the CPU and the way 
signals or page faults are handled by the OS.  (There's also a hybrid 
which many POSIXish systems use, trapping the page fault specifically 
when running the stack probe; the probe is designed to be safe to either 
restart or ignore, so it can be handled more portably.)




What implementation is used in GHC?

Is this more easy to implement with a pure functional language like 
Haskell, or the same implementation can be used with a procedural 
language like C?




Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Library design question

2008-09-19 Thread Daniel Fischer
Am Freitag, 19. September 2008 22:55 schrieb Andre Nathan:
 On Fri, 2008-09-19 at 10:35 +0200, Christian Maeder wrote:
  I agree. Duncan's version also looks more atomic to me,

 [...]

 OK, so what I have now is

   addVertex :: Int - a - Graph a b - Graph a b
   addVertex v l g = Graph adj (numVertices g + 1) (numEdges g)
 where adj = Map.insert v (l, Map.empty) (adjacencies g)

   addEdge :: Int - Int - b - Graph a b - Graph a b
   addEdge v w l g = Graph adj (numVertices g) (numEdges g + 1)
 where adj = Map.insert v (vl, ns') (adjacencies g)
   ns' = Map.insert w l ns
   (vl, ns) = fromJust $ Map.lookup v (adjacencies g)

 Creating a random graph [G(n,p) model] the naive way:

   type RandomGraph a b = StateT (Graph a b) IO ()

   randomGraph :: Int - Double - IO (Graph Int Int)
   randomGraph n p = execStateT create Graph.empty
 where create = mapM_ (uncurry $ createVertex p) vls
   vls= zip [1..n] (repeat 1)

   createVertex :: Double - Int - a - RandomGraph a Int
   createVertex p v l = do
 modify (Graph.addVertex v l)
 createEdges v p

   createEdges :: Int - Double - RandomGraph a Int
   createEdges n p = mapM_ (maybeAddEdges n) [1..n-1]
 where maybeAddEdges v w = do
 maybeAddEdge v w
 maybeAddEdge w v
   maybeAddEdge v w = do
 r - lift randomDouble
 when (r  p) $ modify (addEdge v w 1)

   randomDouble :: IO Double
   randomDouble = randomIO

 So, to reference another thread, does this make anyone cry? :)

Yes. What's IO gotta do with it?
It's much cleaner to pass the PRNG as an explicit argument (or what about
StateT (Graph a b) (State StdGen) ?).

And in addVertex/addEdge, it might be good to check whether the vertex/edge is 
already present.

 Thanks a lot for the suggestions,
 Andre


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:
[...]
It is possible to implement a map reduce version that can handle gzipped 
log files?


Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.



This is a bit hard.
A deflate encoded stream contains multiple blocks, so you need to find 
the offset of each block and decompress it in parallel.

But then you need also to make sure each final block terminates with a '\n'.

And the zlib Haskell binding does not support this usage (I'm not even 
sure zlib support this).




By the way, this phrase:
We allow multiple threads to read different chunks at once by supplying 
each one with a distinct file handle, all reading the same file

here:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

IMHO is not correct, or at least misleading.
Each block is read in the main thread, or at least myThreadId return 
always the same value.


This is also the reason why I don't understand why my version is slower 
then the book version.
The only difference is that the book version reads 4 chunks and my 
version only 1 big chunk.




-- Don




Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Bryan O'Sullivan
On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo [EMAIL PROTECTED]wrote:


 By the way, this phrase:
 We allow multiple threads to read different chunks at once by supplying
 each one with a distinct file handle, all reading the same file
 here:

 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

 IMHO is not correct, or at least misleading.


It's both correct and, er, leading. The files are opened in a single thread,
and then the file handles are read by multiple threads.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Python's big challenges, Haskell's big advantages?

2008-09-19 Thread Brandon S. Allbery KF8NH

On 2008 Sep 19, at 17:14, Manlio Perillo wrote:

Brandon S. Allbery KF8NH ha scritto:
There are two ways to handle a growable stack; both start with  
allocating each stack in a separate part of the address space with  
room to grow it downward.  The simpler way uses stack probes on  
function entry to detect impending stack overflow.  The harder (and  
less portable) one involves trapping page faults (segmentation  
violation on POSIX), enlarging the stack, and restarting the  
instruction that caused the trap; this requires fairly detailed  
knowledge of the CPU and the way signals or page faults are handled  
by the OS.  (There's also a hybrid which many POSIXish systems use,  
trapping the page fault specifically when running the stack probe;  
the probe is designed to be safe to either restart or ignore, so it  
can be handled more portably.)


What implementation is used in GHC?


I haven't looked at the GHC implementation.

Is this more easy to implement with a pure functional language like  
Haskell, or the same implementation can be used with a procedural  
language like C?



You can use it with pretty much any language, as long as you can limit  
the size of stack frames.  (If a stack frame is larger than the stack  
probe distance you might just get an unhandled page fault.)  The main  
question is whether you ant to absorb the additional complexity; it's  
a bit harder to debug memory issues when you have to deal with page  
faults yourself.  (A *real* segmentation violation might be hidden by  
the stack grow code.)


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Bryan O'Sullivan ha scritto:
On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:



By the way, this phrase:
We allow multiple threads to read different chunks at once by
supplying each one with a distinct file handle, all reading the same
file
here:

http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

IMHO is not correct, or at least misleading.


It's both correct and, er, leading. The files are opened in a single 
thread, and then the file handles are read by multiple threads.


Ah, right, thanks.
They are read with ByteString.Lazy.



Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Parsing arguments and reading configuration

2008-09-19 Thread Magnus Therning
Hi all,

I'm looking for some inspiration for an elegant solution to a silly
little problem I have.  This might have a general well-known solution,
or maybe there's something particularly elegant possible in Haskell.  I
just thought I'd ask.

When writing a command line tool I want to use a configuration file and
then have the ability to override the configuration using command line
arguments.  When I've worked with command line arguments before I've
used the trick of folding (=) over a list of functions that modify the
members of a type, using the default values as the starting point.  I
like that, it's cute.

First I thought I'd treat the configuration in a similar way, but then I
noticed a slight ordering problem.  The command line arguments should
take priority over the contents of the configuration file, but the
location of the configuration can be given as an argument.  I could read
the arguments twice, first to get the correct location of the config
file, then load the config, and then read the arguments again to make
sure they take priority.  But that feels a little silly.  Are there any
more elegant solutions people are using?

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus@therning.org
http://therning.org/magnus

Haskell is an even 'redder' pill than Lisp or Scheme.
 -- PaulPotts




signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: Extensible and Modular Generics for the Masses (EMGM) 0.1

2008-09-19 Thread Don Stewart
leather:
Extensible and Modular Generics for the Masses
==
 
Extensible and Modular Generics for the Masses (EMGM) is a library for
generic programming in Haskell using type classes.
 
This is the initial release of a maintained library for EMGM. Other
versions have previously existed in various states from various sources.
We plan to continue updating and maintaining this version.
 
Visit the home page:
 
  [1]http://www.cs.uu.nl/wiki/GenericProgramming/EMGM
 

Now in Arch Linux,

http://aur.archlinux.org/packages.php?ID=20070

Come on Debian!

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Library design question

2008-09-19 Thread Andre Nathan
On Fri, 2008-09-19 at 23:16 +0200, Daniel Fischer wrote:
 Yes. What's IO gotta do with it?

I did it because of randomIO :(

 (or what about StateT (Graph a b) (State StdGen) ?).

Now there's something I wouldn't have thought of... I changed the
RandomGraph type to

  type RandomGraph a b = StateT (Graph a b) (State StdGen) ()

and randomFloat to

  randomDouble :: State StdGen Double
  randomDouble = State random

and randomGraph to

  randomGraph :: StdGen - Int - Double - Graph Int Int
  randomGraph gen n p = evalState (execStateT create Graph.empty) gen
where create = mapM_ (uncurry $ createVertex p) vls
  vls= zip [1..n] (repeat 42)

However, when I try to create a graph with 1000 vertices I get a stack
overflow, which didn't happen in the IO version. Any idea why that happens?

Thanks,
Andre

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing arguments and reading configuration

2008-09-19 Thread Derek Elkins
On Fri, 2008-09-19 at 23:24 +0100, Magnus Therning wrote:
 Hi all,
 
 I'm looking for some inspiration for an elegant solution to a silly
 little problem I have.  This might have a general well-known solution,
 or maybe there's something particularly elegant possible in Haskell.  I
 just thought I'd ask.
 
 When writing a command line tool I want to use a configuration file and
 then have the ability to override the configuration using command line
 arguments.  When I've worked with command line arguments before I've
 used the trick of folding (=) over a list of functions that modify the
 members of a type, using the default values as the starting point.  I
 like that, it's cute.
 
 First I thought I'd treat the configuration in a similar way, but then I
 noticed a slight ordering problem.  The command line arguments should
 take priority over the contents of the configuration file, but the
 location of the configuration can be given as an argument.  I could read
 the arguments twice, first to get the correct location of the config
 file, then load the config, and then read the arguments again to make
 sure they take priority.  But that feels a little silly.  Are there any
 more elegant solutions people are using?

You could build a monoid,
data Option a = Unspecified | Default a | Config a | CommandLine a

With Unspecified being the identity and the multiplication being
Default a * Config b = Config b
Default a * CommandLine b = CommandLine b
Config a * CommandLine b = CommandLine b
and symmetrically and break ties to the right e.g.
CommandLine a * CommandLine b = CommandLine b


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing arguments and reading configuration

2008-09-19 Thread Antoine Latter
2008/9/19 Magnus Therning [EMAIL PROTECTED]:

 First I thought I'd treat the configuration in a similar way, but then I
 noticed a slight ordering problem.  The command line arguments should
 take priority over the contents of the configuration file, but the
 location of the configuration can be given as an argument.  I could read
 the arguments twice, first to get the correct location of the config
 file, then load the config, and then read the arguments again to make
 sure they take priority.  But that feels a little silly.  Are there any
 more elegant solutions people are using?

I'm not sure how well it would hold up under maintenance, but you coud
have a config sum-type which is itself a monoid, and then create two
of them:

 data UserConfig = UserConfig
   { item1 :: Maybe Type1
   , item2 :: Maybe Type2
   , configFileLocation :: Maybe FilePath
   }

 instance Monoid UserConfig where
  {- not shown -}

 buildConfig :: IO UserConfig
 buildConfig = do
   cmdLineCfg - buildConfigFromCmdLine
   fileCfg - maybe (return mempty) buildConfigFromFile (configFileLocation 
 cmdLineCfg)

   return $ fileCfg `mappend` cmdLineCfg

 -- mappend is assumed to be left-biased

 buildConfigFromCmdLine :: IO UserConfig
 buildConfigFromFile :: FilePath - IO UserConfig

Does that make sense?  or is it too complicated?

-Antoine
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing arguments and reading configuration

2008-09-19 Thread Antoine Latter
On Fri, Sep 19, 2008 at 7:35 PM, Antoine Latter [EMAIL PROTECTED] wrote:
 I'm not sure how well it would hold up under maintenance, but you coud
 have a config sum-type which is itself a monoid, and then create two
 of them:


And by sum-type I mean product type.  Sheesh.

Although having your config options in a sum-type packed into a Set,
which is itself a Monoid is another option.  Then you get 'mempty' and
'mappend' for free.

I think I saw a blog-post or something detailing this, but I don't
have a book-mark.

-Antoine
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Ropes

2008-09-19 Thread Rafael Gustavo da Cunha Pereira Pinto
Hi all,

Is there any implementation of the rope data structure in Haskell?

I couldn't find any on Hackage, and I am intending to implement it.

Regards,

Rafael Gustavo da Cunha Pereira Pinto
Electronic Engineer, MSc.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: citeproc-hs, a Haskell implementation of the Citation Style Language designed for Pandoc

2008-09-19 Thread Gwern Branwen
On Sat, Sep 13, 2008 at 1:33 PM, Andrea Rossato
[EMAIL PROTECTED] wrote:
 Hello,

 I'm happy to announce the first release of citeproc-hs, a Haskell
 implementation of the Citation Style Language.

 citeproc-hs adds to Pandoc, the famous Haskell text processing tool, a
 Bibtex like citation and bibliographic formatting and generation
 facility.
...
 John MacFarlane, the author of Pandoc, has been very supportive of the
 project and provided a lot of useful feed back, comments and
 suggestions.

 Hope you'll enjoy,
 Andrea Rossato

Hi Andrea. So I was looking at the README. Does citeproc-hs only
support adding refs from a .xml file when one's written in Pandoc
markdown? That is, I don't see how I could take a .lhs file and a .bib
file and produce one of the Pandoc-supported outputs. In particular,
I'd really like to be able to go .lhs - .wiki, with refs; this would
let me convert The Monad Reader articles for haskell.org.

--
gwern
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: XML (HXML) parsing :: GHC 6.8.3 space leak from 2000

2008-09-19 Thread Lev Walkin

Lev Walkin wrote:

Simon Marlow wrote:

Lev Walkin wrote:


I wondered why would a contemporary GHC 6.8.3 exhibit such a leak?
After all, the technique was known in 2000 (and afir by Wadler in '87)
and one would assume Joe English's reference to most other Haskell
systems ought to mean GHC.


Thanks for this nice example - Don Stewart pointed me to it, and  
Simon PJ and I just spent some time this morning diagnosing it.


Incedentally, with GHC 6.8 you can just run the program with +RTS 
-hT to get a basic space profile, there's no need to compile it for 
profiling - this is tremendously useful for quick profiling jobs.  And 
in this case we see the the heap is filling up with (:) and Tree 
constructors, no thunks.


Here's the short story:  GHC does have the space leak optimisation you 
refer to, and it is working correctly, but it doesn't cover all the 
cases you might want it to cover.  In particular, optimisations 
sometimes interact badly with the space leak avoidance, and that's 
what is happening here.  We've known about the problem for some time, 
but this is the first time I've seen a nice small example that 
demonstrates it.



-- Lazily build a tree out of a sequence of tree-building events
build :: [TreeEvent] - ([UnconsumedEvent], [Tree String])
build (Start str : es) =
let (es', subnodes) = build es
(spill, siblings) = build es'
in (spill, (Tree str subnodes : siblings))
build (Leaf str : es) =
let (spill, siblings) = build es
in (spill, Tree str [] : siblings)
build (Stop : es) = (es, [])
build [] = ([], [])


[skip]

We don't know of a good way to fix this problem.  I'm going to record 
this example in a ticket for future reference, though.


Simon,

is there a way, perhaps, to rewrite this expression to avoid leaks?
An ad-hoc will do, perhaps split in two modules to avoid intramodular
optimizations?


Tried to avoid this misoptimization by using explicit fst, and
it worked on my synthesized input (probably benefiting of CSE):

build :: [TreeEvent] - ([UnconsumedEvent], [Tree String])
build (Start str : es) =
let (_, subnodes) = build es
(spill, siblings) = build . fst . build $ es
in (spill, (Tree str subnodes : siblings))
build (Leaf str : es) =
let (spill, siblings) = build es
in (spill, Tree str [] : siblings)
build (Stop : es) = (es, [])
build [] = ([], [])

However, while this solution works on a synthesized input (cycle [...]),
it still has memory leak when taken into HXML environment which
operates on files (why?).

Only when I also added Ketil Malde's `par` based hack I finally
was able to parse the big XML file without a space leak. Here's
the diff to HXML 0.2:

==
--- TreeBuild.hs.old2008-09-19 17:01:30.0 -0700
+++ TreeBuild.hs2008-09-19 17:04:15.0 -0700
@@ -20,6 +20,7 @@
 import XMLParse
 import XML
 import Tree
+import Control.Parallel

 --
 -- TODO: add basic error-checks: matching end-tags, ensure input exhausted
@@ -43,8 +44,9 @@
addTree t es= let (s,es') = build es in pair (cons t s) es'
build []= pair nil []
build (e:es) = case e of
-   StartEvent gi atts  - let (c,es') = build es
-  in addNode (ELNode gi atts) c es'
+   StartEvent gi atts  - let (c, es') = build es
+  sbl = build . snd . build $ es
+  in sbl `par` (cons (tree (ELNode gi atts) c) 
(fst sbl), snd sbl)
EndEvent _  - pair nil es
EmptyEvent gi atts  - addLeaf (ELNode gi atts) es
TextEvent s - addLeaf (TXNode s) es
===

With that, a 45 mb XML is parsed in constant space in

G4 1.5GHz: 1 minute 48 seconds, taking 16 mb RAM
Pentium D 2x3.0GHz: 12 seconds, taking 9 mb RAM

Compared to 0.2s `wc -l`.

If you
  * remove `par` from there or
  * replace (build . snd . build $ es) with just (es') or
  * forget to specify -threaded (-smp) during ghc compilation
then the space leak will exhibit itself again.

However, removing -threaded will still make this code run without leak
on synthesized input (StartEvent  [] : cycle [TextEvent ]).

I believe there's a way to get rid of `par`, perhaps by wrapping
this tree building thing into a optimization-unfriendly monad?
But I don't know how to approach this. Any help?

--
Lev Walkin
[EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ropes

2008-09-19 Thread Ryan Ingram
I think Data.Sequence uses fingertrees which are pretty fast.

I used a handgrown rope-like structure for ICFPC07 but I wish I had
known about Sequence; it would have likely just been better.

  -- ryan

2008/9/19 Rafael Gustavo da Cunha Pereira Pinto [EMAIL PROTECTED]:
 Hi all,

 Is there any implementation of the rope data structure in Haskell?

 I couldn't find any on Hackage, and I am intending to implement it.

 Regards,

 Rafael Gustavo da Cunha Pereira Pinto
 Electronic Engineer, MSc.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ropes

2008-09-19 Thread Don Stewart
RafaelGCPP.Linux:
Hi all,
 
Is there any implementation of the rope data structure in Haskell?
 
I couldn't find any on Hackage, and I am intending to implement it.

There's no mature rope implementation. Can you write a bytestring-rope
that outperforms lazy bytestrings please :)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe