Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-02-02 Thread Albert Y. C. Lai

http://www.vex.net/~trebla/haskell/lazy.xhtml

It is half done.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-02-01 Thread Ryan Newton


 Even though advertised as parallel programming tools, parMap and other
 functions that work in parallel over *sequential* access data
 structures (i.e. linked lists.) We want flat, strict, unpacked data
 structures to get good performance out of parallel algorithms. DPH,
 repa, and even vector show the way.


You would think that tree data structures would be good here as well.  For
example, monad-par includes a definition of an append-based AList (like
Guy Steele argues for).

But alas that turns out to be much harder to get working well.  For most
algorithms Vectors so often end up better.

   -Ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Marc Weber

Using insertWith' gets time down to 30-40 secs (thus only being 3-4
times slower than PHP).
PHP still is at 13 secs, does not require installing libraries - does
not require compilation and is trivial to write.

A trivial C++ application takes 11-12secs and even with some googling
was trivial to write.

Excerpts from Felipe Almeida Lessa's message of Mon Jan 30 17:36:46 +0100 2012:
 Then please take a deeper look into my code.  What you said that
 you've tried is something else.
I didn't say that I tried your code. I gave enumerator package a try
counting lines which I expected to behave similar to conduits
because both serve a similar purpose.
Then I hit the the sourceFile returns chunked lines issue (reported
it, got fixed) - 

Anyway: My log files are a json dictionary on each line:

  { id : foo, ... }
  { id : bar, ... }

Now how do I use the conduit package to split a chunked file into lines?
Or should I create a new parser many json  newline ?

Except that I think my processJson for this test should look like this
because I want to count how often the clients queried the server.
Probalby I should also be using CL.fold as shown in the test cases of
conduit. If you tell me how you'd cope with the one json dict on each
line issue I'll try to benchmark this solution as well.


-- probably existing library functions can be used here ..
processJson :: (M.Map T.Text Int) - Value - (M.Map T.Text Int)
processJson m value = case value of
  Ae.Object hash_map -
case HMS.lookup (T.pack id) hash_map of
  Just id_o -
case id_o of
  Ae.String id - M.insertWith' (+) id 1 m
  _ - m
  _ - m
  _ - m

Marc Weber

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Felipe Almeida Lessa
On Tue, Jan 31, 2012 at 6:05 AM, Marc Weber marco-owe...@gmx.de wrote:
 I didn't say that I tried your code. I gave enumerator package a try
 counting lines which I expected to behave similar to conduits
 because both serve a similar purpose.
 Then I hit the the sourceFile returns chunked lines issue (reported
 it, got fixed) - 

 Anyway: My log files are a json dictionary on each line:

  { id : foo, ... }
  { id : bar, ... }

 Now how do I use the conduit package to split a chunked file into lines?
 Or should I create a new parser many json  newline ?

Currently there are two solutions.  The first one is what I wrote
earlier on this thread:

 jsonLines :: C.Resource m = C.Conduit B.ByteString m Value
 jsonLines = C.sequenceSink () $ do
   val - CA.sinkParser json'
   CB.dropWhile isSpace_w8
   return $ C.Emit () [val]

This conduit will run the json' parser (from aeson) and then drop any
whitespace after that.  Note that it will correctly parse all of your
files but will also parse some files that don't conform to your
specification.  I assume that's fine.



The other solution is going to released with conduit 0.2, probably
today.  There's a lines conduit that splits the file into lines, so
you could write jsonLines above as:

 mapJson :: C.Resource m = C.Conduit B.ByteString m Value
 mapJson = C.sequenceSink () $ do
   val - CA.sinkParser json'
   return $ C.Emit () [val]

which doesn't need to care about newlines, and then change main to

 main = do
   ...
   ret - forM_ fileList $ \fp - do
 C.runResourceT $
   CB.sourceFile fp C.$=
   CB.lines C.$=  -- new line is here
   mapJson C.$=
   CL.mapM processJson C.$$
   CL.consume
   print ret


I don't know which solution would be faster.  Either way, both
solutions will probably be faster with the new conduit 0.2.


 Except that I think my processJson for this test should look like this
 because I want to count how often the clients queried the server.
 Probalby I should also be using CL.fold as shown in the test cases of
 conduit. If you tell me how you'd cope with the one json dict on each
 line issue I'll try to benchmark this solution as well.

This issue was already being coped with in my previous e-mail =).

 -- probably existing library functions can be used here ..
 processJson :: (M.Map T.Text Int) - Value - (M.Map T.Text Int)
 processJson m value = case value of
                          Ae.Object hash_map -
                            case HMS.lookup (T.pack id) hash_map of
                              Just id_o -
                                case id_o of
                                  Ae.String id - M.insertWith' (+) id 1 m
                                  _ - m
                              _ - m
                          _ - m

Looks like the perfect job for CL.fold.  Just change those three last
lines in main from

  ... C.$=
  CL.mapM processJson C.$$
  CL.consume

into

  ... C.$$
  CL.fold processJson

and you should be ready to go.

Cheers!

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Marc Weber
  jsonLines :: C.Resource m = C.Conduit B.ByteString m Value
  jsonLines = C.sequenceSink () $ do
val - CA.sinkParser json'
CB.dropWhile isSpace_w8
return $ C.Emit () [val]

Adding a \state - (the way Felipe Lessa told me) make is work and
it runs in about 20sec and that although some conduit overhead is likely
to take place.

omitting my custom data type using bytestrings operating on Value of
Aeson reduces running time to 16secs. 

PHP/C++ still wins: less than 12secs.

Now I can imagine again that even a desktop multi core system is faster
than a single threaded C application.

Thanks for your help. Maybe I can setup profiling again to understand
why its still taking little bit more time.

Marc Weber

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Felipe Almeida Lessa
On Tue, Jan 31, 2012 at 1:36 PM, Marc Weber marco-owe...@gmx.de wrote:
 Adding a \state - (the way Felipe Lessa told me) make is work and
 it runs in about 20sec and that although some conduit overhead is likely
 to take place.

Just out of curiosity: did you use conduit 0.1 or 0.2?

Cheers! =)

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Marc Weber
Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100 2012:
 Just out of curiosity: did you use conduit 0.1 or 0.2?
I updated to 0.2 today because I was looking for a monad instance for
SequenceSink - but didn't find it cause I tried using it the wrong way
(\state - see last mail)

I also tried json' vs json (strict and non strict versions) - didn't
seem to make a big difference.

Marc Weber

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Steve Severance
Hi Everyone,
  I had a similar experience with a similar type of problem. The
application was analyzing web pages that our web crawler had collected,
well not the pages themselves but metadata about when the page was
collected.

The basic query was:

SELECT
  Domain, Date, COUNT(*)
FROM
  Pages
GROUP BY
  Domain, Date

The webpage data was split out across tens of thousands of files compressed
binary. I used enumerator to load these files and select the appropriate
columns. This step was performed in parallel using parMap and worked fine
once i figured out how to add the appropriate !s.

The second step was the group by. I built some tools across monad-par that
had the normal higher level operators like map, groupBy, filter, etc... The
typical pattern I followed was the map-reduce style pattern used in
monad-par. I was hoping to someday share this work, although I have since
abandoned work on it.

It took me a couple of weeks to get the strictness mostly right. I say
mostly because it still randomly blows up, meaning if I feed in a single
40kb file maybe 1 time in 10 it consumes all the memory on the machine in a
few seconds. There is obviously a laziness bug in there somewhere although
after working on it for a few days and failing to come up with a solid
repro case I eventually built all the web page analysis tools in scala, in
large part because I did not see a way forward and need to tie off that
work and move on.

My observations:
Combining laziness and parallelism made it very difficult to reason about
what was going on. Test cases became non-deterministic not in terms out
output in the success case but whether they ran at all.

The tooling around laziness does not give enough information about
debugging complex problems. Because of this when people ask Is Haskell
good for parallel development? I tell them the answer is complicated.
Haskell has excellent primitives for parallel development like the STM
which I love but it lacks a PLINQ like toolkit that is fully built out to
enable flexible parallel data processing.

The other thing is that deepseq is very important . IMHO this needs to be a
first class language feature with all major libraries shipping with deepseq
instances. There seems to have been some movement on this front but you
can't do serious parallel development without it.

Some ideas for things that might help would be a plugin for vim that showed
the level of strictness of operations and data. I am going to take another
crack at a PLINQ like library with GHC 7.4.1 in the next couple of months
using the debug symbols that Peter has been working on.

Conclusion:
Haskell was the wrong platform to be doing webpage analysis anyhow, not
because anything is wrong with the language but simply it does not have the
tooling that the JVM does. I moved all my work into Hadoop to take
advantage of multi-machine parallelism and higher level tools like Hive.
There might be a future in building haskell code that could be translated
into a Hive query.

With better tools I think that Haskell can become the goto language for
developing highly parallel software. We just need the tools to help
developers better understand the laziness of their software. There also
seems to be a documentation gap on developing data analysis or data
transformation pipelines in haskell.

Sorry for the length. I hope my experience is useful to someone.

Steve

On Tue, Jan 31, 2012 at 7:57 AM, Marc Weber marco-owe...@gmx.de wrote:

 Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100
 2012:
  Just out of curiosity: did you use conduit 0.1 or 0.2?
 I updated to 0.2 today because I was looking for a monad instance for
 SequenceSink - but didn't find it cause I tried using it the wrong way
 (\state - see last mail)

 I also tried json' vs json (strict and non strict versions) - didn't
 seem to make a big difference.

 Marc Weber

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Gregory Collins
On Tue, Jan 31, 2012 at 9:19 PM, Steve Severance
ssevera...@alphaheavy.comwrote:

 The other thing is that deepseq is very important . IMHO this needs to be
 a first class language feature with all major libraries shipping with
 deepseq instances. There seems to have been some movement on this front but
 you can't do serious parallel development without it.


I completely agree on the first part, but deepseq is not a panacea either.
It's a big hammer and overuse can sometimes cause wasteful O(n) no-op
traversals of already-forced data structures. I also definitely wouldn't go
so far as to say that you can't do serious parallel development without it!

The only real solution to problems like these is a thorough understanding
of Haskell's evaluation order, and how and why call-by-need is different
than call-by-value. This is both a pedagogical problem and genuinely hard
-- even Haskell experts like the guys at GHC HQ sometimes spend a lot of
time chasing down space leaks. Haskell makes a trade-off here; reasoning
about denotational semantics is much easier than in most other languages
because of purity, but non-strict evaluation makes reasoning about
operational semantics a little bit harder.

In domains where you care a lot about operational semantics (like parallel
and concurrent programming, where it's absolutely critical), programmers
necessarily require a lot more experience and knowledge in order to be
effective in Haskell.

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Johan Tibell
On Tue, Jan 31, 2012 at 1:22 PM, Gregory Collins
g...@gregorycollins.net wrote:
 I completely agree on the first part, but deepseq is not a panacea either.
 It's a big hammer and overuse can sometimes cause wasteful O(n) no-op
 traversals of already-forced data structures. I also definitely wouldn't go
 so far as to say that you can't do serious parallel development without it!

I agree. The only time I ever use deepseq is in Criterion benchmarks,
as it's a convenient way to make sure that the input data is evaluated
before the benchmark starts. If you want a data structure to be fully
evaluated, evaluate it as it's created, not after the fact.

 The only real solution to problems like these is a thorough understanding of
 Haskell's evaluation order, and how and why call-by-need is different than
 call-by-value. This is both a pedagogical problem and genuinely hard -- even
 Haskell experts like the guys at GHC HQ sometimes spend a lot of time
 chasing down space leaks. Haskell makes a trade-off here; reasoning about
 denotational semantics is much easier than in most other languages because
 of purity, but non-strict evaluation makes reasoning about operational
 semantics a little bit harder.

+1

We can do a much better job at teaching how to reason about
performance. A few rules of thumb gets you a long way. I'm (slowly)
working on improving the state of affairs here.

-- Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-31 Thread Johan Tibell
On Tue, Jan 31, 2012 at 12:19 PM, Steve Severance
ssevera...@alphaheavy.com wrote:
 The webpage data was split out across tens of thousands of files compressed
 binary. I used enumerator to load these files and select the appropriate
 columns. This step was performed in parallel using parMap and worked fine
 once i figured out how to add the appropriate !s.

Even though advertised as parallel programming tools, parMap and other
functions that work in parallel over *sequential* access data
structures (i.e. linked lists.) We want flat, strict, unpacked data
structures to get good performance out of parallel algorithms. DPH,
repa, and even vector show the way.

-- Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Herbert Valerio Riedel
On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote:
 So maybe also the JSON parsing library kept too
 many unevaluated things in memory. So I could start either writing my
 own JSON parsing library (being more strict)

Jfyi, aeson has been added strict parser variants json' and value' [1]
some time ago, so you shouldn't have to write your own stricter JSON
parsing library...

 [1]: 
http://hackage.haskell.org/packages/archive/aeson/0.6.0.0/doc/html/Data-Aeson-Parser.html#g:2


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Felipe Almeida Lessa
On Mon, Jan 30, 2012 at 6:21 AM, Herbert Valerio Riedel h...@gnu.org wrote:
 On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote:
 So maybe also the JSON parsing library kept too
 many unevaluated things in memory. So I could start either writing my
 own JSON parsing library (being more strict)

 Jfyi, aeson has been added strict parser variants json' and value'
 some time ago, so you shouldn't have to write your own stricter JSON
 parsing library...

Also, besides using those variants, you may also use the
attoparsec-conduit library [1].  If you have

  processJson :: Value - IO X

then you'd need just something like

  import Data.Aeson (Value, json')
  import Data.Attoparsec.Char8 (isSpace_w8)
  import qualified Data.ByteString as B
  import qualified Data.Conduit as C
  import qualified Data.Conduit.Attoparsec as CA
  import qualified Data.Conduit.Binary as CB
  import qualified Data.Conduit.List as CL

  main = do
...
ret - forM_ fileList $ \fp - do
  C.runResourceT $
CB.sourceFile fp C.$=
jsonLines C.$=
CL.mapM processJson C.$$
CL.consume
print ret

  jsonLines :: C.Resource m = C.Conduit B.ByteString m Value
  jsonLines = C.sequenceSink () $ do
val - CA.sinkParser json'
CB.dropWhile isSpace_w8
return $ C.Emit () [val]

This code is extremely resource-friendly, since (a) you can't leak
file descriptors and (b) you just have to make sure that  processJson
function isn't too lazy.  It should be quite fast as well.

Cheers! =)

[1] http://hackage.haskell.org/package/attoparsec-conduit

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Alexander Bernauer
On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
 First of all, /learning/ to optimize Haskell can be difficult.  The
 optimizing itself is actually fairly easy in my experience, once you
 understand how the language works.

Given the fact that you have obviously mastered the learning part of
this, do you have any recommendations on what to read or on how to
proceed in order to learn how to optimize Haskell code?

I can imagine, it's not only about understanding the language itself,
but also about understanding how your compiler and its switches work,
being able to find the hot spots in your code, being able to measure the
effects of your changes, developing a good sense for the tradeoffs, etc.

So far, I have only stumpled upon chapter 25 of Real World Haskell.
Anything else you might recommend?

regards

Alex


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Malcolm Wallace

On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote:
 A strict-by-default Haskell comes with the
 implication that you can throw away most of the libraries, including the
 base library.  So yes, a strict-by-default Haskell is very well
 possible, but the question is whether you actually want that.  I
 wouldn't, because a lot of my code relies on the standard semantics.

At work, we have a strict version of Haskell, and indeed we do not use the 
standard libraries, but have built up our own versions of the ones we use.  
However, our compiler is smart enough to transform and optimise the source code 
*as if* it were non-strict: it is only at runtime that things are evaluated 
strictly.  This means that, in general, you can rely on the standard semantics 
to a surprisingly large extent.  For instance,

maybe (error foo) lines (Just hello\nworld)

will succeed without calling error, just like in Haskell.  Even if the final 
argument is supplied only at runtime, not statically, it will still do the 
right thing.

However, the downside of being strict at runtime is frequently poorer 
performance.  Stack overflows are common if you use explicit recursion:  it is 
better to use higher-order functions (map, fold, until) that are implemented at 
a lower level in C (i.e. not directly using explicit recursion themselves).  
This is a good thing of course - thinking of data structures in the aggregate, 
rather than piecemeal.
However, bulk operations do transform the entire data structure, not merely the 
fragments that are needed for the onward computation, so it can often be a net 
performance loss.  The standard lazy computational paradigm of 
generate-and-test is therefore hideously expensive, on occasion.

Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Jan-Willem Maessen
On Mon, Jan 30, 2012 at 6:24 AM, Malcolm Wallace malcolm.wall...@me.comwrote:


 On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote:
  A strict-by-default Haskell comes with the
  implication that you can throw away most of the libraries, including the
  base library.  So yes, a strict-by-default Haskell is very well
  possible, but the question is whether you actually want that.  I
  wouldn't, because a lot of my code relies on the standard semantics.

 At work, we have a strict version of Haskell, and indeed we do not use the
 standard libraries, but have built up our own versions of the ones we use.
  However, our compiler is smart enough to transform and optimise the source
 code *as if* it were non-strict: it is only at runtime that things are
 evaluated strictly.  This means that, in general, you can rely on the
 standard semantics to a surprisingly large extent.


I wanted to emphasize Malcolm's point here.  Optimizing using the original
Haskell semantics turned out to be pretty important back when I was working
on Eager Haskell.  For example, a lot of Haskell functions are written in
the following style:

f a b c
  | guard1 d = g h i
  | guard2 e = h
  | guard3 f = i
  | otherwise = j
  where d = ...expensive...
e = ...expensive...
f = ...expensive...
g = ...expensive...
h = ...expensive...
i = ...expensive...
j = ... expensive...

An an ordinary procedural language, where function calls in g, h, i, and j
might have side effects, we can't sink bindings down to the point of use.
 Even in the absence of side effects, we have to account for the fact that
some of these computations are used in some -- but not all -- right-hand
sides, and that often we need to do some inlining to discover that a value
isn't going to be used.  It turns out Haskell code relies on this sort of
thing all over the place, and simply coding around it leads to
deeply-nested let bindings that walk off the right-hand edge of the screen.

It's not difficult to rewrite most of the prelude functions in this style,
but it's no longer pretty, and it's recognizably not idiomatic Haskell.

However, bulk operations do transform the entire data structure, not merely
 the fragments that are needed for the onward computation, so it can often
 be a net performance loss.  The standard lazy computational paradigm of
 generate-and-test is therefore hideously expensive, on occasion.


This was a huge issue in Eager Haskell.  By far our worst performance was
on stream-like programs that generated infinite lists of results, and then
sliced away the useless tail.  With completely strict evaluation, this of
course doesn't work at all, but it can be tricky to bound the required
sizes of inputs even if you know how much of the output you want (imagine a
call to takeWhile or filter on an infinite list).

-Jan-Willem Maessen



 Regards,
Malcolm
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Ertugrul Söylemez
Alexander Bernauer alex-hask...@copton.net wrote:

 On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
  First of all, /learning/ to optimize Haskell can be difficult.  The
  optimizing itself is actually fairly easy in my experience, once you
  understand how the language works.

 Given the fact that you have obviously mastered the learning part of
 this, do you have any recommendations on what to read or on how to
 proceed in order to learn how to optimize Haskell code?

 I can imagine, it's not only about understanding the language itself,
 but also about understanding how your compiler and its switches work,
 being able to find the hot spots in your code, being able to measure
 the effects of your changes, developing a good sense for the
 tradeoffs, etc.

 So far, I have only stumpled upon chapter 25 of Real World Haskell.
 Anything else you might recommend?

That's the tricky part about this.  Unfortunately the monad tutorial
fallacy [1] applies here.  At some point it makes click and suddenly you
see all the data dependencies.  If you were schizophrenic, you would
probably see little arrows all over your code.

What helped me a bit was to read the Wikibooks page about Haskell's
denotational semantics [2], but other than that I pretty much learned it
by myself.

In any case it's helpful to use 'seq' explicitly instead of bang
patterns.  Personally I never use bang patterns, but always use seq,
strict patterns or guards with strict functions.  Example:

add :: Integer - Integer - Integer
add x 0 = x
add x y = (add $! succ x) (pred y)

Here the pattern forces the second argument, so there is no need to
force it yourself.  However, nothing forces the first argument, so it
needs to be forced manually, in this case by using ($!).

[1]: http://byorgey.wordpress.com/2009/01/12/
  abstraction-intuition-and-the-monad-tutorial-fallacy/

[2]: http://en.wikibooks.org/wiki/Haskell/Denotational_semantics


Greets,
Ertugrul

-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/


signature.asc
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Marc Weber
Replying to all replies at once:

 Malcolm Wallace
  At work, we have a strict version of Haskell
:-) which proofs that it is worth thinking about it.

 Ertugrul
 If you want to save the time to learn how to write efficient Haskell
 programs, you may want to have a look into the Disciple language. 
Yes - I will. Its on my TODO list for at least 12 month :(
Not sure whether there are parser combinator libraries yet.

@  Herbert Valerio Riedel  (suggesting aeson)
I gave it yet another try - see below.
It still fails.

@ Felipe Almeida Lessa  (suggesting conduits and atto parsec)
I mentioned that I already tried it. Counting lines only was a lot slower than
counting lines and parsing JSON using PHP.

@ Chris Wong
 flag is that strictness isn't just something you turn on and off willy-nilly
You're right. But those features are not required for all tasks :)
Eg have a look at Data.Map. Would a strict implementation look that different?




I came up with this now. Trying strict bytestrings and Aeson.
note the LB.fromChunks usage below to turn a strict into a lazy bytestring.

Result: PHP script doing the same runs in 33secs (using the
AFindCheckoutsAndContacts branch) The haskell program below - I stopped it
after 8 min. (At least it does no longer cause my system to swap ..

You're right: I could start profiling. I could learn about how to optimize it.)
But why? The conclusion is that if I ever use yesod - and if I ever want to
analyse logs - I should call PHP from yesod as external process !? :-(

Even if I had a core i7 (8 threads in parallel) I still would not be sure
whether Haskell would be the right choice. I agree that I assume that all data
fits into memory so that piecewise evaluation doesn't matter too much.

Thanks for all your help - it proofs once again that the Haskell
community is the most helpful I know about. Yet I think me being the
programmer Haskell is the wrong choice for this task.

Thanks Marc Weber

my new attempt - now I should start profiling.. Looks like I haven't
built all libs with profiling support ..



  import Data.Aeson.Types
  import Data.Aeson
  import Data.List
  import Control.Applicative
  import Debug.Trace
  import qualified Data.Map as M
  import Action
  import Data.Aeson.Parser as AP

  import qualified Data.ByteString.Lazy as LB

  import qualified Data.ByteString as BS
  import qualified Data.ByteString.Char8 as LBC8

  data Action = ACountLine | AFindCheckoutsAndContacts

  -- lines look like this:
  -- {id:4ee535f01550c,r:,ua:Mozilla\/5.0 (compatible; bingbot\/2.0; 
+http:\/\/www.bing.com\/bingbot.htm),t:1323644400,k:XXX,v:YY}

  data Item = Item {
id :: SB.ByteString,
ua :: SB.ByteString,
t  :: Int,
k  :: SB.ByteString,
v  :: SB.ByteString
  }

  instance FromJSON Item where
  parseJSON (Object v) = Item $
 v .: id *
 v .: ua *
 v .: t *
 v .: k *
 v .: v
  parseJSON _ = empty


  run :: Action - [FilePath] - IO ()

  run AFindCheckoutsAndContacts files = do
-- find all ids quering the server more than 100 times.
-- do so by building a map counting queries

(lines :: [BS.ByteString]) - fmap (concat . (map LBC8.lines) ) $ mapM 
BS.readFile files
let processLine :: (M.Map BS.ByteString Int) - BS.ByteString - (M.Map 
BS.ByteString Int)
processLine st line = case decode' (LB.fromChunks [line]) of
  Nothing - st -- traceShow (bad line  ++ 
(LBC8.unpack line)) st
  Just (Item id ua t k v) - M.insertWith (+) k 
1 st
let count_by_id = foldl' processLine (M.empty) lines
mapM_ print $ M.toList $ M.filter ( 100) count_by_id

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Felipe Almeida Lessa
On Mon, Jan 30, 2012 at 2:12 PM, Marc Weber marco-owe...@gmx.de wrote:
 @ Felipe Almeida Lessa  (suggesting conduits and atto parsec)
 I mentioned that I already tried it. Counting lines only was a lot slower than
 counting lines and parsing JSON using PHP.

Then please take a deeper look into my code.  What you said that
you've tried is something else.

Cheers,

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Roman Leshchinskiy
Marc Weber wrote:
 Replying to all replies at once:

 Malcolm Wallace
  At work, we have a strict version of Haskell
 :-) which proofs that it is worth thinking about it.

But doesn't necessarily prove that it's a good idea.

   Just (Item id ua t k v) - M.insertWith
 (+) k 1 st

Does replacing this by insertWith' help?

Roman




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Roman Leshchinskiy
Marc Weber wrote:
 Replying to all replies at once:

 Malcolm Wallace
  At work, we have a strict version of Haskell
 :-) which proofs that it is worth thinking about it.

But doesn't necessarily prove that it's a good idea.

   Just (Item id ua t k v) - M.insertWith
 (+) k 1 st

Does replacing this by insertWith' help?

Roman




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-30 Thread Joachim Breitner
Hi,

Am Montag, den 30.01.2012, 10:52 +0100 schrieb Alexander Bernauer:
 On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
  First of all, /learning/ to optimize Haskell can be difficult.  The
  optimizing itself is actually fairly easy in my experience, once you
  understand how the language works.
 
 Given the fact that you have obviously mastered the learning part of
 this, do you have any recommendations on what to read or on how to
 proceed in order to learn how to optimize Haskell code?
 
 I can imagine, it's not only about understanding the language itself,
 but also about understanding how your compiler and its switches work,
 being able to find the hot spots in your code, being able to measure the
 effects of your changes, developing a good sense for the tradeoffs, etc.
 
 So far, I have only stumpled upon chapter 25 of Real World Haskell.
 Anything else you might recommend?

Although I would not claim that I have mastered this, I did recently
hold a talk that touched some of these issues, and also exhibits a case
where you want something more fine-grained than just strictness or
lazyness. From your name I think it is safe to point you to a German
document:
http://www.joachim-breitner.de/blog/archives/539-Guest-lecture-on-Haskell-performance.html

Greetings,
Joachim

-- 
Joachim nomeata Breitner
  m...@joachim-breitner.de  |  nome...@debian.org  |  GPG: 0x4743206C
  xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-29 Thread Don Stewart
Generally strict Haskell means using strict data types - vectors, arrays,
bytestrings, intmaps where required.

However, you usually don't want all code and data strict, all the time,
since laziness/on-demand eval is critical for deferring non-essential work.

Summary; -fstrict wouldn't magically make your code good. Using the right
balance of strict and lazy code, via the right choice of strict and lazy
types, however, often does.

Id be interested to know what choices were made in your log file case led
you into problems -- using something excessively lazy (like lazy lists) or
something excessively strict (like strict bytestrings) would both be
suboptimal for log analysis. A hybrid type like a lazy bytestring, would be
more appropriate.

On Sunday, January 29, 2012, Marc Weber marco-owe...@gmx.de wrote:
 A lot of work has been gone into GHC and its libraries.
 However for some use cases C is still preferred, for obvious speed
 reasons - because optimizing an Haskell application can take much time.

 Is there any document describing why there is no ghc --strict flag
 making all code strict by default?
 Wouldn't this make it easier to apply Haskell to some additional fields
 such as video processing etc?

 Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
 compiler?

 Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
 show that the idea is not new.

 Eg some time ago I had to do some logfile analysis. I ended doing it in
 PHP because optimizing the Haskell code took too much time.

 Marc Weber

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-29 Thread Chris Wong
On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber marco-owe...@gmx.de wrote:
 A lot of work has been gone into GHC and its libraries.
 However for some use cases C is still preferred, for obvious speed
 reasons - because optimizing an Haskell application can take much time.

As much as any other high-level language, I guess. Don't compare
apples to oranges and complain oranges aren't crunchy enough ;)

 Is there any document describing why there is no ghc --strict flag
 making all code strict by default?

Yes -- it's called the Haskell Report :)

GHC does a lot of optimization already. If making something strict
won't change how it behaves, it will, using a process called
strictness analysis.

The reason why there is no --strict flag is that strictness isn't just
something you turn on and off willy-nilly: it changes how the whole
language works. Structures such as infinite lists and Don Stewart's
lazy bytestrings *depend* on laziness for their performance.

 Wouldn't this make it easier to apply Haskell to some additional fields
 such as video processing etc?

 Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
 compiler?

See above.

 Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
 show that the idea is not new.

Not sure what that does, but I'll have a look at it.

 Eg some time ago I had to do some logfile analysis. I ended doing it in
 PHP because optimizing the Haskell code took too much time.

That probably because you're using linked lists for strings. For
intensive text processing, it's better to use the text package instead
[1]

Chris

[1] http://hackage.haskell.org/package/text

 Marc Weber

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-29 Thread Ertugrul Söylemez
Marc Weber marco-owe...@gmx.de wrote:

 A lot of work has been gone into GHC and its libraries.
 However for some use cases C is still preferred, for obvious speed
 reasons - because optimizing an Haskell application can take much
 time.

 Is there any document describing why there is no ghc --strict flag
 making all code strict by default?
 Wouldn't this make it easier to apply Haskell to some additional
 fields such as video processing etc?

 Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
 compiler?

 Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
 show that the idea is not new.

 Eg some time ago I had to do some logfile analysis. I ended doing it
 in PHP because optimizing the Haskell code took too much time.

First of all, /learning/ to optimize Haskell can be difficult.  The
optimizing itself is actually fairly easy in my experience, once you
understand how the language works.

Usually the nonstrictness is no bottleneck.  However, you have to know
that you are in a nonstrict language.  In fact, I find myself having
difficulties writing efficient code in a strict language.

Now to answer your question:  A strict-by-default Haskell comes with the
implication that you can throw away most of the libraries, including the
base library.  So yes, a strict-by-default Haskell is very well
possible, but the question is whether you actually want that.  I
wouldn't, because a lot of my code relies on the standard semantics.  I
would also expect problems with the way Haskell performs I/O, because it
would mean that

forever (putStrLn Hello world)

would cause a heap overflow, if Haskell were strict.  Note that we don't
have control structures.  We have combinators, and their nonstrictness
is essential.  The flag you are proposing would turn Haskell into a
language that is different enough that you couldn't do many useful
things with it.

If you want to save the time to learn how to write efficient Haskell
programs, you may want to have a look into the Disciple language.  You
will find that it has a different type system, which captures side
effects explicitly to make a pure strict language even possible.


Greets,
Ertugrul

-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/


signature.asc
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-29 Thread Austin Seipp
The strict-ghc-plugin (under my maintenance) is just a continuation of
one of the original demos Max had for plugin support in the compiler.
The idea is fairly simple: 'let' and 'case' are the forms for creating
lazy/strict bindings in Core. It just systematically replaces all
occurrences of 'let' in Core with 'case'. So 'let b = e1 in e2'
becomes 'case e1 of { b - e2 }', making 'b' strict. It also replaces
all applications of the form 'App e1 e2' (which is also lazy, as e2 is
evaluated on demand) with an equivalent binding like 'case e2 of { x
- App e1 x }'. Pretty simple, and results in a totally strict
program.

The idea is just a proof of concept; in particular, I (and likely Max
although I cannot speak for him) am not using it as a position to say
that sometimes you want everything strict. You don't; at some point,
you're not even using Haskell anymore I suppose (remember: non-strict
semantics.) I can't think of any instance in which I would need or
want to use this plugin, honestly. But maybe someone else would - I
did refactor it to where you can strictify individual functions, as
opposed to full-blown modules, via annotations. So you could
selectively strictify things if you found it beneficial on certain
identifiers. But then there's the question of what affect that has on
the rest of GHC's optimizers, which I cant answer: the strictifier
modifies the pipeline to be the *first* pass, and the remaining ones
run afterwords. Compilers are built on heuristics and built for
'average' code. Sometimes these heuristics interact in odd ways,
especially with code that may deviate from 'the norm.' Once you're
fighting the optimizer, it can become a very difficult battle to win.
Careful analysis and selective optimization is probably going to take
you farther than hitting it with a giant hammer.

Having lazy and strict data structures and knowing when/where to use
them is crucial for good performance, and both have their merits (same
with every other thing under the sun, like by-ref/by-val semantics in
`data` types, which you can control with UNPACK etc.) I think we could
most certainly use better tools for analyzing low-level performance
details and the tradeoff between strictness/laziness and (especially
in large codebases,) but I don't think systematically making
everything strict is going to be the right idea in a vast majority of
situations.

On Sun, Jan 29, 2012 at 4:12 PM, Chris Wong
chrisyco+haskell-c...@gmail.com wrote:
 On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber marco-owe...@gmx.de wrote:
 A lot of work has been gone into GHC and its libraries.
 However for some use cases C is still preferred, for obvious speed
 reasons - because optimizing an Haskell application can take much time.

 As much as any other high-level language, I guess. Don't compare
 apples to oranges and complain oranges aren't crunchy enough ;)

 Is there any document describing why there is no ghc --strict flag
 making all code strict by default?

 Yes -- it's called the Haskell Report :)

 GHC does a lot of optimization already. If making something strict
 won't change how it behaves, it will, using a process called
 strictness analysis.

 The reason why there is no --strict flag is that strictness isn't just
 something you turn on and off willy-nilly: it changes how the whole
 language works. Structures such as infinite lists and Don Stewart's
 lazy bytestrings *depend* on laziness for their performance.

 Wouldn't this make it easier to apply Haskell to some additional fields
 such as video processing etc?

 Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
 compiler?

 See above.

 Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
 show that the idea is not new.

 Not sure what that does, but I'll have a look at it.

 Eg some time ago I had to do some logfile analysis. I ended doing it in
 PHP because optimizing the Haskell code took too much time.

 That probably because you're using linked lists for strings. For
 intensive text processing, it's better to use the text package instead
 [1]

 Chris

 [1] http://hackage.haskell.org/package/text

 Marc Weber

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Regards,
Austin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] strict version of Haskell - does it exist?

2012-01-29 Thread Marc Weber
Excerpts from Don Stewart's message of Sun Jan 29 22:55:08 +0100 2012:
 Summary; -fstrict wouldn't magically make your code good.
No. you're right. I don't expect that to happen. I agree on it being
always the programmers fault using wrong tools or not knowing the tools
well enough to get a job done.

The PHP code looks like this:

foreach(glob('*.txt') as $file){
  foreach(split(file_get_contents($file)) = $line){
$parsed_line = json_decode($line);
// get some unix timestamps, keep some hashes of seen clients
// (cookie ids) and such
// check how many minutes before a checkout the customer visited
// the site - and whether he did so for a couple of days.
  }
}

// print result

The files are about 300 MB in size. However memory usage grew and grew
and grew - I had to kill it or limit amount of files.
The PHP code runs in a couple of seconds (parsing json and loading
files).. the Haskell app took much longer. That PHP is fast is no
surprise: I expect json_decode and split to be implemented in C.

So yes - I used lazy lists. However 8GB of RAM should have been enough
to keep things in RAM. So maybe also the JSON parsing library kept too
many unevaluated things in memory. So I could start either writing my
own JSON parsing library (being more strict) or profile the application
many times - but I don't want to.
Ignoring the json parsing I also gave conduits a try - only counting
lines. I know its experimental - but from its description I concluded it
would prevent me being a stupid Haskell programmer from taking too much
memory even using bad Haskell code.
However it looked like splitting the lines only counting them (recognizing
utf-8 chars) took a lot longer than also doing the json parsing in PHP.
Then the conduit implementation looked funny: hGetLine is used to feed
a line each time ... (luckily - because the utf8-libraries don't provide
nice ways to parse incomplete chunks of utf-8 bytes such as returning
Either IncompleteMissingByte UTF8Chunk)..

Probably the split(,\n) in PHP doesn't parse utf-8 chars - and luckily
it doesn't seem to matter because \n only uses one byte.

I know that I'm not an Haskell expert. I still got the impression that
getting nice performance would be a small challenge and require much
more time than I spend on the PHP version.

That's why I'm wondering why there is no -fstrict option for such simple
use cases because Haskell has so many optimizations other languages
dream about.

I mean lot's of non lazy language still get their jobs done. So it
always depends on the use case.

Isn't it easy to add a compiler flag to GHC adding those ! strictness 
annotations
everywhere possible? Then simple use case like this would not be a huge
challenge.

Maybe you're right: I should just prepare some dummy files and ask the
community to help. However optimizing the JSON parser and my code just
seemed to be too much effort ..

Marc Weber

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe