Re: [Haskell-cafe] Streaming bytes and performance

2013-03-20 Thread Konstantin Litvinenko

On 03/20/2013 12:47 AM, Branimir Maksimovic wrote:

Your problem is that main_6 thunks 'i' and 'a' .
If you write (S6 !i !a) - get
than there is no problem any more...



Nope :( Unfortunately that doesn't change anything. Still allocating...


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-20 Thread Konstantin Litvinenko

On 03/20/2013 11:17 AM, Branimir Maksimovic wrote:

Are you sure? I use ghc 7.6.2


Huh, I use 7.4.2, and if 7.6.2 can handle this I will try to switch. Not 
sure how to do that on ubuntu 12.10...





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-20 Thread Konstantin Litvinenko

On 03/20/2013 11:17 AM, Branimir Maksimovic wrote:


Are you sure? I use ghc 7.6.2 (compiled with -O2) and without bang patterns
for 1million iterations it blows stack space.
With bang patterns it runs in constant space , same as
other version?


Okay, I have found the root of allocation problem. It is not because of 
7.4.2. If I use -auto-all it somehow change code generation and start 
allocating. If I remove -auto-all from command line than no allocation 
occurs. That really weird because now I don't know how to profile and 
get meaningful results :(




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need some advice around lazy IO

2013-03-19 Thread Konstantin Litvinenko

On 03/19/2013 07:12 AM, Edward Kmett wrote:

Konstantin,

Please allow me to elaborate on Dan's point -- or at least the point
that I believe that Dan is making.

Using,

let bug = Control.DeepSeq.rnf str `seq` fileContents2Bug str


or ($!!)will create a value that *when forced* cause the rnfto occur.

As you don't look at buguntil much later this causes the same problem as
before!



Yes. You (and Dan) are totally right. 'Let' just bind expression, not 
evaluating it. Dan's evaluate trick force rnf to run before hClose. As I 
said - it's tricky part especially for newbie like me :)




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-19 Thread Konstantin Litvinenko

On 03/18/2013 02:14 PM, Gregory Collins wrote:

Put a bang pattern on your accumulator in go. Since the value is not
demanded until the end of the program, you're actually just building up
a huge space leak there.


Fixed that


Secondly, unconsing from the lazy bytestring will cause a lot of
allocation churn in the garbage collector -- each byte read in the input
forces the creation of a new L.ByteString, which is many times larger.


Nope. L.ByteString is created along with strict ByteString but content 
not copied. And, in fact, that not a problem. The problem is that GHC 
unable to optimize constantly changing state in State monad. I don't 
know is it posible or not and if it is than what should I do to allow 
such optimization.


import Control.Monad.State.Strict

data S6 = S6 Int Int

main_6 = do
let r = evalState go (S6 1 0)
print r
  where
go = do
(S6 i a) - get
if (i == 0) then return a else put (S6 (i - 1) (a + i))  go

main_7 = do
let r = go (S6 1 0)
print r
  where
go (S6 i a)
| i == 0 = a
| otherwise = go $ S6 (i - 1) (a + i)

main = main_7

If I run main_6 I get constant allocations. If I run main_7 I get no 
allocations.


Does anybody know how to overcome this inefficiency?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-19 Thread Konstantin Litvinenko

On 03/19/2013 10:32 PM, Don Stewart wrote:

Oh, I forgot the technique of inlining the lazy bytestring chunks, and
processing each chunk seperately.

$ time ./fast
4166680
./fast  1.25s user 0.07s system 99% cpu 1.325 total

Essentially inline Lazy.foldlChunks and specializes is (the inliner
should really get that).
And now we have a nice unboxed inner loop, which llvm might spot:

$ ghc -O2 -funbox-strict-fields fast.hs  --make -fllvm
$ time ./fast
4166680
./fast  1.07s user 0.06s system 98% cpu *1.146 total*

So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)


Thanks Don, but after some investigation I came to conclusion that 
problem is in State monad


{-# LANGUAGE BangPatterns #-}

import Control.Monad.State.Strict

data S6 = S6 !Int !Int

main_6 = do
let r = evalState go (S6 1 0)
print r
  where
go = do
(S6 i a) - get
if (i == 0) then return a else (put (S6 (i - 1) (a + i)))  go

main_7 = do
let r = go (S6 1 0)
print r
  where
go (S6 i a)
| i == 0 = a
| otherwise = go $ S6 (i - 1) (a + i)

main = main_6

main_6 doing constant allocations while main_7 run in constant space. 
Can you suggest something that improve situation? I don't want to 
manually unfold all my code that I want to be fast :(.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-19 Thread Konstantin Litvinenko

On 03/19/2013 10:53 PM, Nicolas Trangez wrote:

On Tue, 2013-03-19 at 20:32 +, Don Stewart wrote:

So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)


You could try something like this using Conduit:

{-# LANGUAGE BangPatterns #-}
module Main (main) where

import Data.Conduit
import qualified Data.Conduit.List as L
import qualified Data.Conduit.Binary as B
import qualified Data.ByteString.Char8 as BS8

main :: IO ()
main = print = runResourceT (
 B.sourceFile filename $$ L.fold (\(!a) (!b) - a + BS8.count ' ' b)
(0 :: Int))
   where
 filename = ...


Please stops counting spaces! :) It was a MODEL that demonstrates 
constant allocation of state when I used State monad. That's the 
*problem*. I mention in my first email that I do know how to count 
spaces using one-line L.foldl with no allocations at all :).



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Streaming bytes and performance

2013-03-19 Thread Konstantin Litvinenko

On 03/19/2013 10:49 PM, Konstantin Litvinenko wrote:

{-# LANGUAGE BangPatterns #-}

import Control.Monad.State.Strict

data S6 = S6 !Int !Int

main_6 = do
 let r = evalState go (S6 1 0)
 print r
   where
 go = do
 (S6 i a) - get
 if (i == 0) then return a else (put (S6 (i - 1) (a + i)))  go

main_7 = do
 let r = go (S6 1 0)
 print r
   where
 go (S6 i a)
 | i == 0 = a
 | otherwise = go $ S6 (i - 1) (a + i)

main = main_6

main_6 doing constant allocations while main_7 run in constant space.
Can you suggest something that improve situation? I don't want to
manually unfold all my code that I want to be fast :(.


Correction - they both run in constant space, that's not a problem. The 
problem is main_6 doing constant allocation/destroying and main_7 doesn't.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Streaming bytes and performance

2013-03-18 Thread Konstantin Litvinenko

Hi All!

I tune my toy project for performance and hit the wall on simple, in 
imperative world, task. Here is the code that model what I'm trying to 
achieve


import qualified Data.ByteString.Lazy as L
import Data.Word8(isSpace)
import Data.Word
import Control.Monad.State

type Stream = State L.ByteString

get_byte :: Stream (Maybe Word8)
get_byte = do
s - get
case L.uncons s of
Nothing - return Nothing
Just (x, xs) - put xs  return (Just x)

main = do
f - L.readFile test.txt
let r = evalState count_spaces f
print r
  where
count_spaces = go 0
  where
go a = do
x - get_byte
case x of
Just x' -  if isSpace x' then go (a + 1) else go a
Nothing - return a

It takes the file and count spaces, in imperative way, consuming bytes 
one by one. The problem is: How to rewrite this to get rid of constant 
allocation of state but still working with stream of bytes? I can 
rewrite this as one-liner L.foldl, but that doesn't help me in any way 
to optimize my toy project where all algorithms build upon consuming 
stream of bytes.


PS. My main lang is C++ over 10 years and I only learn Haskell :)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need some advice around lazy IO

2013-03-18 Thread Konstantin Litvinenko

On 03/17/2013 07:08 AM, C K Kashyap wrote:

I am working on an automation that periodically fetches bug data from
our bug tracking system and creates static HTML reports. Things worked
fine when the bugs were in the order of 200 or so. Now I am trying to
run it against 3000 bugs and suddenly I see things like - too  many open
handles, out of memory etc ...

Here's the code snippet - http://hpaste.org/84197

It's a small snippet and I've put in the comments stating how I run into
out of file handles or simply file not getting read due to lazy IO.

I realize that putting ($!) using a trial/error approach is going to be
futile. I'd appreciate some pointers into the tools I could use to get
some idea of which expressions are building up huge thunks.


You problem is in

let bug = ($!) fileContents2Bug str

($!) evaluate only WHNF and you need NF. Above just evaluate to first 
char in a file, not to all content. To fully evaluate 'str' you need 
something like


let bug = Control.DeepSeq.rnf str `seq` fileContents2Bug str





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need some advice around lazy IO

2013-03-18 Thread Konstantin Litvinenko

On 03/18/2013 06:06 PM, Dan Doel wrote:

Do note that deepSeq alone won't (I think) change anything in your
current code. bug will deepSeq the file contents.


rfn fully evaluate 'bug' by reading all file content. Later hClose will 
close it and we done. Not reading all content will lead to semi closed 
handle, leaked in that case. Handle will be opened until hGetContents 
lazy list hit the end.


 And the cons will

seq bug. But nothing is evaluating the cons. And further, the cons
isn't seqing the tail, so none of that will collapse, either. So the
file descriptors will still all be opened at once.

Probably the best solution if you choose to go this way is:

 bug - evaluate (fileContents2Bug $!! str)

which ties the evaluation of the file contents into the IO execution.
At that point, deepSeqing the file is probably unnecessary, though,
because evaluating the bug will likely allow the file contents to be
collected.


evaluate do the same as $! - evaluate args to WHNF. That won't help in 
any way. Executing in IO monad doesn't imply strictness Thats why mixing 
lazy hGetContent with strict hOpen/hClose is so tricky.




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] regex-pcre is not working with UTF-8

2012-08-21 Thread Konstantin Litvinenko

On 08/18/2012 06:16 PM, José Romildo Malaquias wrote:

Hello.

It seems that the regex-pcre has a bug dealing with utf-8:

I hope this bug can be fixed soon.

Is there a bug tracker to report the bug? If so, what is it?


You need something like that

let pat = makeRegexOpts (compUTF8 .|. defaultCompOpt) defaultExecOpt 
(@'(.+?)'@ :: B.ByteString)


and than pat will match correctly.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Organizing big repository

2011-10-27 Thread Konstantin Litvinenko
I am trying to understand how to organize my code and edit-compile-run cycles. I can't figure out 
how to setup environment in such why that when I build some program using cabal, cabal will rebuild 
program dependencies if some was changed. I don't want to configure/build/install manually.
Having program 'foo' depends on lib 'bar' I want to edit some files in 'bar' than build 'foo' and 
get 'bar' rebuilt and 'foo' rebuilt/relink.

How can I do this?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe