Re: [Haskell-cafe] Data.Text performance problem

2010-09-13 Thread Petr Prokhorenkov
I really didn't expect mapAccumL to have quadratic complexity. Thank you a
lot for the fix!


On Mon, Sep 13, 2010 at 5:06 AM, Bryan O'Sullivan b...@serpentine.comwrote:

 On Sun, Sep 12, 2010 at 12:23 PM, Petr Prokhorenkov 
 prokhoren...@gmail.com wrote:

 I experienced a following problem while dealing with some text processing.


 Thanks for the report and the test case. There's nothing wrong with your
 code - read on for details.

 You ran into one of the few functions in Data.Text that I copied straight
 over from the list implementation due to it not being used often.
 Unfortunately, that implementation constructs a new Text value (using cons)
 on every iteration through the loop, and as you might expect that's very
 slow even on tiny inputs, as it has quadratic behaviour.

 I've rewritten both strict and lazy mapAccumL and mapAccumR to use as much
 of the stream fusion machinery as possible. (By the way, there's an
 interesting fact behind why those functions started out life as they did:
 you can't write mapAccum functions using only stream machinery, due to their
 types, and the strict code is more work to write if you can't use the stream
 machinery. In the early days it just wasn't worth writing the more complex
 variants of them, as I had more pressing performance concerns at the time.)

 Where the old version of mapAccumL caused your test case to take 5 seconds
 to process an 11KB file (ouch!), with the rewritten version, your code can
 process an 81MB file in the same amount of time, without any changes to your
 code, and that O(n^2) behaviour is gone :-)

 text 0.8.1.0 is now up on hackage, with the fix included. Enjoy!

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Text performance problem

2010-09-13 Thread Bryan O'Sullivan
On Mon, Sep 13, 2010 at 3:26 AM, Petr Prokhorenkov
prokhoren...@gmail.comwrote:

 I really didn't expect mapAccumL to have quadratic complexity. Thank you a
 lot for the fix!


No problem. By the way, in my benchmarks, mapAccumL on Text is now faster
than on ByteString :-)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Data.Text performance problem

2010-09-12 Thread Petr Prokhorenkov
I experienced a following problem while dealing with some text processing.

I have a text and want to get the same text with parts enclosed into {} or
[] stripped away. Substituting them with a ' ' would also work.

Here is the code I wrote (T is Data.Text):

stripBrackets :: T.Text - T.Text
stripBrackets text = snd $ T.mapAccumL f 0 text where
f depth c = let
   depth' = depth + d' c
c' | depth  0 || depth'  0 = ' '
  | otherwise = c
in
   (depth', c')

   d' '{' = 1
   d' '[' = 1
d' '}' = -1
   d' ']' = -1
d' _   = 0

The only problem is that it takes about a minute to complete on 3GHz+
processor when text is a 30k chars long.

Any ideas how to improve this code?

--
Regards, Petr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Text performance problem

2010-09-12 Thread Daniel Fischer
On Sunday 12 September 2010 21:23:50, Petr Prokhorenkov wrote:
 I experienced a following problem while dealing with some text
 processing.

 I have a text and want to get the same text with parts enclosed into {}
 or [] stripped away. Substituting them with a ' ' would also work.

 Here is the code I wrote (T is Data.Text):

 stripBrackets :: T.Text - T.Text
 stripBrackets text = snd $ T.mapAccumL f 0 text where
 f depth c = let
depth' = depth + d' c
 c' | depth  0 || depth'  0 = ' '

   | otherwise = c

 in
(depth', c')

d' '{' = 1
d' '[' = 1
 d' '}' = -1
d' ']' = -1
 d' _   = 0

 The only problem is that it takes about a minute to complete on 3GHz+
 processor when text is a 30k chars long.

 Any ideas how to improve this code?

First, is it intentional that

stripBrackets a]b[c] = a]b[c] and not a]b?

Also that it doesn't distinguish between the types of brackets, e.g.

stripBrackets {a] = ?

Of course, if you know that the text is properly bracketed, that's not 
important.

Concerning the performance, it seems that mapAccumL actually has quadratic 
complexity instead of linear because it doesn't get fused.
Ouch.

I think

stripBrackets :: T.Text - T.Text
stripBrackets text = T.concat $ go 0 text
  where
go depth inp = case T.breakBy (`elem` {[]}) inp of
(pre,post) -
  case T.uncons post of
Nothing - if depth  0
then [T.pack $ replicate (T.length pre) 
' ']
else [pre]
Just (c,tl) -
let depth'
  | c `elem` {[ = depth + 1
  | otherwise = depth - 1
in (if depth  0
 then T.pack (replicate (T.length pre) ' ')
 else pre) :
T.pack (if depth  0 || depth'  0
then  
else [c])
: go depth' tl

has the same semantics as your function. It's not blazingly fast either 
(using String for that purpose is much faster), but it's orders of 
magnitude faster than T.mapAccumL.

Note that it's easy to remove the parts enclosed in brackets instead of 
replacing it by ' ' here.


 --
 Regards, Petr

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Text performance problem

2010-09-12 Thread Bryan O'Sullivan
On Sun, Sep 12, 2010 at 12:23 PM, Petr Prokhorenkov
prokhoren...@gmail.comwrote:

 I experienced a following problem while dealing with some text processing.


Thanks for the report and the test case. There's nothing wrong with your
code - read on for details.

You ran into one of the few functions in Data.Text that I copied straight
over from the list implementation due to it not being used often.
Unfortunately, that implementation constructs a new Text value (using cons)
on every iteration through the loop, and as you might expect that's very
slow even on tiny inputs, as it has quadratic behaviour.

I've rewritten both strict and lazy mapAccumL and mapAccumR to use as much
of the stream fusion machinery as possible. (By the way, there's an
interesting fact behind why those functions started out life as they did:
you can't write mapAccum functions using only stream machinery, due to their
types, and the strict code is more work to write if you can't use the stream
machinery. In the early days it just wasn't worth writing the more complex
variants of them, as I had more pressing performance concerns at the time.)

Where the old version of mapAccumL caused your test case to take 5 seconds
to process an 11KB file (ouch!), with the rewritten version, your code can
process an 81MB file in the same amount of time, without any changes to your
code, and that O(n^2) behaviour is gone :-)

text 0.8.1.0 is now up on hackage, with the fix included. Enjoy!
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data.Text performance problem

2010-09-12 Thread Felipe Lessa
On Sun, Sep 12, 2010 at 10:06 PM, Bryan O'Sullivan b...@serpentine.com wrote:
 text 0.8.1.0 is now up on hackage, with the fix included. Enjoy!

Wow! That was fast! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe