Re[2]: [Haskell-cafe] Substring replacements

2005-12-23 Thread Bulat Ziganshin
Hello Branimir,

Wednesday, December 21, 2005, 10:18:43 AM, you wrote:

try to add

{-# NOINLINE replace #-}

to both programs and repeat comparision

BM These are tests:
BM No optimisations (no -O):

NOINLINE just prevents RunTimeCompilation (see wiki page for details),
so this way you will test speed of replace on previously unknown
string. disabling optimization says nothing about real speed of
optimized program, which searches for the many different strings

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-23 Thread Branimir Maksimovic





From: Bulat Ziganshin [EMAIL PROTECTED]
Reply-To: Bulat Ziganshin [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: haskell-cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 23 Dec 2005 11:32:01 +0300

Hello Branimir,

Wednesday, December 21, 2005, 10:18:43 AM, you wrote:

try to add

{-# NOINLINE replace #-}

to both programs and repeat comparision

BM These are tests:
BM No optimisations (no -O):

NOINLINE just prevents RunTimeCompilation (see wiki page for details),
so this way you will test speed of replace on previously unknown
string. disabling optimization says nothing about real speed of
optimized program, which searches for the many different strings



I got it. These tests were with NOINLINE in both cases but I didn;t
saw any speed difference in results as actually replace (straight)
and searchReplace (KMP) is just called for two differnet strings.
Perhaps if I call that for long list of short patterns patterns on short 
string,

test would display different results (INLINE wouldn't help).
I'll try that next.

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-23 Thread Daniel Fischer
Hello Bulat,

I'm not sure what your point is, let's try to enlighten me.

Am Mittwoch, 21. Dezember 2005 16:30 schrieben Sie:
 Hello Daniel,

 Wednesday, December 21, 2005, 5:20:18 PM, you wrote:

 DF ordinarily, on my computer, your version of straightforward is 10-15%
 faster DF than KMP (search-patterns are now supplied on the command line
 -- which has DF no big impact;

 of course. the search pattern can be compiled into the search
 algorithm just at the time of program execution. just for example, if
 you have code

 main = do [x] - getArgs
   y - return $ map (\n - fac $ read x) [1..10^6]

 then `fac $ read x` will be computed just one time. in this time, if
 your perform many searches with the same pattern - compiler must
 initialize array just one time and use it for all searches

Errrh, what here? If I want to replace each occurence of a pattern within a 
String, of course I want the arrays built only once, destroying and 
rebuilding them would be deliberate stupidity, that can't be your point, 
certainly. And also if we interactively search and replace, as in an editor, 
we'd hold on to the arrays until we get the message 'no more of that, next 
pattern'.


 DF searched string is  able sea...; all compiled with -O2;
 DF NOINLINE makes no difference -- at least with optimisations on --

 why? i think that is the right way to check speed of optimized program
 without pre-compiling search pattern to search/replace algorithm at
 runtime
? What has NOINLINE to do with precompiling the search pattern?
BTW, I think, the KMP-algorithm is too large to be inlined anyway (maybe, if 
we explicitly asked for it), so I wouldn't expect any NOINLINE-effect in the 
first place.

And about RunTimeCompilation, I don't quite understand the Wiki-page, what I 
imagined might be the point there is that, if I know the search pattern 
before, I might import the general algorithm, write a function
searchReplacePattern = searchReplace pattern
and the arrays might then be built at compile-time and included in the object 
code -- but I've no idea whether compilers are clever enough to do that (I 
believe it should be possible to write compilers which would do that, but is 
it worth the trouble -- or is that sort of optimisation easy and routinely 
done?) -- but why that would be called RunTimeCompilation, I cannot imagine.
So then, the thing that might make a difference, would be not to give the 
search pattern at compile-time, which I did. Of course, since we search a 
long String, the time needed to build the arrays is minute in relation to the 
time needed to traverse the String, but in the more realistic situation of a 
shorter String (50 kB instead of 48 MB), it would be significant.

 DF ; without optimisations, KMP suffers far worse than straightforward).

 this test is meaningless

Well, it's not really a test for the algorithm, but for ghc's optimiser and 
I've forgotten the exact numbers, but KMP compiled without optimisation is 
much much, much slower than with -O2, so the optimiser does a really great 
job.

  KMP is O(m) while straightforward is O(m*n).

 DF Where m is the length of the input and n is the length of the
 searched-for DF pattern, I think?
 DF But these are worst-case complexities, I believe, ordinarily,
 straightforward DF will be O(m), too.

 and have longer init time (and take more space), so on typical
 searches it will be worser than trivial algorithm

I believe you've misread here. I'm saying that while the worst case complexity 
for the straighforward (or trivial) algorithm is O(n*m) -- as witnessed by my 
test, which was deliberately designed to be bad for that --, usually, in 
most real situations, that will have O(m) complexity, too.
And I -- like you -- believe that for typical searches the straightforward 
algorithm will be faster (at least with a clever implementation, like yours).

Cheers,
Daniel

P.S.: I'd really appreciate another attempt at explaining RunTimeCompilation 
to me (might well be an URL of a paper or something).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-22 Thread Daniel Fischer
Am Mittwoch, 21. Dezember 2005 15:20 schrieb Daniel Fischer:
  i'm 90% sure that straightforward method must be faster for one-time
  searches.

 I'm far less sure of that. If you have a really short search-pattern, I
 think that probably straightforward search is indeed faster (at least, if
 the search-pattern isn't supplied at compile-time). But if you have a
 pattern of length 100, say, and lots of relatively long partial matches,
 chances are that KMP will move on something like 50 chars at once, which
 would have to be re-checked by straightforward. I've no idea, when that
 would outweigh the extra time of building the KMP-arrays, though. In my
 test -- extremely unrealistic, but provides a more or less worst case
 example --
 straightforward must make roughly half a million character comparisons to
 pass each 'c', while KMP does 2000 (+/-2) (one way through the array and
 back), so there are situations where KMP definitely is faster. But
 ordinarily, on my computer, your version of straightforward is 10-15%
 faster than KMP (search-patterns are now supplied on the command line --
 which has no big impact; searched string is  able sea...; all compiled
 with -O2; NOINLINE makes no difference -- at least with optimisations on
 --; without optimisations, KMP suffers far worse than straightforward).


I've now tested with
main = do args - getArgs
  n - case args of
 (ar:_) - readIO ar `catch` (\_ - return 400)
 _  - return 500
  let src = replicate n 'r'
  dst =  # 
  str = replicate (n-1) 'r' ++ 'c': replicate n 'r'
  k   = if n  200 then ceiling (5e6 / fromIntegral n)
   else ceiling (1e7 / fromIntegral n)
  out = searchReplace src dst $ concat $ replicate k str
  putStr Working with 
  print n
  print $ length out
  putStrLn Done

and KMP wins for n == 1 and n = 12, also for  able seasea..., KMP wins for 
search-patterns of length 1 and patterns with no partial matches (and some 
others), but generally -- on my thing -- Bulat's version wins.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-21 Thread Daniel Fischer
Am Mittwoch, 21. Dezember 2005 08:18 schrieb Branimir Maksimovic:
 From: Bulat Ziganshin [EMAIL PROTECTED]

 Reply-To: Bulat Ziganshin [EMAIL PROTECTED]
 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: haskell-cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Tue, 20 Dec 2005 23:55:22 +0300
 
 Hello Branimir,
 
 Tuesday, December 20, 2005, 9:48:48 PM, you wrote:
 
 BM I've finally performed test on amd64 and result is a same as on intel.
 BM KMP always wins. So KMP is best suited for non indexed strings
 BM and I guess should be used in library as prefered search/replace
 method.
 BM This test favors straightforward search.
 
 i'm 90% sure that straightforward method must be faster for one-time
 searches.

I'm far less sure of that. If you have a really short search-pattern, I think 
that probably straightforward search is indeed faster (at least, if the 
search-pattern isn't supplied at compile-time). But if you have a pattern of 
length 100, say, and lots of relatively long partial matches, chances are 
that KMP will move on something like 50 chars at once, which would have to be 
re-checked by straightforward. I've no idea, when that would outweigh the 
extra time of building the KMP-arrays, though. In my test -- extremely 
unrealistic, but provides a more or less worst case example -- 
straightforward must make roughly half a million character comparisons to 
pass each 'c', while KMP does 2000 (+/-2) (one way through the array and 
back), so there are situations where KMP definitely is faster. But 
ordinarily, on my computer, your version of straightforward is 10-15% faster 
than KMP (search-patterns are now supplied on the command line -- which has 
no big impact; searched string is  able sea...; all compiled with -O2; 
NOINLINE makes no difference -- at least with optimisations on --; without 
optimisations, KMP suffers far worse than straightforward).


 KMP is O(m) while straightforward is O(m*n).

Where m is the length of the input and n is the length of the searched-for 
pattern, I think?
But these are worst-case complexities, I believe, ordinarily, straightforward 
will be O(m), too.


 your test may give better results with KMP algorithm just

 because you repeat the same search many times and it was automatically
 run-time compiled as described on the wiki page about KMP

My results seem to witness against that.


 My test favors straightforward, in any other case KMP wins by order of
 magnitude.
Can you give example tests?

 I think that straightfoirward is better then KMP with optimisations
 off is due more complex program.

 try to add
 
 {-# NOINLINE replace #-}
 
 to both programs and repeat comparision

 Greetings, Bane.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Substring replacements

2005-12-21 Thread Bulat Ziganshin
Hello Daniel,

Wednesday, December 21, 2005, 5:20:18 PM, you wrote:

 BM I've finally performed test on amd64 and result is a same as on intel.
 BM KMP always wins. So KMP is best suited for non indexed strings
 BM and I guess should be used in library as prefered search/replace
 method.
 BM This test favors straightforward search.
 
 i'm 90% sure that straightforward method must be faster for one-time
 searches.

DF I'm far less sure of that. If you have a really short search-pattern, I 
think 
DF that probably straightforward search is indeed faster

sorry, i mean exactly that, answering proposition to add this to
library as a PREFFERED method. imho, typical usage of these routines
is for short enough patterns and searched strings



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-21 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Bulat Ziganshin [EMAIL PROTECTED], Haskell-Cafe@haskell.org
 KMP is O(m) while straightforward is O(m*n).

Where m is the length of the input and n is the length of the searched-for
pattern, I think?

Yes.
But these are worst-case complexities, I believe, ordinarily, 
straightforward

will be O(m), too.


Yes,  those are worst cases for both algorithms. O(m) for KMP,
O(m*n) for straightforward.



 My test favors straightforward, in any other case KMP wins by order of
 magnitude.
Can you give example tests?


Any example that has long search pattern say (many a's followed by b )
and searched string has many partial matches (many a's).
Particularly, any example which exhibits O(m*n) or close to, case for 
straightforward

search.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Substring replacements

2005-12-20 Thread Branimir Maksimovic

I've finally performed test on amd64 and result is a same as on intel.
KMP always wins. So KMP is best suited for non indexed strings
and I guess should be used in library as prefered search/replace method.
This test favors straightforward search.

[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.783s
user0m10.769s
sys 0m0.016s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.769s
user0m11.741s
sys 0m0.028s
[EMAIL PROTECTED] myhaskell]$ uname -a
Linux devel64.office.kom 2.6.14-skas3-v8.2 #2 Fri Nov 11 21:19:36 CET 2005 
x86_64 x86_64 x86_64 GNU/Linux


Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-20 Thread Branimir Maksimovic





From: Bulat Ziganshin [EMAIL PROTECTED]
Reply-To: Bulat Ziganshin [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 20 Dec 2005 23:55:22 +0300

Hello Branimir,

Tuesday, December 20, 2005, 9:48:48 PM, you wrote:

BM I've finally performed test on amd64 and result is a same as on intel.
BM KMP always wins. So KMP is best suited for non indexed strings
BM and I guess should be used in library as prefered search/replace 
method.

BM This test favors straightforward search.

i'm 90% sure that straightforward method must be faster for one-time
searches.


KMP is O(m) while straightforward is O(m*n).

your test may give better results with KMP algorithm just

because you repeat the same search many times and it was automatically
run-time compiled as described on the wiki page about KMP


My test favors straightforward, in any other case KMP wins by order of
magnitude.
I think that straightfoirward is better then KMP with optimisations
off is due more complex program.


try to add

{-# NOINLINE replace #-}

to both programs and repeat comparision


These are tests:
No optimisations (no -O):
Intel hyperthreaded,windows
$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m34.766s
user0m0.015s
sys 0m0.000s

[EMAIL PROTECTED] ~/tutorial
$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m14.719s
user0m0.031s
sys 0m0.000s

AMD 64 bit:
[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real1m58.066s
user1m57.939s
sys 0m0.128s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m41.565s
user0m41.527s
sys 0m0.040s

with optimisations (-O):

Intel hyperthreaded,windows
$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m8.625s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.735s
user0m0.015s
sys 0m0.000s

AMD 64 bit, linux:
[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.546s
user0m10.529s
sys 0m0.016s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.796s
user0m11.785s
sys 0m0.012s

Greetings, Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-18 Thread Branimir Maksimovic





From: Bulat Ziganshin [EMAIL PROTECTED]
Reply-To: Bulat Ziganshin [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 16 Dec 2005 16:51:32 +0300

Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests

only way to make ghc using multiple processors is to use 6.5 beta
version, compile with -smp and explicitly fork several threads



You are right. I've double checked on linux there is just one thread
executing and there is not such a big difference between KMP and
straightforward search.
Just about 10% KMP is faster with my test, but still faster.
I've checked both SMP and non SMP linux (Intel).
Hyperthreading effect is on windows only, I guess, as there
are visible three threads per test process.
I have one amd64 near (I'll check that one too, as soon as admin
sets up account for me on that machine).

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-17 Thread Branimir Maksimovic




From: Bulat Ziganshin [EMAIL PROTECTED]
Reply-To: Bulat Ziganshin [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 16 Dec 2005 16:51:32 +0300

Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests
Oh yes it does. I clearly see multiple threads that take unequal percentage 
on

each virtual CPU. I guess that other thread is garbage collector
thread. In case of hyperthreading, speed is gained by reduced memory
latency by 30-60 %



only way to make ghc using multiple processors is to use 6.5 beta
version, compile with -smp and explicitly fork several threads


This is not the case as I see. On windows search replace test programs
spawn 3 threads and on linux I'm not sure, but I've checked program that 
calls Haskell

from C++ and GHC spawns additional thread, which is not my thread, that
also performs something constantly, and I didn't spawn any thread
from Haskell.
So hyperthreading helps as it helps to optimise when several threads
accesses memory as is in this test case.
I can't see any other explanation why KMP search is
slower on AMD 20% , but faster on Intel 30%, then straightforward search
with my test.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-16 Thread Daniel Fischer
Am Freitag, 16. Dezember 2005 03:36 schrieben Sie:
 From: Daniel Fischer [EMAIL PROTECTED]
 Any improvements are welcome, certainly some of you can do much better.

 It is fast on my machine except that you are using Map to lookup
 for badChar which is O(log n).
 I;ve placed this instead:
   badChar :: UArray Int Int
   badChar  = array (0,255) ([(i,-1) | i - [0..255]] ++ proc src 0)
   proc [] _ = []
   proc (s:st) i = (ord s,i):proc  st (i+1)
   getBc c = badChar ! ord c

 which gaved it significant boost, O(1) lookup.

Yes, but Char has 1114112 values, and I'm not sure whether such a large array 
would still be better, especially since, presumably, the Map will usually not 
be deeper than five layers, say. But if we restrict ourselves to extended 
ASCII Strings, an array certainly is better.
And maybe, instead of using two arrays, bmGs0 and bmGs, a mutable array (those 
are DiffArrays, I think -- I'll check that out) would also improve it.

 Now it's faster then brute force method but 10% slower then KMP
 with my test.
 I've also performed tests on dual Xeon linux box and results are
 proportionally
 the same as on my intel windows box.
 KMP wins again 10% better then BM and 20-30% better then straightforward
 search,
 which means that KMP is well suited for non indexed strings.

 Cheers,
 Daniel
 
 P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is
 somewhat
 fussy.

 Yes, BM is for indexed structures.

 Greetings, Bane.

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Substring replacements

2005-12-16 Thread Bulat Ziganshin
Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests

only way to make ghc using multiple processors is to use 6.5 beta
version, compile with -smp and explicitly fork several threads


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Branimir Maksimovic


This is what I got for BM. Performance dissapoints as BM is really
suited for indexed strings like arrays.It mainly operates on indexes.
This is simple BM, as I didn't want to go
for more complex variant,becauses takes and drops and recalculation of next 
position

is too pricey for non indexed structure. So, clear winner is KMP for
non indexed strings. There is also finite automaton algorithm
but this works well if search strings are precompiled, so I'll
implement it only for education purposes. I hope my Haskell improves
as I've learned how to reduce number of paramaters.

searchReplaceBM :: String - String - String - String
searchReplaceBM  _ str = str
searchReplaceBM sr rp str = searchReplace str
where
   table :: UArray Int Int
   table  = array (0,255) ([(i,0) | i - [0..255]] ++ proc sr 1)
   proc [] _ = []
   proc (s:st) i = (ord s,i):proc  st (i+1)
   len = length sr
   rsrch = reverse sr
   searchReplace str
| null remaining = if found then rp
else passed
|found = rp ++ searchReplace remaining
| otherwise = passed ++ searchReplace remaining
where
   (passed,remaining,found) = searchReplace' str
   searchReplace' str
   = if j == 0
then (,drop len str,True)
else failed
   where failed = case drop (j-1) str of
  [] - (str,,False)
  (c:_) - (take sk str, drop sk str, False)
   where md = j - table ! ord c
 sk = if md  0
  then md
  else 1

 j = srch rsrch (reverse $ take len str) len
   where srch   _ = 0
 srch _  l = l
 srch (s:str) (s':str') l
   | s == s' = srch str str' (l-1)
   | otherwise = l


Greetings, Bane.


From: Branimir Maksimovic [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 01:39:57 +





From: Branimir Maksimovic [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:55:02 +





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected 
that

all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.



Forget what I've said.
Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
forward, but when it finds last character in pattern, than starts to search 
backwards.

This can be implemented easilly as Haskell lists naturaly reverse order
when putting from one list to other.
Heh, never say never :)
As I see from documents Boyer-Moore has best performance on average
and should be better than KMP.

Greetings,Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/




_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Daniel Fischer
Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie:
 From: Branimir Maksimovic [EMAIL PROTECTED]

 To: [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Thu, 15 Dec 2005 00:55:02 +
 
 From: Daniel Fischer [EMAIL PROTECTED]
 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Wed, 14 Dec 2005 20:40:06 +0100
 
 Hi Bane,
 
 nice algorithm. Since comparing chars _is_ cheap, it is to be expected
 that
 all the hash-rotating is far more costly for short search patterns. The
 longer the pattern, the better this gets, I think -- though nowhere near
 KMP
 (or would it?).
 
 Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
 algorithm yet, though. But I think it would be
 difficult to implement it in Haskell efficiently as it searches backwards
 and jumps around, and we want memory savings.
 Though, I even didn't tried yet, but it is certainly very interesting.

 Forget what I've said.
 Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
 forward, but when it finds last character in pattern, than starts to search
 backwards.
 This can be implemented easilly as Haskell lists naturaly reverse order
 when putting from one list to other.
 Heh, never say never :)
 As I see from documents Boyer-Moore has best performance on average
 and should be better than KMP.

 Greetings,Bane.

Well, I also thought that all the jumping around in Boyer-Moore wasn't too 
good (after each shift we must bite off a chunk from the remaining input, 
pushing that onto the stack, which costs something). But I gave it a try 
today and here's what I came up with:

import Data.List (tails)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Array.Unboxed

searchRep :: String - String - String - String
searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str
where
  len = length src
  len1 = len-1
  pat :: UArray Int Char
  pat = listArray (0,len1) src
  ch = pat!len1
  badChar :: Map Char Int
  badChar = Map.fromList $ zip src [0 .. ]
  getBc c = case Map.lookup c badChar of
   Just n  - n
   Nothing - -1
  suffs :: UArray Int Int
  suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs
  where
crs = reverse src
pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys
pr n _ _ = n
  bmGs0 :: UArray Int Int
  bmGs0 = array (0,len1) [(j,k) | (k,k') - zip (tail $! help) help, j - 
[k' .. k-1]]
  help = [k | k - [0 .. len], k == len || suffs!k == len-k]
  bmGs :: UArray Int Int
  bmGs = bmGs0 // [(len1-suffs!k,k) | k - [len1,len-2 .. 1]]
  run by  = reverse by
  run by (c:cs)
| c == ch   = process (c:by) cs
| otherwise = run (c:by) cs
  roll n xs ys | n = 0 = (xs, ys)
  roll n xs (y:ys) = roll (n-1) (y:xs) ys
  roll _ xs  = (xs, )
  walk n  = (n,)
  walk n st@(c:cs)
| n  0  = (n,st)
| c == pat!n = walk (n-1) cs
| otherwise  = (n,st)
  process con left
| i  0 = reverse pass ++ rp ++ run  left
| otherwise = {- bye ++ -} run ncon nleft
  where
 (i,pass) = walk len1 con
 d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass))
 -- bye = reverse $! drop (len-d) con
 (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left

it's not as fast as KMP for the tests, but not too bad.
Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a 
match (if any), we'd be better off relieving our memory with 'bye', I think.

Any improvements are welcome, certainly some of you can do much better.

Cheers,
Daniel

P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat 
fussy.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 21:07:11 +0100

Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie:
 From: Branimir Maksimovic [EMAIL PROTECTED]

 To: [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Thu, 15 Dec 2005 00:55:02 +
 
 From: Daniel Fischer [EMAIL PROTECTED]
 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Wed, 14 Dec 2005 20:40:06 +0100
 
 Hi Bane,
 
 nice algorithm. Since comparing chars _is_ cheap, it is to be expected
 that
 all the hash-rotating is far more costly for short search patterns. 
The
 longer the pattern, the better this gets, I think -- though nowhere 
near

 KMP
 (or would it?).
 
 Yes,KMP is superior in single pattern search. We didn't tried 
Boyer-Moore

 algorithm yet, though. But I think it would be
 difficult to implement it in Haskell efficiently as it searches 
backwards

 and jumps around, and we want memory savings.
 Though, I even didn't tried yet, but it is certainly very interesting.

 Forget what I've said.
 Boyer-Moore *can* be implemented efficiently, it is similar to KMP it 
goes
 forward, but when it finds last character in pattern, than starts to 
search

 backwards.
 This can be implemented easilly as Haskell lists naturaly reverse order
 when putting from one list to other.
 Heh, never say never :)
 As I see from documents Boyer-Moore has best performance on average
 and should be better than KMP.

 Greetings,Bane.

Well, I also thought that all the jumping around in Boyer-Moore wasn't too
good (after each shift we must bite off a chunk from the remaining input,
pushing that onto the stack, which costs something). But I gave it a try
today and here's what I came up with:

import Data.List (tails)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Array.Unboxed

searchRep :: String - String - String - String
searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str
where
  len = length src
  len1 = len-1
  pat :: UArray Int Char
  pat = listArray (0,len1) src
  ch = pat!len1
  badChar :: Map Char Int
  badChar = Map.fromList $ zip src [0 .. ]
  getBc c = case Map.lookup c badChar of
   Just n  - n
   Nothing - -1
  suffs :: UArray Int Int
  suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs
  where
crs = reverse src
pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys
pr n _ _ = n
  bmGs0 :: UArray Int Int
  bmGs0 = array (0,len1) [(j,k) | (k,k') - zip (tail $! help) help, j 
-

[k' .. k-1]]
  help = [k | k - [0 .. len], k == len || suffs!k == len-k]
  bmGs :: UArray Int Int
  bmGs = bmGs0 // [(len1-suffs!k,k) | k - [len1,len-2 .. 1]]
  run by  = reverse by
  run by (c:cs)
| c == ch   = process (c:by) cs
| otherwise = run (c:by) cs
  roll n xs ys | n = 0 = (xs, ys)
  roll n xs (y:ys) = roll (n-1) (y:xs) ys
  roll _ xs  = (xs, )
  walk n  = (n,)
  walk n st@(c:cs)
| n  0  = (n,st)
| c == pat!n = walk (n-1) cs
| otherwise  = (n,st)
  process con left
| i  0 = reverse pass ++ rp ++ run  left
| otherwise = {- bye ++ -} run ncon nleft
  where
 (i,pass) = walk len1 con
 d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass))
 -- bye = reverse $! drop (len-d) con
 (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left

it's not as fast as KMP for the tests, but not too bad.
Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before 
a
match (if any), we'd be better off relieving our memory with 'bye', I 
think.


Any improvements are welcome, certainly some of you can do much better.


It is fast on my machine except that you are using Map to lookup
for badChar which is O(log n).
I;ve placed this instead:
 badChar :: UArray Int Int
 badChar  = array (0,255) ([(i,-1) | i - [0..255]] ++ proc src 0)
 proc [] _ = []
 proc (s:st) i = (ord s,i):proc  st (i+1)
 getBc c = badChar ! ord c

which gaved it significant boost, O(1) lookup.
Now it's faster then brute force method but 10% slower then KMP
with my test.
I've also performed tests on dual Xeon linux box and results are 
proportionally

the same as on my intel windows box.
KMP wins again 10% better then BM and 20-30% better then straightforward 
search,

which means that KMP is well suited for non indexed strings.




Cheers,
Daniel

P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is 
somewhat

fussy.


Yes, BM is for indexed structures.

Greetings, Bane

Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 13 Dec 2005 11:23:29 +0100



After seeing that your program is fastest (I've also tried one from
http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
that good in converting to search replace?) I've decided to
try with Rabin-Karp algorithm.
This algorithm performs same operation as straightforward search,
but compares hashes instead of chars.
With ability to rotate hash (remove first, add next) characters
there is also optimisation, that hash is calculated only for single
next character rather again for whole substring.
Unfortunatelly on my machine it is very cheap to compare
characters so with my test hashing overweights character compare,
except in your test when hash searching is faster then straightforward
search.

This is best I can write in terms of performance and readability.
I've tried with getFst that returns Maybe but it was slower so I decided
to return '\0' in case that argument is empty list, which renders '\0'
unusable, but then I really doubt that 0 will be used in strings.

-- Rabin-Karp string search algorithm, it is very effective in searching of 
set

-- of patterns of length n on same string
-- this program is for single pattern search, but can be crafted
-- for multiple patterns of length m

hSearchReplace :: String - String - String - String
hSearchReplace sr rp xs
   | not (null remaining) = found ++ rp
++ hSearchReplace sr rp (drop (length sr) 
remaining)

   | otherwise = found
   where
   (found,remaining) = hSearch sr xs

hSearch :: String - String - (String,String)
hSearch sr xs = hSearch' sr xs hcmp 
   where
   hsrch = hash sr
   hcmp = hash $ take ls xs
   cmp = take ls xs
   ls = length sr

   hSearch' [] xs _ _= (xs,[])
   hSearch' sr [] _ fndFail = (reverse fndFail,[])
   hSearch' srch xxs@(x:xs) hcmps fndFail
   = if hsrch == hcmps
then if isPrefixOf srch xxs
then (reverse fndFail,xxs)
else searchAgain
else searchAgain
   where
   searchAgain
= hSearch' srch xs
   (hashRotate (getFst xxs) (getFst nextxxs) (ls-1) 
hcmps)

   (x:fndFail)
   nextxxs = drop ls xxs

getFst :: String - Char
getFst [] = '\0'
getFst (a:as) = a

hash :: String - Int
hash str
   =  hash' str (length str - 1)
   where
   hash' :: String - Int - Int
   hash' [] _ = 0
   hash' (s:str) pow  = (101 ^ pow) *(fromEnum s)
+ hash' str (pow-1)

hashRotate :: Char - Char - Int - Int - Int
hashRotate cout cin pow hsh
   = (hsh - ((101 ^ pow) * (fromEnum cout)))*101
 + (fromEnum cin)



Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Daniel Fischer
Hi, Bane and all,

Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie:
   In real world situation your KMP will always be fastest on average.
   I like that we are not using C arrays as then we have advantage
   of lazyness and save on memory usage. C++ program will be faster
   on shorter strings but on this large strings will loose due memory
   latency. and with your test, both programs are very fast.

Yesterday, I did the unspeakable -- I wrote a C-version. Smashes 
Haskell-performance for short enough Strings (factor 10 for my test, factor 
2.2 for Bane's), but once it starts swapping, we catch up, and for really 
large Strings I dare say we'd win far out.

I also managed to get my KMP still faster, using take and drop instead of 
splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was 
what slowed it down, I reintroduced searchr''' and both are equal).
I'm not quite sure, whether that indeed helps, but I also chose to use 
listArray for the boxed array of borders.

Now it's
searchReplace :: String - String - String - String
searchReplace  _ str = str
searchReplace src@(c:cs) dst str = process 0 str 
where
  len = length src
  pat :: UArray Int Char
  pat = listArray (0,len-1) src
  bord = A.listArray (0,len) $ (-1):0:
   [getBord (pat!i) i + 1 | i - [1 .. len-1]]
  getBord s n
 | m  0  = m
 | s == pat!m = m
 | otherwise  = getBord s m
   where
 m = bord A.! n
  bor :: UArray Int Int
  bor = listArray (0,len) $ A.elems bord
  getBor s n
 | m  0 || s == pat!m = m
 | otherwise = getBor s m
   where
 m = bor!n
  process n str _ | n = len = dst ++ process 0 str 
  process _  mat = reverse mat
  process 0 (s:st) _
 | s == c= process 1 st [s]
 | otherwise = s:process 0 st 
  process n str@(s:st) mat
 | s == pat!n = process (n+1) st (s:mat)
 | otherwise  =
case getBor s n of
  -1 - reverse mat ++ process 0 str 
  0  - reverse mat ++ process 1 st [s]
  j  - reverse (drop j mat) ++ process (j+1) st (s:take j mat)


gives a speedup of roughly 10% on my box versus yesterday's version.
  
   Greetings, Bane.
 
 On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about
 20%
 faster than my KMP on your test -- btw, I unboxed the pat array, which
  gave a
 bit of extra speed, but not much.

 I think that's because on your machine Bulat's version have better
 perfromance
 with CPU cache.
 I don;t know but now your version is 25% faster with my test on P4
 hyperthreaded.

E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with 
computers.


 your new version:
 $ time srchrep.exe
 Working:seasearch replace  able seaseasearch baker seasearch charlie
 True
 Done


 real0m8.734s
 user0m0.015s
 sys 0m0.000s

 Bulat's version:

 [EMAIL PROTECTED] ~/tutorial
 $ time replace1.exe
 Working:seasearch replace  able seaseasearch baker seasearch charlie
 True
 Done


 real0m11.734s
 user0m0.015s
 sys 0m0.015s

 3 secs difference now.

 And apologies to Sebastian Sylvan, I also included an unboxed version of
 bord,
 built from the boxed version, and that sped things further up -- not much,
 again, but there it is.

 On my machine you got another 10-15% of boost with unboxed arrays.

 I wonder about this difference, -10% on one system and +20% on another
 system,
 ist that normal?

 Different caching schemes on CPU's perhaps? different memory latencies?
 hyperthreading helps your version? more code and data, perhaps because
 of that it pays the price on your machine?

 Greetings, Bane.

Well, whatever. Upto now, on my box, Bulat's is still the fastest for your 
test -- though I've narrowed the gap quite a bit.

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Daniel Fischer
Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected that 
all the hash-rotating is far more costly for short search patterns. The 
longer the pattern, the better this gets, I think -- though nowhere near KMP 
(or would it?). However, I don't see how to (efficiently) do a multiple 
pattern search with KMP, so there -- if all patterns have the same length, 
otherwise I don't see -- Rabin-Karp would probably be the method of choice.

Am Mittwoch, 14. Dezember 2005 10:16 schrieben Sie:
 From: Daniel Fischer [EMAIL PROTECTED]

 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Tue, 13 Dec 2005 11:23:29 +0100

 After seeing that your program is fastest (I've also tried one from
 http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
 that good in converting to search replace?) I've decided to
 try with Rabin-Karp algorithm.
 This algorithm performs same operation as straightforward search,
 but compares hashes instead of chars.
 With ability to rotate hash (remove first, add next) characters
 there is also optimisation, that hash is calculated only for single
 next character rather again for whole substring.
 Unfortunatelly on my machine it is very cheap to compare
 characters so with my test hashing overweights character compare,
 except in your test when hash searching is faster then straightforward
 search.

 This is best I can write in terms of performance and readability.
 I've tried with getFst that returns Maybe but it was slower so I decided
 to return '\0' in case that argument is empty list, which renders '\0'
 unusable, but then I really doubt that 0 will be used in strings.

 -- Rabin-Karp string search algorithm, it is very effective in searching of
 set
 -- of patterns of length n on same string
 -- this program is for single pattern search, but can be crafted
 -- for multiple patterns of length m


I tuned it up somewhat:
import Data.List (isPrefixOf)
import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
-- faster for my test, but slower for yours, but only a whiff.

searchrep :: String - String - String - String
searchrep  _ str = str-- or cycle rp, or error?
searchrep sr rp xs = hSearchRep xs  -- don't carry more around than necessary
where
   len = length sr   -- we don't want that to be recomputed 
   hsrch = hash sr -- neither that
   hSearchRep   = 
   hSearchRep xs
   | null remaining = passed
   | otherwise  = passed ++ rp ++ hSearchRep (drop len remaining)
 where
   (passed,remaining) = hSearch xs  -- ' xs (hash $ take len xs) 
   hSearch xs = hSearch' xs hcmp  -- since hSearch will be optimised
   where -- away anyway, we might
  hcmp = hash $ take len xs   -- as well eliminate it ourselves
   hSearch'  _ got = (reverse got, )
   hSearch' xxs@(x:xs) hcd got
   | hcd == hsrch  (sr `isPrefixOf` xxs) = (reverse got, xxs)
   | otherwise = searchAgain -- one test 
less
 where
   searchAgain = case drop len xxs of
[] - (reverse got ++ xxs, )   -- then we know we're done
(y:_)  - hSearch' xs (hashRotate x y hcd) (x:got)
-- no need for fancy getFst anymore
-- making hashRotate local eliminates one argument, makes it faster
   hashRotate :: Char - Char - Int - Int
   hashRotate cout cin hsh = 101*(hsh - 101^(len-1)*ord cout) + ord cin
-- using foldl for hash is an enormous boost
   hash :: String - Int
   hash = foldl ((. ord) . (+) . (*101)) 0
-- hash str = foldl (\n c - 101*n+ord c) 0 str
-- this is equally fast as the point-free version, easier to read, probably,
-- but I like an occasional pointless pointfreeness.

Now this beats everything but KMP on my test very clearly.
[EMAIL PROTECTED]:~/Documents/haskell/Allotria/Search time myhash; time myhash2
Working: seasearch replace  able seaseaseasearch baker ssseasearch charlie
True
Done


real0m50.401s
user0m49.990s
sys 0m0.060s
Working very long
True
Done

real0m15.747s
user0m15.630s
sys 0m0.020s

Still poor on your test, though.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 17:10:20 +0100





 I think that's because on your machine Bulat's version have better
 perfromance
 with CPU cache.
 I don;t know but now your version is 25% faster with my test on P4
 hyperthreaded.

E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with
computers.


I think that i've figure it now.
Hyperthreading is hardware CPU feature that single CPU core
can speed up execution of two running threads.
For example if one thread uses integer unit and other FP unit
CPU executes that in parallel. But that's not important
or significant. What is interestenting is memory latency.
If one thread peeks and pokes around memory for , say 1
unit of time, with usual CPU two thread will execute
2 units of time. Hyperthreaded (I'm talking about intel implementation)
CPU will execute that in 1.4 points of time giving 60% boost
in terms of speed. I've tested some assembler and C program
that launches two threads each roaming over memory
to anulate impact of cache.
What is noticable is that two threads have 60% less memory
latency constantly then single thread. That means if
single thread for each out of cache memory access waits
300-400 CPU cycles, two threads wait 60% less.
Now what has that to do with our programs as they are single
threaded? I think it's garbage collection.
Our programs run with garbage collector in background and
you feel that burden by 20% as your program probably pushes
garbage collector to work more than Bulat's version.
On hyperthreaded CPU impact of garbage collection is reduced
by a factor of 30-60 % resulting in your program being 30% faster
on my machine.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected that
all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.

However, I don't see how to (efficiently) do a multiple

pattern search with KMP, so there -- if all patterns have the same length,
otherwise I don't see -- Rabin-Karp would probably be the method of choice.


Yes, this algorithm can search in parallel patterns of same length.
Different search patterns have to be searched same way as with KMP.



I tuned it up somewhat:
import Data.List (isPrefixOf)
import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
-- faster for my test, but slower for yours, but only a whiff.


Wow, on my machine your version of Rabin-Karp gives 30% boost to my test.
This helps me learn Haskell , too .

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Branimir Maksimovic [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:55:02 +





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected 
that

all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.



Forget what I've said.
Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
forward, but when it finds last character in pattern, than starts to search 
backwards.

This can be implemented easilly as Haskell lists naturaly reverse order
when putting from one list to other.
Heh, never say never :)
As I see from documents Boyer-Moore has best performance on average
and should be better than KMP.

Greetings,Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread ajb
G'day all.

Quoting Branimir Maksimovic [EMAIL PROTECTED]:

 After seeing that your program is fastest (I've also tried one from
 http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
 that good in converting to search replace?)

You probably did it right, but you could post your version to the
list if you want me to take a look.

When I wrote the RunTimeCompilation code, it wasn't intended to be a
shining example of efficiency, merely an illustration.  Remember
that it's doing TWO things: compiling the pattern to code, and then
performing the search.  The compilation phase is likely to be much
slower than the search, so the speedup (if any!) would only be realised
the SECOND time that you searched a string using the same pattern.
(Assuming you re-used the compiled match code, of course!)

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: [EMAIL PROTECTED]
To: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:25:19 -0500

G'day all.

Quoting Branimir Maksimovic [EMAIL PROTECTED]:

 After seeing that your program is fastest (I've also tried one from
 http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
 that good in converting to search replace?)

You probably did it right, but you could post your version to the
list if you want me to take a look.


Oh, here it is but just don;t laugh :)
I've hacked with unsafePerformIO as I din't know
how to remove IO from match any other way.

searchReplaceKMP :: String-String-String - String
searchReplaceKMP sr rp s
   | not (null remaining) = found++rp
++ searchReplaceKMP sr rp remaining
   | otherwise = found
   where
   (found,remaining) = unsafePerformIO $ matchKMP sr s

matchKMP :: (Monad m, Eq a) = [a] - ([a] - m ([a],[a]))
matchKMP []
   = error Can't match empty list
matchKMP xs
   = matchfunc []
 where
   matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
   dofail = \ps xs - case xs of
   [] - fail can't match
   (y:ys) - matchfunc (y:ps) ys

type PartialMatchFunc m a = [a] - [a] - m ([a], [a])

makeMatchFunc :: (Monad m, Eq a) = [PartialMatchFunc m a] - [(a, Int)]
   - PartialMatchFunc m a
makeMatchFunc prev []
   = \ps xs - return (reverse (drop ((length prev)-1) ps), xs)
makeMatchFunc prev ((x,failstate):ms)
   = thisf
 where
   mf = makeMatchFunc (thisf:prev) ms
   failcont = prev !! (length prev - failstate - 1)
   thisf = \ps xs - case xs of
   [] - fail can't match
   (y:ys) - if (x == y) then mf (y:ps) ys
   else failcont ps xs

overlap :: (Eq a) = [a] - [Int]
overlap str
   = overlap' [0] str
 where
   overlap' prev []
 = reverse prev
   overlap' prev (x:xs)
 = let get_o o
| o = 1 || str !! (o-2) == x = o
| otherwise = get_o (1 + prev !! (length prev - o + 1))
   in overlap' (get_o (head prev + 1):prev) xs

--
These are timings (it's performance is about the same as Rabin-Karp):

$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
searchr.exe: user error (can't match)


real0m22.187s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ ghc -fglasgow-exts  -O2 searchr.hs --make  -o searchr.exe
Chasing modules from: searchr.hs
Compiling Main ( searchr.hs, searchr.o )
Linking ...

[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working very long
True
Done

real0m8.110s
user0m0.031s
sys 0m0.016s



When I wrote the RunTimeCompilation code, it wasn't intended to be a
shining example of efficiency, merely an illustration.  Remember
that it's doing TWO things: compiling the pattern to code, and then
performing the search.  The compilation phase is likely to be much
slower than the search, so the speedup (if any!) would only be realised
the SECOND time that you searched a string using the same pattern.
(Assuming you re-used the compiled match code, of course!)


Oh, that explaines it. Actually this has to be converted to searchReplace
in order to be fast, but I don;t know how (yet) as your program
is pretty complicated to my humble Haskell skills.
I think that your technique can be usefull with Aho-Corasick algorithm
as it first constructs finite automaton from tree, then performs search.
So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run
time compilation, but this is too advanced for me for now.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-13 Thread Daniel Fischer
Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
 From: Daniel Fischer [EMAIL PROTECTED]

 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Mon, 12 Dec 2005 16:15:46 +0100
 
 Earlier today:
   Sorry, but
   Prelude SearchRep searchReplace abaaba ## abababaaba
   abababaaba
  
   I haven't analyzed the algorithm, so I don't know why exactly this
 
 fails.
 
   I'll take a look sometime soon.
 
 I found the problem (one at least).
 Say the pattern to be replaced begins with 'a' and we have a sufficiently
 long
 match with the pattern starting at the first 'a' in the String. Upon
 encountering the second 'a', while the first pattern still matches, you
 start
 pushing onto the rollback-stack. But that isn't inspected anymore, so if
 the
 actual occurence of the pattern starts at the third (or fourth, n-th)
 occurence of 'a' and that is already pushed onto the rollback, you miss
  it.

 I've corrected this with adjusting rollback position. if rollBack is null
 then
 search for rollback starts at second character if not starts at same as
 searhed
 character because I skip what was searched. That's all.
 Though I'm not so sure now when I read this.

Still not working:

*New searchReplace abababc # ababababababc
ababababababc
*New searchReplace1 abababc # ababababababc
ababababababc


 So the question is, can we find a cheap test to decide whether to use KMP
 or
 Bulat's version?

 In real world situation your KMP will always be fastest on average.
 I like that we are not using C arrays as then we have advantage
 of lazyness and save on memory usage. C++ program will be faster
 on shorter strings but on this large strings will loose due memory
 latency. and with your test, both programs are very fast.

 Greetings, Bane.


On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% 
faster than my KMP on your test -- btw, I unboxed the pat array, which gave a 
bit of extra speed, but not much. 
And apologies to Sebastian Sylvan, I also included an unboxed version of bord, 
built from the boxed version, and that sped things further up -- not much, 
again, but there it is.
I wonder about this difference, -10% on one system and +20% on another system, 
ist that normal?

Cheers, Daniel
--
Up-To-Date version of KMP:

import Data.Array.Unboxed (UArray, listArray, (!))
import qualified Data.Array as A (array, (!), elems)

searchReplace :: String - String - String - String
searchReplace  _ str = str
searchReplace src@(c:cs) dst str = process 0 str 
where
  len = {-# scc len #-} length src
  pat :: UArray Int Char
  pat = {-# scc pat #-} listArray (0,len-1) src
  bord ={-# scc bord #-} A.array (0,len) $ (0,-1):(1,0):
 [(i+1,getBord (pat!i) i + 1) | i - [1 .. len-1]]
  getBord s n
 | m  0  = m
 | s == pat!m = m
 | otherwise  = getBord s m
   where
 m = bord A.! n
  bor :: UArray Int Int
  bor = listArray (0,len) $ A.elems bord
  getBor s n
 | m  0 || s == pat!m = m
 | otherwise = getBor s m
   where
 m = bor!n
  process n str _ | n = len = {-# scc process #-} dst ++ process 0 str 

  process _  mat = {-# scc process #-} reverse mat
  process 0 (s:st) _
 | s == c= {-# scc process #-} process 1 st [s]
 | otherwise = {-# scc process #-} s:process 0 st 
  process n str@(s:st) mat
 | s == pat!n = {-# scc process #-} process (n+1) st (s:mat)
 | otherwise  = {-# scc process #-}
let j = getBor s n
(ret,skip) = splitAt j mat
in if j  0 then reverse mat ++ process 0 str 
   else reverse skip ++ process (j+1) st (s:ret)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-13 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 13 Dec 2005 11:23:29 +0100

Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
 From: Daniel Fischer [EMAIL PROTECTED]

 To: Branimir Maksimovic [EMAIL PROTECTED]
 CC: Haskell-Cafe@haskell.org
 Subject: Re: [Haskell-cafe] Substring replacements
 Date: Mon, 12 Dec 2005 16:15:46 +0100
 
 Earlier today:
   Sorry, but
   Prelude SearchRep searchReplace abaaba ## abababaaba
   abababaaba
  
   I haven't analyzed the algorithm, so I don't know why exactly this
 
 fails.
 
   I'll take a look sometime soon.
 
 I found the problem (one at least).
 Say the pattern to be replaced begins with 'a' and we have a 
sufficiently

 long
 match with the pattern starting at the first 'a' in the String. Upon
 encountering the second 'a', while the first pattern still matches, you
 start
 pushing onto the rollback-stack. But that isn't inspected anymore, so 
if

 the
 actual occurence of the pattern starts at the third (or fourth, n-th)
 occurence of 'a' and that is already pushed onto the rollback, you miss
  it.

 I've corrected this with adjusting rollback position. if rollBack is 
null

 then
 search for rollback starts at second character if not starts at same as
 searhed
 character because I skip what was searched. That's all.
 Though I'm not so sure now when I read this.

Still not working:

*New searchReplace abababc # ababababababc
ababababababc
*New searchReplace1 abababc # ababababababc
ababababababc



Yes, perhaps you've missed another post of mine. I've noticed
that problem when pattern repeats more then 2 times and gave up
because now whatever I do, your version is always fastest.



 So the question is, can we find a cheap test to decide whether to use 
KMP

 or
 Bulat's version?


Just interleave string with  search hits with one with no seacrh (that means 
partial too)

hits, and your version will gain in speed.
More partial matches and full search matches Bulat's version will gain in
speed.
Longer search strings, your version will have gains.



 In real world situation your KMP will always be fastest on average.
 I like that we are not using C arrays as then we have advantage
 of lazyness and save on memory usage. C++ program will be faster
 on shorter strings but on this large strings will loose due memory
 latency. and with your test, both programs are very fast.

 Greetings, Bane.


On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 
20%
faster than my KMP on your test -- btw, I unboxed the pat array, which gave 
a

bit of extra speed, but not much.


I think that's because on your machine Bulat's version have better 
perfromance

with CPU cache.
I don;t know but now your version is 25% faster with my test on P4
hyperthreaded.

your new version:
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m8.734s
user0m0.015s
sys 0m0.000s

Bulat's version:

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.734s
user0m0.015s
sys 0m0.015s

3 secs difference now.

And apologies to Sebastian Sylvan, I also included an unboxed version of 
bord,

built from the boxed version, and that sped things further up -- not much,
again, but there it is.


On my machine you got another 10-15% of boost with unboxed arrays.

I wonder about this difference, -10% on one system and +20% on another 
system,

ist that normal?


Different caching schemes on CPU's perhaps? different memory latencies?
hyperthreading helps your version? more code and data, perhaps because
of that it pays the price on your machine?

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Daniel Fischer
Am Montag, 12. Dezember 2005 01:34 schrieben Sie:
 On 12/12/05, Daniel Fischer [EMAIL PROTECTED] wrote:
  Okay, I have looked up KMP and implemented it.
  Seems to work -- my first use of QuickCheck, too.
  It's slower than Bulat's and Tomasz' for Branimir's test :-(,
  but really fast for my test.
  Undoubtedly, one can still tune it.

 Perhaps by using unboxed arrays...

 /S


 --
 Sebastian Sylvan
 +46(0)736-818655
 UIN: 44640862

I'm afraid, unboxed arrays are out of the question, because bord is 
incrementally produced :-(


Working very long
test2: loop


Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 10:31:49 +0100

Am Montag, 12. Dezember 2005 01:34 schrieben Sie:
 On 12/12/05, Daniel Fischer [EMAIL PROTECTED] wrote:
  Okay, I have looked up KMP and implemented it.
  Seems to work -- my first use of QuickCheck, too.
  It's slower than Bulat's and Tomasz' for Branimir's test :-(,
  but really fast for my test.
  Undoubtedly, one can still tune it.

 Perhaps by using unboxed arrays...

 /S


 --
 Sebastian Sylvan
 +46(0)736-818655
 UIN: 44640862

I'm afraid, unboxed arrays are out of the question, because bord is
incrementally produced :-(


Working very long
test2: loop



No worrie your test is now fastest with both your and mine test.
I;ve forgot to change working function in your test:0)

mine test: your program is  is srchrep.exe
[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m14.344s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.672s  your program is almost 1.5 secs faster then Bulat's
user0m0.015s
sys 0m0.000s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m12.016s
user0m0.015s
sys 0m0.015s


now your test:

[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working very long
True
Done

real0m0.312s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working very long
False
Done

real0m12.516s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real0m0.375s  yours is less then second as mine but is fastest in both 
tests

user0m0.015s
sys 0m0.015s

I don;t know how you get lesser numbers with mine test, but on
this machine your KMP algorithm performs best.


Greetings ,Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Daniel Fischer
Earlier today:
 Sorry, but
 Prelude SearchRep searchReplace abaaba ## abababaaba
 abababaaba

 I haven't analyzed the algorithm, so I don't know why exactly this fails.
 I'll take a look sometime soon.


I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently long 
match with the pattern starting at the first 'a' in the String. Upon 
encountering the second 'a', while the first pattern still matches, you start 
pushing onto the rollback-stack. But that isn't inspected anymore, so if the 
actual occurence of the pattern starts at the third (or fourth, n-th) 
occurence of 'a' and that is already pushed onto the rollback, you miss it.

let src = concat (replicate n abc) ++ d
let str = concat (replicate (n+k) abc) ++ d
then
searchReplace src Success! str
will work correctly iff k is congruent to 0 or 1 modulo (n+1).

Now to fix it, I see two possibilities
1. re-inspect the rollback, which brings you basically to my srchrep
2. introduce a hierarchy of rollback-stacks - but that would be rather 
horrible, I think, because you must keep count on which stack you have to 
push, how many matching patterns you currently have (and which ones they are) 
...

So the question is, can we find a cheap test to decide whether to use KMP or 
Bulat's version?

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 13:07:29 +0100

Sorry, but
Prelude SearchRep searchReplace abaaba ## abababaaba
abababaaba

I haven't analyzed the algorithm, so I don't know why exactly this fails.
I'll take a look sometime soon.


It failed because I didn;t adjusted search string for rollBack when previous 
rollBack is not null.

this is corrected version: (with your changes it looks much better)
---
searchReplace :: String-String-String - String
searchReplace  _ xs  = xs
searchReplace sr rp xs = searchr sr rp xs  
  where
searchr :: String-String-String-String-String - String
searchr _ _  _ _ = 
searchr sr rp xs retB rollB
   | found = rp ++ searchr sr rp rema ret roll
| otherwise = reverse (proc ++ rollB) ++
  searchr sr rp rema ret roll
   where
 (found, proc, rema, ret, roll)
= searchr' sr sr (reverse retB ++ xs)  rollB

searchr' src@(s:sr) src'@(s':sr') xs soFar rollB
   = searchr'' (drop (length rollB) src) src' xs soFar (not (null 
rollB),,) s


searchr''  _ xs fnd _ _ = (True,fnd,xs,,)
searchr'' _ _  fnd (_,ret,roll) _ = (False,ret++roll++fnd,,,)
searchr'' src@(s:sr) src'@(s':sr') xxs@(x:xs) soFar (cnt,ret,roll) c
   | s == x = if s' == x  null ret  cnt
  then searchr'' sr sr' xs soFar (True, , x:roll) c
  else
if null ret  null roll
   then searchr'' sr src' xs (x:soFar) (True, , ) c
   else searchr'' sr src' xs soFar (True, x:roll++ret, 
) c
| otherwise = if null roll  null ret
 then
if c == x
  then (False, soFar, xxs, , )
  else let (from, pre) = break (==c) xs
   in (False, reverse from ++ x:soFar, pre, , 
)
  else
if s'/=x
 then if null ret
  then (False, (x:roll) ++ soFar, xs,,)
  else (False, soFar, xxs,ret,)
 else if null ret
   then (False, soFar, xs, , x:roll)
   else (False, soFar, xxs, ret, )


However it is significantly slower then previous ugly version:

searchReplace :: String-String-String - String
searchReplace sr rp xs = searchr sr rp xs  
  where
   searchr :: String-String-String-String-String - String
   searchr [] _ xs _ _ =  xs
   searchr _ _ [] _ _  = []
   searchr sr rp xs retBack rollBack
| isFound $ fnd rollBack = rp
 ++ searchr sr rp (remaining $ fnd 
rollBack )
  ( getRetBack $ fnd 
rollBack)
  ( getRollBack $ fnd 
rollBack)
| otherwise = reverse ((processed $ fnd rollBack) ++ 
rollBack)

  ++ searchr sr rp (remaining $ fnd rollBack)
   ( getRetBack $ fnd rollBack)
   ( getRollBack $ fnd 
rollBack)

   where fnd  = searchr' sr sr (reverse retBack ++ xs) 

   isFound = fst . fst
   remaining = snd . snd . fst
   getRollBack = snd . snd
   getRetBack = fst . snd
   processed = fst . snd . fst

   searchr' :: String-String-String-String-String
   - ((Bool,(String,String)),(String,String))
   searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
  searchr'' (drop (length rollBack) srch) srch' xs 
fndSoFar

(not (isEmpty rollBack),,) sr

   searchr'' :: String-String-String-String-(Bool,String,String)-Char
- ((Bool,(String,String)),(String,String))
   searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),(,))
   searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ 
rollBack ++ fnd,[])),(,))
   searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar 
(cnt,retBack,rollBack) s

 | sr == x = if cnt  sr' == x  isEmpty retBack
then searchr'' srs srs' xs fndSoFar 
(True,,x:rollBack) s
else if not (isEmpty retBack)  || not (isEmpty 
rollBack)

then searchr'' srs srch' xs fndSoFar
(True,(x:rollBack) ++ 
retBack,) s
else searchr'' srs srch' xs (x:fndSoFar) 
(True,,) s

 | otherwise = if isEmpty rollBack  isEmpty retBack
  then if s == x

Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 16:15:46 +0100

Earlier today:
 Sorry, but
 Prelude SearchRep searchReplace abaaba ## abababaaba
 abababaaba

 I haven't analyzed the algorithm, so I don't know why exactly this 
fails.

 I'll take a look sometime soon.


I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently 
long

match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you 
start
pushing onto the rollback-stack. But that isn't inspected anymore, so if 
the

actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss it.


I've corrected this with adjusting rollback position. if rollBack is null 
then
search for rollback starts at second character if not starts at same as 
searhed

character because I skip what was searched. That's all.
Though I'm not so sure now when I read this.



So the question is, can we find a cheap test to decide whether to use KMP 
or

Bulat's version?


In real world situation your KMP will always be fastest on average.
I like that we are not using C arrays as then we have advantage
of lazyness and save on memory usage. C++ program will be faster
on shorter strings but on this large strings will loose due memory
latency. and with your test, both programs are very fast.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Branimir Maksimovic [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 16:15:46 +0100

Earlier today:
 Sorry, but
 Prelude SearchRep searchReplace abaaba ## abababaaba
 abababaaba

 I haven't analyzed the algorithm, so I don't know why exactly this 
fails.

 I'll take a look sometime soon.


I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently 
long

match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you 
start
pushing onto the rollback-stack. But that isn't inspected anymore, so if 
the

actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss it.

let src = concat (replicate n abc) ++ d
let str = concat (replicate (n+k) abc) ++ d
then
searchReplace src Success! str
will work correctly iff k is congruent to 0 or 1 modulo (n+1).



Oh, yes this seems the problem for searchr :(
I have to look for efficient way in order to circumvent repeated searches.
But since your KMP is fastest of all now, I am considering if there
is any point now to correct this.

And searchr ugly version that I've posted has a bug (not present in MyBane 
pretty

version) .
should be :
else if sr'/=x

Greetings, Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-11 Thread Sebastian Sylvan
On 12/12/05, Daniel Fischer [EMAIL PROTECTED] wrote:
 Okay, I have looked up KMP and implemented it.
 Seems to work -- my first use of QuickCheck, too.
 It's slower than Bulat's and Tomasz' for Branimir's test :-(,
 but really fast for my test.
 Undoubtedly, one can still tune it.

Perhaps by using unboxed arrays...

/S


--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Substring replacements

2005-12-11 Thread Branimir Maksimovic





From: Daniel Fischer [EMAIL PROTECTED]
To: Haskell-Cafe@haskell.org
Subject: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 01:14:37 +0100

Okay, I have looked up KMP and implemented it.
Seems to work -- my first use of QuickCheck, too.
It's slower than Bulat's and Tomasz' for Branimir's test :-(,
but really fast for my test.


Strange I got completelly different results:

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real0m16.407s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ ghc -fglasgow-exts  -O2 srchrep.hs --make -o srchrep.exe
Chasing modules from: srchrep.hs
Compiling Main ( srchrep.hs, srchrep.o )
Linking ...

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.156s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.672s
user0m0.015s
sys 0m0.015s

Now your version is fastest according to my machine, but it is not faster
with your test it's slower in compariton to replace1.

I've corrected my code so it is fastest with your test,still less then a 
second,

but slowest with mine.
Checked with your fail tests and compared results of these 2 tests.
Now should be ok.
I maintan now two lists one for successes and other for failures.
I also prettified code a bit .

searchReplace :: String-String-String - String
searchReplace sr rp xs = searchr sr rp xs  
  where
   searchr :: String-String-String-String-String - String
   searchr [] _ xs _ _ =  xs
   searchr _ _ [] _ _  = []
   searchr sr rp xs retBack rollBack
| isFound $ fnd rollBack = rp
 ++ searchr sr rp (remaining $ fnd 
rollBack )
  ( getRetBack $ fnd 
rollBack)
  ( getRollBack $ fnd 
rollBack)
| otherwise = reverse ((processed $ fnd rollBack) ++ 
rollBack)

  ++ searchr sr rp (remaining $ fnd rollBack)
   ( getRetBack $ fnd rollBack)
   ( getRollBack $ fnd 
rollBack)

   where fnd  = searchr' sr sr (reverse retBack ++ xs) 

   isFound = fst . fst
   remaining = snd . snd . fst
   getRollBack = snd . snd
   getRetBack = fst . snd
   processed = fst . snd . fst

   searchr' :: String-String-String-String-String
   - ((Bool,(String,String)),(String,String))
   searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
  searchr'' (drop (length rollBack) srch) srch' xs 
fndSoFar

(False,,) sr

   searchr'' :: String-String-String-String-(Bool,String,String)-Char
- ((Bool,(String,String)),(String,String))
   searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),(,))
   searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ 
rollBack ++ fnd,[])),(,))
   searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar 
(cnt,retBack,rollBack) s

 | sr == x = if cnt  sr' == x  isEmpty retBack
then searchr'' srs srs' xs fndSoFar 
(True,,x:rollBack) s
else if not (isEmpty retBack)  || not (isEmpty 
rollBack)

then searchr'' srs srch' xs fndSoFar
(True,(x:rollBack) ++ 
retBack,) s
else searchr'' srs srch' xs (x:fndSoFar) 
(True,,) s

 | otherwise = if isEmpty rollBack  isEmpty retBack
  then if s == x
  then ((False,(fndSoFar,xxs)),(,))
  else ((False,searchr''' s xs 
(x:fndSoFar)),(,))

  else if sr' == x  isEmpty retBack
  then ((False,(fndSoFar, xs)), 
(retBack,x:rollBack))
  else ((False,(fndSoFar, xxs)), 
(retBack,rollBack))



   searchr''' :: Char-String-String - (String,String)
   searchr''' sr [] fndSoFar = (fndSoFar,[])
   searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs 
(x:fndSoFar)

| otherwise = (fndSoFar,xxs)
   isEmpty [] = True
   isEmpty (a:as) = False
---



Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe