[Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Tillmann Vogt

Hi,

For my font library I need A function that can handle ligatures. It can 
be explained best with an example:


f  [Th, ff, fi, fl, ffi]  The fluffiest bunny

should be evaluated to

[Th, e,  , fl, u, ffi, e, s, t,  , b, u, n, 
n, y ]


I looked at Data.Text 
http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
and 
http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html


but they don't have a function that can search several substrings in one 
run.
I guess that searching a text again and again for every substring is not 
very efficient and it can be done in one run.
Although I may figure this out myself I think such a function could be 
so common that someone has done it or can give me some tips.


Thanks

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Johannes Waldmann
 ... a function that can search several substrings in one run.

use regular expressions? (the regexp can be compiled
into a finite automaton that scans the string just once.)




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Daniel Fischer
On Tuesday 05 July 2011, 20:01:26, Tillmann Vogt wrote:
 Hi,
 
 For my font library I need A function that can handle ligatures. It can
 be explained best with an example:
 
 f  [Th, ff, fi, fl, ffi]  The fluffiest bunny
 
 should be evaluated to
 
 [Th, e,  , fl, u, ffi, e, s, t,  , b, u, n,
 n, y ]
 
 I looked at Data.Text
 http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.
 html and
 http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/
 Data-ByteString-Search.html
 
 but they don't have a function that can search several substrings in one
 run.

Well, Data.ByteString[.Lazy].KarpRabin does provide simultaneous search for 
multiple substrings - it does not, however, provide direct splitting.

But in my tests, unless the number of substrings was large, multiple 
separate (Boyer-Moore) passes with manual merging of the offset lists were 
much faster [there's a possible space leak for lazy ByteStrings if any 
pattern does not appear in a long substring of the source, so that would be 
a point for Data.ByteString.Lazy.KarpRabin], and I didn't know of any real 
use-case, so I did not pursue it further and considered it just an 
interesting curiosity.

I suppose using a regex package as Johannes Waldmann suggested would be the 
easiest way (probably also faster).

If you submit a feature request, however, I would look into expanding the 
offered functionality (but I'll be on vacation soon, so performance would 
have to wait; I suppose something could be gained there).

 I guess that searching a text again and again for every substring is not
 very efficient and it can be done in one run.

Well, it is surprisingly efficient, compared to (my implementation of) the 
Karp-Rabin algorithm at least.

 Although I may figure this out myself I think such a function could be
 so common that someone has done it or can give me some tips.
 
 Thanks

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Bryan O'Sullivan
On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt tillmann.v...@rwth-aachen.de
 wrote:

 I looked at Data.Text http://hackage.haskell.org/**
 packages/archive/text/0.5/doc/**html/Data-Text.htmlhttp://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
 and http://hackage.haskell.org/**packages/archive/stringsearch/**
 0.3.3/doc/html/Data-**ByteString-Search.htmlhttp://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html

 but they don't have a function that can search several substrings in one
 run.


Here's what you want:
http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Eric Rasmussen
I've been looking into building parsers at runtime (from a config
file), and in my case it's beneficial to fit them into the context of
a larger parser with Attoparsec.Text. This code is untested for
practical use so I doubt you'll see comparable performance to the
aforementioned regex packages, but it could be worth exploring if you
need to mix and match parsers or if the definitions can change
arbitrarily at runtime.

import qualified Data.Text as T
import Data.Attoparsec.Text
import Control.Applicative ((|))

parseLigature x = string (T.pack x)

charToText = do c - anyChar
return (T.singleton c)

buildChain [x]= parseLigature x
buildChain (x:xs) = try (parseLigature x) | buildChain xs

-- ordering matters here, so ffi comes before ff or fi
ligatures = buildChain [ffi, th, ff, fi, fl]

myParser = many (try ligatures | charToText)

-- at ghci prompt: parseOnly myParser (T.pack the fluffiest bunny)
-- Right [th,e, ,fl,u,ffi,e,s,t, ,b,u,n,n,y]




On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivan b...@serpentine.com wrote:
 On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt
 tillmann.v...@rwth-aachen.de wrote:

 I looked at Data.Text
 http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
 and
 http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html

 but they don't have a function that can search several substrings in one
 run.

 Here's what you want:
 http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)

2011-07-05 Thread Tillmann Vogt

Am 05.07.2011 21:29, schrieb Eric Rasmussen:

I've been looking into building parsers at runtime (from a config
file), and in my case it's beneficial to fit them into the context of
a larger parser with Attoparsec.Text. This code is untested for
practical use so I doubt you'll see comparable performance to the
aforementioned regex packages, but it could be worth exploring if you
need to mix and match parsers or if the definitions can change
arbitrarily at runtime.

import qualified Data.Text as T
import Data.Attoparsec.Text
import Control.Applicative ((|))

parseLigature x = string (T.pack x)

charToText = do c- anyChar
 return (T.singleton c)

buildChain [x]= parseLigature x
buildChain (x:xs) = try (parseLigature x)|  buildChain xs

-- ordering matters here, so ffi comes before ff or fi
ligatures = buildChain [ffi, th, ff, fi, fl]

myParser = many (try ligatures|  charToText)

-- at ghci prompt: parseOnly myParser (T.pack the fluffiest bunny)
-- Right [th,e, ,fl,u,ffi,e,s,t, ,b,u,n,n,y]


Of course parsec! I should have thought of this.
icu seems to be the best solution (I already considered it for parsing 
character references), but it is not so easy to install on windows. So I 
wait until cabal does this or it is integrated into the haskell platform.


Thank you all for your help (especially the attoparsec example)





On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivanb...@serpentine.com  wrote:

On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt
tillmann.v...@rwth-aachen.de  wrote:

I looked at Data.Text
http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
and
http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html

but they don't have a function that can search several substrings in one
run.

Here's what you want:
http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe