[Haskell-cafe] Searching of several substrings (with Data.Text ?)
Hi, For my font library I need A function that can handle ligatures. It can be explained best with an example: f [Th, ff, fi, fl, ffi] The fluffiest bunny should be evaluated to [Th, e, , fl, u, ffi, e, s, t, , b, u, n, n, y ] I looked at Data.Text http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html and http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html but they don't have a function that can search several substrings in one run. I guess that searching a text again and again for every substring is not very efficient and it can be done in one run. Although I may figure this out myself I think such a function could be so common that someone has done it or can give me some tips. Thanks ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)
... a function that can search several substrings in one run. use regular expressions? (the regexp can be compiled into a finite automaton that scans the string just once.) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)
On Tuesday 05 July 2011, 20:01:26, Tillmann Vogt wrote: Hi, For my font library I need A function that can handle ligatures. It can be explained best with an example: f [Th, ff, fi, fl, ffi] The fluffiest bunny should be evaluated to [Th, e, , fl, u, ffi, e, s, t, , b, u, n, n, y ] I looked at Data.Text http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text. html and http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/ Data-ByteString-Search.html but they don't have a function that can search several substrings in one run. Well, Data.ByteString[.Lazy].KarpRabin does provide simultaneous search for multiple substrings - it does not, however, provide direct splitting. But in my tests, unless the number of substrings was large, multiple separate (Boyer-Moore) passes with manual merging of the offset lists were much faster [there's a possible space leak for lazy ByteStrings if any pattern does not appear in a long substring of the source, so that would be a point for Data.ByteString.Lazy.KarpRabin], and I didn't know of any real use-case, so I did not pursue it further and considered it just an interesting curiosity. I suppose using a regex package as Johannes Waldmann suggested would be the easiest way (probably also faster). If you submit a feature request, however, I would look into expanding the offered functionality (but I'll be on vacation soon, so performance would have to wait; I suppose something could be gained there). I guess that searching a text again and again for every substring is not very efficient and it can be done in one run. Well, it is surprisingly efficient, compared to (my implementation of) the Karp-Rabin algorithm at least. Although I may figure this out myself I think such a function could be so common that someone has done it or can give me some tips. Thanks Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)
On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt tillmann.v...@rwth-aachen.de wrote: I looked at Data.Text http://hackage.haskell.org/** packages/archive/text/0.5/doc/**html/Data-Text.htmlhttp://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html and http://hackage.haskell.org/**packages/archive/stringsearch/** 0.3.3/doc/html/Data-**ByteString-Search.htmlhttp://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html but they don't have a function that can search several substrings in one run. Here's what you want: http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)
I've been looking into building parsers at runtime (from a config file), and in my case it's beneficial to fit them into the context of a larger parser with Attoparsec.Text. This code is untested for practical use so I doubt you'll see comparable performance to the aforementioned regex packages, but it could be worth exploring if you need to mix and match parsers or if the definitions can change arbitrarily at runtime. import qualified Data.Text as T import Data.Attoparsec.Text import Control.Applicative ((|)) parseLigature x = string (T.pack x) charToText = do c - anyChar return (T.singleton c) buildChain [x]= parseLigature x buildChain (x:xs) = try (parseLigature x) | buildChain xs -- ordering matters here, so ffi comes before ff or fi ligatures = buildChain [ffi, th, ff, fi, fl] myParser = many (try ligatures | charToText) -- at ghci prompt: parseOnly myParser (T.pack the fluffiest bunny) -- Right [th,e, ,fl,u,ffi,e,s,t, ,b,u,n,n,y] On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivan b...@serpentine.com wrote: On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt tillmann.v...@rwth-aachen.de wrote: I looked at Data.Text http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html and http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html but they don't have a function that can search several substrings in one run. Here's what you want: http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Searching of several substrings (with Data.Text ?)
Am 05.07.2011 21:29, schrieb Eric Rasmussen: I've been looking into building parsers at runtime (from a config file), and in my case it's beneficial to fit them into the context of a larger parser with Attoparsec.Text. This code is untested for practical use so I doubt you'll see comparable performance to the aforementioned regex packages, but it could be worth exploring if you need to mix and match parsers or if the definitions can change arbitrarily at runtime. import qualified Data.Text as T import Data.Attoparsec.Text import Control.Applicative ((|)) parseLigature x = string (T.pack x) charToText = do c- anyChar return (T.singleton c) buildChain [x]= parseLigature x buildChain (x:xs) = try (parseLigature x)| buildChain xs -- ordering matters here, so ffi comes before ff or fi ligatures = buildChain [ffi, th, ff, fi, fl] myParser = many (try ligatures| charToText) -- at ghci prompt: parseOnly myParser (T.pack the fluffiest bunny) -- Right [th,e, ,fl,u,ffi,e,s,t, ,b,u,n,n,y] Of course parsec! I should have thought of this. icu seems to be the best solution (I already considered it for parsing character references), but it is not so easy to install on windows. So I wait until cabal does this or it is integrated into the haskell platform. Thank you all for your help (especially the attoparsec example) On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivanb...@serpentine.com wrote: On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt tillmann.v...@rwth-aachen.de wrote: I looked at Data.Text http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html and http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html but they don't have a function that can search several substrings in one run. Here's what you want: http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe