With [Char] and (Seq Char) the text is full unicode. With ByteString and ByteString.Lazy you are really using ByteString.Char8 and ByteString.Lazy.Char8
Here is a test (I saved the source file in utf8): import Text.Regex.TDFA text = "☮☯♲☢☣☠☃" regex = "(☢|☣)" search :: [[String]] search = text =~ regex main = do print text print regex print search in ghci this prints: *Main> main main "\9774\9775\9842\9762\9763\9760\9731" "(\9762|\9763)" [["\9762","\9762"],["\9763","\9763"]] So this works. Are you using bytestrings to hold unicode as utf-8 or utf-16 ? On Mar 20, 10:17 am, Jean-Philippe Bernardy <[email protected]> wrote: > Am I right that this library does not support unicode in regexes? > Searching for unicode strings in Yi does not work, but ny cursory > browsing of the code, I cannot find the reason why. > > Thanks, > JP. > > On Wed, Mar 18, 2009 at 1:23 PM, ChrisK <[email protected]> wrote: > > I have just uploaded the new regex-tdfa-1.1.0 to hackage. This version is a > > small performance update to the old regex-tdfa-1.0.0 version. > > > Previously all text (e.g. ByteString) being search was converted to String > > and sent through a single engine. > > > The new version uses a type class and SPECIALIZE pragmas to avoid converting > > to String. This should make adding support for searching other Char > > containers easy to do. > > > The new version includes six specialized engine loops to take advantage of > > obvious optimizations of the traversal. The previous version had only a > > couple of such engines. The new code paths have been tested for correctness > > and no performance degradations have shown up. > > > -- > > Chris > > _______________________________________________ > > Libraries mailing list > > [email protected] > >http://www.haskell.org/mailman/listinfo/libraries --~--~---------~--~----~------------~-------~--~----~ Yi development mailing list [email protected] http://groups.google.com/group/yi-devel -~----------~----~----~----~------~----~------~--~---
