Re: [Haskell-cafe] Bytestrings and [Char]
Hi Alberto, To some extent this already exists, it's just that nobody uses it. I believe it's the approach taken by the Edison libraries. Also the ListLike package provides the type classes ListLike, StringLike, and a few others. Neither seems to have become very popular despite having well-respected authors (Okasaki and Goerzon, respectively). Some container functions are already provided by other classes, namely Foldable, Traversable, and Monoid. The first bit, creating a tree of type classes suitable for all containers, is probably a few hours work. An automated system to determine the best implementation is significantly more difficult; I can't say if the scope would be appropriate for SoC. Best, John From: Alberto G. Corona agocor...@gmail.com Just a dream: -separate interface and implementation for all containers, via type classes -develop, by genetic programming techniques + quickcheck, a system that find the best container implementation for a particular program. Is that suitable for a Google Summer of Code project? 2010/3/23 Alberto G. Corona agocor...@gmail.com The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Once we have a tree of type classes suitable for all containers, as you said, theoretically it shouldn't very difficult to incorporate the testing of different instances for each class used in a program, besides testing different compilation flags in a genetic algoritm. This latter has already been done. http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/ http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/ To find automatically the best combination of class implementations and compilation flags in a single process would be very useful and would save a lot of manual testing. 2010/3/24 John Lato jwl...@gmail.com Hi Alberto, To some extent this already exists, it's just that nobody uses it. I believe it's the approach taken by the Edison libraries. Also the ListLike package provides the type classes ListLike, StringLike, and a few others. Neither seems to have become very popular despite having well-respected authors (Okasaki and Goerzon, respectively). Some container functions are already provided by other classes, namely Foldable, Traversable, and Monoid. The first bit, creating a tree of type classes suitable for all containers, is probably a few hours work. An automated system to determine the best implementation is significantly more difficult; I can't say if the scope would be appropriate for SoC. Best, John From: Alberto G. Corona agocor...@gmail.com Just a dream: -separate interface and implementation for all containers, via type classes -develop, by genetic programming techniques + quickcheck, a system that find the best container implementation for a particular program. Is that suitable for a Google Summer of Code project? 2010/3/23 Alberto G. Corona agocor...@gmail.com The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Hi Alberto I rather doubt a valuable set of type classes that is suitable for all containers exists, I'm afraid. If you consider containers as the containers package, the data structures are all (?) functorial - but they have different shapes, so e.g. a cons operation makes sense on the linear ones (Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is analogous to 'insert' but 'insert' exactly describes the operation on maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the 'cons' operation on lists exactly as it doesn't indicate the position of where it adds to the list. Now, if you partition the type classes into small groups you get over the fact that some operations make sense on certain 'shapes' of data structure, but there are still subtle type differences that aren't conducive to type classes - e.g. ByteString and Data.Text aren't functorial so map is type restricted: map :: (Word8 - Word8) - ByteString - ByteString Also, while ListLike does provide default definitions for many ops I'm guessing its more important for performance to redefine them, so the defaults definitions aren't adding much. There might be a couple of useful new type classes waiting in the wings, e.g a destroy one as per view in Data.Sequence, but I doubt that a proliferation of classes will generally make programs clearer or more efficient and it would introduce more instances of problems that we already have with Num type class. class Destroy f where destroy :: f a - (a, Maybe (f a)) Best wishes Stephen ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
I still think that getting other authors to use it would be the biggest difficulty. Another concern of mine is that RULEs-based fusion can be fragile; if the type classes prevent fusion from occurring you'll never approach the performance of monomorphic code. That said, I think this is worth pursuing further because of the benefits you describe. I have spent a great deal of time on exactly this issue with the iteratee package, although my needs there are relatively simple. As a general question to the Haskell community: have you ever attempted to write container-polymorphic code? I'd like to hear about either successes or stumbling blocks. On Wed, Mar 24, 2010 at 12:02 PM, Alberto G. Corona agocor...@gmail.com wrote: Once we have a tree of type classes suitable for all containers, as you said, theoretically it shouldn't very difficult to incorporate the testing of different instances for each class used in a program, besides testing different compilation flags in a genetic algoritm. This latter has already been done. http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with- To find automatically the best combination of class implementations and compilation flags in a single process would be very useful and would save a lot of manual testing. 2010/3/24 John Lato jwl...@gmail.com Hi Alberto, To some extent this already exists, it's just that nobody uses it. I believe it's the approach taken by the Edison libraries. Also the ListLike package provides the type classes ListLike, StringLike, and a few others. Neither seems to have become very popular despite having well-respected authors (Okasaki and Goerzon, respectively). Some container functions are already provided by other classes, namely Foldable, Traversable, and Monoid. The first bit, creating a tree of type classes suitable for all containers, is probably a few hours work. An automated system to determine the best implementation is significantly more difficult; I can't say if the scope would be appropriate for SoC. Best, John From: Alberto G. Corona agocor...@gmail.com Just a dream: -separate interface and implementation for all containers, via type classes -develop, by genetic programming techniques + quickcheck, a system that find the best container implementation for a particular program. Is that suitable for a Google Summer of Code project? 2010/3/23 Alberto G. Corona agocor...@gmail.com The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
I have a very specific StringLike typeclass in the web-encodings package so that I can- for example- to HTML entity encoding on String, (lazy) bytestrings and (lazy) text. Of course, I need to make assumptions about character encoding for the bytestring version. Michael On Wed, Mar 24, 2010 at 8:50 AM, John Lato jwl...@gmail.com wrote: I still think that getting other authors to use it would be the biggest difficulty. Another concern of mine is that RULEs-based fusion can be fragile; if the type classes prevent fusion from occurring you'll never approach the performance of monomorphic code. That said, I think this is worth pursuing further because of the benefits you describe. I have spent a great deal of time on exactly this issue with the iteratee package, although my needs there are relatively simple. As a general question to the Haskell community: have you ever attempted to write container-polymorphic code? I'd like to hear about either successes or stumbling blocks. On Wed, Mar 24, 2010 at 12:02 PM, Alberto G. Corona agocor...@gmail.com wrote: Once we have a tree of type classes suitable for all containers, as you said, theoretically it shouldn't very difficult to incorporate the testing of different instances for each class used in a program, besides testing different compilation flags in a genetic algoritm. This latter has already been done. http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with- To find automatically the best combination of class implementations and compilation flags in a single process would be very useful and would save a lot of manual testing. 2010/3/24 John Lato jwl...@gmail.com Hi Alberto, To some extent this already exists, it's just that nobody uses it. I believe it's the approach taken by the Edison libraries. Also the ListLike package provides the type classes ListLike, StringLike, and a few others. Neither seems to have become very popular despite having well-respected authors (Okasaki and Goerzon, respectively). Some container functions are already provided by other classes, namely Foldable, Traversable, and Monoid. The first bit, creating a tree of type classes suitable for all containers, is probably a few hours work. An automated system to determine the best implementation is significantly more difficult; I can't say if the scope would be appropriate for SoC. Best, John From: Alberto G. Corona agocor...@gmail.com Just a dream: -separate interface and implementation for all containers, via type classes -develop, by genetic programming techniques + quickcheck, a system that find the best container implementation for a particular program. Is that suitable for a Google Summer of Code project? 2010/3/23 Alberto G. Corona agocor...@gmail.com The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Wed, Mar 24, 2010 at 5:48 AM, Stephen Tetley stephen.tet...@gmail.comwrote: I rather doubt a valuable set of type classes that is suitable for all containers exists, I'm afraid. I don't think it's so clear cut. Stepanov's Elements of Programming lays out a pretty clear (and familiar to many Haskellers) path through the algebraic thicket of types, classes, and their properties, albeit in the much clunkier setting of C++ and traits. The disadvantage to this approach is substantial: just as with the from-principles approaches to redoing Haskell's numeric hierarchy, you end up with a fearsome and complex set of typeclasses that are difficult to learn and follow. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: [Haskell-cafe] Bytestrings and [Char]
-- Forwarded message -- From: Alberto G. Corona agocor...@gmail.com Date: 2010/3/24 Subject: Re: [Haskell-cafe] Bytestrings and [Char] To: Stephen Tetley stephen.tet...@gmail.com 2010/3/24 Stephen Tetley stephen.tet...@gmail.com Hi Alberto I rather doubt a valuable set of type classes that is suitable for all containers exists, I'm afraid. If you consider containers as the containers package, the data structures are all (?) functorial - but they have different shapes, so e.g. a cons operation makes sense on the linear ones (Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is analogous to 'insert' but 'insert' exactly describes the operation on maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the 'cons' operation on lists exactly as it doesn't indicate the position of where it adds to the list. Some operations like insert in list could be inefficient, some operations like cons in maps could be an inefficient hack, but at this does not means that a list can do create, insert, map , lookup faster that a Map in the case of sort lists, for example. A generalized cons applied to a Map can make less sense, but a program made for lists with occasional cons's can happen to perform better with Maps and so on. So a library with generality in mind can perform optimally in two or more different scenarios with different instances/implementations. . A genetic algoritm can have the opportunity to detect this. I think also that a complex list of class hierarchies is neither necessary nor recommended. We are not talking about mathemathical concepts. This is a performance issue. Even if a definitive class hierarchy is a problem, hopefully it may be worth to define tentative and temporal classes for performance testing purposes, that warps the common primitives with the sole purpose of executing the testing algorithm for different instance implementations, compilation flags and fusion rules. Now, if you partition the type classes into small groups you get over the fact that some operations make sense on certain 'shapes' of data structure, but there are still subtle type differences that aren't conducive to type classes - e.g. ByteString and Data.Text aren't functorial so map is type restricted: map :: (Word8 - Word8) - ByteString - ByteString For this purpose, specific testing classes for Word8 data -with no claim to be official can be defined by the programmer, leaving to the library user to run the testing process for the different instances in his particular program. I know that this add extra work and that not many would do this extra level of abstraction, but perhaps this is less work than spending even more time exchanging emails, discussing low level issues, performing tests, trying to figure out what will be the most common sceniarios for wich the library could be used, testing manually what is the container that has the most performance here and now, not tomorrow for these scenarios, to worry about the obsolescence of the library for reasons not related with the library algorithm etc Also, while ListLike does provide default definitions for many ops I'm guessing its more important for performance to redefine them, so the defaults definitions aren't adding much. There might be a couple of useful new type classes waiting in the wings, e.g a destroy one as per view in Data.Sequence, but I doubt that a proliferation of classes will generally make programs clearer or more efficient and it would introduce more instances of problems that we already have with Num type class. class Destroy f where destroy :: f a - (a, Maybe (f a)) Best wishes Stephen ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: [Haskell-cafe] Bytestrings and [Char]
2010/3/24 Stephen Tetley stephen.tet...@gmail.com If you consider containers as the containers package, the data structures are all (?) functorial - but they have different shapes, so e.g. a cons operation makes sense on the linear ones (Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is analogous to 'insert' but 'insert' exactly describes the operation on maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the 'cons' operation on lists exactly as it doesn't indicate the position of where it adds to the list. Some operations like insert in list could be inefficient, some operations like cons in maps could be an inefficient hack, but at this does not means that a list can do create, insert, map , lookup faster that a Map in the case of sort lists, for example. A generalized cons applied to a Map can make less sense, but a program made for lists with occasional cons's can happen to perform better with Maps and so on. So a library with generality in mind can perform optimally in two or more different scenarios with different instances/implementations. . A genetic algoritm can have the opportunity to detect this. I think also that a complex list of class hierarchies is neither necessary nor recommended. We are not talking about mathemathical concepts. This is a performance issue. Even if a definitive class hierarchy is a problem, hopefully it may be worth to define tentative and temporal classes for performance testing purposes, that warps the common primitives with the sole purpose of executing the testing algorithm for different instance implementations, compilation flags and fusion rules. Now, if you partition the type classes into small groups you get over the fact that some operations make sense on certain 'shapes' of data structure, but there are still subtle type differences that aren't conducive to type classes - e.g. ByteString and Data.Text aren't functorial so map is type restricted: map :: (Word8 - Word8) - ByteString - ByteString For this purpose, specific testing classes for Word8 data -with no claim to be official can be defined by the programmer, leaving to the library user to run the testing process for the different instances in his particular program. I know that this add extra work and that not many would do this extra level of abstraction, but perhaps this is less work than spending even more time exchanging emails, discussing low level issues, performing tests, trying to figure out what will be the most common sceniarios for wich the library could be used, testing manually what is the container that has the most performance here and now, not tomorrow for these scenarios, to worry about the obsolescence of the library for reasons not related with the library algorithm etc Regards ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On 22.03.2010 14:10, Johan Tibell wrote: On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtljohann.hoec...@gmail.com wrote: My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I read bytestrings over a socket which happens to be UTF16-LE encoded and identify a fitting function in Data.Text, I guess I have to transcode them with Data.Text.Encoding to make the type System happy? -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On 22.03.2010 14:15, Ivan Miljenovic wrote: On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com wrote: A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Though Data.Text still has the disadvantage of not being as nice to deal with as String, since you can't pattern match on it, etc. Whilst it may degrade performance, treating String as a list of characters rather than an array provides you with greater flexibility of how to deal with it. But doesn't this basically mean sacrifice performance / applicability for algorithmic beauty? Sure, pattern match comes in handy, but in the case of strings at a high price ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote: How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I read bytestrings over a socket which happens to be UTF16-LE encoded and identify a fitting function in Data.Text, I guess I have to transcode them with Data.Text.Encoding to make the type System happy? There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is just a more efficient encoding of [Word8], just as Text is a more efficient encoding of [Char]. If the file format you're parsing specifies that some series of bytes is text encoded as UTF16-LE, then you can use the Text decoders to convert to Text. Poor separation between bytes and characters has caused problems in many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon the advantages of correctness to chase a few percentage points of performance. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Hi, Am Dienstag, den 23.03.2010, 08:51 -0700 schrieb John Millikin: On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote: How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I read bytestrings over a socket which happens to be UTF16-LE encoded and identify a fitting function in Data.Text, I guess I have to transcode them with Data.Text.Encoding to make the type System happy? There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is just a more efficient encoding of [Word8], just as Text is a more efficient encoding of [Char]. If the file format you're parsing specifies that some series of bytes is text encoded as UTF16-LE, then you can use the Text decoders to convert to Text. It wold still be useful to have an alternative to Data.Text that internally stores strings as UTF8 encoded bytestrings. I tried to switch from String to Data.Text in arbtt (which mostly calls pcre-light, which expects and returns UTF8-encoded C-strings), and it became slower! No surprise, considering that the program has to re-encode the strings all the time. Using a newtype Text = Text { ByteString } with an interface akin to Data.Text, but using UTF8-encoded ByteStrings internally gave the same performance as String, at half the memory footprint. This is in an internal module¹ but I would find it handy to have this available as a common type in a well-supported library. Greetings, Joachim ¹ http://darcs.nomeata.de/arbtt/src/Data/MyText.hs -- Joachim nomeata Breitner mail: m...@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nome...@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 08:51:16AM -0700, John Millikin wrote: On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote: How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I read bytestrings over a socket which happens to be UTF16-LE encoded and identify a fitting function in Data.Text, I guess I have to transcode them with Data.Text.Encoding to make the type System happy? There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is just a more efficient encoding of [Word8], just as Text is a more efficient encoding of [Char]. If the file format you're parsing specifies that some series of bytes is text encoded as UTF16-LE, then you can use the Text decoders to convert to Text. Poor separation between bytes and characters has caused problems in many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon the advantages of correctness to chase a few percentage points of performance. I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… I've been struggling with this problem too and it's not nice. Every time one uses the system readFile friends (anything that doesn't read via ByteStrings), it hell slow. Test: read a file and compute its size in chars. Input text file is ~40MB in size, has one non-ASCII char. The test might seem stupid but it is a simple one. ghc 6.12.1. Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. String (system readfile + length) - ~1 second, correct length. For the record: python2.6 (str type) - ~60ms, incorrect length. python3.1 (unicode) - ~310ms, correct length. If anyone has a solution on how to work on fast text (unicode) transformations (but not a 1:1 pipeline where fusion can work nicely), I'd be glad to hear. iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On 18:11 Tue 23 Mar , Iustin Pop wrote: I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… I've been struggling with this problem too and it's not nice. Every time one uses the system readFile friends (anything that doesn't read via ByteStrings), it hell slow. Test: read a file and compute its size in chars. Input text file is ~40MB in size, has one non-ASCII char. The test might seem stupid but it is a simple one. ghc 6.12.1. Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. String (system readfile + length) - ~1 second, correct length. Is this a mistake? Your own report shows String readFile being an order of magnitude faster than everything else, contrary to your earlier claim. -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote: On 18:11 Tue 23 Mar , Iustin Pop wrote: I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… I've been struggling with this problem too and it's not nice. Every time one uses the system readFile friends (anything that doesn't read via ByteStrings), it hell slow. Test: read a file and compute its size in chars. Input text file is ~40MB in size, has one non-ASCII char. The test might seem stupid but it is a simple one. ghc 6.12.1. Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. String (system readfile + length) - ~1 second, correct length. Is this a mistake? Your own report shows String readFile being an order of magnitude faster than everything else, contrary to your earlier claim. No, it's not a mistake. String is faster than pack to Text and length, but it's 100 times slower than ByteString. My whole point is that difference between byte processing and char processing in Haskell is not a few percentage points, but order of magnitude. I would really like to have only the 6x penalty that Python shows, for example. regards, iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 05:53:00PM +, Vincent Hanquez wrote: On 23/03/10 17:11, Iustin Pop wrote: Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Why would you use system readFile and not bytestring readFile in the second case ? using the later i got a ~48x slowdown, not a 1100x slowdown, and the length is correct too. Hmm… because I didn't think of that :) iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
2010/3/23 Iustin Pop iu...@k1024.org I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… Well, your benchmarks are highly suspect. See below. Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. You should be using bytestring I/O for this. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. You should be using text I/O for this. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
BOS: Well, your benchmarks are highly suspect. Attached is another benchmark with similar results. This is no criterion benchmark but I did try to adjust a wee bit for cache issues. Suffice to say I am not yet impressed with Data.Text performance wise. In the broader scope I feel there is a deeper point here: Data.ByteString is how most data is available in a program (be it from a file, network, or other library/device). With all this data we really should have some sort of zero-copy safeCoerce that can expose the bytestring as an O(1) unboxed array of some safe length. I know this would have been nice for pureMD5, which did use an ugly unsafePerformIO hack just to get Word16s but even those got boxed up at horrible cost (it now uses 'cereal' to get the Word16 - even worse performance but it lets me ignore endianess of the architecture). OT Beating the dead horse: I once wrote and deleted a blog post ranting about how to get Word16 in C vs in Haskell: word = ((uint16_t *)p)[index]; or word = unsafePerformIO $ withForeignPtr ptr $ \ptr' - let p = castPtr (plusPtr ptr' off) in peekElemOff p index That Haskell snippet takes significantly longer to both write and run. END OT Code and benchmark numbers included below, all complaints about the accuracy of the benchmark are welcome. NOTE: tLog is a daily #haskell log file repeated many times ~ 59MB. --- [to...@mavlo Test]$ ./read tLog Normal String + System.IO 61443120: 1.467924s Data.ByteString.Lazy 61450365: 0.023285s Data.ByteString.Lazy.UTF8 61443120: 3.305154s Data.Text.Lazy 61443120: 3.99178s - CODE - import Data.Time import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.UTF8 as U import qualified Data.Text.Lazy as T import qualified Data.Text.Lazy.IO as TO import System.IO import System.Environment (getArgs) import Control.Monad (sequence_, when) main = do [file] - getArgs let time (f, desc) = do s - getCurrentTime r - f let !r' = r t - getCurrentTime let d = diffUTCTime t s when (length desc 0) (putStrLn $ desc ++ ++ show r' ++ : ++ show d) ops = [ (readFile file = return . show . length, ) , (readFile file = return . show . length, Normal String + System.IO) , (L.readFile file = return . show . L.length, ) , (L.readFile file = return . show . L.length, Data.ByteString.Lazy) , (L.readFile file = return . show . U.length, ) , (L.readFile file = return . show . U.length, Data.ByteString.Lazy.UTF8) , (TO.readFile file = return . show . T.length, ) , (TO.readFile file = return . show . T.length, Data.Text.Lazy) ] mapM_ time ops ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On 18:25 Tue 23 Mar , Iustin Pop wrote: On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote: On 18:11 Tue 23 Mar , Iustin Pop wrote: I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… I've been struggling with this problem too and it's not nice. Every time one uses the system readFile friends (anything that doesn't read via ByteStrings), it hell slow. Test: read a file and compute its size in chars. Input text file is ~40MB in size, has one non-ASCII char. The test might seem stupid but it is a simple one. ghc 6.12.1. Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. String (system readfile + length) - ~1 second, correct length. Is this a mistake? Your own report shows String readFile being an order of magnitude faster than everything else, contrary to your earlier claim. No, it's not a mistake. String is faster than pack to Text and length, but it's 100 times slower than ByteString. Only if you don't care about obtaining the correct answer, in which case you may as well just say const 42 or somesuch, which is even faster. My whole point is that difference between byte processing and char processing in Haskell is not a few percentage points, but order of magnitude. I would really like to have only the 6x penalty that Python shows, for example. Hang on a second... less than 10 milliseconds to read 40 megabytes from disk? Something's fishy here. I ran my own tests with a 400M file (419430400 bytes) consisting almost exclusively of the letter 'a' with two Japanese characters placed at every multiple of 40 megabytes (UTF-8 encoded). With Prelude.readFile/length and 5 runs, I see 10145ms, 10087ms, 10223ms, 10321ms, 10216ms. with approximately 10% of that time spent performing GC each run. With Data.Bytestring.Lazy.readFile/length and 5 runs, I see 8223ms, 8192ms, 8077ms, 8091ms, 8174ms. with approximately 20% of that time spent performing GC each run. Maybe there's some magic command line options to tune the GC for our purposes, but I only managed to make things slower. Thus, I'll handwave a bit and just shave off the GC time from each result. Prelude: 9178ms mean with a standard deviation of 159ms. Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms. Therefore, we managed a throughput of 43 MB/s with the Prelude (and got the right answer), while we managed 61 MB/s with lazy ByteStrings (and got the wrong answer). My disk won't go much, if at all, faster than the second result, so that's good. So that's a 30% reduction in throughput. I'd say that's a lot worse than a few percentage points, but certainly not orders of magnitude. On the other hand, using Data.ByteString.Lazy.readFile and Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with approximately 5% of that time spent in GC, which is rather worse than the Prelude. Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are even worse, with results of around 25 *seconds* (!!) and 2% of that time spent in GC. GNU wc computes the correct answer as quickly as lazy bytestrings compute the wrong answer. With perl 5.8, slurping the entire file as UTF-8 computes the correct answer just as slowly as Prelude. In my first ever Python program (with python 2.6), I tried to read the entire file as a unicode string and it quickly crashes due to running out of memory (yikes!), so it earns a DNF. So, for computing the right answer with this simple test, it looks like the Prelude is the best option. We tie with Perl and lose only to GNU wc (which is written in C). Really, though, it would be nice to close that gap. -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 03:31:33PM -0400, Nick Bowler wrote: On 18:25 Tue 23 Mar , Iustin Pop wrote: On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote: On 18:11 Tue 23 Mar , Iustin Pop wrote: I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… I've been struggling with this problem too and it's not nice. Every time one uses the system readFile friends (anything that doesn't read via ByteStrings), it hell slow. Test: read a file and compute its size in chars. Input text file is ~40MB in size, has one non-ASCII char. The test might seem stupid but it is a simple one. ghc 6.12.1. Data.ByteString.Lazy (bytestring readFile + length) - 10 miliseconds, incorrect length (as expected). Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. String (system readfile + length) - ~1 second, correct length. Is this a mistake? Your own report shows String readFile being an order of magnitude faster than everything else, contrary to your earlier claim. No, it's not a mistake. String is faster than pack to Text and length, but it's 100 times slower than ByteString. Only if you don't care about obtaining the correct answer, in which case you may as well just say const 42 or somesuch, which is even faster. My whole point is that difference between byte processing and char processing in Haskell is not a few percentage points, but order of magnitude. I would really like to have only the 6x penalty that Python shows, for example. Hang on a second... less than 10 milliseconds to read 40 megabytes from disk? Something's fishy here. Of course I don't want to benchmark the disk, and therefore the source file is on tmpfs. I ran my own tests with a 400M file (419430400 bytes) consisting almost exclusively of the letter 'a' with two Japanese characters placed at every multiple of 40 megabytes (UTF-8 encoded). With Prelude.readFile/length and 5 runs, I see 10145ms, 10087ms, 10223ms, 10321ms, 10216ms. with approximately 10% of that time spent performing GC each run. With Data.Bytestring.Lazy.readFile/length and 5 runs, I see 8223ms, 8192ms, 8077ms, 8091ms, 8174ms. with approximately 20% of that time spent performing GC each run. Maybe there's some magic command line options to tune the GC for our purposes, but I only managed to make things slower. Thus, I'll handwave a bit and just shave off the GC time from each result. Prelude: 9178ms mean with a standard deviation of 159ms. Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms. Therefore, we managed a throughput of 43 MB/s with the Prelude (and got the right answer), while we managed 61 MB/s with lazy ByteStrings (and got the wrong answer). My disk won't go much, if at all, faster than the second result, so that's good. I'll bet that for a 400MB file, if you have more than two 2GB of ram, most of it will be cached. If you want to check Haskell performance, just copy it to a tmpfs filesytem so that the disk is out of the equation. So that's a 30% reduction in throughput. I'd say that's a lot worse than a few percentage points, but certainly not orders of magnitude. Because you're possibly benchmarking the disk also. With a 400MB file on tmpfs, lazy bytestring readfile + length takes on my machine ~150ms, which is way faster than 8 seconds… On the other hand, using Data.ByteString.Lazy.readFile and Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with approximately 5% of that time spent in GC, which is rather worse than the Prelude. Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are even worse, with results of around 25 *seconds* (!!) and 2% of that time spent in GC. GNU wc computes the correct answer as quickly as lazy bytestrings compute the wrong answer. With perl 5.8, slurping the entire file as UTF-8 computes the correct answer just as slowly as Prelude. In my first ever Python program (with python 2.6), I tried to read the entire file as a unicode string and it quickly crashes due to running out of memory (yikes!), so it earns a DNF. So, for computing the right answer with this simple test, it looks like the Prelude is the best option. We tie with Perl and lose only to GNU wc (which is written in C). Really, though, it would be nice to close that gap. Totally agreed :) iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Tue, Mar 23, 2010 at 11:22:23AM -0700, Bryan O'Sullivan wrote: 2010/3/23 Iustin Pop iu...@k1024.org I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… Well, your benchmarks are highly suspect. See below. Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. You should be using bytestring I/O for this. As I was told, indeed :) Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. You should be using text I/O for this. I didn't realize Data.Text.IO exists. Thanks for the hint! iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Bryan O'Sullivan wrote: 2010/3/23 Iustin Pop iu...@k1024.org mailto:iu...@k1024.org I agree with the principle of correctness, but let's be honest - it's (many) orders of magnitude between ByteString and String and Text, not just a few percentage points… Well, your benchmarks are highly suspect. See below. Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11 seconds, correct length. You should be using bytestring I/O for this. Data.Text.Lazy (system readFile + pack + length) - 26s, correct length. You should be using text I/O for this. This is certainly true but adds some unfortunate bloat and has the taste of Java libraries, which define a plethora of similar, but not interchangeable methods due to legacy. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: [Haskell-cafe] Bytestrings and [Char]
The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. 2010/3/23 Johann Höchtl johann.hoec...@gmail.com On 22.03.2010 14:15, Ivan Miljenovic wrote: On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com wrote: A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Though Data.Text still has the disadvantage of not being as nice to deal with as String, since you can't pattern match on it, etc. Whilst it may degrade performance, treating String as a list of characters rather than an array provides you with greater flexibility of how to deal with it. But doesn't this basically mean sacrifice performance / applicability for algorithmic beauty? Sure, pattern match comes in handy, but in the case of strings at a high price ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Fwd: [Haskell-cafe] Bytestrings and [Char]
Just a dream: -separate interface and implementation for all containers, via type classes -develop, by genetic programming techniques + quickcheck, a system that find the best container implementation for a particular program. Is that suitable for a Google Summer of Code project? 2010/3/23 Alberto G. Corona agocor...@gmail.com The question can be generalized via type classes: Is there any common set of primitives encapsulated into a single type class that has instances for Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays, vectors and wathever container that can store an boxed, unboxed, packed unpacked sequence of wathever including chars? All of them have folds, heads, tails and a lot of common functions with the same name, but since there is not a single type class, the library programmer can not abstract his code when it is possible, so the library user can chose the particular instance for his particular problem. 2010/3/23 Johann Höchtl johann.hoec...@gmail.com On 22.03.2010 14:15, Ivan Miljenovic wrote: On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com wrote: A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Though Data.Text still has the disadvantage of not being as nice to deal with as String, since you can't pattern match on it, etc. Whilst it may degrade performance, treating String as a list of characters rather than an array provides you with greater flexibility of how to deal with it. But doesn't this basically mean sacrifice performance / applicability for algorithmic beauty? Sure, pattern match comes in handy, but in the case of strings at a high price ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Bytestrings and [Char]
Hello, I was recentyl playing with Haskell (GHC that is) IO and text processing. Bytestrings and Lazy Bytestrings allow for fast and memory eficient string (well, bytestring) handling, yet a lot of libraries do not support them (yet) Given the incredibly inneficient memory representation of [Char] (16 bytes? per cell) I wonder wheather String should default to lazy batestring altogether instead of [Char]. The levenshtein distance as is on hackage uses e.g. String ([Char]) and as such is unneccessarily slow. My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? Regards, Johann ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Johann Höchtl johann.hoec...@gmail.com writes: Bytestrings and Lazy Bytestrings allow for fast and memory eficient string (well, bytestring) handling, yet a lot of libraries do not support them (yet) WHat do you mean? A lot of libraries need to use String because it's easier to deal with and doesn't rely on knowing which encoding is being used. Bytestring is often nicer for IO, but for internal library stuff (e.g. parsing) it may still be easier to use String. Given the incredibly inneficient memory representation of [Char] (16 bytes? per cell) I wonder wheather String should default to lazy batestring altogether instead of [Char]. Well, then all pattern matching, etc. fails. String and ByteStrings have their own separate places and uses. The levenshtein distance as is on hackage uses e.g. String ([Char]) and as such is unneccessarily slow. Have you ported a version to ByteString and demonstrated that it is noticeably faster? Are you sure that it isn't the fault of the algorithm being used? Do you know that people only care about the levenshtein distance of ByteStrings and not of normal Strings? (Disclaimer: I've never even looked at the package.) Providing a ByteString version as well might be a viable option; replacing the current one is quite likely not. My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? I believe I've answered this above. -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com wrote: My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On 23 March 2010 00:10, Johan Tibell johan.tib...@gmail.com wrote: A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Though Data.Text still has the disadvantage of not being as nice to deal with as String, since you can't pattern match on it, etc. Whilst it may degrade performance, treating String as a list of characters rather than an array provides you with greater flexibility of how to deal with it. -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.comwrote: On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com wrote: My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Slight correction. A sequence of bytes is exactly the same thing as a sequence of Unicode bytes when you use UTF8. -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
David Leimbach wrote: On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.com mailto:johan.tib...@gmail.com wrote: On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com mailto:johann.hoec...@gmail.com wrote: My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Slight correction. A sequence of bytes is exactly the same thing as a sequence of Unicode bytes when you use UTF8. What is a Unicode byte? Cheers, Jochem -- Jochem Berndsen | joc...@functor.nl ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Turn on OverloadedStrings and you can pattern match on any type you like that is in the IsString class. Which means that Data.Text can use string literals just like regular strings (but you can't use Char literals in the match). On Mon, Mar 22, 2010 at 1:15 PM, Ivan Miljenovic ivan.miljeno...@gmail.com wrote: On 23 March 2010 00:10, Johan Tibell johan.tib...@gmail.com wrote: A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Though Data.Text still has the disadvantage of not being as nice to deal with as String, since you can't pattern match on it, etc. Whilst it may degrade performance, treating String as a list of characters rather than an array provides you with greater flexibility of how to deal with it. -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bytestrings and [Char]
Hi David Leimbach wrote: On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.com wrote: On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com wrote: My question or discussion point: Why not depreciate [Char] altogether and favour of lazy Bytestrings? A sequence of bytes is not the same thing as a sequence of Unicode code points. If you want to replace String by something more efficient have a look at Data.Text. Slight correction. A sequence of bytes is exactly the same thing as a sequence of Unicode bytes when you use UTF8. And a sequence of bytes is exactly the same thing as a UTF-32 string, when the sequence is encoded as UTF-32. But that is not the point. The point is that when some function is handed a ByteString, the ByteString can be encoded in many ways. It do not even have to be text. So you need a UTF-8 String type. /Mads -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe