subject:"\[Haskell\-cafe\] Bytestrings and \[Char\]"

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread John Lato

Hi Alberto,

To some extent this already exists, it's just that nobody uses it.  I
believe it's the approach taken by the Edison libraries.  Also the
ListLike package provides the type classes ListLike, StringLike, and a
few others.  Neither seems to have become very popular despite having
well-respected authors (Okasaki and Goerzon, respectively).

Some container functions are already provided by other classes, namely
Foldable, Traversable, and Monoid.

The first bit, creating a tree of type classes suitable for all
containers, is probably a few hours work.

An automated system to determine the best implementation is
significantly more difficult; I can't say if the scope would be
appropriate for SoC.

Best,
John

 From: Alberto G. Corona  agocor...@gmail.com

 Just a dream:
 -separate interface and implementation for all containers, via type classes
 -develop, by genetic programming techniques + quickcheck, a system that find
 the best container implementation for a particular program.

 Is that suitable for a Google Summer of Code project?

 2010/3/23 Alberto G. Corona agocor...@gmail.com

 The question can be generalized via type classes: Is there any common set of
 primitives encapsulated into a single type class that has instances for
 Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
 vectors and wathever container that can store an boxed, unboxed, packed
 unpacked sequence of wathever including chars? All of them have folds,
 heads, tails and a lot of common functions with the same name, but since
 there is not a single type class, the library programmer can not abstract
 his code when it is possible, so the library user can chose the particular
 instance for his particular problem.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Alberto G. Corona

Once we have a  tree of type classes suitable for all containers, as you
said, theoretically it  shouldn't very difficult to incorporate the testing
of different instances for each class used in a program, besides  testing
different compilation flags in a genetic algoritm. This latter has already
been done.
http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/

http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/

To find automatically the best combination of class implementations and
compilation flags in a single process would be very useful and would save a
lot of manual testing.


2010/3/24 John Lato jwl...@gmail.com

 Hi Alberto,

 To some extent this already exists, it's just that nobody uses it.  I
 believe it's the approach taken by the Edison libraries.  Also the
 ListLike package provides the type classes ListLike, StringLike, and a
 few others.  Neither seems to have become very popular despite having
 well-respected authors (Okasaki and Goerzon, respectively).

 Some container functions are already provided by other classes, namely
 Foldable, Traversable, and Monoid.

 The first bit, creating a tree of type classes suitable for all
 containers, is probably a few hours work.

 An automated system to determine the best implementation is
 significantly more difficult; I can't say if the scope would be
 appropriate for SoC.

 Best,
 John

  From: Alberto G. Corona  agocor...@gmail.com
 
  Just a dream:
  -separate interface and implementation for all containers, via type
 classes
  -develop, by genetic programming techniques + quickcheck, a system that
 find
  the best container implementation for a particular program.
 
  Is that suitable for a Google Summer of Code project?
 
  2010/3/23 Alberto G. Corona agocor...@gmail.com
 
  The question can be generalized via type classes: Is there any common set
 of
  primitives encapsulated into a single type class that has instances for
  Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
  vectors and wathever container that can store an boxed, unboxed, packed
  unpacked sequence of wathever including chars? All of them have folds,
  heads, tails and a lot of common functions with the same name, but since
  there is not a single type class, the library programmer can not
 abstract
  his code when it is possible, so the library user can chose the
 particular
  instance for his particular problem.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Stephen Tetley

Hi Alberto

I rather doubt a valuable set of type classes that is suitable for all
containers exists, I'm afraid.

If you consider containers as the containers package, the data
structures are all (?) functorial  - but they have different shapes,
so e.g. a cons operation makes sense on the linear ones
(Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is
analogous to 'insert' but 'insert' exactly describes the operation on
maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the
'cons' operation on lists exactly as it doesn't indicate the position
of where it adds to the list.

Now, if you partition the type classes into small groups you get over
the fact that some operations make sense on certain 'shapes' of data
structure, but there are still subtle type differences that aren't
conducive to type classes - e.g. ByteString and Data.Text aren't
functorial so map is type restricted:

map  :: (Word8  - Word8) - ByteString  - ByteString

Also, while ListLike does provide default definitions for many ops I'm
guessing its more important for performance to redefine them, so the
defaults definitions aren't adding much.

There might be a couple of useful new type classes waiting in the
wings, e.g a destroy one as per view in Data.Sequence, but I doubt
that a proliferation of classes will generally make programs clearer
or more efficient and it would introduce more instances of problems
that we already have with Num type class.

class Destroy f where
  destroy :: f a - (a, Maybe (f a))

Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread John Lato

I still think that getting other authors to use it would be the
biggest difficulty.  Another concern of mine is that RULEs-based
fusion can be fragile; if the type classes prevent fusion from
occurring you'll never approach the performance of monomorphic code.

That said, I think this is worth pursuing further because of the
benefits you describe.  I have spent a great deal of time on exactly
this issue with the iteratee package, although my needs there are
relatively simple.

As a general question to the Haskell community: have you ever
attempted to write container-polymorphic code?  I'd like to hear about
either successes or stumbling blocks.

On Wed, Mar 24, 2010 at 12:02 PM, Alberto G. Corona agocor...@gmail.com wrote:
 Once we have a  tree of type classes suitable for all containers, as you
 said, theoretically it  shouldn't very difficult to incorporate the testing
 of different instances for each class used in a program, besides  testing
 different compilation flags in a genetic algoritm. This latter has already
 been done.
 http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-
 To find automatically the best combination of class implementations and
 compilation flags in a single process would be very useful and would save a
 lot of manual testing.

 2010/3/24 John Lato jwl...@gmail.com

 Hi Alberto,

 To some extent this already exists, it's just that nobody uses it.  I
 believe it's the approach taken by the Edison libraries.  Also the
 ListLike package provides the type classes ListLike, StringLike, and a
 few others.  Neither seems to have become very popular despite having
 well-respected authors (Okasaki and Goerzon, respectively).

 Some container functions are already provided by other classes, namely
 Foldable, Traversable, and Monoid.

 The first bit, creating a tree of type classes suitable for all
 containers, is probably a few hours work.

 An automated system to determine the best implementation is
 significantly more difficult; I can't say if the scope would be
 appropriate for SoC.

 Best,
 John

  From: Alberto G. Corona  agocor...@gmail.com
 
  Just a dream:
  -separate interface and implementation for all containers, via type
  classes
  -develop, by genetic programming techniques + quickcheck, a system that
  find
  the best container implementation for a particular program.
 
  Is that suitable for a Google Summer of Code project?
 
  2010/3/23 Alberto G. Corona agocor...@gmail.com
 
  The question can be generalized via type classes: Is there any common
  set of
  primitives encapsulated into a single type class that has instances for
  Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
  vectors and wathever container that can store an boxed, unboxed, packed
  unpacked sequence of wathever including chars? All of them have folds,
  heads, tails and a lot of common functions with the same name, but
  since
  there is not a single type class, the library programmer can not
  abstract
  his code when it is possible, so the library user can chose the
  particular
  instance for his particular problem.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Michael Snoyman

I have a very specific StringLike typeclass in the web-encodings package so
that I can- for example- to HTML entity encoding on String, (lazy)
bytestrings and (lazy) text. Of course, I need to make assumptions about
character encoding for the bytestring version.

Michael

On Wed, Mar 24, 2010 at 8:50 AM, John Lato jwl...@gmail.com wrote:

 I still think that getting other authors to use it would be the
 biggest difficulty.  Another concern of mine is that RULEs-based
 fusion can be fragile; if the type classes prevent fusion from
 occurring you'll never approach the performance of monomorphic code.

 That said, I think this is worth pursuing further because of the
 benefits you describe.  I have spent a great deal of time on exactly
 this issue with the iteratee package, although my needs there are
 relatively simple.

 As a general question to the Haskell community: have you ever
 attempted to write container-polymorphic code?  I'd like to hear about
 either successes or stumbling blocks.

 On Wed, Mar 24, 2010 at 12:02 PM, Alberto G. Corona agocor...@gmail.com
 wrote:
  Once we have a  tree of type classes suitable for all containers, as you
  said, theoretically it  shouldn't very difficult to incorporate the
 testing
  of different instances for each class used in a program, besides  testing
  different compilation flags in a genetic algoritm. This latter has
 already
  been done.
 
 http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-
  To find automatically the best combination of class implementations and
  compilation flags in a single process would be very useful and would save
 a
  lot of manual testing.
 
  2010/3/24 John Lato jwl...@gmail.com
 
  Hi Alberto,
 
  To some extent this already exists, it's just that nobody uses it.  I
  believe it's the approach taken by the Edison libraries.  Also the
  ListLike package provides the type classes ListLike, StringLike, and a
  few others.  Neither seems to have become very popular despite having
  well-respected authors (Okasaki and Goerzon, respectively).
 
  Some container functions are already provided by other classes, namely
  Foldable, Traversable, and Monoid.
 
  The first bit, creating a tree of type classes suitable for all
  containers, is probably a few hours work.
 
  An automated system to determine the best implementation is
  significantly more difficult; I can't say if the scope would be
  appropriate for SoC.
 
  Best,
  John
 
   From: Alberto G. Corona  agocor...@gmail.com
  
   Just a dream:
   -separate interface and implementation for all containers, via type
   classes
   -develop, by genetic programming techniques + quickcheck, a system
 that
   find
   the best container implementation for a particular program.
  
   Is that suitable for a Google Summer of Code project?
  
   2010/3/23 Alberto G. Corona agocor...@gmail.com
  
   The question can be generalized via type classes: Is there any common
   set of
   primitives encapsulated into a single type class that has instances
 for
   Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
   vectors and wathever container that can store an boxed, unboxed,
 packed
   unpacked sequence of wathever including chars? All of them have
 folds,
   heads, tails and a lot of common functions with the same name, but
   since
   there is not a single type class, the library programmer can not
   abstract
   his code when it is possible, so the library user can chose the
   particular
   instance for his particular problem.
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Bryan O'Sullivan

On Wed, Mar 24, 2010 at 5:48 AM, Stephen Tetley stephen.tet...@gmail.comwrote:


 I rather doubt a valuable set of type classes that is suitable for all
 containers exists, I'm afraid.


I don't think it's so clear cut. Stepanov's Elements of Programming lays
out a pretty clear (and familiar to many Haskellers) path through the
algebraic thicket of types, classes, and their properties, albeit in the
much clunkier setting of C++ and traits. The disadvantage to this approach
is substantial: just as with the from-principles approaches to redoing
Haskell's numeric hierarchy, you end up with a fearsome and complex set of
typeclasses that are difficult to learn and follow.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Fwd: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Alberto G. Corona

-- Forwarded message --
From: Alberto G. Corona agocor...@gmail.com
Date: 2010/3/24
Subject: Re: [Haskell-cafe] Bytestrings and [Char]
To: Stephen Tetley stephen.tet...@gmail.com

2010/3/24 Stephen Tetley stephen.tet...@gmail.com

Hi Alberto

 I rather doubt a valuable set of type classes that is suitable for all
 containers exists, I'm afraid.

 If you consider containers as the containers package, the data
 structures are all (?) functorial  - but they have different shapes,
 so e.g. a cons operation makes sense on the linear ones
 (Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is
 analogous to 'insert' but 'insert' exactly describes the operation on
 maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the
 'cons' operation on lists exactly as it doesn't indicate the position
 of where it adds to the list.

Some operations like insert in list could be inefficient, some operations
like cons in maps could be an inefficient hack, but at this does not means
that a list can do  create, insert,  map , lookup faster that a Map in the
case of sort lists, for example. A generalized cons applied to a Map can
make less sense, but a program made for lists with occasional cons's can
happen to perform better with Maps and so on.  So a library with generality
in mind can perform optimally in two or more different scenarios with
different instances/implementations.  . A genetic algoritm can have the
opportunity to detect this.

 I think also that a complex list of class hierarchies is neither necessary
nor recommended. We are not talking about mathemathical concepts. This is a
performance issue.

Even if a definitive class hierarchy is a problem, hopefully it may be worth
to define tentative and temporal classes for performance testing purposes,
that warps the common primitives with the sole purpose of executing the
testing algorithm for different instance implementations, compilation flags
and fusion rules.

 Now, if you partition the type classes into small groups you get over
 the fact that some operations make sense on certain 'shapes' of data
 structure, but there are still subtle type differences that aren't
 conducive to type classes - e.g. ByteString and Data.Text aren't
 functorial so map is type restricted:

 map  :: (Word8  - Word8) - ByteString  - ByteString

For this purpose,  specific testing classes for Word8 data -with no claim to
be official can be defined by the programmer, leaving to the library user
to run the testing process for the different instances in his particular
program.

I know that this add extra work and that not many would do this extra level
of abstraction, but perhaps this is less work than spending even more time
exchanging emails, discussing low level issues, performing tests, trying to
figure out what will be the most common sceniarios for wich the library
could be used, testing manually what is the container that has the most
performance here and now, not tomorrow for these scenarios, to worry about
the obsolescence of the library for reasons not related with the library
algorithm  etc

Also, while ListLike does provide default definitions for many ops I'm
 guessing its more important for performance to redefine them, so the
 defaults definitions aren't adding much.

 There might be a couple of useful new type classes waiting in the
 wings, e.g a destroy one as per view in Data.Sequence, but I doubt
 that a proliferation of classes will generally make programs clearer
 or more efficient and it would introduce more instances of problems
 that we already have with Num type class.

 class Destroy f where
  destroy :: f a - (a, Maybe (f a))

 Best wishes

 Stephen
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Fwd: [Haskell-cafe] Bytestrings and [Char]

2010-03-24 Thread Alberto G. Corona

2010/3/24 Stephen Tetley stephen.tet...@gmail.com


 If you consider containers as the containers package, the data
 structures are all (?) functorial  - but they have different shapes,
 so e.g. a cons operation makes sense on the linear ones
 (Data.Sequence, Data.List) but not on Data.Map, Data.Tree. 'cons' is
 analogous to 'insert' but 'insert' exactly describes the operation on
 maps whereas 'cons' doesn't, similarly 'insert' doesn't describe the
 'cons' operation on lists exactly as it doesn't indicate the position
 of where it adds to the list.


Some operations like insert in list could be inefficient, some operations
like cons in maps could be an inefficient hack, but at this does not means
that a list can do  create, insert,  map , lookup faster that a Map in the
case of sort lists, for example. A generalized cons applied to a Map can
make less sense, but a program made for lists with occasional cons's can
happen to perform better with Maps and so on.  So a library with generality
in mind can perform optimally in two or more different scenarios with
different instances/implementations.  . A genetic algoritm can have the
opportunity to detect this.

 I think also that a complex list of class hierarchies is neither necessary
nor recommended. We are not talking about mathemathical concepts. This is a
performance issue.

Even if a definitive class hierarchy is a problem, hopefully it may be worth
to define tentative and temporal classes for performance testing purposes,
that warps the common primitives with the sole purpose of executing the
testing algorithm for different instance implementations, compilation flags
and fusion rules.


 Now, if you partition the type classes into small groups you get over
 the fact that some operations make sense on certain 'shapes' of data
 structure, but there are still subtle type differences that aren't
 conducive to type classes - e.g. ByteString and Data.Text aren't
 functorial so map is type restricted:

 map  :: (Word8  - Word8) - ByteString  - ByteString


For this purpose,  specific testing classes for Word8 data -with no claim to
be official can be defined by the programmer, leaving to the library user
to run the testing process for the different instances in his particular
program.

I know that this add extra work and that not many would do this extra level
of abstraction, but perhaps this is less work than spending even more time
exchanging emails, discussing low level issues, performing tests, trying to
figure out what will be the most common sceniarios for wich the library
could be used, testing manually what is the container that has the most
performance here and now, not tomorrow for these scenarios, to worry about
the obsolescence of the library for reasons not related with the library
algorithm  etc

Regards
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Johann Höchtl


On 22.03.2010 14:10, Johan Tibell wrote:

On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtljohann.hoec...@gmail.com  wrote:


My question or discussion point: Why not depreciate [Char] altogether
and favour of lazy Bytestrings?


A sequence of bytes is not the same thing as a sequence of Unicode
code points. If you want to replace String by something more efficient
have a look at Data.Text.



How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I read 
bytestrings over a socket which happens to be UTF16-LE encoded and identify a 
fitting function in Data.Text, I guess I have to transcode them with 
Data.Text.Encoding to make the type System happy?

-- Johan



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Johann Höchtl


On 22.03.2010 14:15, Ivan Miljenovic wrote:

On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com  wrote:
   

A sequence of bytes is not the same thing as a sequence of Unicode
code points. If you want to replace String by something more efficient
have a look at Data.Text.
 

Though Data.Text still has the disadvantage of not being as nice to
deal with as String, since you can't pattern match on it, etc.

Whilst it may degrade performance, treating String as a list of
characters rather than an array provides you with greater flexibility
of how to deal with it.

   
But doesn't this basically mean sacrifice performance / applicability 
for algorithmic beauty? Sure, pattern match comes in handy, but in the 
case of strings at a high price


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread John Millikin

On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote:
 How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I
 read bytestrings over a socket which happens to be UTF16-LE encoded and
 identify a fitting function in Data.Text, I guess I have to transcode them
 with Data.Text.Encoding to make the type System happy?

There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is
just a more efficient encoding of [Word8], just as Text is a more
efficient encoding of [Char]. If the file format you're parsing
specifies that some series of bytes is text encoded as UTF16-LE, then
you can use the Text decoders to convert to Text.

Poor separation between bytes and characters has caused problems in
many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon
the advantages of correctness to chase a few percentage points of
performance.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Joachim Breitner

Hi,

Am Dienstag, den 23.03.2010, 08:51 -0700 schrieb John Millikin:
 On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote:
  How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I
  read bytestrings over a socket which happens to be UTF16-LE encoded and
  identify a fitting function in Data.Text, I guess I have to transcode them
  with Data.Text.Encoding to make the type System happy?
 
 There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is
 just a more efficient encoding of [Word8], just as Text is a more
 efficient encoding of [Char]. If the file format you're parsing
 specifies that some series of bytes is text encoded as UTF16-LE, then
 you can use the Text decoders to convert to Text.

It wold still be useful to have an alternative to Data.Text that
internally stores strings as UTF8 encoded bytestrings. I tried to switch
from String to Data.Text in arbtt (which mostly calls pcre-light, which
expects and returns UTF8-encoded C-strings), and it became slower! No
surprise, considering that the program has to re-encode the strings all
the time.

Using a 
 newtype Text = Text { ByteString }
with an interface akin to Data.Text, but using UTF8-encoded ByteStrings
internally gave the same performance as String, at half the memory
footprint. This is in an internal module¹ but I would find it handy to
have this available as a common type in a well-supported library.

Greetings,
Joachim

¹ http://darcs.nomeata.de/arbtt/src/Data/MyText.hs


-- 
Joachim nomeata Breitner
  mail: m...@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C
  JID: nome...@joachim-breitner.de | http://www.joachim-breitner.de/
  Debian Developer: nome...@debian.org


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 08:51:16AM -0700, John Millikin wrote:
 On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote:
  How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I
  read bytestrings over a socket which happens to be UTF16-LE encoded and
  identify a fitting function in Data.Text, I guess I have to transcode them
  with Data.Text.Encoding to make the type System happy?
 
 There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is
 just a more efficient encoding of [Word8], just as Text is a more
 efficient encoding of [Char]. If the file format you're parsing
 specifies that some series of bytes is text encoded as UTF16-LE, then
 you can use the Text decoders to convert to Text.
 
 Poor separation between bytes and characters has caused problems in
 many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon
 the advantages of correctness to chase a few percentage points of
 performance.

I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…

I've been struggling with this problem too and it's not nice. Every time
one uses the system readFile  friends (anything that doesn't read via
ByteStrings), it hell slow.

Test: read a file and compute its size in chars. Input text file is
~40MB in size, has one non-ASCII char. The test might seem stupid but it
is a simple one. ghc 6.12.1.

Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
incorrect length (as expected).

Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.

Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.

String (system readfile + length) - ~1 second, correct length.

For the record:

python2.6 (str type) -  ~60ms, incorrect length.
python3.1 (unicode)  - ~310ms, correct length.

If anyone has a solution on how to work on fast text (unicode)
transformations (but not a 1:1 pipeline where fusion can work nicely),
I'd be glad to hear.

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Nick Bowler

On 18:11 Tue 23 Mar , Iustin Pop wrote:
 I agree with the principle of correctness, but let's be honest - it's
 (many) orders of magnitude between ByteString and String and Text, not
 just a few percentage points…
 
 I've been struggling with this problem too and it's not nice. Every time
 one uses the system readFile  friends (anything that doesn't read via
 ByteStrings), it hell slow.
 
 Test: read a file and compute its size in chars. Input text file is
 ~40MB in size, has one non-ASCII char. The test might seem stupid but it
 is a simple one. ghc 6.12.1.
 
 Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
 incorrect length (as expected).
 
 Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
 seconds, correct length.
 
 Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
 
 String (system readfile + length) - ~1 second, correct length.

Is this a mistake?  Your own report shows String  readFile being an
order of magnitude faster than everything else, contrary to your earlier
claim.

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
 On 18:11 Tue 23 Mar , Iustin Pop wrote:
  I agree with the principle of correctness, but let's be honest - it's
  (many) orders of magnitude between ByteString and String and Text, not
  just a few percentage points…
  
  I've been struggling with this problem too and it's not nice. Every time
  one uses the system readFile  friends (anything that doesn't read via
  ByteStrings), it hell slow.
  
  Test: read a file and compute its size in chars. Input text file is
  ~40MB in size, has one non-ASCII char. The test might seem stupid but it
  is a simple one. ghc 6.12.1.
  
  Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
  incorrect length (as expected).
  
  Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
  seconds, correct length.
  
  Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
  
  String (system readfile + length) - ~1 second, correct length.
 
 Is this a mistake?  Your own report shows String  readFile being an
 order of magnitude faster than everything else, contrary to your earlier
 claim.

No, it's not a mistake. String is faster than pack to Text and length, but it's
100 times slower than ByteString.

My whole point is that difference between byte processing and char processing
in Haskell is not a few percentage points, but order of magnitude. I would
really like to have only the 6x penalty that Python shows, for example.

regards,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 05:53:00PM +, Vincent Hanquez wrote:
 On 23/03/10 17:11, Iustin Pop wrote:
 Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
 incorrect length (as expected).
 
 Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
 seconds, correct length.
 
 Why would you use system readFile and not bytestring readFile in the
 second case ?
 
 using the later i got a ~48x slowdown, not a 1100x slowdown, and
 the length is correct too.

Hmm… because I didn't think of that :)

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Bryan O'Sullivan

2010/3/23 Iustin Pop iu...@k1024.org

 I agree with the principle of correctness, but let's be honest - it's
 (many) orders of magnitude between ByteString and String and Text, not
 just a few percentage points…


Well, your benchmarks are highly suspect. See below.


 Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
 seconds, correct length.


You should be using bytestring I/O for this.


 Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.


You should be using text I/O for this.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Thomas DuBuisson

BOS:
 Well, your benchmarks are highly suspect.

Attached is another benchmark with similar results.  This is no
criterion benchmark but I did try to adjust a wee bit for cache
issues.  Suffice to say I am not yet impressed with Data.Text
performance wise.

In the broader scope I feel there is a deeper point here:
Data.ByteString is how most data is available in a program (be it from
a file, network, or other library/device).  With all this data we
really should have some sort of zero-copy safeCoerce that can expose
the bytestring as an O(1) unboxed array of some safe length.  I know
this would have been nice for pureMD5, which did use an ugly
unsafePerformIO hack just to get Word16s but even those got boxed up
at horrible cost (it now uses 'cereal' to get the Word16 - even worse
performance but it lets me ignore endianess of the architecture).


 OT 
Beating the dead horse: I once wrote and deleted a blog post ranting
about how to get Word16 in C vs in Haskell:
   word = ((uint16_t *)p)[index];
or
  word = unsafePerformIO $ withForeignPtr ptr $ \ptr' - let p =
castPtr (plusPtr ptr' off) in peekElemOff p index

That Haskell snippet takes significantly longer to both write and run.
 END OT 


Code and benchmark numbers included below, all complaints about the
accuracy of the benchmark are welcome.

NOTE: tLog is a daily #haskell log file repeated many times ~ 59MB.
---
[to...@mavlo Test]$ ./read tLog
Normal String + System.IO 61443120: 1.467924s
Data.ByteString.Lazy 61450365: 0.023285s
Data.ByteString.Lazy.UTF8 61443120: 3.305154s
Data.Text.Lazy 61443120: 3.99178s

- CODE -
import Data.Time
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.UTF8 as U
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TO
import System.IO
import System.Environment (getArgs)
import Control.Monad (sequence_, when)

main = do
[file] - getArgs
let time (f, desc)  = do
s - getCurrentTime
r - f
let !r' = r
t - getCurrentTime
let d = diffUTCTime t s
when (length desc  0)
(putStrLn $ desc ++   ++ show r' ++ :  ++ show d)
ops = [
(readFile file = return . show . length, )
  , (readFile file = return . show . length, Normal
String + System.IO)
  , (L.readFile file = return . show . L.length, )
  , (L.readFile file = return . show . L.length,
Data.ByteString.Lazy)
  , (L.readFile file = return . show . U.length, )
  , (L.readFile file = return . show . U.length,
Data.ByteString.Lazy.UTF8)
  , (TO.readFile file = return . show . T.length, )
  , (TO.readFile file = return . show . T.length,
Data.Text.Lazy)
  ]
mapM_ time ops
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Nick Bowler

On 18:25 Tue 23 Mar , Iustin Pop wrote:
 On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
  On 18:11 Tue 23 Mar , Iustin Pop wrote:
   I agree with the principle of correctness, but let's be honest - it's
   (many) orders of magnitude between ByteString and String and Text, not
   just a few percentage points…
   
   I've been struggling with this problem too and it's not nice. Every time
   one uses the system readFile  friends (anything that doesn't read via
   ByteStrings), it hell slow.
   
   Test: read a file and compute its size in chars. Input text file is
   ~40MB in size, has one non-ASCII char. The test might seem stupid but it
   is a simple one. ghc 6.12.1.
   
   Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
   incorrect length (as expected).
   
   Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
   seconds, correct length.
   
   Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
   
   String (system readfile + length) - ~1 second, correct length.
  
  Is this a mistake?  Your own report shows String  readFile being an
  order of magnitude faster than everything else, contrary to your earlier
  claim.
 
 No, it's not a mistake. String is faster than pack to Text and length, but 
 it's
 100 times slower than ByteString.

Only if you don't care about obtaining the correct answer, in which case
you may as well just say const 42 or somesuch, which is even faster.

 My whole point is that difference between byte processing and char processing
 in Haskell is not a few percentage points, but order of magnitude. I would
 really like to have only the 6x penalty that Python shows, for example.

Hang on a second... less than 10 milliseconds to read 40 megabytes from
disk?  Something's fishy here.

I ran my own tests with a 400M file (419430400 bytes) consisting almost
exclusively of the letter 'a' with two Japanese characters placed at
every multiple of 40 megabytes (UTF-8 encoded).

With Prelude.readFile/length and 5 runs, I see

  10145ms, 10087ms, 10223ms, 10321ms, 10216ms.

with approximately 10% of that time spent performing GC each run.

With Data.Bytestring.Lazy.readFile/length and 5 runs, I see

  8223ms, 8192ms, 8077ms, 8091ms, 8174ms.

with approximately 20% of that time spent performing GC each run.
Maybe there's some magic command line options to tune the GC for our
purposes, but I only managed to make things slower.  Thus, I'll handwave
a bit and just shave off the GC time from each result.

Prelude: 9178ms mean with a standard deviation of 159ms.
Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms.

Therefore, we managed a throughput of 43 MB/s with the Prelude (and got
the right answer), while we managed 61 MB/s with lazy ByteStrings (and
got the wrong answer).  My disk won't go much, if at all, faster than
the second result, so that's good.

So that's a 30% reduction in throughput.  I'd say that's a lot worse
than a few percentage points, but certainly not orders of magnitude.

On the other hand, using Data.ByteString.Lazy.readFile and
Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with
approximately 5% of that time spent in GC, which is rather worse than
the Prelude.  Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are
even worse, with results of around 25 *seconds* (!!) and 2% of that time
spent in GC.

GNU wc computes the correct answer as quickly as lazy bytestrings
compute the wrong answer.  With perl 5.8, slurping the entire file as
UTF-8 computes the correct answer just as slowly as Prelude.  In my
first ever Python program (with python 2.6), I tried to read the entire
file as a unicode string and it quickly crashes due to running out of
memory (yikes!), so it earns a DNF.

So, for computing the right answer with this simple test, it looks like
the Prelude is the best option.  We tie with Perl and lose only to GNU
wc (which is written in C).  Really, though, it would be nice to close
that gap.

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 03:31:33PM -0400, Nick Bowler wrote:
 On 18:25 Tue 23 Mar , Iustin Pop wrote:
  On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
   On 18:11 Tue 23 Mar , Iustin Pop wrote:
I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…

I've been struggling with this problem too and it's not nice. Every time
one uses the system readFile  friends (anything that doesn't read via
ByteStrings), it hell slow.

Test: read a file and compute its size in chars. Input text file is
~40MB in size, has one non-ASCII char. The test might seem stupid but it
is a simple one. ghc 6.12.1.

Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
incorrect length (as expected).

Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.

Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.

String (system readfile + length) - ~1 second, correct length.
   
   Is this a mistake?  Your own report shows String  readFile being an
   order of magnitude faster than everything else, contrary to your earlier
   claim.
  
  No, it's not a mistake. String is faster than pack to Text and length, but 
  it's
  100 times slower than ByteString.
 
 Only if you don't care about obtaining the correct answer, in which case
 you may as well just say const 42 or somesuch, which is even faster.
 
  My whole point is that difference between byte processing and char 
  processing
  in Haskell is not a few percentage points, but order of magnitude. I would
  really like to have only the 6x penalty that Python shows, for example.
 
 Hang on a second... less than 10 milliseconds to read 40 megabytes from
 disk?  Something's fishy here.

Of course I don't want to benchmark the disk, and therefore the source file is
on tmpfs.

 I ran my own tests with a 400M file (419430400 bytes) consisting almost
 exclusively of the letter 'a' with two Japanese characters placed at
 every multiple of 40 megabytes (UTF-8 encoded).
 
 With Prelude.readFile/length and 5 runs, I see
 
   10145ms, 10087ms, 10223ms, 10321ms, 10216ms.
 
 with approximately 10% of that time spent performing GC each run.
 
 With Data.Bytestring.Lazy.readFile/length and 5 runs, I see
 
   8223ms, 8192ms, 8077ms, 8091ms, 8174ms.
 
 with approximately 20% of that time spent performing GC each run.
 Maybe there's some magic command line options to tune the GC for our
 purposes, but I only managed to make things slower.  Thus, I'll handwave
 a bit and just shave off the GC time from each result.
 
 Prelude: 9178ms mean with a standard deviation of 159ms.
 Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms.
 
 Therefore, we managed a throughput of 43 MB/s with the Prelude (and got
 the right answer), while we managed 61 MB/s with lazy ByteStrings (and
 got the wrong answer).  My disk won't go much, if at all, faster than
 the second result, so that's good.

I'll bet that for a 400MB file, if you have more than two 2GB of ram, most of
it will be cached. If you want to check Haskell performance, just copy it to a
tmpfs filesytem so that the disk is out of the equation.

 So that's a 30% reduction in throughput.  I'd say that's a lot worse
 than a few percentage points, but certainly not orders of magnitude.

Because you're possibly benchmarking the disk also. With a 400MB file on tmpfs,
lazy bytestring readfile + length takes on my machine ~150ms, which is way
faster than 8 seconds…

 On the other hand, using Data.ByteString.Lazy.readFile and
 Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with
 approximately 5% of that time spent in GC, which is rather worse than
 the Prelude.  Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are
 even worse, with results of around 25 *seconds* (!!) and 2% of that time
 spent in GC.
 
 GNU wc computes the correct answer as quickly as lazy bytestrings
 compute the wrong answer.  With perl 5.8, slurping the entire file as
 UTF-8 computes the correct answer just as slowly as Prelude.  In my
 first ever Python program (with python 2.6), I tried to read the entire
 file as a unicode string and it quickly crashes due to running out of
 memory (yikes!), so it earns a DNF.
 
 So, for computing the right answer with this simple test, it looks like
 the Prelude is the best option.  We tie with Perl and lose only to GNU
 wc (which is written in C).  Really, though, it would be nice to close
 that gap.

Totally agreed :)

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 11:22:23AM -0700, Bryan O'Sullivan wrote:
 2010/3/23 Iustin Pop iu...@k1024.org
 
  I agree with the principle of correctness, but let's be honest - it's
  (many) orders of magnitude between ByteString and String and Text, not
  just a few percentage points…
 
 
 Well, your benchmarks are highly suspect. See below.
 
 
  Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
  seconds, correct length.
 
 
 You should be using bytestring I/O for this.

As I was told, indeed :)

  Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
 
 
 You should be using text I/O for this.

I didn't realize Data.Text.IO exists. Thanks for the hint!

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Johann Höchtl




Bryan O'Sullivan wrote:

2010/3/23 Iustin Pop iu...@k1024.org mailto:iu...@k1024.org

I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…


Well, your benchmarks are highly suspect. See below.
 


Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.


You should be using bytestring I/O for this.
 


Data.Text.Lazy (system readFile + pack + length) - 26s, correct
length.


You should be using text I/O for this.
This is certainly true but adds some unfortunate bloat and has the taste 
of Java libraries, which define a plethora of similar, but not 
interchangeable methods due to legacy.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Fwd: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Alberto G. Corona

The question can be generalized via type classes: Is there any common set of
primitives encapsulated into a single type class that has instances for
Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
vectors and wathever container that can store an boxed, unboxed, packed
unpacked sequence of wathever including chars? All of them have folds,
heads, tails and a lot of common functions with the same name, but since
there is not a single type class, the library programmer can not abstract
his code when it is possible, so the library user can chose the particular
instance for his particular problem.

2010/3/23 Johann Höchtl johann.hoec...@gmail.com

On 22.03.2010 14:15, Ivan Miljenovic wrote:

 On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com  wrote:


 A sequence of bytes is not the same thing as a sequence of Unicode
 code points. If you want to replace String by something more efficient
 have a look at Data.Text.


 Though Data.Text still has the disadvantage of not being as nice to
 deal with as String, since you can't pattern match on it, etc.

 Whilst it may degrade performance, treating String as a list of
 characters rather than an array provides you with greater flexibility
 of how to deal with it.



 But doesn't this basically mean sacrifice performance / applicability for
 algorithmic beauty? Sure, pattern match comes in handy, but in the case of
 strings at a high price


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Fwd: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Alberto G. Corona

Just a dream:
-separate interface and implementation for all containers, via type classes
-develop, by genetic programming techniques + quickcheck, a system that find
the best container implementation for a particular program.

Is that suitable for a Google Summer of Code project?

2010/3/23 Alberto G. Corona agocor...@gmail.com

The question can be generalized via type classes: Is there any common set of
 primitives encapsulated into a single type class that has instances for
 Strings (Data.List) ByteStrings, Data.Text, Lazy bytestrings, Arrays,
 vectors and wathever container that can store an boxed, unboxed, packed
 unpacked sequence of wathever including chars? All of them have folds,
 heads, tails and a lot of common functions with the same name, but since
 there is not a single type class, the library programmer can not abstract
 his code when it is possible, so the library user can chose the particular
 instance for his particular problem.

 2010/3/23 Johann Höchtl johann.hoec...@gmail.com

 On 22.03.2010 14:15, Ivan Miljenovic wrote:

 On 23 March 2010 00:10, Johan Tibelljohan.tib...@gmail.com  wrote:


 A sequence of bytes is not the same thing as a sequence of Unicode
 code points. If you want to replace String by something more efficient
 have a look at Data.Text.


 Though Data.Text still has the disadvantage of not being as nice to
 deal with as String, since you can't pattern match on it, etc.

 Whilst it may degrade performance, treating String as a list of
 characters rather than an array provides you with greater flexibility
 of how to deal with it.



 But doesn't this basically mean sacrifice performance / applicability for
 algorithmic beauty? Sure, pattern match comes in handy, but in the case of
 strings at a high price


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Johann Höchtl

Hello, I was recentyl playing with Haskell (GHC that is) IO and text
processing.

Bytestrings and Lazy Bytestrings allow for fast and memory eficient
string (well, bytestring) handling, yet a lot of libraries do not
support them (yet)

Given the incredibly inneficient memory representation of [Char] (16
bytes? per cell) I wonder wheather String should default to lazy
batestring altogether instead of [Char].

The levenshtein distance as is on hackage uses e.g. String ([Char])
and as such is unneccessarily slow.

My question or discussion point: Why not depreciate [Char] altogether
and favour of lazy Bytestrings?

Regards,

   Johann
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Ivan Lazar Miljenovic

Johann Höchtl johann.hoec...@gmail.com writes:
 Bytestrings and Lazy Bytestrings allow for fast and memory eficient
 string (well, bytestring) handling, yet a lot of libraries do not
 support them (yet)

WHat do you mean?  A lot of libraries need to use String because it's
easier to deal with and doesn't rely on knowing which encoding is being
used.  Bytestring is often nicer for IO, but for internal library stuff
(e.g. parsing) it may still be easier to use String.

 Given the incredibly inneficient memory representation of [Char] (16
 bytes? per cell) I wonder wheather String should default to lazy
 batestring altogether instead of [Char].

Well, then all pattern matching, etc. fails.  String and ByteStrings
have their own separate places and uses.

 The levenshtein distance as is on hackage uses e.g. String ([Char])
 and as such is unneccessarily slow.

Have you ported a version to ByteString and demonstrated that it is
noticeably faster?  Are you sure that it isn't the fault of the
algorithm being used?  Do you know that people only care about the
levenshtein distance of ByteStrings and not of normal Strings?
(Disclaimer: I've never even looked at the package.)  Providing a
ByteString version as well might be a viable option; replacing the
current one is quite likely not.

 My question or discussion point: Why not depreciate [Char] altogether
 and favour of lazy Bytestrings?

I believe I've answered this above.

-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Johan Tibell

On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com wrote:
 My question or discussion point: Why not depreciate [Char] altogether
 and favour of lazy Bytestrings?

A sequence of bytes is not the same thing as a sequence of Unicode
code points. If you want to replace String by something more efficient
have a look at Data.Text.

-- Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Ivan Miljenovic

On 23 March 2010 00:10, Johan Tibell johan.tib...@gmail.com wrote:
 A sequence of bytes is not the same thing as a sequence of Unicode
 code points. If you want to replace String by something more efficient
 have a look at Data.Text.

Though Data.Text still has the disadvantage of not being as nice to
deal with as String, since you can't pattern match on it, etc.

Whilst it may degrade performance, treating String as a list of
characters rather than an array provides you with greater flexibility
of how to deal with it.

-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread David Leimbach

On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.comwrote:

 On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl johann.hoec...@gmail.com
 wrote:
  My question or discussion point: Why not depreciate [Char] altogether
  and favour of lazy Bytestrings?

 A sequence of bytes is not the same thing as a sequence of Unicode
 code points. If you want to replace String by something more efficient
 have a look at Data.Text.


Slight correction.

A sequence of bytes is exactly the same thing as a sequence of Unicode bytes
when you use UTF8.




 -- Johan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Jochem Berndsen


David Leimbach wrote:



On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.com 
mailto:johan.tib...@gmail.com wrote:


On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl
johann.hoec...@gmail.com mailto:johann.hoec...@gmail.com wrote:
  My question or discussion point: Why not depreciate [Char] altogether
  and favour of lazy Bytestrings?

A sequence of bytes is not the same thing as a sequence of Unicode
code points. If you want to replace String by something more efficient
have a look at Data.Text.


Slight correction.

A sequence of bytes is exactly the same thing as a sequence of Unicode 
bytes when you use UTF8.  



What is a Unicode byte?

Cheers, Jochem

--
Jochem Berndsen | joc...@functor.nl
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Lennart Augustsson

Turn on OverloadedStrings and you can pattern match on any type you
like that is in the IsString class.
Which means that Data.Text can use string literals just like regular
strings (but you can't use Char literals in the match).

On Mon, Mar 22, 2010 at 1:15 PM, Ivan Miljenovic
ivan.miljeno...@gmail.com wrote:
 On 23 March 2010 00:10, Johan Tibell johan.tib...@gmail.com wrote:
 A sequence of bytes is not the same thing as a sequence of Unicode
 code points. If you want to replace String by something more efficient
 have a look at Data.Text.

 Though Data.Text still has the disadvantage of not being as nice to
 deal with as String, since you can't pattern match on it, etc.

 Whilst it may degrade performance, treating String as a list of
 characters rather than an array provides you with greater flexibility
 of how to deal with it.

 --
 Ivan Lazar Miljenovic
 ivan.miljeno...@gmail.com
 IvanMiljenovic.wordpress.com
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-22 Thread Mads Lindstrøm

Hi

David Leimbach wrote:
 
 
 On Mon, Mar 22, 2010 at 6:10 AM, Johan Tibell johan.tib...@gmail.com
 wrote:
 On Mon, Mar 22, 2010 at 1:16 PM, Johann Höchtl
 johann.hoec...@gmail.com wrote:
  My question or discussion point: Why not depreciate [Char]
 altogether
  and favour of lazy Bytestrings?
 
 
 A sequence of bytes is not the same thing as a sequence of
 Unicode
 code points. If you want to replace String by something more
 efficient
 have a look at Data.Text.
 
 
 Slight correction.
 
 
 A sequence of bytes is exactly the same thing as a sequence of Unicode
 bytes when you use UTF8.  

And a sequence of bytes is exactly the same thing as a UTF-32 string,
when the sequence is encoded as UTF-32.

But that is not the point. The point is that when some function is
handed a ByteString, the ByteString can be encoded in many ways. It do
not even have to be text.

So you need a UTF-8 String type.

/Mads

 
 
  
 
 -- Johan
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Fwd: [Haskell-cafe] Bytestrings and [Char]

Fwd: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Fwd: [Haskell-cafe] Bytestrings and [Char]

Fwd: [Haskell-cafe] Bytestrings and [Char]

[Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

Re: [Haskell-cafe] Bytestrings and [Char]

32 matches

Site Navigation

Mail list logo

Footer information