Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-12 Thread Francesco Mazzoli
At Mon, 12 Nov 2012 11:21:42 +0800,
John Lato wrote:
 Speaking as the ListLike maintainer, I'd like this too.  But it's difficult to
 do so without sacrificing performance.  In some cases, sacrificing *a lot* of
 performance.  So they have to be class members.
 
 However, there's no reason ListLike has to remain a single monolithic class.
 I'd prefer an API that's split up into several classes, as was done in Edison.
 Then 'ListLike' itself would just be a type synonym, or possibly a small type
 class with the appropriate superclasses.

Interesting.  Are we sure that we can't convince GHC to inline the functions
with enough pragmas?

 However this seems like a lot of work for relatively little payoff, which
 makes it a low priority for me.

Fair enough.

 The community's view on newtypes is funny.  On the one hand, I see all the
 time the claim Just use a newtype wrapper to write instances for ...
 (e.g. the recent suggestion of 'instance Num a = Num (a,a)'.  On the other,
 nobody actually seems to want to use these newtype wrappers.  Maybe it
 clutters the code?  I don't know.
 
 I couldn't think of a better way to implement this functionality, patches
 would be gratefully accepted.  Anyway, you really shouldn't use these wrappers
 unless you're using a ByteString to represent ASCII text.  Which you shouldn't
 be doing anyway.  If you're using a ByteString to represent a sequence of
 bytes, you needn't ever encounter CharString.

Well newtypes are good, the problem is that either you use well accepted ones
(e.g. the `Sum' and `Product' in base) or otherwise it's not worth it, because
people are going to unpack them and use their owns.  What I would do is simply
define those instances in separate modules.

 Given that text and vector are both in the Haskell Platform, I wouldn't object
 to these instances being rolled into the main ListLike package.  Any comments
 on this?

I think it's much better, especially for Text, since if you use ListLike you are
probably using it with Text (at least in my experience).  Not a big deal anyway.

Francesco.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-12 Thread Francesco Mazzoli
At Mon, 12 Nov 2012 10:26:01 +,
Francesco Mazzoli wrote:
 Interesting.  Are we sure that we can't convince GHC to inline the functions
 with enough pragmas?

Inline and SPECIALIZE :).

Francesco.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-11 Thread John Lato

 From: Francesco Mazzoli f...@mazzo.li

 At Sat, 10 Nov 2012 15:16:30 +0100,
 Alberto G. Corona  wrote:
  There is a ListLike package, which does this nice abstraction. but I
 don't
  know if it is ready for and/or enough complete for serious usage.  I?m
  thinking into using it for the same reasons.
 
  Anyone has some experiences to share about it?

 I've used it in the past and it's solid, it's been around for a while and
 the
 original author knows his Haskell.

 Things I don't like:

 * The classes are huge:
   
 http://hackage.haskell.org/packages/archive/ListLike/3.1.6/doc/html/Data-ListLike.html#t:ListLike
 .
   I'd much rater prefer to have all those utilities functions outside the
 type
   class, for no particular reason other then the ugliness of the type
 class.


Speaking as the ListLike maintainer, I'd like this too.  But it's difficult
to do so without sacrificing performance.  In some cases, sacrificing *a
lot* of performance.  So they have to be class members.

However, there's no reason ListLike has to remain a single monolithic
class.  I'd prefer an API that's split up into several classes, as was done
in Edison.  Then 'ListLike' itself would just be a type synonym, or
possibly a small type class with the appropriate superclasses.

However this seems like a lot of work for relatively little payoff, which
makes it a low priority for me.

* It defines its own wrappers for `ByteString':
   
 http://hackage.haskell.org/packages/archive/ListLike/3.1.6/doc/html/Data-ListLike.html#t:CharString
 .


The community's view on newtypes is funny.  On the one hand, I see all the
time the claim Just use a newtype wrapper to write instances for ...
(e.g. the recent suggestion of 'instance Num a = Num (a,a)'.  On the
other, nobody actually seems to want to use these newtype wrappers.  Maybe
it clutters the code?  I don't know.

I couldn't think of a better way to implement this functionality, patches
would be gratefully accepted.  Anyway, you really shouldn't use these
wrappers unless you're using a ByteString to represent ASCII text.  Which
you shouldn't be doing anyway.  If you're using a ByteString to represent a
sequence of bytes, you needn't ever encounter CharString.


 * It doesn't have instances for `Text', you have to resort to the
   `listlike-instances' package.


Given that text and vector are both in the Haskell Platform, I wouldn't
object to these instances being rolled into the main ListLike package.  Any
comments on this?

John L.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-11 Thread Bas van Dijk
On 10 November 2012 17:57, Johan Tibell johan.tib...@gmail.com wrote:
 It better communicates intent. A e.g. lazy byte string can be used for two
 separate things:

  * to model a stream of bytes, or
  * to avoid costs due to concatenating strings.

 By using a strict byte string you make it clear that you're not trying to do
 the former (at some potential cost due to the latter). When you want to do
 the former it should be clear to the consumer that he/she better consume the
 string in an incremental manner as to preserve laziness and avoid space
 leaks (by forcing the whole string).

Good advice.

And when you want to do the latter you should use a Builder[1] (or [2]
if you're working with text).

Bas

[1] 
http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Builder.html
[2] 
http://hackage.haskell.org/packages/archive/text/0.11.2.3/doc/html/Data-Text-Lazy-Builder.html

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-10 Thread Gábor Lehel
On Sat, Nov 10, 2012 at 4:00 AM, Johan Tibell johan.tib...@gmail.com wrote:
 As for type classes, I don't think we use them enough. Perhaps because
 Haskell wasn't developed as an engineering language, some good software
 engineering principles (code against an interface, not a concrete
 implementation) aren't used in out base libraries. One specific example is
 the lack of a sequence abstraction/type class, that all the string, list,
 and vector types could implement. Right now all these types try to implement
 a compatible interface (i.e. the traditional list interface), without a
 language mechanism to express that this is what they do.

I think the challenge is designing an abstraction that everyone is
comfortable with. If you just make everything a class method
(ListLike), it's ugly. If you don't, how do you figure out what goes
in the class and what gets implemented on top of it? Is there any
principled reason for it, or is it just ad hoc? How do you make sure
that none of the implementations suffers a performance decrease? What
about sequential vs. random access (list vs. array) issues? Should an
interface be implemented if it's semantically reasonable, but slow? If
you treat everything as a uniform sequence, doesn't that bring back
the Unicode issues again? (And can you make it work for all of
Text/ByteString (kind *), boxed Vectors and lists (* - *), and
unboxed vectors (* - * with a constraint)? What about operations that
change the element type? Surely it's possible with TypeFamilies,
ConstraintKinds, and PolyKinds all available, but I'm not sure if it's
obvious. Can it go into the Prelude if it uses extensions? Should it
also support other containers, like Maps? And so on.)

So my impression is that the reason the problem hasn't been solved yet
is that it's hard. We do have some useful things: Functor, Foldable,
Traversable, and the classes in Data.Key[1], but for starters none of
them can be implemented by Text and ByteString, so that brings us back
to square one.

But a constructive idea: what if strict Text and ByteString were both
synonyms for unboxed Vectors (already available in ByteString's
case[2])? What if, for lazy Text and ByteString, we either had lazy
Vectors to make them synonyms of, or a 'data Lazy v' which made a lazy
chunked sequence out of any underlying strict Vector-ish type? That
would cut down on the number of types, which is a good thing in
itself, and it would suggest an obvious way to abstract over them: the
existing Functor/Foldable/Traversable/Data.Key classes extended with
an associated constraint. I'm not sure how much of the use cases that
would cover, but certainly a lot more than we have now. It wouldn't
solve every one of the questions above, but it anwers many of them,
and it seems like a good compromise. The big drawbacks I can see are
that (a) it would be a *lot* of work, especially if we want to be
completely uncompromising on performance, and (b) I'm not sure how
pinned arrays and interoperation with C would be handled without
making it complicated again. (Though I suppose we could just punt and
have ByteString be a synonym for Vector.Storable (pinned) and Text for
Vector.Unboxed (not pinned) to mirror the current situation. Or maybe
we could have a pinArray# primop?)

Anyway, if I'm blue-sky dreaming, that's what looks appealing to me.

[1] 
http://hackage.haskell.org/packages/archive/keys/3.0.1/doc/html/Data-Key.html
[2] http://hackage.haskell.org/package/vector-bytestring

-- 
Your ship was destroyed in a monadic eruption.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-10 Thread Alberto G. Corona
Andrew:

There is a ListLike package, which does this nice abstraction. but
I don't know if it is ready for and/or enough complete for serious usage.
I´m thinking into using it for the same reasons.

Anyone has some experiences to share about it?


2012/11/10 Andrew Pennebaker andrew.penneba...@gmail.com

 Frequently when I'm coding in Haskell, the crux of my problem is
 converting between all the stupid string formats.

 You've got String, ByteString, Lazy ByteString, Text, [Word], and on and
 on... I have to constantly lookup how to convert between them, and the
 overloaded strings GHC directive doesn't work, and sometimes
 ByteString.unpack doesn't work, because it expects [Word8], not [Char].
 AAAH!!!

 Haskell is a wonderful playground for experimentation. I've started to
 notice that many Hackage libraries are simply instances of typeclasses
 designed a while ago, and their underlying implementations are free to play
 around with various optimizations... But they ideally all expose the same
 interface through typeclasses.

 Can we do the same with String? Can we pick a good compromise of lazy vs
 strict, flexible vs fast, and all use the same data structure? My vote is
 for type String = [Char], but I'm willing to switch to another data
 structure, just as long as it's consistently used.

 --
 Cheers,

 Andrew Pennebaker
 www.yellosoft.us

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
Alberto.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-10 Thread Francesco Mazzoli
At Sat, 10 Nov 2012 15:16:30 +0100,
Alberto G. Corona  wrote:
 There is a ListLike package, which does this nice abstraction. but I don't
 know if it is ready for and/or enough complete for serious usage.  I´m
 thinking into using it for the same reasons.
 
 Anyone has some experiences to share about it?

I've used it in the past and it's solid, it's been around for a while and the
original author knows his Haskell.

Things I don't like:

* The classes are huge:
  
http://hackage.haskell.org/packages/archive/ListLike/3.1.6/doc/html/Data-ListLike.html#t:ListLike.
  I'd much rater prefer to have all those utilities functions outside the type
  class, for no particular reason other then the ugliness of the type class.

* It defines its own wrappers for `ByteString':
  
http://hackage.haskell.org/packages/archive/ListLike/3.1.6/doc/html/Data-ListLike.html#t:CharString.

* It doesn't have instances for `Text', you have to resort to the
  `listlike-instances' package.

In any case I think it's on the right track, I'd really like something like
that, but much simpler, to be in `base'.

Francesco

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-10 Thread Tobias Brandt
On 10 November 2012 04:00, Johan Tibell johan.tib...@gmail.com wrote:

 As for type classes, I don't think we use them enough. Perhaps because
 Haskell wasn't developed as an engineering language, some good software
 engineering principles (code against an interface, not a concrete
 implementation) aren't used in out base libraries. One specific example is
 the lack of a sequence abstraction/type class, that all the string, list,
 and vector types could implement. Right now all these types try to
 implement a compatible interface (i.e. the traditional list interface),
 without a language mechanism to express that this is what they do.


Data.Collectionshttp://hackage.haskell.org/packages/archive/collections-api/1.0.0.0/doc/html/Data-Collections.html#t:Sequence
has
(among others) a Sequence type class and provides instances for the base
data types in a separate package.
However, it appears that not many people are using it.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-10 Thread Johan Tibell
On Fri, Nov 9, 2012 at 10:22 PM, Roman Cheplyaka r...@ro-che.info wrote:

 * Johan Tibell johan.tib...@gmail.com [2012-11-09 19:00:04-0800]
  As a community we should primary use strict ByteStrings and Texts. There
  are uses for the lazy variants (i.e. they are sometimes more efficient),
  but in general the strict versions should be preferred.

 I'm fairly surprised by this advice.

 I think that lazy BS/Text are a much safer default.

 If there's not much text it wouldn't matter anyway, but for large
 amounts using strict BS/Text would disable incremental
 producing/consuming (except when you're using some kind of an iteratee
 library).

 Can you explain your reasoning?


It better communicates intent. A e.g. lazy byte string can be used for two
separate things:

 * to model a stream of bytes, or
 * to avoid costs due to concatenating strings.

By using a strict byte string you make it clear that you're not trying to
do the former (at some potential cost due to the latter). When you want to
do the former it should be clear to the consumer that he/she better consume
the string in an incremental manner as to preserve laziness and avoid space
leaks (by forcing the whole string).

-- Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Motion to unify all the string data types

2012-11-09 Thread Andrew Pennebaker
Frequently when I'm coding in Haskell, the crux of my problem is converting
between all the stupid string formats.

You've got String, ByteString, Lazy ByteString, Text, [Word], and on and
on... I have to constantly lookup how to convert between them, and the
overloaded strings GHC directive doesn't work, and sometimes
ByteString.unpack doesn't work, because it expects [Word8], not [Char].
AAAH!!!

Haskell is a wonderful playground for experimentation. I've started to
notice that many Hackage libraries are simply instances of typeclasses
designed a while ago, and their underlying implementations are free to play
around with various optimizations... But they ideally all expose the same
interface through typeclasses.

Can we do the same with String? Can we pick a good compromise of lazy vs
strict, flexible vs fast, and all use the same data structure? My vote is
for type String = [Char], but I'm willing to switch to another data
structure, just as long as it's consistently used.

-- 
Cheers,

Andrew Pennebaker
www.yellosoft.us
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-09 Thread Johan Tibell
Hi Andrew,

On Fri, Nov 9, 2012 at 6:15 PM, Andrew Pennebaker 
andrew.penneba...@gmail.com wrote:

 Frequently when I'm coding in Haskell, the crux of my problem is
 converting between all the stupid string formats.

 You've got String, ByteString, Lazy ByteString, Text, [Word], and on and
 on... I have to constantly lookup how to convert between them, and the
 overloaded strings GHC directive doesn't work, and sometimes
 ByteString.unpack doesn't work, because it expects [Word8], not [Char].
 AAAH!!!

 Haskell is a wonderful playground for experimentation. I've started to
 notice that many Hackage libraries are simply instances of typeclasses
 designed a while ago, and their underlying implementations are free to play
 around with various optimizations... But they ideally all expose the same
 interface through typeclasses.

 Can we do the same with String? Can we pick a good compromise of lazy vs
 strict, flexible vs fast, and all use the same data structure? My vote is
 for type String = [Char], but I'm willing to switch to another data
 structure, just as long as it's consistently used.


tl;dr; Use strict Text and ByteStrings.

We need at least two string types, one for byte strings and one for Unicode
strings, as these are two semantically different concepts. You see that
most modern languages use two types (e.g. str and unicode in Python). For
Unicode strings, String is not a good candidate; it's slow, uses a lot of
memory, doesn't hide its representation [1], and finally, it encourages
people to do the wrong thing from a Unicode perspective [2].

As a community we should primary use strict ByteStrings and Texts. There
are uses for the lazy variants (i.e. they are sometimes more efficient),
but in general the strict versions should be preferred. Choosing to use
these two types can sometimes be a bit frustrating, as lots of code (e.g.
the base package) uses Strings. But if we don't start using them the pain
will never end. One of the main pain points is that the I/O layer using
Strings, which is both inconvenient and wrong (e.g. a socket returns bytes,
not Unicode code points, yet the recv function returns a String). We really
need to create a more sane I/O layer.

If you use ByteString and Text, you shouldn't see calls to pack/unpack in
your code (except if you want to interact with legacy code), as the correct
way to go between the two is via the encode and decode functions in the
text package.

As for type classes, I don't think we use them enough. Perhaps because
Haskell wasn't developed as an engineering language, some good software
engineering principles (code against an interface, not a concrete
implementation) aren't used in out base libraries. One specific example is
the lack of a sequence abstraction/type class, that all the string, list,
and vector types could implement. Right now all these types try to
implement a compatible interface (i.e. the traditional list interface),
without a language mechanism to express that this is what they do.

1. If String was designed as an abstract type, we could simply has switched
its implementation for a more efficient implementation and we would have to
create a new Text type.

2. By having the primary interface of a Unicode data type be a sequence, we
encourage users to work on strings element-wise, which can lead to errors
as Unicode code points don't correspond well to the human concept of a
character (for example, the Swedish ä character can be represented using
either one or two code points). A sequence view is sometimes useful, if
you're implementing more high-level transformations, but often you should
use functions that operate on the whole string, such as toUpper :: Text -
Text.

Cheers,
  Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Motion to unify all the string data types

2012-11-09 Thread Roman Cheplyaka
* Johan Tibell johan.tib...@gmail.com [2012-11-09 19:00:04-0800]
 As a community we should primary use strict ByteStrings and Texts. There
 are uses for the lazy variants (i.e. they are sometimes more efficient),
 but in general the strict versions should be preferred.

I'm fairly surprised by this advice.

I think that lazy BS/Text are a much safer default.

If there's not much text it wouldn't matter anyway, but for large
amounts using strict BS/Text would disable incremental
producing/consuming (except when you're using some kind of an iteratee
library).

Can you explain your reasoning?

Roman

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe