RE: [Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

2007-03-26 Thread Simon Peyton-Jones
[Probably [EMAIL PROTECTED] is the right list for this message, so I'm fwding 
your message below, and will reply there.]

| -Original Message-
| From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benjamin
| Franksen
| Sent: 23 March 2007 22:56
| To: haskell-cafe@haskell.org
| Cc: haskell@haskell.org
| Subject: [Haskell-cafe] Can we do better than duplicate APIs? [was: 
Data.CompactString 0.3]
|
| [sorry for the somewhat longer rant, you may want to skip to the more
| technical questions at the end of the post]
|
| Twan van Laarhoven wrote:
|  I would like to announce version 0.3 of my Data.CompactString library.
|  Data.CompactString is a wrapper around Data.ByteString that represents a
|  Unicode string. This new version supports different encodings, as can be
|  seen from the data type:
| 
|  [...]
| 
|  Homepage:  http://twan.home.fmf.nl/compact-string/
|  Haddock:   http://twan.home.fmf.nl/compact-string/doc/html/
|  Source:darcs get http://twan.home.fmf.nl/repos/compact-string
|
| After taking a look at the Haddock docs, I was impressed by the amount of
| repetition in the APIs. Not ony does Data.CompactString duplicate the whole
| Data.ByteString interface (~100 functions, adding some more for encoding
| and decoding), the whole interface is again repeated another four times,
| once for each supported encoding.
|
| Now, this is /not/ meant as a criticism of the compact-string package in
| particular. To the contrary, duplicating a fat interface for almost
| identical functionality is apparently state-of-the-art in Haskell library
| design, viz. the celebrated Data.Bytesting, whose API is similarly
| repetitive (see Data.List, Data.ByteString.Lazy, etc...), as well as
| Map/IntMap/SetIntSet etc. I greatly appreciate the effort that went into
| these libraries, and admire the elegance of the implementation as well as
| the stunning results wrt. efficiency gains etc.. However I fear that
| duplicating interfaces in this way will prove to be problematic in the long
| run.
|
| The problems I (for-)see are for maintenance and usability, both of which
| are of course two sides of the same coin. For the library implementer,
| maintenance will become more difficult, as ever more of such 'almost equal'
| interfaces must be maintained and kept in sync. One could use code
| generation or macro expansion to alleviate this, but IMO the necessity to
| use extra-language pre-processors points to a weakness in the language; it
| be much less complicated and more satisfying to use a language feature that
| avoids the repetition instead of generating code to facilitate it. On the
| other side of teh coin, usability suffers as one has to lookup the (almost)
| same function in more and more different (but 'almost equal') module
| interfaces, depending on whether the string in question is Char vs. Byte,
| strict vs. lazy, packed vs. unpacked, encoded in X or Y or Z..., especially
| since there is no guarantee that the function is /really/ spelled the same
| everywhere and also really does what the user expects.(*)
|
| I am certain that most, if not all, people involved with these new libraries
| are well aware of these infelicities. Of course, type classes come to mind
| as a possible solution. However, something seems to prevent developers from
| using them to capture e.g. a common String or ListLike interface. Whatever
| this 'something' is, I think it should be discussed and addressed, before
| the number of 'almost equal' APIs becomes unmanageable for users and
| maintainers.
|
| Here are some raw ideas:
|
| One reason why I think type classes have not (yet) been used to reduce the
| amount of API repetition is that Haskell doesn't (directly) support
| abstraction over type constraints nor over the number of type parameters
| (polykinded types?). Often such 'almost equal' module APIs differ in
| exactly these aspects, i.e. one has an additional type parameter, while yet
| another one needs slightly different or additional constraints on certain
| types. Oleg K. has shown that some if these limitations can be overcome w/o
| changing or adding features to the language, however these tricks are not
| easy to learn and apply.
|
| Another problem is the engineering question of how much to put into the
| class proper: there is a tension between keeping the class as simple as
| possible (few methods, many parametric functions) for maximum usability vs.
| making it large (many methods, less parametric functions) for maximum
| efficiency via specialized implementations. It is often hard to decide this
| question up front, i.e. before enough instances are available. (This has
| been stated as a cause for defering the decision for a common interface to
| list-like values or strings). Since the type of a function doesn't reveal
| whether it is a normal function with a class constraint or a real class
| method, I imagine a language feature that (somehow) enables me to
| specialize such a function

Re: [Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

2007-03-24 Thread Robert Dockins
On Friday 23 March 2007 18:55, Benjamin Franksen wrote:
 [sorry for the somewhat longer rant, you may want to skip to the more
 technical questions at the end of the post]

 Twan van Laarhoven wrote:
  I would like to announce version 0.3 of my Data.CompactString library.
  Data.CompactString is a wrapper around Data.ByteString that represents a
  Unicode string. This new version supports different encodings, as can be
  seen from the data type:
 
  [...]
 
  Homepage:  http://twan.home.fmf.nl/compact-string/
  Haddock:   http://twan.home.fmf.nl/compact-string/doc/html/
  Source:darcs get http://twan.home.fmf.nl/repos/compact-string

 After taking a look at the Haddock docs, I was impressed by the amount of
 repetition in the APIs. Not ony does Data.CompactString duplicate the whole
 Data.ByteString interface (~100 functions, adding some more for encoding
 and decoding), the whole interface is again repeated another four times,
 once for each supported encoding.

I'd like to mention that as maintainer of Edison, I face similar difficulties.  
The data structure interfaces have scores of functions and there are about 20 
different concrete implementations of various sorts.  Even minor interface 
changes require a lot of tedious editing to make sure that everything stays 
in sync.

[snip]

 The problems I (for-)see are for maintenance and usability, both of which
 are of course two sides of the same coin. For the library implementer,
 maintenance will become more difficult, as ever more of such 'almost equal'
 interfaces must be maintained and kept in sync.

This is true. For the specific case of Edison, this is not too bad because all 
the implementations are currently in one package and under the control of one 
person (namely, me).  That might, however, change in the future.  Obviously, 
it is a problem _now_ for the ByteString and friends.

 One could use code 
 generation or macro expansion to alleviate this, but IMO the necessity to
 use extra-language pre-processors points to a weakness in the language; it
 be much less complicated and more satisfying to use a language feature that
 avoids the repetition instead of generating code to facilitate it.

I've considered something like this for Edison.  Actually, I've considered 
going even further and building the Edison concrete implementations in a 
theorem prover to prove correctness and then extracting the Haskell source.  
Some sort of in-langauge or extra-language support for mechanicly producing 
the source files for the full API from the optimized core API would be 
quite welcome.  Handling export lists, haddock comments, typeclass instances, 
etc, are quite tedious.

I have to admit, I'm not sure what an in-language mechanism for doing 
something like this would look like.  Template Haskell is an option, I 
suppose, but its pretty hard to work with and highly non-portable.  It also 
wouldn't produce Haddock-consumable source files.  ML-style first class 
modules might fit the bill, but I'm not sure anyone is seriously interested 
in bolting that onto Haskell.

[snip]

 Here are some raw ideas:

 One reason why I think type classes have not (yet) been used to reduce the
 amount of API repetition is that Haskell doesn't (directly) support
 abstraction over type constraints nor over the number of type parameters
 (polykinded types?). Often such 'almost equal' module APIs differ in
 exactly these aspects, i.e. one has an additional type parameter, while yet
 another one needs slightly different or additional constraints on certain
 types. Oleg K. has shown that some if these limitations can be overcome w/o
 changing or adding features to the language, however these tricks are not
 easy to learn and apply.

I mostly put these kinds of type system tricks in the same category as TH: 
hard to use and non-portable.

[snip]

 Or maybe we have come to the point where Haskell's lack of a 'real' module
 system, like e.g. in SML, actually starts to hurt? Can associated types
 come to the rescue?

I'm not familiar enough with associated types to know if they would help with 
these sorts of problems.  Can someone more knowledgable comment?

 Cheers
 Ben


Rob Dockins
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Can we do better than duplicate APIs? [was: Data.CompactString 0.3]

2007-03-23 Thread Benjamin Franksen
[sorry for the somewhat longer rant, you may want to skip to the more
technical questions at the end of the post]

Twan van Laarhoven wrote:
 I would like to announce version 0.3 of my Data.CompactString library. 
 Data.CompactString is a wrapper around Data.ByteString that represents a 
 Unicode string. This new version supports different encodings, as can be 
 seen from the data type:
 
 [...]
 
 Homepage:  http://twan.home.fmf.nl/compact-string/
 Haddock:   http://twan.home.fmf.nl/compact-string/doc/html/
 Source:darcs get http://twan.home.fmf.nl/repos/compact-string

After taking a look at the Haddock docs, I was impressed by the amount of
repetition in the APIs. Not ony does Data.CompactString duplicate the whole
Data.ByteString interface (~100 functions, adding some more for encoding
and decoding), the whole interface is again repeated another four times,
once for each supported encoding.

Now, this is /not/ meant as a criticism of the compact-string package in
particular. To the contrary, duplicating a fat interface for almost
identical functionality is apparently state-of-the-art in Haskell library
design, viz. the celebrated Data.Bytesting, whose API is similarly
repetitive (see Data.List, Data.ByteString.Lazy, etc...), as well as
Map/IntMap/SetIntSet etc. I greatly appreciate the effort that went into
these libraries, and admire the elegance of the implementation as well as
the stunning results wrt. efficiency gains etc.. However I fear that
duplicating interfaces in this way will prove to be problematic in the long
run.

The problems I (for-)see are for maintenance and usability, both of which
are of course two sides of the same coin. For the library implementer,
maintenance will become more difficult, as ever more of such 'almost equal'
interfaces must be maintained and kept in sync. One could use code
generation or macro expansion to alleviate this, but IMO the necessity to
use extra-language pre-processors points to a weakness in the language; it
be much less complicated and more satisfying to use a language feature that
avoids the repetition instead of generating code to facilitate it. On the
other side of teh coin, usability suffers as one has to lookup the (almost)
same function in more and more different (but 'almost equal') module
interfaces, depending on whether the string in question is Char vs. Byte,
strict vs. lazy, packed vs. unpacked, encoded in X or Y or Z..., especially
since there is no guarantee that the function is /really/ spelled the same
everywhere and also really does what the user expects.(*)

I am certain that most, if not all, people involved with these new libraries
are well aware of these infelicities. Of course, type classes come to mind
as a possible solution. However, something seems to prevent developers from
using them to capture e.g. a common String or ListLike interface. Whatever
this 'something' is, I think it should be discussed and addressed, before
the number of 'almost equal' APIs becomes unmanageable for users and
maintainers.

Here are some raw ideas:

One reason why I think type classes have not (yet) been used to reduce the
amount of API repetition is that Haskell doesn't (directly) support
abstraction over type constraints nor over the number of type parameters
(polykinded types?). Often such 'almost equal' module APIs differ in
exactly these aspects, i.e. one has an additional type parameter, while yet
another one needs slightly different or additional constraints on certain
types. Oleg K. has shown that some if these limitations can be overcome w/o
changing or adding features to the language, however these tricks are not
easy to learn and apply.

Another problem is the engineering question of how much to put into the
class proper: there is a tension between keeping the class as simple as
possible (few methods, many parametric functions) for maximum usability vs.
making it large (many methods, less parametric functions) for maximum
efficiency via specialized implementations. It is often hard to decide this
question up front, i.e. before enough instances are available. (This has
been stated as a cause for defering the decision for a common interface to
list-like values or strings). Since the type of a function doesn't reveal
whether it is a normal function with a class constraint or a real class
method, I imagine a language feature that (somehow) enables me to
specialize such a function for a particular instance even if it is not a
proper class member.

Or maybe we have come to the point where Haskell's lack of a 'real' module
system, like e.g. in SML, actually starts to hurt? Can associated types
come to the rescue?

Cheers
Ben
--
(*) I know that strictly speaking a class doesn't guarantee any semantic
conformance either, but at least there is a common place to document the
expected laws that all implementations should obey. With duplicated module
APIs there is no such single place.

___
Haskell-Cafe