Re: Library conventions

2000-06-24 Thread Chris Okasaki

 Edison uses
   update :: Seq s = Int - a - s a - s a
   adjust :: Seq s = (a - a) - Int - s a - s a
 for what my guidelines could give
   setElem:: Seq s = s a - Int - a - s a
   updateElem :: Seq s = s a - Int - (a - a) - s a
 
 Not that I don't like partial application, but Edison's order is
 usually not used elsewhere, e.g. in PosixTTY (using "with" prefix
 instead of my "set"), Bits (using separate setBit and clearBit instead
 of setBit with Bool parameter, and complementBit instead of updateBit -
 but the latter is OK), FiniteMap.
 
 Maybe we should promote Edison's order, which is consistent with
 module List, but what about above libraries?

Simon PJ sold me on this order, which supports multiple updates like

  update 1 'a' $ update 2 'b' $ update 3 'c' s

instead of

  update (update (update 3 'c' s) 2 'b) 1 'a'

Chris





Re: Library conventions

2000-06-24 Thread Chris Okasaki

 Seems that it would get simpler if association maps were expressed as
 collections of key:=value pairs (with Eq,Ord instances ignoring the
 value component). Association maps would have extra functions, but
 they could be always treated as appropriate collections of such pairs.
 
 Is this idea fundamentally broken for some reason?

I considered this for Edison but rejected in for two reasons.
First, it constrains the implementation, adding an extra level
of indirection to every access.  Second, and more seriously, it moves 
the unconstrained type variable into the class, which leads to several 
potential problems such as more ambiguity messages.  Your idea below 
would take care of at least some of these problems.

 Seems that an extended form of context could be useful. E.g.
 (forall a. Coll c a) = ...
 means that c is constrained to types for which there exists an
 appropriate instance that works for all types a.

Chris





Re: Library conventions

2000-06-23 Thread Chris Okasaki

  These suffixes are doing namespace management, avoiding name clashes
  between different things that you want to call empty.  But Haskell
  already has a perfectly good language mechanism for doing this --
  the module system!  Why is emptyX preferable to X.empty?  The latter
  convention is much more flexible.
 
 I prefer recognizing common interfaces and making appropriate classes.
 It is more convenient and more flexible than qualified imports:
 
 [...reasons deleted...]

But classes and qualified imports are not in competition!  Suppose you
have two *classes* that both need/want the same name.  For example,
you may have a class of sequences and a class of finite maps that
both want to use an empty method.  Do you call one emptyS and the
other emptyFM?  Or do you disambiguate using qualified names?

You might say, "Neither!  Make a common superclass containing empty
and anything else common to both sequences and finite maps."  Unfortunately,
this doesn't work because type constructors for sequences and finite maps
both have different kinds and different constraints on their elements.


 I don't like modules that have to be imported qualified.

Can you be more specific?  I understand that you don't want to use
qualified names in those situations where a class would be a better
solution, but what about when classes alone don't solve the problem?

Chris





Re: doubly linked lists

2000-04-28 Thread Chris Okasaki

The implementation that uses laziness to get
true backpointers seems to have caught everybody's
imagination.  Several people have hinted at
the big weakness of this implementation, but lest
any beginners reading this thread be misled, let me 
just state that weakness explicitly -- it takes O(n) 
time to make even the simplest change to such a list.*

What it boils down to is that this implementation
is only useful when the list is mostly static
(that is, not updated very often).  And, in
many situations where the list *is* mostly static,
an array might be a better choice.

Chris


* Laziness can sometimes save you from paying this
  entire O(n) cost up front, but if you are going
  to end up eventually looking at the entire list,
  you'll eventually pay the entire cost.  Furthermore,
  this O(n) cost cannot be amortized across multiple
  updates -- every update pays an additional O(n).




Re: doubly linked list

2000-04-27 Thread Chris Okasaki

 I wonder if it is possible to simulate a doubly linked list in
 Haskell.

Depends on what you mean.  

  - Using mutable state in a monad you can implement a doubly 
linked list directly.
  - If you store all the nodes of the doubly linked list in
an array and simulate the pointers with indices into the
array, then you can easily implement this in Haskell using
some kind of extensible persistent array (probably some flavor 
of binary tree).  [Here you get a logarithmic slowdown
compared to ordinary doubly linked lists.]
  - If you want to be able to add/remove things from the front/back
plus be able to splice two lists together, see my implementation
of catenable deques (ICFP'97 or in my book).
  - If you also want to be able to have a "cursor" into the middle
of the list where you can make changes, you can implement this
as a pair of catenable deques, where the first deque represents
the part before the cursor and the second deque represents the
part after the cursor.
  - If you want to allow an arbitrary number of cursors, then
the simulation using an extensible persistent array is probably
your best bet.

Chris




coercing newtypes

2000-04-17 Thread Chris Okasaki

Many of you have run across the problem with
newtypes that, although it is very cheap to
coerce between the newtype and the base type, it
can be very expensive to coerce between, say,
a list of the newtype and a list of the base type.
Stephanie Weirich and I are working on a proposal
for the Haskell Workshop addressing this problem,
and we would welcome any feedback from the community.

The current draft can be found at
  http://www.cs.columbia.edu/~cdo/coerce.ps.gz

--Chris




Re: RFC: Overloaded arrays

2000-03-29 Thread Chris Okasaki

Simon Marlow wrote:
 class HasBounds a = IArray a e where
 (!) :: Ix ix = a ix e - ix - e
 array   :: Ix ix = (ix,ix) - [(ix,e)] - a ix e
 
 class (Monad m, HasBounds a) = MArray a e m where
 read:: Ix ix = a ix e - ix - m e
 write   :: Ix ix = a ix e - ix - e - m ()
 marray  :: Ix ix = (ix,ix) - m (a ix e)

My main comment is please don't ignore a simple update operation
on immutable arrays, with a type something like
  update :: Ix ix = a ix e - ix - e - a ix e
I don't care about the name but I do care about the functionality.
I'm perfectly happy with the naive, dirt simple, O(n) implementation 
that copies the whole array and makes the update in the copy.  Yes,
there is the // operation, but 95% of the time I just want to
update a single element.

Most functional languages leave this operation out, I presume because
the feeling is that it is so expensive that no one would ever
want to call it.  But I've found myself wanting it lots of times,
usually with very short arrays (say, length 4 or 8).

Chris




Re: RFC: Overloaded arrays

2000-03-29 Thread Chris Okasaki

Simon Marlow wrote:
 Actually, I'm slightly concerned about your use of small arrays: the static
 (one-off) cost of allocating an array is quite high compared to eg. tuples
 or records.  Are arrays the only solution here?

You're right of course that arrays are quite expensive, but 
it is not clear to me whether this is an inherent property of
arrays or an artifact of the current implementation.

At least part of it is inherent, because of the extremely
general nature of Haskell's arrays (use of Ix, arbitrary
bounds).  I've never understood the advantages of these
arrays over a more primitive mechanism (indexed by integers
starting at 0), with the fancier arrays built on top of
the primitive arrays in a library.  But this is not a battle
I'm prepared to fight right now!

As to whether arrays are the only solution, well, no.
Tuples are out because the size is not necessarily known
in advance.  Or even if the size is known, you may
expect it to change several times during development.
Lists are a posibility, but, when I say "short,
that might be as high as maybe 256.  Some tree-like
implementation of arrays, such as Braun trees would 
not be unreasonable.  But arrays seem like the most
natural choice.  It would be a shame it steer programmers
away from arrays just because they are disproportionately
expensive.

Chris




newtypes

2000-03-16 Thread Chris Okasaki

The Haskell report says that in

  newtype T = C t

T uses the same representation as t, and so coercions
between the two can be implemented without execution
time overhead.  Furthermore, the report says that
"unlike type synonyms, newtype may be used to define 
recursive types."

How are these two statements reconciled for recursive
types such as

  newtype Foo = F Foo

or

  newtype A = MkA B
  newtype B = MkB A  ?

Admittedly, you could never construct a value of these
types, but, even so, what do these types mean?

Chris



Re: drop take [was: fixing typos in Haskell-98]

2000-01-25 Thread Chris Okasaki

 I'm with Jon Fairbairn on this.  Negative arguments are an error
 because the domain of take and drop is the naturals.  The problem
 is that we use Int to represent naturals.  -- P

Yep, this is exactly the same argument we had about this
a year or two ago, Phil.  My attitude about the "implicit Nat"
argument is the same as it was then -- if you want the functions
to be over Nat's then *say* Nat in the type, not Int.  This
could be done in at least two relatively lightweight ways.

  -- provide a Nat type that maps to unsigned integers
  -- provide a simple type synonym
 type Nat = Int
 along with a prominent comment that any function that
 *says* Nat in its type should raise an error on a negative argument

I would have no arguments with either approach, or with any other
approach that makes Nat explicit in the type.

But if the type *says* Int, then it should have reasonable behavior
for ints.  I look at the negative case as being equivalent to
standard mathematical treatment of ranges such as i..j, where the
range is considered to be empty if j  i.  Allowing take/drop to
handle negative arguments should be useful to exactly the same
extent as that mathematical convention.

Chris






Re: Bug in Edison Makefile

1999-12-21 Thread Chris Okasaki

Simon Marlow wrote:
  When you call `make all' from the edison subdir, the compiler and the
  flags change magically and dependencies seem to be broken:
 
 You shouldn't try building in the edison subdir; the Makefile in
 fptools/hslibs/data is designed to reach into edison and build the required
 bits.  That way we can keep a full edison distribution in the tree but still
 have an fptools-style Makefile.

I didn't know that!  In that case, the fix is to adjust the Makefile in 
fptools/hslibs/data to add the following compilation flags
  -fallow-undecidable-instances
and
  -funbox-strict-fields
The latter is not absolutely necessary just to get it to compile, 
but Edison is intended to be used with that flag on.

I'll let you know if and when other compilation flags change...

Chris



bug or feature?

1999-12-16 Thread Chris Okasaki

(I'm using GHC 4.04 patchlevel 1...)

Suppose I have a type involving higher kinds such as

  data H f a = H (f a)

and now suppose I want to define equality on this type.
I *cannot* say

  instance Eq (f a) = Eq (H f a) where
H x == H y = x == y

because I get an error message

  Illegal constaint `Eq (f a)' in instance context
(Instance contexts must constrain only type variables)

(BTW, typo in "constaint"...)

However, I *can* say

  data H f a = H (f a)  deriving (Eq)

which presumably generates code internally that is *identical*
to the instance declaration I tried to write.

Is this inconsistency between what can be written manually
and what can be derived deliberate?

-Chris



Second CFP: Special Issue of JFP on Algorithmic Aspects of Functional Programming Languages

1999-12-15 Thread Chris Okasaki

The Journal of Functional Programming will host 
a special issue devoted to the design, analysis,
evaluation, and/or synthesis of algorithms and
data structures in functional programming languages.
For full details, see
  http://www.dcs.gla.ac.uk/jfp/editorialAugust99.html
The deadline for submissions is February 16, 2000.

Chris Okasaki
[EMAIL PROTECTED]



CFP: Special Issue of JFP on Algorithmic Aspects of Functional Programming Languages

1999-10-07 Thread Chris Okasaki

The Journal of Functional Programming will host 
a special issue devoted to the design, analysis,
evaluation, and/or synthesis of algorithms and
data structures in functional programming languages.
For full details, see
  http://www.dcs.gla.ac.uk/jfp/editorialAugust99.html
The deadline for submissions is February 16, 2000.

Chris Okasaki
[EMAIL PROTECTED]






WAAAPL proceedings available

1999-09-16 Thread Chris Okasaki

The proceedings of WAAAPL (Workshop on Algorithmic
Aspects of Advaced Programming Languages) is now
available electronically.  WAAAPL will be held
in Paris on September 30--between ICFP and
the Haskell Workshop.

The entire proceedings is at
  http://www.cs.columbia.edu/~cdo/waaapl99.pdf

The program and links to invidual papers are at
  http://www.cs.columbia.edu/~cdo/waaapl-prog.html

The proceedings will also be available shortly as
a Columbia University technical report.

Enjoy,
Chris Okasaki





Re: Kind Question

1999-05-26 Thread Chris Okasaki

Lennart Augustsson wrote:

  But what can such a type be used for?

 This particular example is not very useful, but there are examples where
 higher kinds are used.  Chris Okasaki have some for representing square
 matrices.

Here's a simpler example.  Consider the type of non-empty, multiway trees
(often called "rose trees").

  data Tree a = Node a [Tree a]

Now, let's generalize this to replace the list type constructor with a type
constructor for some arbitrary kind of collection.

  data GTree coll a = Node a (coll (GTree coll a))

This type is useful for extending certain properties of the underlying
collection type.  For example, if H is a type constructor for priority queues

that supports insert in O(1) time, then Maybe (GTree H a) gives priority
queues that support both insert and merge in O(1) time.  Similarly,
if Q is a type constructor for FIFO queues that support "snoc" in
O(1) time, then Maybe (GTree Q a) gives FIFO queues that support
cons, snoc, and append in O(1) time.

These are both described in Section 10.2 of my book...

Chris







Edison

1999-05-21 Thread Chris Okasaki

I am pleased to make the first public release of Edison, a library of
data structures for Haskell.  See
http://www.cs.columbia.edu/~cdo/edison/
for details.

Many thanks to Ralf Hinze for solving the makefile problems that
had been plaguing me for many months!

Chris Okasaki







Re: STL Like Library For Haskell

1999-04-28 Thread Chris Okasaki

Simon Peyton-Jones wrote:

 Chris Okasaki is working on just such a thing.
 He'll be ready soon...

Lest this be taken too literally however, let me clarify.
I am working on the "Containers" part of the "Containers
and Algorithms" that Kevin asked about.  I am *not*
doing anything like the "Algorithms" part of STL.  That
depends heavily on iterators, which I am not supplying
in an STL-like form.  Rather, I am supplying higher
level iterators such as fold/map/etc.

Chris






First CFP: WAAAPL'99 (Workshop on Algorithmic Aspects of Advanced Programming Languages)

1998-11-10 Thread Chris Okasaki


ANNOUNCEMENT AND CALL FOR PAPERS

WAAAPL'99
  The First Workshop on Algorithmic Aspects of Advanced Programming Languages
   http://www.cs.columbia.edu/~cdo/waaapl.html

  Paris, France
  September 29-30, 1999

The First Workshop on Algorithmic Aspects of Advanced Programming
Languages will take place in Paris, France as part of PLI'99.
Co-located conferences include ICFP, PPDP (previously known as
PLILP/ALP), and HOOTS.

WAAAPL (pronounced "wapple") seeks papers on all aspects of the
design, analysis, evaluation, or synthesis of algorithms or data
structures in the context of advanced programming languages, such as
functional or logic languages, where traditional algorithms or data
structures may be awkward or impossible to apply. Possible topics
include (but are not limited to)

  - new algorithms or data structures
  - empirical studies of existing algorithms or data structures
  - new techniques or frameworks for the design, analysis, evaluation,
or synthesis of algorithms or data structures
  - applications or case studies
  - pedagogical issues (language aspects of teaching algorithms or
algorithmic aspects of teaching languages)

Important dates:

   Submission deadline: June 16, 1999
   Notification of acceptance or rejection: July 28, 1999
   Final papers due: August 25, 1999
   Workshop: the afternoon of September 29 and morning of September 30,
1999

Submission details:

Prospective authors should submit papers of up to 12 pages to Chris
Okasaki ([EMAIL PROTECTED]) on or before June 16, 1999.  Papers
should be formatted in Postcript for USLetter paper.  Accepted papers
will be published in an electronic proceedings, with a hardcopy
distributed at the workshop.

Program committee:

   Gerth Stoslash;lting Brodal (BRICS, University of Aarhus, Denmark)
   Adam Buchsbaum (ATT Labs, USA)
   Iliano Cervesato (Stanford University, USA)
   Ralf Hinze (Rheinische Friedrich-Wilhelms-Universitauml;t Bonn,
Germany)
   John O'Donnell (University of Glasgow, Scotland)
   Chris Okasaki (Columbia University, USA) (chair)
   Ricardo Pentilde;a (Universidad Complutense de Madrid, Spain)

Send questions and comments to [EMAIL PROTECTED]






Re: heap sort or the wonder of abstraction

1997-10-08 Thread Chris Okasaki

--167E2781446B
Content-Type: text/plain; charset="us-ascii"

Ralf Hinze wrote:
 Practitioners are probably surprised to learn that `pairingSort' is the
 algorithm of choice for sorting. Any objections to this recommendation?
 I was surprised to see that it performs so well: sorting 50.000 Int's
 in roughly three seconds and 100.000 Int's in roughly nine seconds is
 quite acceptable.

I ran some similar experiments in Standard ML a few years ago.  In those
experiments pairingSort also performed extremely well.  The only 
algorithm that performed better, and even then only by a small amount,
was splaySort, based on splay trees[1].  However, my experiment
only considered algorithms that were good choices as heaps -- I
did not consider any of the mergesort variations.  Ralf, could I
ask you to run my code below through your experiments (I don't have
easy access to anything but hugs at the moment)?

According to Ralf's criteria, splaySort is
  A. asymptotically optimal
  B. stable
  C. smooth  (In fact, it has been conjectured that splaySort is
  optimal with respect to any reasonable notion of
  "presortedness".[2])
However, I believe--although I'm positive--that splaySort is
  D. not lazy
Ralf considered the situation where the creation phase takes O(n) time
and the selection phase takes O(n log n) time, but for splaySort these
are reversed.

Chris

--

[1] Sleator and Tarjan
"Self-adjusting binary search trees"
Journal of the ACM 32(3):652-686 (July '85)

[2] Moffat, Eddy, and Petersson
"Splaysort: Fast, Versatile, Practical"
Software PE 26(7):781-797 (July '96)

-

--167E2781446B
Content-Disposition: inline; filename="Splay.lhs"
Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs"

 data Splay a = SEmpty | SNode (Splay a) a (Splay a)

 instance PriorityQueue Splay where
  empty = SEmpty
  single x = SNode SEmpty x SEmpty

  fromList xs = foldr insert empty xs
  
  toOrderedList t = tol t []
where tol SEmpty rest = rest
  tol (SNode a x b) rest = tol a (x : tol b rest)

  insert k t = SNode a k b
where 
  (a, b) = partition t  -- elements of a = k, elements of b  k

  partition SEmpty = (SEmpty,SEmpty)
  partition t@(SNode tl x tr)
| x  k =
case tr of
  SEmpty - (t,SEmpty)
  SNode trl y trr
| y  k -
let tl' = SNode tl x trl
(lt,ge) = partition trr
in (SNode tl' y lt,ge)
| otherwise -
let (lt,ge) = partition trl
in (SNode tl x lt,SNode ge y trr)
| otherwise =
case tl of
  SEmpty - (SEmpty,t)
  SNode tll y tlr
| y  k -
let (lt,ge) = partition tlr
in (SNode tll y lt,SNode ge x tr)
| otherwise -
let tr' = SNode tlr x tr
(lt,ge) = partition tll
in (lt,SNode ge y tr')

 splaySort :: (Ord a) = [a] - [a]
 splaySort =  toOrderedList
   .  (fromList :: (Ord a) = [a] - Splay a)

--167E2781446B--





Re: Heap Sort

1997-10-04 Thread Chris Okasaki

[EMAIL PROTECTED] wrote:
 Here is my version:
   [...]

 On 21 Sep , Chris Dornan wrote:
  When would a heap sort be preferable to a merge sort?
 
 a) When you want to explain the imperative heapsort

But the heapsort you give is nothing like the standard imperative
heapsort!  Yes, it uses a heap, but not the binary heap used by
standard heapsort.  Instead, it uses the functional equivalent 
of multi-pass pairing heaps[1].  Larry Paulson's "ML for the
Working Programmer" includes a functional heapsort that is
much closer in spirit to the standard imperative version, and
so is probably superior for pedagogical purposes.  (Although I expect
that your version will be much faster in practice.)

Chris

[1] Fredman, Sedgewick, Sleator, and Tarjan.
"The pairing heap: A new form of self-adjusting heap"
Algorithmica 1(1):111-129, 1986.





Preliminary Design of Edison: A Library of Efficient Data Structures

1997-07-20 Thread Chris Okasaki


I am in the process of designing and implementing a library of
efficient data structures tentatively named Edison.  However,
before I get too deep in the implementation I wanted to solicit
feedback from the Haskell community on the preliminary design of the 
library.  I have sketched the general design conventions I have followed
at
  http://foxnet.cs.cmu.edu/~cokasaki/edison/
along with a few sample implementations.  I would appreciate any and
all comments.

Thanks,
Chris