Re: *safe* coerce: four methods compared

2003-08-05 Thread oleg

This message illustrates how safe casting with multiple universes can
be extended to new user-defined, polymorphic datatypes. We show a
_portable_ mapping of polymorphic types to integers. Different
instances of a polymorphic type map to different integers. Phantom
types can be either disregarded or accounted for -- as the user
wishes. Furthermore, if two different applications running on two
different machines agree on the same type heap, then they agree on the
type encoding. An application can use multiple typeheaps at the same
time. It is easy therefore to dedicate one particular typeheap for
portable encoding of types across several computers.

Incidentally, our encoding of types may take into account _values_ of
some of the components of a polymorphic type. For example, given a
value of a type 'Foo Int a', the encoding may use the _value_ of the
first component and the _type_ of the second component. When we do the
cast, we can check not only for the desired type but also for the
desired values. We can thus approach dependent types.

This message hopefully replies to Ralf Laemmel's comment:
] You should make very clear that there is
] not just a single safeCoerce, but the TI/init_typeseq argument has to
] be constructed and supplied by the programmer in a way that (s)he
] decides what array of types can be handled.  So if you wanted to use
] your approach to scrap boilerplate [1], say deal with many datatypes,
] this becomes quite a burden.  Think of actually building initial type
] sequences. Think of how combinators need to be parameterised to take
] type sequences.

with which I agree. Below I try to make it very implicit what depends
on TI/init_typeseq and what doesn't, and how much work is involved in
adding new datatypes and extending type heaps. I don't know if the
proposed approach is better than the others in many
circumstances. It's certainly different.

Ralf Laemmel also wrote:
] Software-engineering-wise your approach suffers from an important
] weakness: a closed world assumption.  The programmer has to maintain
] your TI and pass it on in all kinds of contexts for the array of
] types to be handled.

I thought it is a feature. I thought a programmer can import some type
heap, partially apply the needed function to it, and re-export the
latter. The example below demonstrates what is involved when a new
datatype is added. It seems not that much. 


] I didn't need undecidable not even overlapping instances.

I don't actually need overlapping-instances extensions. An earlier
message on the subject of MRefs used similar type heaps without any
need for -fallow-overlapping-instances. However I had to use numeral
types such as Succ (Succ ... Zero) rather than Int for type
indices. The current approach looks more elegant. Besides, you seem to
in favor of giving overlapping instances more acceptance, support and
legitimacy.

] Is it obvious to see that fetching stuff from the type sequences would
] be indeed efficient for long PLists?

The sequence of 'cdr' operations needed for fetching stuff is known
statically. A compiler might therefore do something intelligent.


This whole message is self-contained, and can be loaded as it is in
GHCi, given the flags
  -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances

We start with the boilerplate, which has changed a little (for
example, the class PLists now has a member function pllen).

 data Nil t r  = Nil
 data Cons t r = Cons t r

 class PList ntype vtype cdrtype where
 cdr::   ntype vtype cdrtype - cdrtype
 empty:: ntype vtype cdrtype - Bool
 value:: ntype vtype cdrtype - vtype
 pllen:: ntype vtype cdrtype - Int
   
 instance PList Nil vtype cdrtype where
 empty = const True
 pllen = const 0
   
 instance (PList n v r) = PList Cons v' (n v r) where
  empty = const False
  value (Cons v r) = v
  cdr   (Cons v r) = r
  pllen (Cons v r) = 1 + pllen r
 
 class TypeSeq t s where
 type_index:: t - s - Int
 fetch:: t - s - t
 alter:: t - s - s
   
 instance (PList Cons t r) = TypeSeq t (Cons t r) where
 type_index _ _ = 0
 fetch _ (Cons v _) = v
 alter newv (Cons v r)  = Cons newv r

 instance (PList Cons t' r', TypeSeq t r') = TypeSeq t (Cons t' r') where
 type_index v s = 1 + (type_index v $ cdr s)
 fetch v s = fetch v $ cdr s
 alter newv (Cons v' r') = Cons v' $ alter newv r'

The initial typesequence:

 init_typeseq = Cons (undefined::Char) $
Cons (undefined::Int) $
  Cons (undefined::Bool) $
  Cons (undefined::String) $
  (Nil::Nil () ())

and its type. See the previous message for more discussion of the
latter.

 type TI 
   = (Cons
 Char
 (Cons Int (Cons Bool (Cons String (Nil () ())

Because we will be dealing with existential types, we need to extend
the compile-time indexing into run-time:

 class (TypeSeq t u) = TypeRep t u where
 tr_index:: t - u - Int
 tr_index = 

Re: *safe* coerce: four methods compared

2003-08-02 Thread Ralf Laemmel
[EMAIL PROTECTED] wrote:
 
 This is a Related Work section of the previous message.

 ... again cunning stuff omitted ...


I buy most of this
but IMHO you should make very clear 
that there is not just a single safeCoerce, but the TI/init_typeseq
argument has to be constructed and supplied by the programmer
in a way that (s)he decides what array of types can be handled.
So if you wanted to use your approach to scrap boilerplate [1],
say deal with many datatypes, this becomes quite a burden.
Think of actually building initial type sequences. Think of
how combinators need to be parameterised to take type sequences.
(That's what I called a CWA yesterday.)

On the other hand, you mention this duality between type classes
vs. type heaps. Yes, I would say that type classes and type case
are somewhat dual. You provide a type case. What I like about your
type case vs. the approach taken in [1] is that your type case will
be very precise. That is, you don't say one can just try anything
what is Typeable but you rather restrict questions to the types 
in the supplied initial type sequence. This is certainly beneficial
for applications other than scraping boilerplate.

Ralf

[1}  Scrap your boilerplate: a practical design pattern for generic
programming 
 by Ralf Lämmel and Simon Peyton-Jones, 
 appeared in Proceedings of TLDI 2003, ACM Press 
 http://www.cs.vu.nl/boilerplate/#paper

-- 
Ralf Laemmel
VU  CWI, Amsterdam, The Netherlands
http://www.cs.vu.nl/~ralf/
http://www.cwi.nl/~ralf/
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


*safe* coerce: four methods compared

2003-08-01 Thread oleg

This is a Related Work section of the previous message. We compare
three main methods of achieving safe casts. It seems that the method
proposed in the earlier message is quite different -- especially in
terms of extensibility. In this message, we compare the extensibility
of four techniques. Stephanie Weirich ICFP'00 paper points out another
solution, which relies on mutable IORefs. Since that technique can
only be used with IO monad, we do not consider it here.

Some of the methods below require type classes and algebraic datatype
declarations. Some require only an algebraic datatype, or only a
typeclass. In either case, we run into an extensibility problem: to
add support for a new datatype, we must either add an instance
declaration, or a new alternative to the datatype declaration. These
are non-trivial, non-modular extensions. For example, when we add a
new alternative to a datatype declaration, we must physically update
the corresponding file. We must then re-compile all dependent modules.

Surprisingly, the solution in the earlier message is free from these
drawbacks. We can extend the type heap in a modular fashion. We do not
need to alter type or data declarations. It seems our type heaps are
sort of reified lists of instances. There appear to be a duality
between typeclasses and our type heaps. Only our type heaps are
first-class.


The idea behind all type-safe casts is simple: to cast a value of a type
'a' into a target type 'b', we inject the value into some universe and
then project it to the target type 'b'.

To illustrate the differences in implementations of that idea, we will
be using James Cheney and Ralf Hinze's example: a generic comparison
function. To avoid unnecessary complications, we limit ourselves to
built-in and scalar types. Extensions to products, exponential,
recursive and polymorphic types are possible, but too messy. We also
will not consider existential types, so we avoid introducing type
classes if they are not needed in the static case.

This whole message is self-contained, and can be loaded as it is in
GHCi, given the flags
  -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances


Approach 1: Tcl approach

The universal type is a string. The values to cast must belong to the
class Show and the class Read. The injection and projection functions
are trivial:

 sh_inj x = show x
 sh_prj x = read x

Generic equality and the cast functions are trivial as well:

 sh_gequal x y = sh_inj x == sh_inj y

 sh_cast x = sh_prj $ sh_inj x

Here's the test:

 sh_test1 = [sh_gequal 1 2, sh_gequal True True, sh_gequal 'a' 'b']

To add support for a new datatype, we have to place that datatype in
the class Show and the class Read. That is, we have to add the
corresponding instance declarations _and_ we have to implement methods
'show' and 'read'. Functions sh_gequal and sh_cast do not have to be
modified. The Tcl approach is also the most generous with respect to
type equivalence: for example, Int and Integer are considered
equivalent, and sh_cast may cast between them.

Incidentally, when GHC can derive Binary, this approach becomes far
more appealing.


Approach 2: The universe is the tagged union

 data TU = TChar Char | TBool Bool | TInt Int

 class TURepr t where
   tu_inj:: t - TU
   tu_prj:: TU - t
 
 instance TURepr Char where
   tu_inj = TChar
   tu_prj (TChar x) = x
  
 instance TURepr Bool where
   tu_inj = TBool
   tu_prj (TBool x) = x
 
 instance TURepr Int where
   tu_inj = TInt
   tu_prj (TInt x) = x
 
 tu_gequal x y = cmp (tu_inj x) (tu_inj y)
   where
 cmp (TChar x) (TChar y) = x == y   
 cmp (TBool x) (TBool y) = x == y   
 cmp (TInt x)  (TInt y)  = x == y   

 tu_cast x = tu_prj $ tu_inj x

 tu_test1 = [tu_gequal (1::Int) (2::Int), tu_gequal True True, 
 tu_gequal 'a' 'b']


To add support for a new datatype, we have to add a new alternative to
the declaration of the datatype TU and we have to add a new instance
for the class TURepr (with the implementation of the tu_inj and
tu_proj methods). We also have to add another clause to the tu_gequal
function. Clearly this is the least extensible approach.


Approach 3: by Cheney and Ralf Hinze's
The universe is a set of inject/project pairs

 data IPP a =  IPPInt  (a-Int)  (Int-a) 
 | IPPChar (a-Char) (Char-a)
 | IPPBool (a-Bool) (Bool-a)
   
 ipp_gequal (IPPInt  prj inj) x y = prj x == prj y
 ipp_gequal (IPPChar prj inj) x y = prj x == prj y
 ipp_gequal (IPPBool prj inj) x y = prj x == prj y

 ipp_cast (IPPInt xprj xinj) x (IPPInt yprj yinj) = yinj $ xprj x
 -- more should follow...

 ipp_test1 = [ipp_gequal (IPPInt  id id) (1::Int) (2::Int),
  ipp_gequal (IPPBool id id) True True,
  ipp_gequal (IPPChar id id) 'a' 'b']

To add a new primitive datatype, we should modify the declaration of
the datatype IPP and add a new alternative. We also need to add
clauses to ipp_gequal and ipp_cast.

Incidentally, (IPPInt