Re: using the newer collection types

Sam Vilain Thu, 04 May 2006 15:51:41 -0700

Darren Duncan wrote:

>Speaking a little more technically, a Relation has 2 main components, 
>its heading and its body.  The heading is a set of 0..N keys (called 
>"attributes" in relation-land), and the body is a set of 0..N 
>Mappings (called "tuples" in relation-land), where they set of keys 
>of each Mapping is identical to the Relation's heading.  Its very 
>likely that a language-embedded Relation implementation would 
>actually not repeat the keys for each member Mapping, but we can 
>conceptualize as if they were present for simplicity.
>  
>


I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general "Tuple"
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items > 1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

The above should probably all be written in terms of parametric roles
(see S12), but for now the above run-time checking versions should
hopefully express the relationships between the core collection-like
types as I see them.

That sounds like it might bite, but you wouldn't normally access an
Array as a Collection of (Int, Any), you'd access it as an Array, so you
get the nice .post_circumfix:<[ ]> method that makes array access easy.
You don't care that it has this higher order type as a parent class, and
you certainly wouldn't care for the 'bare' Collection interface (as for
one, you don't want to have to deal with the Integer keys). And it is
probably all backed by native methods.

I'm prototyping much of this using Moose in Perl 5, however Hubris is
delaying its release :-)

>Moreover, the Relation type has these 
>operators that the Set type doesn't have: rename(), project(), 
>restrict(), extend(), join(), divide(), summarize(), group(), 
>ungroup(), wrap(), unwrap(), matching(), etc.
>

Is there a reference for the meaning of these methods?

>1.  Are Sets or Junctions allowed to contain undefined elements?  Can 
>undef be a key of a Mapping or Hash?
>  
>

undef.isa(Object), so you should be able to use it as you would any
other object. I would definitely not think of it as the absence of a
value in this context.

>2.  What actually is the practical distinction between a Set and a 
>Junction?  Why would someone use one over the other?  I recognize 
>that the use of Junctions is supposed to make parallelism easier, as 
>iterating through one is known to be order independent.  But, 
>conceptually a Set and a Relation are exactly the same;  you could 
>process their members in any order and/or in parallel as well.  So is 
>the use of a Junction effectively like a compiler flag to make 
>certain kinds of Set ops faster at the expense of others?
>  
>

Well one side effect at the moment is that Junctions are immutable,
whilst Sets are mutable. This is perhaps a deficiency in my original
Set.pm design; all of the mutating functions should be in a seperate
role, really (or just not be mutators).

>6.  Can I declare with named Set (or Junction) and Mapping typed 
>variables and/or parameters that their members are restricted to 
>particular types, such as Str, as I can with Arrays and Hashes, so 
>that Perl itself will catch violations?  Eg, can I say as a parameter 
>"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so 
>Perl will check that arguments are suchwise correct?
>  
>

These are variously called "Generics" (ada I think, Java 1.5+),
"Parametric Types", "Higher Order Types" (Pierce et al), "Generic
Algebraic Data Types" (Haskell)

In Perl 6 they are parametric roles (as in S12 mentioned above)

>7.  Can we add some operators to Mapping that are like the Relation 
>ones, so that implementing a Relation over Mappings is easier (or, 
>see the end of #8)?  Eg, these would be useful: rename(), project(), 
>extend(), join().  In particular, implementing join() into Mapping 
>would help save CPU cycles:
>  
>

Again, a reference to a prototype of the behaviour would be useful.

Sam.

Re: using the newer collection types

Reply via email to