Re: using the newer collection types

2006-05-07 Thread Sam Vilain
Darren Duncan wrote:

Also, I don't agree with the notion of a header of each relation. It
has a type for each tuple item, sure, but header just sounds like the
sort of thing you want in a ResultSet, not a Relation.
Sam.


A relation's heading is essentially the definition of the relation's 
structure, and is not redundant if the relation has no tuples in it, 
which is valid.
  


The Relation has a type, which is a Relation of Tuples of something. The
Header you refer to is the higher order part of the Tuple, the
something.

Sam.


Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam

Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?


Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan

At 3:06 PM +0200 5/6/06, mAsterdam wrote:

Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?


Can we first clarify that this describes what you are referring to:

http://en.wikipedia.org/wiki/Interval_(mathematics)

From what I've read there, I don't see an existing equivalent Perl 6 type.

Some people may confuse it with a Range, but I don't think so since a 
Range progresses in discrete increments, while an Interval would be 
continuous.


Moreover, the existing Set wouldn't overlap it because Set deals with 
a finite collection of discrete members.  (Similarly, a Relation is 
explicitly discrete and finite by definition.)


I assume you aren't referring to the temporal specific thing called 
an interval.


-- Darren Duncan


Re: using the newer collection types - Interval

2006-05-06 Thread Larry Wall
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
: Some people may confuse it with a Range, but I don't think so since a 
: Range progresses in discrete increments, while an Interval would be 
: continuous.

No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 = $x  3.4

The main point of context is to avoid an explosion of types.

Larry


Re: using the newer collection types - Interval

2006-05-06 Thread James Mastros
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
 Some people may confuse it with a Range, but I don't think so since a 
 Range progresses in discrete increments, while an Interval would be 
 continuous.
A range listifies to a (potentially) finite list of discrete elements, but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's false is a
bug, not a feature.

Of course, that doesn't mean implementing range in a subset of perl6 without
it isn't interesting, and possibly useful for bootstrapping.

   -=- James Mastros


Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan

At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply):

No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 = $x  3.4

The main point of context is to avoid an explosion of types.


At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply):

A range listifies to a (potentially) finite list of discrete elements, but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's false is a
bug, not a feature.


Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent 
a mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking for.

-- Darren Duncan


Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam

Darren Duncan wrote:

mAsterdam wrote:


Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?


Can we first clarify that this describes what you are referring to:

http://en.wikipedia.org/wiki/Interval_(mathematics)


Mathematical concepts never seem to completely
survive implementation in computing, but basically yes,
that was what I was thinking about.


 From what I've read there, I don't see an existing equivalent Perl 6 type.

Some people may confuse it with a Range, but I don't think so since a 
Range progresses in discrete increments, while an Interval would be 
continuous.


Moreover, the existing Set wouldn't overlap it because Set deals with a 
finite collection of discrete members.  (Similarly, a Relation is 
explicitly discrete and finite by definition.)


I assume you aren't referring to the temporal specific thing called an 
interval.


Date, Darwen and Lorentzos (Temporal data and the relational
model, 2003) build their temporal reasoning on top of
point types and interval types - they require them
allright - but I don't see how interval types (and point
types) require temporal stuff, i.o.w. I don't really
see how they are temporal specific (unless you are
thinking of SQL intervals).


Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam

Darren Duncan wrote:

At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply):


No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 = $x  3.4

The main point of context is to avoid an explosion of types.



At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply):

A range listifies to a (potentially) finite list of discrete elements, 
but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's 
false is a

bug, not a feature.



Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent a 
mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking 
for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.


Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan

At 12:45 AM +0200 5/7/06, mAsterdam wrote:

Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to 
represent a mathematical interval, but I was mistaken about Range 
being more constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.


It should work just fine.

Keep in mind that your concern about relations is orthogonal to the 
concern about intervals or temporal data.


An attribute of a relation can be any arbitrary data type at all 
(including another relation).


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.


Note that I am of the opinion that Perl probably should not have 
built-in data types that are specific to temporal or spacial data; it 
is better for these to be extensions (like DateTime etc) defined 
over built-ins like numbers or ranges or collections.  Temporal or 
spacial data in common use today is just too complicated and 
non-generic, I think.


(Whereas, the existing built-ins, and relations, are very generic and simple.)

-- Darren Duncan


Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam

Darren Duncan wrote:

At 12:45 AM +0200 5/7/06, mAsterdam wrote:


Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent 
a mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are 
looking for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.


It should work just fine.

Keep in mind that your concern about relations is orthogonal to the 
concern about intervals or temporal data.


I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen  Lorentzos
lightly took the step of introducing 6NF in 2003.

An attribute of a relation can be any arbitrary data type at all 
(including another relation).


Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.

So the only real concern here is whether there is a data type that can 
represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.


I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

Note that I am of the opinion that Perl probably should not have 
built-in data types that are specific to temporal or spacial data; it is 
better for these to be extensions (like DateTime etc) defined over 
built-ins like numbers or ranges or collections.  Temporal or spacial 
data in common use today is just too complicated and non-generic, I think.


(Whereas, the existing built-ins, and relations, are very generic and 
simple.)


... you, like I, want temporal and spacial data as simple
and generic as possible.


Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan

At 2:17 AM +0200 5/7/06, mAsterdam wrote:

I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen  Lorentzos
lightly took the step of introducing 6NF in 2003.

Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.


mAsterdam, I think we should end this sub-thread as I see these 
points you are now bringing up are well beyond the scope of what the 
Perl 6 language designers need to know or care about so I don't see a 
need to continue it on list.


Even for me personally, I don't think this is anything to worry about.

But perhaps to explain why I think this ...

Ignoring 1NF, which relations are always in by definition (they 
contain no duplicate tuples/rows), but things like SQL tables or 
non-relation collections could possibly not be, the 2NF+ have nothing 
to do with the actual definitions of relations themselves or the 
ability to perform relational algebra, which is all that the Perl 
language and/or extension classes to it need to know about.


The 2nd and higher normal forms are just formal labels applied to 
certain best practices that one can follow when designing a 
relational database.  They are efforts to further reduce redundancy 
in the collection of relations making up a relational database.  Best 
left to the users to make decisions about rather than the language 
designers.


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily 
be defined using Perl's standard class definition abilities if it 
isn't pre-defined.


I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

... you, like I, want temporal and spacial data as simple
and generic as possible.


You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };

-- Darren Duncan


Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan

At 6:06 PM -0700 5/6/06, Darren Duncan wrote:

You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };


Er, you should read 'Real' as 'Num' (I originally meant Rational, 
which no longer exists in the newest S06); I meant to say:


class Point { has Num $x; has Num $y; };

subset Interval of Range where } all( .items ).does(Num) };

-- Darren Duncan


Re: using the newer collection types - Interval

2006-05-06 Thread Larry Wall
On Sat, May 06, 2006 at 06:15:34PM -0700, Darren Duncan wrote:
: Er, you should read 'Real' as 'Num' (I originally meant Rational, 
: which no longer exists in the newest S06);

Rational still exists in S02--we just don't automatically promote
anything to it currently.  (A pragma could change that default in
a particular lexical scope, of course.)  The main problem with
Rationals is that they tend to waste a lot of bits maintaining
precision that is of little or no use in the real world, and they
can't easily be sized in advance of the calculation if you're planning
to store them in a particular spot.  Floaters win on both counts.

Larry


Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam

Darren Duncan wrote:

At 2:17 AM +0200 5/7/06, mAsterdam wrote:


I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen  Lorentzos
lightly took the step of introducing 6NF in 2003.

Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.



mAsterdam, I think we should end this sub-thread as I see these points 
you are now bringing up are well beyond the scope of what the Perl 6 
language designers need to know or care about so I don't see a need to 
continue it on list.


Even for me personally, I don't think this is anything to worry about.


Ok.


But perhaps to explain why I think this ...

Ignoring 1NF, which relations are always in by definition (they contain 
no duplicate tuples/rows), but things like SQL tables or non-relation 
collections could possibly not be, the 2NF+ have nothing to do with the 
actual definitions of relations themselves or the ability to perform 
relational algebra, which is all that the Perl language and/or extension 
classes to it need to know about.


The 2nd and higher normal forms are just formal labels applied to 
certain best practices that one can follow when designing a relational 
database.  They are efforts to further reduce redundancy in the 


No. Common misconception - but indeed not relevant to p.p6.l

collection of relations making up a relational database.  Best left to 
the users to make decisions about rather than the language designers.


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.



I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

... you, like I, want temporal and spacial data as simple
and generic as possible.



You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };


How to go about about having cyclic intervals over
(weekday) points e.g. Fri..Mon ?

If you want to discuss this off-list - the email-address
simply works :-) (huge amounts of spam though :-( )


Re: using the newer collection types

2006-05-05 Thread Sam Vilain
Darren Duncan wrote:

Is there a reference for the meaning of these methods?
  

There are many written references to these methods; just type 
relational algebra into Google.



I will add that the first hit on such a search, the Wikipedia page on 
relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra 
), is a perfectly good primer on what relational algebra is and what 
its importance is.
  


Thanks for the pointer.

While this may not actually change anything, I should point out that 
every collection type can also be expressed in terms of a Relation 
definition and/or they can all be implemented over a Relation (whose 
members are actually always unique).  For example:

1. A Set of Any is a Relation with one Any attribute.
2. A Bag of N Any attributes is a Relation of N+1 attributes, where 
the extra attribute is an Int (constrained = 1) that counts 
occurrances of the distinct other attributes.
3. A Mapping can be a Relation of 2 Any attributes.
4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), 
where the key has a unique constraint.
5. A Seq is a Relation of 2 attributes, typed Int (= 0) and Any, 
where the first shows their ordinal position and the second is the 
actual value; the first has a unique constraint.
6. An Array is the same, assuming it is a sparse; if it is not 
sparse, there is an additional constraint that the greatest Int value 
is the same / one less than the count of Relation members.
  


I don't know if anyone will care, but you can't emulate the raw
Collection type with this fixed Relation type. That is, a collection
of tuples, each of which may be of differing length and type.

This is what leads me to think that Collection is the more generic role.
I'm not saying Relations are not useful, perhaps they are more useful
than Collections in the practical case, but they are a sub-type.

Also, I don't agree with the notion of a header of each relation. It
has a type for each tuple item, sure, but header just sounds like the
sort of thing you want in a ResultSet, not a Relation.

Sam.


Re: using the newer collection types

2006-05-05 Thread Darren Duncan

At 8:01 PM +1200 5/5/06, Sam Vilain wrote:

Also, I don't agree with the notion of a header of each relation. It
has a type for each tuple item, sure, but header just sounds like the
sort of thing you want in a ResultSet, not a Relation.
Sam.


A relation's heading is essentially the definition of the relation's 
structure, and is not redundant if the relation has no tuples in it, 
which is valid.


A comment I back-logged on #perl6 yesterday suggested that what I was 
proposing with relations is sort of a retread of what classes are, 
where each is defined in terms of zero or more named and typed 
attributes.


I just want to weigh in that I see that as partly true (the same 
could also be said that it retreads what C structs are), and 
therefore perhaps some language constructs related to class 
definitions could be reused with relation definitions.


Perhaps Relation in Perl 6 would best be a role and/or meta-class 
with factory methods that produce other classes with common interface 
elements, such as all the relational algebra methods, and/or objects 
each of which has its own class.  Under this system, each time you 
perform an operation like a join, you are potentially making a new 
class, because its attributes are likely different than its 
predecessors.


Frankly, with Perl 6 being what it is, there are probably numerous 
ways to implement what I'm looking for, and I don't exactly know what 
would be best.


For now I'll just follow the simple path with a single Relation class 
in Pugs' ext/Relation/ directory for demonstration purposes and 
evaluate changes later.


-- Darren Duncan


Re: using the newer collection types

2006-05-04 Thread Sam Vilain
Darren Duncan wrote:

Speaking a little more technically, a Relation has 2 main components, 
its heading and its body.  The heading is a set of 0..N keys (called 
attributes in relation-land), and the body is a set of 0..N 
Mappings (called tuples in relation-land), where they set of keys 
of each Mapping is identical to the Relation's heading.  Its very 
likely that a language-embedded Relation implementation would 
actually not repeat the keys for each member Mapping, but we can 
conceptualize as if they were present for simplicity.
  


I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general Tuple
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items  1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

The above should probably all be written in terms of parametric roles
(see S12), but for now the above run-time checking versions should
hopefully express the relationships between the core collection-like
types as I see them.

That sounds like it might bite, but you wouldn't normally access an
Array as a Collection of (Int, Any), you'd access it as an Array, so you
get the nice .post_circumfix:[ ] method that makes array access easy.
You don't care that it has this higher order type as a parent class, and
you certainly wouldn't care for the 'bare' Collection interface (as for
one, you don't want to have to deal with the Integer keys). And it is
probably all backed by native methods.

I'm prototyping much of this using Moose in Perl 5, however Hubris is
delaying its release :-)

Moreover, the Relation type has these 
operators that the Set type doesn't have: rename(), project(), 
restrict(), extend(), join(), divide(), summarize(), group(), 
ungroup(), wrap(), unwrap(), matching(), etc.


Is there a reference for the meaning of these methods?

1.  Are Sets or Junctions allowed to contain undefined elements?  Can 
undef be a key of a Mapping or Hash?
  


undef.isa(Object), so you should be able to use it as you would any
other object. I would definitely not think of it as the absence of a
value in this context.

2.  What actually is the practical distinction between a Set and a 
Junction?  Why would someone use one over the other?  I recognize 
that the use of Junctions is supposed to make parallelism easier, as 
iterating through one is known to be order independent.  But, 
conceptually a Set and a Relation are exactly the same;  you could 
process their members in any order and/or in parallel as well.  So is 
the use of a Junction effectively like a compiler flag to make 
certain kinds of Set ops faster at the expense of others?
  


Well one side effect at the moment is that Junctions are immutable,
whilst Sets are mutable. This is perhaps a deficiency in my original
Set.pm design; all of the mutating functions should be in a seperate
role, really (or just not be mutators).

6.  Can I declare with named Set (or Junction) and Mapping typed 
variables and/or parameters that their members are restricted to 
particular types, such as Str, as I can with Arrays and Hashes, so 
that Perl itself will catch violations?  Eg, can I say as a parameter 
Set of Str :$heading? or Set of Mapping(Str) of Any :$body? so 
Perl will check that arguments are suchwise correct?
  


These are variously called Generics (ada I think, Java 1.5+),
Parametric Types, Higher Order Types (Pierce et al), Generic
Algebraic Data Types (Haskell)

In Perl 6 they are parametric roles (as in S12 mentioned above)

7.  Can we add some operators to Mapping that are like the Relation 
ones, so that implementing a Relation over Mappings is easier (or, 
see the end of #8)?  Eg, these would be useful: rename(), project(), 
extend(), join().  In particular, implementing join() into Mapping 
would help save CPU cycles:
  


Again, a reference to a prototype of the behaviour would be useful.

Sam.


Re: using the newer collection types

2006-05-04 Thread Darren Duncan

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

 Moreover, the Relation type has these

operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),

 ungroup(), wrap(), unwrap(), matching(), etc.

Is there a reference for the meaning of these methods?

 7.  Can we add some operators to Mapping that are like the Relation

ones, so that implementing a Relation over Mappings is easier (or,
see the end of #8)?  Eg, these would be useful: rename(), project(),
extend(), join().  In particular, implementing join() into Mapping
would help save CPU cycles:


Again, a reference to a prototype of the behaviour would be useful.


There are many written references to these methods; just type 
relational algebra into Google.


That said, some of those search results may not explain things in the 
same way, so I specifically prefer the definitions in Date and 
Darwen's book Databases, Types, and The Relational Model; 
equivalent definitions are probably on the 'net' but I'm not sure 
where.


I also defined part of the join() operator in my last email.  How 
that definition, for Tuples, extends to a Relation is that you pair 
every tuple in each relation being joined to every tuple in each of 
the others, and apply my earlier definition with each pairing; the 
output Relation contains a tuple where the earlier definition 
returned a tuple, and no tuple where it returned undef.


Alternately, if you want to wait a week, I will be coding up 
documented implementations as soon as possible within Pugs' 
ext/Relation/ dir.


But in order for me to do this, I was needing some answers about the 
nature of existing types like Set and Mapping and Junction etc, which 
I asked in this email.


Regarding your other comments:

I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general Tuple
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).


You may be on to something here, but I'll withold comments for now.

My main concerns with this whole Relation thing are that I want an 
efficient way in Perl 6 to represent what relation-land's concept of 
tuples and relations are, as well as an efficient implementations of 
relational algebra that are just as easy to use and fast in Perl 6 as 
are the language's other collection types such as Sets.


If Perl 6 is to be the choice language of the singularity, it needs 
to be easy and fast to do any common type of work with it, and 
relational algebra is an extremely common kind of work being done.


-- Darren Duncan


Re: using the newer collection types

2006-05-04 Thread Darren Duncan

Actually, I'll add a few more things to my reply, which should be helpful ...

At 5:11 PM -0700 5/4/06, Darren Duncan wrote:

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

 Moreover, the Relation type has these

operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),

 ungroup(), wrap(), unwrap(), matching(), etc.

Is there a reference for the meaning of these methods?


There are many written references to these methods; just type 
relational algebra into Google.


I will add that the first hit on such a search, the Wikipedia page on 
relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra 
), is a perfectly good primer on what relational algebra is and what 
its importance is.


The article's introduction says:

	Relational algebra, an offshoot of first-order logic, is a 
set of relations closed under operators. Operators operate on one or 
more relations to yield a relation. Relational algebra is a part of 
computer science.


	Relation algebra in pure mathematics is an algebraic 
structure, relevant to mathematical logic and set theory.


That article also explains many of the most important relational operators.

Note that there is a related set of operators comprising relational 
calculus, and you can do everything in one that you can in the other, 
though less or more verbosely as the case may be.


At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

I do think that a Pair should be a sub-type of a more general Tuple
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items  1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }


While this may not actually change anything, I should point out that 
every collection type can also be expressed in terms of a Relation 
definition and/or they can all be implemented over a Relation (whose 
members are actually always unique).  For example:


1. A Set of Any is a Relation with one Any attribute.
2. A Bag of N Any attributes is a Relation of N+1 attributes, where 
the extra attribute is an Int (constrained = 1) that counts 
occurrances of the distinct other attributes.

3. A Mapping can be a Relation of 2 Any attributes.
4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), 
where the key has a unique constraint.
5. A Seq is a Relation of 2 attributes, typed Int (= 0) and Any, 
where the first shows their ordinal position and the second is the 
actual value; the first has a unique constraint.
6. An Array is the same, assuming it is a sparse; if it is not 
sparse, there is an additional constraint that the greatest Int value 
is the same / one less than the count of Relation members.


Suffice it to say that I'm sure you would implement a bag using some 
other type, whether a relation or a hash or an array, where the 
member is stored once with an occurrance count.


-- Darren Duncan