subject:"using the newer collection types"

Re: using the newer collection types

2006-05-07 Thread Sam Vilain

Darren Duncan wrote:

>>Also, I don't agree with the notion of a "header" of each relation. It
>>has a type for each tuple item, sure, but "header" just sounds like the
>>sort of thing you want in a ResultSet, not a Relation.
>>Sam.
>>
>>
>A relation's heading is essentially the definition of the relation's 
>structure, and is not redundant if the relation has no tuples in it, 
>which is valid.
>  
>

The Relation has a type, which is a Relation of Tuples of something. The
"Header" you refer to is the higher order part of the Tuple, the
"something".

Sam.

Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam


Darren Duncan wrote:

At 2:17 AM +0200 5/7/06, mAsterdam wrote:


I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen & Lorentzos
lightly took the step of introducing 6NF in 2003.

Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.



mAsterdam, I think we should end this sub-thread as I see these points 
you are now bringing up are well beyond the scope of what the Perl 6 
language designers need to know or care about so I don't see a need to 
continue it on list.


Even for me personally, I don't think this is anything to worry about.


Ok.


But perhaps to explain why I think this ...

Ignoring 1NF, which relations are always in by definition (they contain 
no duplicate tuples/rows), but things like SQL tables or non-relation 
collections could possibly not be, the 2NF+ have nothing to do with the 
actual definitions of relations themselves or the ability to perform 
relational algebra, which is all that the Perl language and/or extension 
classes to it need to know about.


The 2nd and higher "normal forms" are just formal labels applied to 
certain best practices that one can follow when designing a relational 
database.  They are efforts to further reduce redundancy in the 


No. Common misconception - but indeed not relevant to p.p6.l

collection of relations making up a relational database.  Best left to 
the users to make decisions about rather than the language designers.


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.



I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

... you, like I, want temporal and spacial data as simple
and generic as possible.



You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };


How to go about about having cyclic intervals over
(weekday) points e.g. Fri..Mon ?

If you want to discuss this off-list - the email-address
simply works :-) (huge amounts of spam though :-( )

Re: using the newer collection types - Interval

2006-05-06 Thread Larry Wall

On Sat, May 06, 2006 at 06:15:34PM -0700, Darren Duncan wrote:
: Er, you should read 'Real' as 'Num' (I originally meant Rational, 
: which no longer exists in the newest S06);

Rational still exists in S02--we just don't automatically promote
anything to it currently.  (A pragma could change that default in
a particular lexical scope, of course.)  The main problem with
Rationals is that they tend to waste a lot of bits maintaining
precision that is of little or no use in the real world, and they
can't easily be sized in advance of the calculation if you're planning
to store them in a particular spot.  Floaters win on both counts.

Larry

Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan


At 6:06 PM -0700 5/6/06, Darren Duncan wrote:

You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };


Er, you should read 'Real' as 'Num' (I originally meant Rational, 
which no longer exists in the newest S06); I meant to say:


class Point { has Num $x; has Num $y; };

subset Interval of Range where } all( .items ).does(Num) };

-- Darren Duncan

Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan


At 2:17 AM +0200 5/7/06, mAsterdam wrote:

I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen & Lorentzos
lightly took the step of introducing 6NF in 2003.

Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.


mAsterdam, I think we should end this sub-thread as I see these 
points you are now bringing up are well beyond the scope of what the 
Perl 6 language designers need to know or care about so I don't see a 
need to continue it on list.


Even for me personally, I don't think this is anything to worry about.

But perhaps to explain why I think this ...

Ignoring 1NF, which relations are always in by definition (they 
contain no duplicate tuples/rows), but things like SQL tables or 
non-relation collections could possibly not be, the 2NF+ have nothing 
to do with the actual definitions of relations themselves or the 
ability to perform relational algebra, which is all that the Perl 
language and/or extension classes to it need to know about.


The 2nd and higher "normal forms" are just formal labels applied to 
certain best practices that one can follow when designing a 
relational database.  They are efforts to further reduce redundancy 
in the collection of relations making up a relational database.  Best 
left to the users to make decisions about rather than the language 
designers.


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily 
be defined using Perl's standard class definition abilities if it 
isn't pre-defined.


I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

... you, like I, want temporal and spacial data as simple
and generic as possible.


You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };

-- Darren Duncan

Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam


Darren Duncan wrote:

At 12:45 AM +0200 5/7/06, mAsterdam wrote:


Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent 
a mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are 
looking for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.


It should work just fine.

Keep in mind that your concern about relations is orthogonal to the 
concern about intervals or temporal data.


I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen & Lorentzos
lightly took the step of introducing 6NF in 2003.

An attribute of a relation can be any arbitrary data type at all 
(including another relation).


Aside, about RVA (relation valued attibutes): I read at 
comp.database.theory that Hugh Darwen has introduced

gu(group/ungroup)NF a month ago.

So the only real concern here is whether there is a data type that can 
represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.


I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

Note that I am of the opinion that Perl probably should not have 
built-in data types that are specific to temporal or spacial data; it is 
better for these to be extensions (like "DateTime" etc) defined over 
built-ins like numbers or ranges or collections.  Temporal or spacial 
data in common use today is just too complicated and non-generic, I think.


(Whereas, the existing built-ins, and relations, are very generic and 
simple.)


... you, like I, want temporal and spacial data as simple
and generic as possible.

Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan


At 12:45 AM +0200 5/7/06, mAsterdam wrote:

Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to 
represent a mathematical interval, but I was mistaken about Range 
being more constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.


It should work just fine.

Keep in mind that your concern about relations is orthogonal to the 
concern about intervals or temporal data.


An attribute of a relation can be any arbitrary data type at all 
(including another relation).


So the only real concern here is whether there is a data type that 
can represent a single piece of temporal data.  But one can easily be 
defined using Perl's standard class definition abilities if it isn't 
pre-defined.


Note that I am of the opinion that Perl probably should not have 
built-in data types that are specific to temporal or spacial data; it 
is better for these to be extensions (like "DateTime" etc) defined 
over built-ins like numbers or ranges or collections.  Temporal or 
spacial data in common use today is just too complicated and 
non-generic, I think.


(Whereas, the existing built-ins, and relations, are very generic and simple.)

-- Darren Duncan

Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam


Darren Duncan wrote:

At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply):


No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 <= $x < 3.4

The main point of context is to avoid an explosion of types.



At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply):

A range listifies to a (potentially) finite list of discrete elements, 
but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's 
false is a

bug, not a feature.



Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent a 
mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking 
for.


I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.

Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam


Darren Duncan wrote:

mAsterdam wrote:


Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?


Can we first clarify that this describes what you are referring to:

http://en.wikipedia.org/wiki/Interval_(mathematics)


Mathematical concepts never seem to completely
survive implementation in computing, but basically yes,
that was what I was thinking about.


 From what I've read there, I don't see an existing equivalent Perl 6 type.

Some people may confuse it with a Range, but I don't think so since a 
Range progresses in discrete increments, while an Interval would be 
continuous.


Moreover, the existing Set wouldn't overlap it because Set deals with a 
finite collection of discrete members.  (Similarly, a Relation is 
explicitly discrete and finite by definition.)


I assume you aren't referring to the temporal specific thing called an 
interval.


Date, Darwen and Lorentzos (Temporal data and the relational
model, 2003) build their temporal reasoning on top of
point types and interval types - they require them
allright - but I don't see how interval types (and point
types) require temporal stuff, i.o.w. I don't really
see how they are temporal specific (unless you are
thinking of SQL intervals).

Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan


At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply):

No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 <= $x < 3.4

The main point of context is to avoid an explosion of types.


At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply):

A range listifies to a (potentially) finite list of discrete elements, but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's false is a
bug, not a feature.


Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent 
a mathematical interval, but I was mistaken about Range being more 
constrained than it actually is.


So, there you go mAsterdam; Range is indeed the interval you are looking for.

-- Darren Duncan

Re: using the newer collection types - Interval

2006-05-06 Thread James Mastros

On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
> Some people may confuse it with a Range, but I don't think so since a 
> Range progresses in discrete increments, while an Interval would be 
> continuous.
A range listifies to a (potentially) finite list of discrete elements, but
it compares as a range.  1.1 should ~~ 1..2; pugs thinking that's false is a
bug, not a feature.

Of course, that doesn't mean implementing range in a subset of perl6 without
it isn't interesting, and possibly useful for bootstrapping.

   -=- James Mastros

Re: using the newer collection types - Interval

2006-05-06 Thread Larry Wall

On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
: Some people may confuse it with a Range, but I don't think so since a 
: Range progresses in discrete increments, while an Interval would be 
: continuous.

No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context.  But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 <= $x < 3.4

The main point of context is to avoid an explosion of types.

Larry

Re: using the newer collection types - Interval

2006-05-06 Thread Darren Duncan


At 3:06 PM +0200 5/6/06, mAsterdam wrote:

Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?


Can we first clarify that this describes what you are referring to:

http://en.wikipedia.org/wiki/Interval_(mathematics)

From what I've read there, I don't see an existing equivalent Perl 6 type.

Some people may confuse it with a Range, but I don't think so since a 
Range progresses in discrete increments, while an Interval would be 
continuous.


Moreover, the existing Set wouldn't overlap it because Set deals with 
a finite collection of discrete members.  (Similarly, a Relation is 
explicitly discrete and finite by definition.)


I assume you aren't referring to the temporal specific thing called 
an interval.


-- Darren Duncan

Re: using the newer collection types - Interval

2006-05-06 Thread mAsterdam


Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?

relations as roles (was: using the newer collection types)

2006-05-05 Thread Darren Duncan


First of all, Sam Vilain, thank you for your responses.

Giving these issues more thought, I'm am now leaning towards the idea 
that the best way to provide relational algebra in Perl 6 is that the 
relation-land Tuple and Relation each be a Role which various other 
classes can provide to their users.


For one thing, this lets people use more types of data collections in 
the forms where they currently sit, rather than requiring them to 
first convert the data into separate Tuple or Relation objects, in 
order to apply relational algebra to them.  And I suppose that's "the 
Perl way".  Having things just work the way people intuit that they 
ought to, given conceptually similar examples.


Moreover, some implementations of the Tuple and Relation roles can go 
beyond the basics and provide features often associated with those 
roles but that aren't part of the core algebra (such as various kinds 
of type based or whole-set constraints, or various tied means of 
persistence), while other implementations of those roles don't have 
to.


Moreover, one key distinction that some implementations can have from 
others is to whether they are mutable.  Some implementations could be 
mutable and others not.  All of the generic relational algebra 
operators are not mutators, so they will work equally well with 
immutable or mutable implementations.  That said, for this to work 
properly in all cases, Perl 6 will need to provide a way (and it 
probably does) for a routine to examine the signature of an anonymous 
routine passed in as an argument, and make sure that its parameters 
are all read-only; some generic relational operators, such as 
restrict() and extend(), would take routines as arguments, like 
grep() and map() do, and we don't want them to try mutating their 
arguments.


As an aside, you'll recall that I said any other type could be 
implemented over relations, such as sets or arrays etc.  However, 
every attribute ("column") of a relation has a name, but things like 
sets or arrays conceptually don't need them and so having sets or 
arrays or hashes etc implement the Relation role could be difficult, 
one might think.  But I say there is no problem here.  We can simply 
reuse the names of "methods" we already have on those types.  For 
example, a Hash object seen through a Relation role would have 2 
attributes named "keys" and "values".  With sets, its just "members" 
or "values" or whatever.  Though I suppose the names may need more 
creativity with a shaped hash or array, where each dimension should 
be referrable to by a distinct name (a dimension index number if 
necessary), unless we cop out and make "keys" for a shaped hash or 
array to be of type Seq, where the one value encompasses all 
dimensions; such an approach is perfectly valid for a Relation role, 
but it may or may not be less useful in practice.


Getting back to the main topic, if we want to implement the Tuple or 
Relation role in a way that each attribute ("column") of such is 
specifically typed (because that tends to be the most common or 
useful practice), it probably would work best to employ a class 
factory (implementing said roles) where each distinct heading of a 
Tuple or Relation corresponds to a distinct actual class definition, 
where the relation-land attributes correspond 1:1 with actual class 
attributes, and all the features for addressing, manipulating, and 
defining said attributes with types can be reused.  In this 
situation, nearly every generic relational algebra operator would 
necessarily be a factory for new classes; and there needs to be a way 
to examine existing classes in enough detail to know what attributes 
they have and/or clone parts of them into new ones (as I assume 
Perl's various .meta provide).  This approach is probably the most 
powerful of all as an implementation.


This email is by no means exhaustive, but feedback is appreciated.

And keep in mind that the most important thing I'm suggesting here is 
both that Tuple and Relation are roles, and that as many built-ins as 
reasonable "do" (appropriately restricted versions of) them; if 
people have disagreements, its mainly with that sub-suggestion that I 
want to know about them.


Thank you. -- Darren Duncan

Re: using the newer collection types

2006-05-05 Thread Darren Duncan


At 8:01 PM +1200 5/5/06, Sam Vilain wrote:

Also, I don't agree with the notion of a "header" of each relation. It
has a type for each tuple item, sure, but "header" just sounds like the
sort of thing you want in a ResultSet, not a Relation.
Sam.


A relation's heading is essentially the definition of the relation's 
structure, and is not redundant if the relation has no tuples in it, 
which is valid.


A comment I back-logged on #perl6 yesterday suggested that what I was 
proposing with relations is sort of a retread of what classes are, 
where each is defined in terms of zero or more named and typed 
attributes.


I just want to weigh in that I see that as partly true (the same 
could also be said that it retreads what C structs are), and 
therefore perhaps some language constructs related to class 
definitions could be reused with relation definitions.


Perhaps "Relation" in Perl 6 would best be a role and/or meta-class 
with factory methods that produce other classes with common interface 
elements, such as all the relational algebra methods, and/or objects 
each of which has its own class.  Under this system, each time you 
perform an operation like a join, you are potentially making a new 
class, because its attributes are likely different than its 
predecessors.


Frankly, with Perl 6 being what it is, there are probably numerous 
ways to implement what I'm looking for, and I don't exactly know what 
would be best.


For now I'll just follow the simple path with a single Relation class 
in Pugs' ext/Relation/ directory for demonstration purposes and 
evaluate changes later.


-- Darren Duncan

Re: using the newer collection types

2006-05-05 Thread Sam Vilain

Darren Duncan wrote:

>>>Is there a reference for the meaning of these methods?
>>>  
>>>
>>There are many written references to these methods; just type 
>>"relational algebra" into Google.
>>
>>
>
>I will add that the first hit on such a search, the Wikipedia page on 
>relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra 
>), is a perfectly good primer on what relational algebra is and what 
>its importance is.
>  
>

Thanks for the pointer.

>While this may not actually change anything, I should point out that 
>every collection type can also be expressed in terms of a Relation 
>definition and/or they can all be implemented over a Relation (whose 
>members are actually always unique).  For example:
>
>1. A Set of Any is a Relation with one Any attribute.
>2. A Bag of N Any attributes is a Relation of N+1 attributes, where 
>the extra attribute is an Int (constrained >= 1) that counts 
>occurrances of the distinct other attributes.
>3. A Mapping can be a Relation of 2 Any attributes.
>4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), 
>where the key has a unique constraint.
>5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any, 
>where the first shows their ordinal position and the second is the 
>actual value; the first has a unique constraint.
>6. An Array is the same, assuming it is a sparse; if it is not 
>sparse, there is an additional constraint that the greatest Int value 
>is the same / one less than the count of Relation members.
>  
>

I don't know if anyone will care, but you can't emulate the raw
"Collection" type with this fixed Relation type. That is, a collection
of tuples, each of which may be of differing length and type.

This is what leads me to think that Collection is the more generic role.
I'm not saying Relations are not useful, perhaps they are more useful
than Collections in the practical case, but they are a sub-type.

Also, I don't agree with the notion of a "header" of each relation. It
has a type for each tuple item, sure, but "header" just sounds like the
sort of thing you want in a ResultSet, not a Relation.

Sam.

Re: using the newer collection types

2006-05-04 Thread Darren Duncan

Actually, I'll add a few more things to my reply, which should be helpful ...

At 5:11 PM -0700 5/4/06, Darren Duncan wrote:

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

 >Moreover, the Relation type has these

operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),

 >ungroup(), wrap(), unwrap(), matching(), etc.

Is there a reference for the meaning of these methods?

There are many written references to these methods; just type 
"relational algebra" into Google.

I will add that the first hit on such a search, the Wikipedia page on 
relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra 
), is a perfectly good primer on what relational algebra is and what 
its importance is.

The article's introduction says:

	Relational algebra, an offshoot of first-order logic, is a 
set of relations closed under operators. Operators operate on one or 
more relations to yield a relation. Relational algebra is a part of 
computer science.

	Relation algebra in pure mathematics is an algebraic 
structure, relevant to mathematical logic and set theory.

That article also explains many of the most important relational operators.

Note that there is a related set of operators comprising relational 
calculus, and you can do everything in one that you can in the other, 
though less or more verbosely as the case may be.

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

I do think that a Pair should be a sub-type of a more general "Tuple"
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items > 1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

While this may not actually change anything, I should point out that 
every collection type can also be expressed in terms of a Relation 
definition and/or they can all be implemented over a Relation (whose 
members are actually always unique).  For example:

1. A Set of Any is a Relation with one Any attribute.
2. A Bag of N Any attributes is a Relation of N+1 attributes, where 
the extra attribute is an Int (constrained >= 1) that counts 
occurrances of the distinct other attributes.

3. A Mapping can be a Relation of 2 Any attributes.
4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), 
where the key has a unique constraint.
5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any, 
where the first shows their ordinal position and the second is the 
actual value; the first has a unique constraint.
6. An Array is the same, assuming it is a sparse; if it is not 
sparse, there is an additional constraint that the greatest Int value 
is the same / one less than the count of Relation members.

Suffice it to say that I'm sure you would implement a bag using some 
other type, whether a relation or a hash or an array, where the 
member is stored once with an occurrance count.

-- Darren Duncan

Re: using the newer collection types

2006-05-04 Thread Darren Duncan

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:

 >Moreover, the Relation type has these

operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),

 >ungroup(), wrap(), unwrap(), matching(), etc.

Is there a reference for the meaning of these methods?

 >7.  Can we add some operators to Mapping that are like the Relation

ones, so that implementing a Relation over Mappings is easier (or,
see the end of #8)?  Eg, these would be useful: rename(), project(),
extend(), join().  In particular, implementing join() into Mapping
would help save CPU cycles:

Again, a reference to a prototype of the behaviour would be useful.

There are many written references to these methods; just type 
"relational algebra" into Google.

That said, some of those search results may not explain things in the 
same way, so I specifically prefer the definitions in Date and 
Darwen's book "Databases, Types, and The Relational Model"; 
equivalent definitions are probably on the 'net' but I'm not sure 
where.

I also defined part of the join() operator in my last email.  How 
that definition, for Tuples, extends to a Relation is that you pair 
every tuple in each relation being joined to every tuple in each of 
the others, and apply my earlier definition with each pairing; the 
output Relation contains a tuple where the earlier definition 
returned a tuple, and no tuple where it returned undef.

Alternately, if you want to wait a week, I will be coding up 
documented implementations as soon as possible within Pugs' 
ext/Relation/ dir.

But in order for me to do this, I was needing some answers about the 
nature of existing types like Set and Mapping and Junction etc, which 
I asked in this email.

Regarding your other comments:

I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general "Tuple"
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

You may be on to something here, but I'll withold comments for now.

My main concerns with this whole Relation thing are that I want an 
efficient way in Perl 6 to represent what relation-land's concept of 
tuples and relations are, as well as an efficient implementations of 
relational algebra that are just as easy to use and fast in Perl 6 as 
are the language's other collection types such as Sets.

If Perl 6 is to be the choice language of the singularity, it needs 
to be easy and fast to do any common type of work with it, and 
relational algebra is an extremely common kind of work being done.

-- Darren Duncan

Re: using the newer collection types

2006-05-04 Thread Sam Vilain

Darren Duncan wrote:

>Speaking a little more technically, a Relation has 2 main components, 
>its heading and its body.  The heading is a set of 0..N keys (called 
>"attributes" in relation-land), and the body is a set of 0..N 
>Mappings (called "tuples" in relation-land), where they set of keys 
>of each Mapping is identical to the Relation's heading.  Its very 
>likely that a language-embedded Relation implementation would 
>actually not repeat the keys for each member Mapping, but we can 
>conceptualize as if they were present for simplicity.
>  
>

I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general "Tuple"
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items > 1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

The above should probably all be written in terms of parametric roles
(see S12), but for now the above run-time checking versions should
hopefully express the relationships between the core collection-like
types as I see them.

That sounds like it might bite, but you wouldn't normally access an
Array as a Collection of (Int, Any), you'd access it as an Array, so you
get the nice .post_circumfix:<[ ]> method that makes array access easy.
You don't care that it has this higher order type as a parent class, and
you certainly wouldn't care for the 'bare' Collection interface (as for
one, you don't want to have to deal with the Integer keys). And it is
probably all backed by native methods.

I'm prototyping much of this using Moose in Perl 5, however Hubris is
delaying its release :-)

>Moreover, the Relation type has these 
>operators that the Set type doesn't have: rename(), project(), 
>restrict(), extend(), join(), divide(), summarize(), group(), 
>ungroup(), wrap(), unwrap(), matching(), etc.
>

Is there a reference for the meaning of these methods?

>1.  Are Sets or Junctions allowed to contain undefined elements?  Can 
>undef be a key of a Mapping or Hash?
>  
>

undef.isa(Object), so you should be able to use it as you would any
other object. I would definitely not think of it as the absence of a
value in this context.

>2.  What actually is the practical distinction between a Set and a 
>Junction?  Why would someone use one over the other?  I recognize 
>that the use of Junctions is supposed to make parallelism easier, as 
>iterating through one is known to be order independent.  But, 
>conceptually a Set and a Relation are exactly the same;  you could 
>process their members in any order and/or in parallel as well.  So is 
>the use of a Junction effectively like a compiler flag to make 
>certain kinds of Set ops faster at the expense of others?
>  
>

Well one side effect at the moment is that Junctions are immutable,
whilst Sets are mutable. This is perhaps a deficiency in my original
Set.pm design; all of the mutating functions should be in a seperate
role, really (or just not be mutators).

>6.  Can I declare with named Set (or Junction) and Mapping typed 
>variables and/or parameters that their members are restricted to 
>particular types, such as Str, as I can with Arrays and Hashes, so 
>that Perl itself will catch violations?  Eg, can I say as a parameter 
>"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so 
>Perl will check that arguments are suchwise correct?
>  
>

These are variously called "Generics" (ada I think, Java 1.5+),
"Parametric Types", "Higher Order Types" (Pierce et al), "Generic
Algebraic Data Types" (Haskell)

In Perl 6 they are parametric roles (as in S12 mentioned above)

>7.  Can we add some operators to Mapping that are like the Relation 
>ones, so that implementing a Relation over Mappings is easier (or, 
>see the end of #8)?  Eg, these would be useful: rename(), project(), 
>extend(), join().  In particular, implementing join() into Mapping 
>would help save CPU cycles:
>  
>

Again, a reference to a prototype of the behaviour would be useful.

Sam.

using the newer collection types

2006-05-04 Thread Darren Duncan

As I carry on in my spare time to implement a Relation type for Perl 
6, I would like to use some of the simpler types that were added to 
the Synopsis recently and seem to lack a lot of explanatory details 
that older types have, and moreover they don't seem to be implemented 
yet in Pugs.


So I have a few questions whose answers should clarify the intended 
meaning and features of these newer types, as well as the syntax for 
declaring them.


Some relevant example types from Synopsis 6:

  Immutable types

  Objects with these types behave like values, i.e. C<$x === $y> is true
  if and only if their types and contents are identical.

ListLazy Perl list (composed of Seq and Range parts)
Seq Completely evaluated (hence immutable) sequence
Range   Incrementally generated (hence lazy) sequence
Set Unordered Seqs that allow no duplicates
JunctionSets with additional behaviours
PairSeq of two elements that serves as an one-element Mapping
Mapping Pairs with no duplicate keys
Signature   Function parameters (left-hand side of a binding)
Capture Function call arguments (right-hand side of a binding)

  Mutable types

  Objects with these types have distinct C<.id> values.

Array   Perl array
HashPerl hash

The intended new Relation type could be described like this, if I 
correctly understand the meaning of the existing types:


  Immutable types

RelationSet of Mappings where all Mappings have the same keys

Speaking a little more technically, a Relation has 2 main components, 
its heading and its body.  The heading is a set of 0..N keys (called 
"attributes" in relation-land), and the body is a set of 0..N 
Mappings (called "tuples" in relation-land), where they set of keys 
of each Mapping is identical to the Relation's heading.  Its very 
likely that a language-embedded Relation implementation would 
actually not repeat the keys for each member Mapping, but we can 
conceptualize as if they were present for simplicity.


The operations that you can do with a Relation are a proper super-set 
of those you can do with a Set.  So, the Relation type supports all 
the same Set operators, with the same meanings, such as: equal(), 
subset(), superset(), union(), intersection(), difference(), 
symmetric_difference(), none(), any(), all(), member_exists(), 
members(), member_count().  Moreover, the Relation type has these 
operators that the Set type doesn't have: rename(), project(), 
restrict(), extend(), join(), divide(), summarize(), group(), 
ungroup(), wrap(), unwrap(), matching(), etc.  Moreover, there would 
probably be convenience wrapper functions over combinations of the 
above operators such as insert(), update(), delete(), etc, though 
they aren't essential (those examples are not mutators, despite their 
name-sakes).  Some extra operators like sort() would also be 
provided, which convert Relations to Seqs or Arrays.


Now, some of the questions:

1.  Are Sets or Junctions allowed to contain undefined elements?  Can 
undef be a key of a Mapping or Hash?


2.  What actually is the practical distinction between a Set and a 
Junction?  Why would someone use one over the other?  I recognize 
that the use of Junctions is supposed to make parallelism easier, as 
iterating through one is known to be order independent.  But, 
conceptually a Set and a Relation are exactly the same;  you could 
process their members in any order and/or in parallel as well.  So is 
the use of a Junction effectively like a compiler flag to make 
certain kinds of Set ops faster at the expense of others?


3.  Is a Signature like the keys of a Mapping but that it has extra 
stuff like associated types and such?  Can one declare and use a 
Signature separately from declaring a function?


4.  What is the syntax for declaring anonymous Sets and Mappings?  I 
am already aware of these syntax for other types (correct me if I'm 
wrong):


  $a = [1,2,3]; # Array
  $b = {'x'=>2,'y'=>4}; # Hash
  $c = (1=>2);  # Pair
  $d = (1,2,3); # Seq
  $e = 1..5;# Range
  $f = all(1,2,3);  # Junction

If this hasn't yet been decided, might I suggest the following?:

  $g = set(1,2,3);  # Set
  $h = ('x'=>2,'y'=>4); # Mapping

If that works, then perhaps an anonymous Relation declartion could look like:

  $r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

I'm not particular with the exact syntax; it just needs to be something good.

Note that a terse form of this could leave out the heading 
declaration if at least one Mapping/tuple is provided, since that 
contains the same key list.


  $r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

Then the heading declaration is only needed if the Relation has no 
Mappings/tuples.


  $r = relation( set( 'x', 'y' ): );

5.  What is the syntax for subscripting or extracting Mapping 
components?  Eg, can we use the same .keys, .values, .pai

Re: using the newer collection types

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

Re: using the newer collection types - Interval

relations as roles (was: using the newer collection types)

Re: using the newer collection types

Re: using the newer collection types

Re: using the newer collection types

Re: using the newer collection types

Re: using the newer collection types

using the newer collection types

21 matches

Site Navigation

Mail list logo

Footer information