Re: using the newer collection types
Darren Duncan wrote: >>Also, I don't agree with the notion of a "header" of each relation. It >>has a type for each tuple item, sure, but "header" just sounds like the >>sort of thing you want in a ResultSet, not a Relation. >>Sam. >> >> >A relation's heading is essentially the definition of the relation's >structure, and is not redundant if the relation has no tuples in it, >which is valid. > > The Relation has a type, which is a Relation of Tuples of something. The "Header" you refer to is the higher order part of the Tuple, the "something". Sam.
Re: using the newer collection types - Interval
Darren Duncan wrote: At 2:17 AM +0200 5/7/06, mAsterdam wrote: I hope (and think) you are right about that regarding implementing relations. Using them correctly is another story though. I don't think Date, Darwen & Lorentzos lightly took the step of introducing 6NF in 2003. Aside, about RVA (relation valued attibutes): I read at comp.database.theory that Hugh Darwen has introduced gu(group/ungroup)NF a month ago. mAsterdam, I think we should end this sub-thread as I see these points you are now bringing up are well beyond the scope of what the Perl 6 language designers need to know or care about so I don't see a need to continue it on list. Even for me personally, I don't think this is anything to worry about. Ok. But perhaps to explain why I think this ... Ignoring 1NF, which relations are always in by definition (they contain no duplicate tuples/rows), but things like SQL tables or non-relation collections could possibly not be, the 2NF+ have nothing to do with the actual definitions of relations themselves or the ability to perform relational algebra, which is all that the Perl language and/or extension classes to it need to know about. The 2nd and higher "normal forms" are just formal labels applied to certain best practices that one can follow when designing a relational database. They are efforts to further reduce redundancy in the No. Common misconception - but indeed not relevant to p.p6.l collection of relations making up a relational database. Best left to the users to make decisions about rather than the language designers. So the only real concern here is whether there is a data type that can represent a single piece of temporal data. But one can easily be defined using Perl's standard class definition abilities if it isn't pre-defined. I really don't yet see how to define point types and interval (range) types based on those. I think you (we) /will/ need them (*not* neccesarily Perl 6 built-in) if ... ... you, like I, want temporal and spacial data as simple and generic as possible. You can do it simply, kind of like this: class Point { has Real $x; has Real $y; }; subset Interval of Range where { all( .items ).does(Real) }; How to go about about having cyclic intervals over (weekday) points e.g. Fri..Mon ? If you want to discuss this off-list - the email-address simply works :-) (huge amounts of spam though :-( )
Re: using the newer collection types - Interval
On Sat, May 06, 2006 at 06:15:34PM -0700, Darren Duncan wrote: : Er, you should read 'Real' as 'Num' (I originally meant Rational, : which no longer exists in the newest S06); Rational still exists in S02--we just don't automatically promote anything to it currently. (A pragma could change that default in a particular lexical scope, of course.) The main problem with Rationals is that they tend to waste a lot of bits maintaining precision that is of little or no use in the real world, and they can't easily be sized in advance of the calculation if you're planning to store them in a particular spot. Floaters win on both counts. Larry
Re: using the newer collection types - Interval
At 6:06 PM -0700 5/6/06, Darren Duncan wrote: You can do it simply, kind of like this: class Point { has Real $x; has Real $y; }; subset Interval of Range where { all( .items ).does(Real) }; Er, you should read 'Real' as 'Num' (I originally meant Rational, which no longer exists in the newest S06); I meant to say: class Point { has Num $x; has Num $y; }; subset Interval of Range where } all( .items ).does(Num) }; -- Darren Duncan
Re: using the newer collection types - Interval
At 2:17 AM +0200 5/7/06, mAsterdam wrote: I hope (and think) you are right about that regarding implementing relations. Using them correctly is another story though. I don't think Date, Darwen & Lorentzos lightly took the step of introducing 6NF in 2003. Aside, about RVA (relation valued attibutes): I read at comp.database.theory that Hugh Darwen has introduced gu(group/ungroup)NF a month ago. mAsterdam, I think we should end this sub-thread as I see these points you are now bringing up are well beyond the scope of what the Perl 6 language designers need to know or care about so I don't see a need to continue it on list. Even for me personally, I don't think this is anything to worry about. But perhaps to explain why I think this ... Ignoring 1NF, which relations are always in by definition (they contain no duplicate tuples/rows), but things like SQL tables or non-relation collections could possibly not be, the 2NF+ have nothing to do with the actual definitions of relations themselves or the ability to perform relational algebra, which is all that the Perl language and/or extension classes to it need to know about. The 2nd and higher "normal forms" are just formal labels applied to certain best practices that one can follow when designing a relational database. They are efforts to further reduce redundancy in the collection of relations making up a relational database. Best left to the users to make decisions about rather than the language designers. So the only real concern here is whether there is a data type that can represent a single piece of temporal data. But one can easily be defined using Perl's standard class definition abilities if it isn't pre-defined. I really don't yet see how to define point types and interval (range) types based on those. I think you (we) /will/ need them (*not* neccesarily Perl 6 built-in) if ... ... you, like I, want temporal and spacial data as simple and generic as possible. You can do it simply, kind of like this: class Point { has Real $x; has Real $y; }; subset Interval of Range where { all( .items ).does(Real) }; -- Darren Duncan
Re: using the newer collection types - Interval
Darren Duncan wrote: At 12:45 AM +0200 5/7/06, mAsterdam wrote: Okay, thank you both for clarifying this. Conceptually in my mind, a Range is entirely appropriate to represent a mathematical interval, but I was mistaken about Range being more constrained than it actually is. So, there you go mAsterdam; Range is indeed the interval you are looking for. I hope it is also the appropriate type for implementing relations with temporal attributes. Thank you all for your prompt discussion. It should work just fine. Keep in mind that your concern about relations is orthogonal to the concern about intervals or temporal data. I hope (and think) you are right about that regarding implementing relations. Using them correctly is another story though. I don't think Date, Darwen & Lorentzos lightly took the step of introducing 6NF in 2003. An attribute of a relation can be any arbitrary data type at all (including another relation). Aside, about RVA (relation valued attibutes): I read at comp.database.theory that Hugh Darwen has introduced gu(group/ungroup)NF a month ago. So the only real concern here is whether there is a data type that can represent a single piece of temporal data. But one can easily be defined using Perl's standard class definition abilities if it isn't pre-defined. I really don't yet see how to define point types and interval (range) types based on those. I think you (we) /will/ need them (*not* neccesarily Perl 6 built-in) if ... Note that I am of the opinion that Perl probably should not have built-in data types that are specific to temporal or spacial data; it is better for these to be extensions (like "DateTime" etc) defined over built-ins like numbers or ranges or collections. Temporal or spacial data in common use today is just too complicated and non-generic, I think. (Whereas, the existing built-ins, and relations, are very generic and simple.) ... you, like I, want temporal and spacial data as simple and generic as possible.
Re: using the newer collection types - Interval
At 12:45 AM +0200 5/7/06, mAsterdam wrote: Okay, thank you both for clarifying this. Conceptually in my mind, a Range is entirely appropriate to represent a mathematical interval, but I was mistaken about Range being more constrained than it actually is. So, there you go mAsterdam; Range is indeed the interval you are looking for. I hope it is also the appropriate type for implementing relations with temporal attributes. Thank you all for your prompt discussion. It should work just fine. Keep in mind that your concern about relations is orthogonal to the concern about intervals or temporal data. An attribute of a relation can be any arbitrary data type at all (including another relation). So the only real concern here is whether there is a data type that can represent a single piece of temporal data. But one can easily be defined using Perl's standard class definition abilities if it isn't pre-defined. Note that I am of the opinion that Perl probably should not have built-in data types that are specific to temporal or spacial data; it is better for these to be extensions (like "DateTime" etc) defined over built-ins like numbers or ranges or collections. Temporal or spacial data in common use today is just too complicated and non-generic, I think. (Whereas, the existing built-ins, and relations, are very generic and simple.) -- Darren Duncan
Re: using the newer collection types - Interval
Darren Duncan wrote: At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply): No, Range objects in Perl 6 are defined to be intervals unless used in a context that implies discrete increments, such as counting in list context. But if you say $x ~~ 1.2 ..^ 3.4 it is exactly equivalent to 1.2 <= $x < 3.4 The main point of context is to avoid an explosion of types. At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply): A range listifies to a (potentially) finite list of discrete elements, but it compares as a range. 1.1 should ~~ 1..2; pugs thinking that's false is a bug, not a feature. Okay, thank you both for clarifying this. Conceptually in my mind, a Range is entirely appropriate to represent a mathematical interval, but I was mistaken about Range being more constrained than it actually is. So, there you go mAsterdam; Range is indeed the interval you are looking for. I hope it is also the appropriate type for implementing relations with temporal attributes. Thank you all for your prompt discussion.
Re: using the newer collection types - Interval
Darren Duncan wrote: mAsterdam wrote: Prompted by Darren Duncan's proposal on Relation type objects I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html and wondered how Interval type objects would fit in. I couldn't imagine how. Now that isn't a surprise (not for lack of imagination but for lack of perl6 knowledge). I tried to find anything relevant but didn't succeed. Could somebody provide some clues, please? Can we first clarify that this describes what you are referring to: http://en.wikipedia.org/wiki/Interval_(mathematics) Mathematical concepts never seem to completely survive implementation in computing, but basically yes, that was what I was thinking about. From what I've read there, I don't see an existing equivalent Perl 6 type. Some people may confuse it with a Range, but I don't think so since a Range progresses in discrete increments, while an Interval would be continuous. Moreover, the existing Set wouldn't overlap it because Set deals with a finite collection of discrete members. (Similarly, a Relation is explicitly discrete and finite by definition.) I assume you aren't referring to the temporal specific thing called an interval. Date, Darwen and Lorentzos (Temporal data and the relational model, 2003) build their temporal reasoning on top of point types and interval types - they require them allright - but I don't see how interval types (and point types) require temporal stuff, i.o.w. I don't really see how they are temporal specific (unless you are thinking of SQL intervals).
Re: using the newer collection types - Interval
At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply): No, Range objects in Perl 6 are defined to be intervals unless used in a context that implies discrete increments, such as counting in list context. But if you say $x ~~ 1.2 ..^ 3.4 it is exactly equivalent to 1.2 <= $x < 3.4 The main point of context is to avoid an explosion of types. At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply): A range listifies to a (potentially) finite list of discrete elements, but it compares as a range. 1.1 should ~~ 1..2; pugs thinking that's false is a bug, not a feature. Okay, thank you both for clarifying this. Conceptually in my mind, a Range is entirely appropriate to represent a mathematical interval, but I was mistaken about Range being more constrained than it actually is. So, there you go mAsterdam; Range is indeed the interval you are looking for. -- Darren Duncan
Re: using the newer collection types - Interval
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote: > Some people may confuse it with a Range, but I don't think so since a > Range progresses in discrete increments, while an Interval would be > continuous. A range listifies to a (potentially) finite list of discrete elements, but it compares as a range. 1.1 should ~~ 1..2; pugs thinking that's false is a bug, not a feature. Of course, that doesn't mean implementing range in a subset of perl6 without it isn't interesting, and possibly useful for bootstrapping. -=- James Mastros
Re: using the newer collection types - Interval
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote: : Some people may confuse it with a Range, but I don't think so since a : Range progresses in discrete increments, while an Interval would be : continuous. No, Range objects in Perl 6 are defined to be intervals unless used in a context that implies discrete increments, such as counting in list context. But if you say $x ~~ 1.2 ..^ 3.4 it is exactly equivalent to 1.2 <= $x < 3.4 The main point of context is to avoid an explosion of types. Larry
Re: using the newer collection types - Interval
At 3:06 PM +0200 5/6/06, mAsterdam wrote: Prompted by Darren Duncan's proposal on Relation type objects I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html and wondered how Interval type objects would fit in. I couldn't imagine how. Now that isn't a surprise (not for lack of imagination but for lack of perl6 knowledge). I tried to find anything relevant but didn't succeed. Could somebody provide some clues, please? Can we first clarify that this describes what you are referring to: http://en.wikipedia.org/wiki/Interval_(mathematics) From what I've read there, I don't see an existing equivalent Perl 6 type. Some people may confuse it with a Range, but I don't think so since a Range progresses in discrete increments, while an Interval would be continuous. Moreover, the existing Set wouldn't overlap it because Set deals with a finite collection of discrete members. (Similarly, a Relation is explicitly discrete and finite by definition.) I assume you aren't referring to the temporal specific thing called an interval. -- Darren Duncan
Re: using the newer collection types - Interval
Prompted by Darren Duncan's proposal on Relation type objects I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html and wondered how Interval type objects would fit in. I couldn't imagine how. Now that isn't a surprise (not for lack of imagination but for lack of perl6 knowledge). I tried to find anything relevant but didn't succeed. Could somebody provide some clues, please?
relations as roles (was: using the newer collection types)
First of all, Sam Vilain, thank you for your responses. Giving these issues more thought, I'm am now leaning towards the idea that the best way to provide relational algebra in Perl 6 is that the relation-land Tuple and Relation each be a Role which various other classes can provide to their users. For one thing, this lets people use more types of data collections in the forms where they currently sit, rather than requiring them to first convert the data into separate Tuple or Relation objects, in order to apply relational algebra to them. And I suppose that's "the Perl way". Having things just work the way people intuit that they ought to, given conceptually similar examples. Moreover, some implementations of the Tuple and Relation roles can go beyond the basics and provide features often associated with those roles but that aren't part of the core algebra (such as various kinds of type based or whole-set constraints, or various tied means of persistence), while other implementations of those roles don't have to. Moreover, one key distinction that some implementations can have from others is to whether they are mutable. Some implementations could be mutable and others not. All of the generic relational algebra operators are not mutators, so they will work equally well with immutable or mutable implementations. That said, for this to work properly in all cases, Perl 6 will need to provide a way (and it probably does) for a routine to examine the signature of an anonymous routine passed in as an argument, and make sure that its parameters are all read-only; some generic relational operators, such as restrict() and extend(), would take routines as arguments, like grep() and map() do, and we don't want them to try mutating their arguments. As an aside, you'll recall that I said any other type could be implemented over relations, such as sets or arrays etc. However, every attribute ("column") of a relation has a name, but things like sets or arrays conceptually don't need them and so having sets or arrays or hashes etc implement the Relation role could be difficult, one might think. But I say there is no problem here. We can simply reuse the names of "methods" we already have on those types. For example, a Hash object seen through a Relation role would have 2 attributes named "keys" and "values". With sets, its just "members" or "values" or whatever. Though I suppose the names may need more creativity with a shaped hash or array, where each dimension should be referrable to by a distinct name (a dimension index number if necessary), unless we cop out and make "keys" for a shaped hash or array to be of type Seq, where the one value encompasses all dimensions; such an approach is perfectly valid for a Relation role, but it may or may not be less useful in practice. Getting back to the main topic, if we want to implement the Tuple or Relation role in a way that each attribute ("column") of such is specifically typed (because that tends to be the most common or useful practice), it probably would work best to employ a class factory (implementing said roles) where each distinct heading of a Tuple or Relation corresponds to a distinct actual class definition, where the relation-land attributes correspond 1:1 with actual class attributes, and all the features for addressing, manipulating, and defining said attributes with types can be reused. In this situation, nearly every generic relational algebra operator would necessarily be a factory for new classes; and there needs to be a way to examine existing classes in enough detail to know what attributes they have and/or clone parts of them into new ones (as I assume Perl's various .meta provide). This approach is probably the most powerful of all as an implementation. This email is by no means exhaustive, but feedback is appreciated. And keep in mind that the most important thing I'm suggesting here is both that Tuple and Relation are roles, and that as many built-ins as reasonable "do" (appropriately restricted versions of) them; if people have disagreements, its mainly with that sub-suggestion that I want to know about them. Thank you. -- Darren Duncan
Re: using the newer collection types
At 8:01 PM +1200 5/5/06, Sam Vilain wrote: Also, I don't agree with the notion of a "header" of each relation. It has a type for each tuple item, sure, but "header" just sounds like the sort of thing you want in a ResultSet, not a Relation. Sam. A relation's heading is essentially the definition of the relation's structure, and is not redundant if the relation has no tuples in it, which is valid. A comment I back-logged on #perl6 yesterday suggested that what I was proposing with relations is sort of a retread of what classes are, where each is defined in terms of zero or more named and typed attributes. I just want to weigh in that I see that as partly true (the same could also be said that it retreads what C structs are), and therefore perhaps some language constructs related to class definitions could be reused with relation definitions. Perhaps "Relation" in Perl 6 would best be a role and/or meta-class with factory methods that produce other classes with common interface elements, such as all the relational algebra methods, and/or objects each of which has its own class. Under this system, each time you perform an operation like a join, you are potentially making a new class, because its attributes are likely different than its predecessors. Frankly, with Perl 6 being what it is, there are probably numerous ways to implement what I'm looking for, and I don't exactly know what would be best. For now I'll just follow the simple path with a single Relation class in Pugs' ext/Relation/ directory for demonstration purposes and evaluate changes later. -- Darren Duncan
Re: using the newer collection types
Darren Duncan wrote: >>>Is there a reference for the meaning of these methods? >>> >>> >>There are many written references to these methods; just type >>"relational algebra" into Google. >> >> > >I will add that the first hit on such a search, the Wikipedia page on >relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra >), is a perfectly good primer on what relational algebra is and what >its importance is. > > Thanks for the pointer. >While this may not actually change anything, I should point out that >every collection type can also be expressed in terms of a Relation >definition and/or they can all be implemented over a Relation (whose >members are actually always unique). For example: > >1. A Set of Any is a Relation with one Any attribute. >2. A Bag of N Any attributes is a Relation of N+1 attributes, where >the extra attribute is an Int (constrained >= 1) that counts >occurrances of the distinct other attributes. >3. A Mapping can be a Relation of 2 Any attributes. >4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), >where the key has a unique constraint. >5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any, >where the first shows their ordinal position and the second is the >actual value; the first has a unique constraint. >6. An Array is the same, assuming it is a sparse; if it is not >sparse, there is an additional constraint that the greatest Int value >is the same / one less than the count of Relation members. > > I don't know if anyone will care, but you can't emulate the raw "Collection" type with this fixed Relation type. That is, a collection of tuples, each of which may be of differing length and type. This is what leads me to think that Collection is the more generic role. I'm not saying Relations are not useful, perhaps they are more useful than Collections in the practical case, but they are a sub-type. Also, I don't agree with the notion of a "header" of each relation. It has a type for each tuple item, sure, but "header" just sounds like the sort of thing you want in a ResultSet, not a Relation. Sam.
Re: using the newer collection types
Actually, I'll add a few more things to my reply, which should be helpful ... At 5:11 PM -0700 5/4/06, Darren Duncan wrote: At 10:51 AM +1200 5/5/06, Sam Vilain wrote: >Moreover, the Relation type has these operators that the Set type doesn't have: rename(), project(), restrict(), extend(), join(), divide(), summarize(), group(), >ungroup(), wrap(), unwrap(), matching(), etc. Is there a reference for the meaning of these methods? There are many written references to these methods; just type "relational algebra" into Google. I will add that the first hit on such a search, the Wikipedia page on relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra ), is a perfectly good primer on what relational algebra is and what its importance is. The article's introduction says: Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators. Operators operate on one or more relations to yield a relation. Relational algebra is a part of computer science. Relation algebra in pure mathematics is an algebraic structure, relevant to mathematical logic and set theory. That article also explains many of the most important relational operators. Note that there is a related set of operators comprising relational calculus, and you can do everything in one that you can in the other, though less or more verbosely as the case may be. At 10:51 AM +1200 5/5/06, Sam Vilain wrote: I do think that a Pair should be a sub-type of a more general "Tuple" type, with the 'where' clause being { .items == 2 } or something like that. I think that the most flexible arrangement is to define; - a Collection as a Bag of Tuples - a Relation as a Collection where the tuples have a shape and no duplicate tuples are allowed (but Relation does not need to be a core type) Then, Mappings, Sequences, etc, become sub-types of one of the above two types. For instance, a sequence is a Collection of (Int, Any) where the first Int is unique across the collection. Similarly a Mapping is a Collection of (Any, Any) where Unique(0). something like role Tuple { has @.items }; role Collection { has Tuple @.tuples }; subset Pair of Tuple where { .items.items == 2 }; subset Bag of Collection where { ! .tuples.grep:{.items > 1 } } subset Set of Bag where { all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } ) } subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } } subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } } subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } } While this may not actually change anything, I should point out that every collection type can also be expressed in terms of a Relation definition and/or they can all be implemented over a Relation (whose members are actually always unique). For example: 1. A Set of Any is a Relation with one Any attribute. 2. A Bag of N Any attributes is a Relation of N+1 attributes, where the extra attribute is an Int (constrained >= 1) that counts occurrances of the distinct other attributes. 3. A Mapping can be a Relation of 2 Any attributes. 4. A Hash is a Relation of 2 attributes, Str (key) and Any (value), where the key has a unique constraint. 5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any, where the first shows their ordinal position and the second is the actual value; the first has a unique constraint. 6. An Array is the same, assuming it is a sparse; if it is not sparse, there is an additional constraint that the greatest Int value is the same / one less than the count of Relation members. Suffice it to say that I'm sure you would implement a bag using some other type, whether a relation or a hash or an array, where the member is stored once with an occurrance count. -- Darren Duncan
Re: using the newer collection types
At 10:51 AM +1200 5/5/06, Sam Vilain wrote: >Moreover, the Relation type has these operators that the Set type doesn't have: rename(), project(), restrict(), extend(), join(), divide(), summarize(), group(), >ungroup(), wrap(), unwrap(), matching(), etc. Is there a reference for the meaning of these methods? >7. Can we add some operators to Mapping that are like the Relation ones, so that implementing a Relation over Mappings is easier (or, see the end of #8)? Eg, these would be useful: rename(), project(), extend(), join(). In particular, implementing join() into Mapping would help save CPU cycles: Again, a reference to a prototype of the behaviour would be useful. There are many written references to these methods; just type "relational algebra" into Google. That said, some of those search results may not explain things in the same way, so I specifically prefer the definitions in Date and Darwen's book "Databases, Types, and The Relational Model"; equivalent definitions are probably on the 'net' but I'm not sure where. I also defined part of the join() operator in my last email. How that definition, for Tuples, extends to a Relation is that you pair every tuple in each relation being joined to every tuple in each of the others, and apply my earlier definition with each pairing; the output Relation contains a tuple where the earlier definition returned a tuple, and no tuple where it returned undef. Alternately, if you want to wait a week, I will be coding up documented implementations as soon as possible within Pugs' ext/Relation/ dir. But in order for me to do this, I was needing some answers about the nature of existing types like Set and Mapping and Junction etc, which I asked in this email. Regarding your other comments: I don't think this terminology or these restrictions are particularly useful. I do think that a Pair should be a sub-type of a more general "Tuple" type, with the 'where' clause being { .items == 2 } or something like that. I think that the most flexible arrangement is to define; - a Collection as a Bag of Tuples - a Relation as a Collection where the tuples have a shape and no duplicate tuples are allowed (but Relation does not need to be a core type) Then, Mappings, Sequences, etc, become sub-types of one of the above two types. For instance, a sequence is a Collection of (Int, Any) where the first Int is unique across the collection. Similarly a Mapping is a Collection of (Any, Any) where Unique(0). You may be on to something here, but I'll withold comments for now. My main concerns with this whole Relation thing are that I want an efficient way in Perl 6 to represent what relation-land's concept of tuples and relations are, as well as an efficient implementations of relational algebra that are just as easy to use and fast in Perl 6 as are the language's other collection types such as Sets. If Perl 6 is to be the choice language of the singularity, it needs to be easy and fast to do any common type of work with it, and relational algebra is an extremely common kind of work being done. -- Darren Duncan
Re: using the newer collection types
Darren Duncan wrote: >Speaking a little more technically, a Relation has 2 main components, >its heading and its body. The heading is a set of 0..N keys (called >"attributes" in relation-land), and the body is a set of 0..N >Mappings (called "tuples" in relation-land), where they set of keys >of each Mapping is identical to the Relation's heading. Its very >likely that a language-embedded Relation implementation would >actually not repeat the keys for each member Mapping, but we can >conceptualize as if they were present for simplicity. > > I don't think this terminology or these restrictions are particularly useful. I do think that a Pair should be a sub-type of a more general "Tuple" type, with the 'where' clause being { .items == 2 } or something like that. I think that the most flexible arrangement is to define; - a Collection as a Bag of Tuples - a Relation as a Collection where the tuples have a shape and no duplicate tuples are allowed (but Relation does not need to be a core type) Then, Mappings, Sequences, etc, become sub-types of one of the above two types. For instance, a sequence is a Collection of (Int, Any) where the first Int is unique across the collection. Similarly a Mapping is a Collection of (Any, Any) where Unique(0). something like role Tuple { has @.items }; role Collection { has Tuple @.tuples }; subset Pair of Tuple where { .items.items == 2 }; subset Bag of Collection where { ! .tuples.grep:{.items > 1 } } subset Set of Bag where { all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } ) } subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } } subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } } subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } } The above should probably all be written in terms of parametric roles (see S12), but for now the above run-time checking versions should hopefully express the relationships between the core collection-like types as I see them. That sounds like it might bite, but you wouldn't normally access an Array as a Collection of (Int, Any), you'd access it as an Array, so you get the nice .post_circumfix:<[ ]> method that makes array access easy. You don't care that it has this higher order type as a parent class, and you certainly wouldn't care for the 'bare' Collection interface (as for one, you don't want to have to deal with the Integer keys). And it is probably all backed by native methods. I'm prototyping much of this using Moose in Perl 5, however Hubris is delaying its release :-) >Moreover, the Relation type has these >operators that the Set type doesn't have: rename(), project(), >restrict(), extend(), join(), divide(), summarize(), group(), >ungroup(), wrap(), unwrap(), matching(), etc. > Is there a reference for the meaning of these methods? >1. Are Sets or Junctions allowed to contain undefined elements? Can >undef be a key of a Mapping or Hash? > > undef.isa(Object), so you should be able to use it as you would any other object. I would definitely not think of it as the absence of a value in this context. >2. What actually is the practical distinction between a Set and a >Junction? Why would someone use one over the other? I recognize >that the use of Junctions is supposed to make parallelism easier, as >iterating through one is known to be order independent. But, >conceptually a Set and a Relation are exactly the same; you could >process their members in any order and/or in parallel as well. So is >the use of a Junction effectively like a compiler flag to make >certain kinds of Set ops faster at the expense of others? > > Well one side effect at the moment is that Junctions are immutable, whilst Sets are mutable. This is perhaps a deficiency in my original Set.pm design; all of the mutating functions should be in a seperate role, really (or just not be mutators). >6. Can I declare with named Set (or Junction) and Mapping typed >variables and/or parameters that their members are restricted to >particular types, such as Str, as I can with Arrays and Hashes, so >that Perl itself will catch violations? Eg, can I say as a parameter >"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so >Perl will check that arguments are suchwise correct? > > These are variously called "Generics" (ada I think, Java 1.5+), "Parametric Types", "Higher Order Types" (Pierce et al), "Generic Algebraic Data Types" (Haskell) In Perl 6 they are parametric roles (as in S12 mentioned above) >7. Can we add some operators to Mapping that are like the Relation >ones, so that implementing a Relation over Mappings is easier (or, >see the end of #8)? Eg, these would be useful: rename(), project(), >extend(), join(). In particular, implementing join() into Mapping >would help save CPU cycles: > > Again, a reference to a prototype of the behaviour would be useful. Sam.
using the newer collection types
As I carry on in my spare time to implement a Relation type for Perl 6, I would like to use some of the simpler types that were added to the Synopsis recently and seem to lack a lot of explanatory details that older types have, and moreover they don't seem to be implemented yet in Pugs. So I have a few questions whose answers should clarify the intended meaning and features of these newer types, as well as the syntax for declaring them. Some relevant example types from Synopsis 6: Immutable types Objects with these types behave like values, i.e. C<$x === $y> is true if and only if their types and contents are identical. ListLazy Perl list (composed of Seq and Range parts) Seq Completely evaluated (hence immutable) sequence Range Incrementally generated (hence lazy) sequence Set Unordered Seqs that allow no duplicates JunctionSets with additional behaviours PairSeq of two elements that serves as an one-element Mapping Mapping Pairs with no duplicate keys Signature Function parameters (left-hand side of a binding) Capture Function call arguments (right-hand side of a binding) Mutable types Objects with these types have distinct C<.id> values. Array Perl array HashPerl hash The intended new Relation type could be described like this, if I correctly understand the meaning of the existing types: Immutable types RelationSet of Mappings where all Mappings have the same keys Speaking a little more technically, a Relation has 2 main components, its heading and its body. The heading is a set of 0..N keys (called "attributes" in relation-land), and the body is a set of 0..N Mappings (called "tuples" in relation-land), where they set of keys of each Mapping is identical to the Relation's heading. Its very likely that a language-embedded Relation implementation would actually not repeat the keys for each member Mapping, but we can conceptualize as if they were present for simplicity. The operations that you can do with a Relation are a proper super-set of those you can do with a Set. So, the Relation type supports all the same Set operators, with the same meanings, such as: equal(), subset(), superset(), union(), intersection(), difference(), symmetric_difference(), none(), any(), all(), member_exists(), members(), member_count(). Moreover, the Relation type has these operators that the Set type doesn't have: rename(), project(), restrict(), extend(), join(), divide(), summarize(), group(), ungroup(), wrap(), unwrap(), matching(), etc. Moreover, there would probably be convenience wrapper functions over combinations of the above operators such as insert(), update(), delete(), etc, though they aren't essential (those examples are not mutators, despite their name-sakes). Some extra operators like sort() would also be provided, which convert Relations to Seqs or Arrays. Now, some of the questions: 1. Are Sets or Junctions allowed to contain undefined elements? Can undef be a key of a Mapping or Hash? 2. What actually is the practical distinction between a Set and a Junction? Why would someone use one over the other? I recognize that the use of Junctions is supposed to make parallelism easier, as iterating through one is known to be order independent. But, conceptually a Set and a Relation are exactly the same; you could process their members in any order and/or in parallel as well. So is the use of a Junction effectively like a compiler flag to make certain kinds of Set ops faster at the expense of others? 3. Is a Signature like the keys of a Mapping but that it has extra stuff like associated types and such? Can one declare and use a Signature separately from declaring a function? 4. What is the syntax for declaring anonymous Sets and Mappings? I am already aware of these syntax for other types (correct me if I'm wrong): $a = [1,2,3]; # Array $b = {'x'=>2,'y'=>4}; # Hash $c = (1=>2); # Pair $d = (1,2,3); # Seq $e = 1..5;# Range $f = all(1,2,3); # Junction If this hasn't yet been decided, might I suggest the following?: $g = set(1,2,3); # Set $h = ('x'=>2,'y'=>4); # Mapping If that works, then perhaps an anonymous Relation declartion could look like: $r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) ); I'm not particular with the exact syntax; it just needs to be something good. Note that a terse form of this could leave out the heading declaration if at least one Mapping/tuple is provided, since that contains the same key list. $r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) ); Then the heading declaration is only needed if the Relation has no Mappings/tuples. $r = relation( set( 'x', 'y' ): ); 5. What is the syntax for subscripting or extracting Mapping components? Eg, can we use the same .keys, .values, .pai