Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez
Hey,


> OK. I see the "referent" concept is broader than I had thought. They are
> not just pointers, but (paraphrasing) expressions awaiting evaluation. The
> "referent pattern" is more or less the type of the expression, i.e. the
> type of whatever the expression evaluates to.

Yes. As I see it.



Reference = Referent Pattern + Instruction Patterns + Referents
* Referent Pattern = The most specific description possible of 
the structure of the referents of the reference.
* Instruction Patterns = The result of streaming the referents 
through the instruction.
* Referents = The objects pointed to be the reference. 
Accessing them is a dereference (a manifestation).

Why are references more than just pointers to referents? Kuppitz and I are 
realizing that ?all? database optimization can be understood as techniques to 
avoid dereferencing. If the reference can tell the VM what to expect from a 
dereference, then certain instructions can avoid dereferencing the reference. 

Page 17 of the mm-ADT spec has a summary table of the subsequent subsections 
that show how foreign keys, schemas, indices, denormalizations, views, 
pipelines, and aggregations are all just variations on this theme.

Now, what I realized today what that there are two types of “referent patterns."
* The reference’s referent pattern describes what to expect (locally). 
// read-oriented
* The database’s type pattern describes what is legal (globally).   
 // write-oriented
I didn’t have the second and I need it. Enter a type system. We need a way to 
extend the mm-ADT’s type system. Enter you.


> For example, in Haskell
> notation:
> 
>  sum [1,2,3] :: Int
> 
> Here, "sum [1,2,3]" is the reference. The referent is something which has
> yet to be determined (the number 6). We know that the referent's type is
> Int, and we can type-check the expression to be verify that it will produce
> an Int. Another example:
> 
>  fmap (\n -> "number " ++ show n) $ filter (> 1) [1,2,3] :: [String]
> 
> Here, "fmap ... [1,2,3]" is the reference, and the referent is a list of
> strings: ["number 2","number 3”].

Bingo. So there are three types of “right hand sides” to an instruction pattern.

[instruction pattern]->result pattern (a description of what too expect)
[instruction pattern]->result object  (the actual result to expect — 
memoization-style)
[instruction pattern]->bytecode   (a different computation that should be 
executed)

…where the first is the second given a specific enough pattern ;). That is, so 
the pattern  is so specific that its an explicit object. And the last is 
something new that I realized this weekend with respects to writing, that might 
not be necessary….


> Instruction patterns seem like additional "referents" to me, with the
> difference that they are applied to objects, and that they are composed of
> concrete instructions.

Yes. Though not “concrete” instructions, but instruction patterns. For 
instance, what is a database hash-index over the name-attribute?

*person{name:@string, age:@int /  // referent pattern
  [has,name,eq,$x.@string]->*person{name:x, age:@int} }   // instruction pattern

The concrete instruction [has,name,eq,marko] will match the above instruction 
pattern and will return a reference to person objects whose name is now known 
and whose age is still just some integer.

> Instruction patterns seem like additional "referents" to me


I would argue that referent patterns are actually just instruction patterns as:

[value,name]->@string  // instruction pattern

is the same thing as

{name:@string} // referent pattern

…I don’t have an ultra-confident reason why references aren’t just instruction 
patterns, but its related to behavioral differences in reading/writing. More 
exploration is required.


> If a referent is nullary (has some type "a"), an
> instruction pattern seems unary (has some type "a->b", consuming an "a" and
> producing a "b"). But I need to grok more.

Sorta. The semantics of X->Y are:

“Given a matching X-instruction executed on the reference’s referents, 
the result is Y."

If Y is sufficient to solve the computation, the VM avoided dereferencing… and 
this is database optimization in a nutshell.

Marko.

http://rredux.com




> 
> 
> On Sun, May 26, 2019 at 6:51 PM Marko Rodriguez 
> wrote:
> 
>> Hello,
>> 
 [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
 
 We have lost information about the “schema.” This is not good as
 compile-time write validation is not possible.
 
>>> 
>>> So far, I am thinking: yeah, dealing with schema changes can be tricky.
>> 
>> This is not a “schema change” but a greater specification of what the
>> referents are. Again, a reference is defined by its referent pattern (and
>> instruction patterns). The referent pattern is a description of the current
>> instances (referents) while the “schema” is a description 

Re: A Type System Core to mm-ADT

2019-05-26 Thread Joshua Shinavier
OK. I see the "referent" concept is broader than I had thought. They are
not just pointers, but (paraphrasing) expressions awaiting evaluation. The
"referent pattern" is more or less the type of the expression, i.e. the
type of whatever the expression evaluates to. For example, in Haskell
notation:

  sum [1,2,3] :: Int

Here, "sum [1,2,3]" is the reference. The referent is something which has
yet to be determined (the number 6). We know that the referent's type is
Int, and we can type-check the expression to be verify that it will produce
an Int. Another example:

  fmap (\n -> "number " ++ show n) $ filter (> 1) [1,2,3] :: [String]

Here, "fmap ... [1,2,3]" is the reference, and the referent is a list of
strings: ["number 2","number 3"].

Instruction patterns seem like additional "referents" to me, with the
difference that they are applied to objects, and that they are composed of
concrete instructions. If a referent is nullary (has some type "a"), an
instruction pattern seems unary (has some type "a->b", consuming an "a" and
producing a "b"). But I need to grok more.

Josh






On Sun, May 26, 2019 at 6:51 PM Marko Rodriguez 
wrote:

> Hello,
>
> >> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
> >>
> >> We have lost information about the “schema.” This is not good as
> >> compile-time write validation is not possible.
> >>
> >
> > So far, I am thinking: yeah, dealing with schema changes can be tricky.
>
> This is not a “schema change” but a greater specification of what the
> referents are. Again, a reference is defined by its referent pattern (and
> instruction patterns). The referent pattern is a description of the current
> instances (referents) while the “schema” is a description of what is legal
> for all instances (referents). Without “schema,” I lose compile-time
> validation.
>
> > I then create a people-key on the db map that maintains a
> person-reference.
> >>
> >
> > OK. I think by people-key you mean the primary key for the person type,
> > i.e. the vertex id. Correct me if I am wrong.
>
> No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map
> with the keys being the table names and the values being *{:} references to
> maps (i.e. rows).
>
> > I see this as a type plus a constraint. And... you won't be surprised to
> > hear me say this... you express it with a select statement:
> >
> >youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)
>
> Well, type definitions like this won’t happen at runtime. The VM will just
> be able to tell you if the range has been restricted. It won’t create new
> types. But yea, referent patterns are (now) a type plus a constraint.
> Before, they were just constraints and that is why I lost schema
> information at runtime.
>
> Take care,
> Marko.
>
> http://rredux.com
>
>


Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez
Hello,

>> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>> 
>> We have lost information about the “schema.” This is not good as
>> compile-time write validation is not possible.
>> 
> 
> So far, I am thinking: yeah, dealing with schema changes can be tricky.

This is not a “schema change” but a greater specification of what the referents 
are. Again, a reference is defined by its referent pattern (and instruction 
patterns). The referent pattern is a description of the current instances 
(referents) while the “schema” is a description of what is legal for all 
instances (referents). Without “schema,” I lose compile-time validation.

> I then create a people-key on the db map that maintains a person-reference.
>> 
> 
> OK. I think by people-key you mean the primary key for the person type,
> i.e. the vertex id. Correct me if I am wrong.

No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map with 
the keys being the table names and the values being *{:} references to maps 
(i.e. rows).

> I see this as a type plus a constraint. And... you won't be surprised to
> hear me say this... you express it with a select statement:
> 
>youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)

Well, type definitions like this won’t happen at runtime. The VM will just be 
able to tell you if the range has been restricted. It won’t create new types. 
But yea, referent patterns are (now) a type plus a constraint. Before, they 
were just constraints and that is why I lost schema information at runtime.

Take care,
Marko.

http://rredux.com 



Re: A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez
Hello,

>> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>> 
>> We have lost information about the “schema.” This is not good as
>> compile-time write validation is not possible.
>> 
> 
> So far, I am thinking: yeah, dealing with schema changes can be tricky.

This is not a “schema change” but a greater specification of what the referents 
are. Again, a reference is defined by its referent pattern (and instruction 
patterns). The referent pattern is a description of the current instances 
(referents) while the “schema” is a description of what is legal for all 
instances (referents). Without “schema,” I lose compile-time validation.

> I then create a people-key on the db map that maintains a person-reference.
>> 
> 
> OK. I think by people-key you mean the primary key for the person type,
> i.e. the vertex id. Correct me if I am wrong.

No. db.get(‘people’) is the "people table.” RDBMSs are modeled as a map with 
the keys being the table names and the values being *{:} references to maps 
(i.e. rows).

> I see this as a type plus a constraint. And... you won't be surprised to
> hear me say this... you express it with a select statement:
> 
>youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)

Well, type definitions like this won’t happen at runtime. The VM will just be 
able to tell you if the range has been restricted. It won’t create new types. 
But yea, referent patterns are (now) a type plus a constraint. Before, they 
were just constraints and that is why I lost schema information at runtime.

Take care,
Marko.

http://rredux.com



Re: A Type System Core to mm-ADT

2019-05-26 Thread Joshua Shinavier
Hi Marko,

Ryan Wisnesky, Jan Hidders, and I have been jamming on the Algebraic
Property Graphs paper, which formalizes that type system. It's at 17 pages;
I hope to have it in a sharable state tomorrow after I add an intro
section. Looking forward to your feedback. We have delved into the TP4
virtual machine specifically, as it is in flux until the spec comes out,
but Ryan has laid out a strategy for bridging the gap between the type
system and programming environments like Java, Python, CQL. I believe the
same approach can be used for mm-ADT.

Responses inline.



On Sun, May 26, 2019 at 4:34 PM Marko Rodriguez 
wrote:

> Hi,
>
> *** This email is primarily for Josh (and Kuppitz).
>
> I think we need Josh's type system core to mm-ADT. I’m facing the problem
> of having to model both a “range” (legal schema) and a “codomain” (current
> schema) of the referents of a reference. Let me explain with an example.
>
> Suppose that there is an SQL table called ‘people’ and the table is empty.
> When I mm-ADT serves up a this table, it looks like this in mm-ADT:
>
>
> [db][get,’people’] // *{name:@string, age:@int}
>
> This says that ‘people’ is a pointer to maps containing a name-key with a
> string value and an age-key with an integer value.
>
> Now lets say I insert some rows into this table. Now, according to the
> mm-ADT spec, every reference must have as much information as possible
> about the referents. Thus, the people-reference pattern can change. Lets
> assume it does and it now is:
>
> [db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}
>
> We have lost information about the “schema.” This is not good as
> compile-time write validation is not possible.
>

So far, I am thinking: yeah, dealing with schema changes can be tricky.


Thus, I want to make a distinction between “range” and “codomain”. Here is
> some bytecode:
>
> [db][define,’person’,{name:@string,age:@int}]
> [db][create,’people’,*person]
>
> I define a type called person, where all such instances must match the
> respective map-pattern.

I then create a people-key on the db map that maintains a person-reference.
>

OK. I think by people-key you mean the primary key for the person type,
i.e. the vertex id. Correct me if I am wrong.


Now:
>
> [db][add,’people’,{name:marko,age:29}]
> [db][add,’people’,{name:josh,age:32}]
> ...
> [db][get,’people']// *person{name:@string, age:!gt(20)&!lt(33)}
> [db][type,’person']   // {name:@string, age:@int}
>
> Thus, when I get the reference at people, I see the “codomain” of current
> person referents, but when I get the person-type, I get the “range” of
> legal person referents.
>

I see this as a type plus a constraint. And... you won't be surprised to
hear me say this... you express it with a select statement:

youngishPeople := σ_{age <= 20 ∧ age >= 33}(people)



> In this way, “types” become central to mm-ADT, where schema is crucial in
> specifying a referent range.
>

As you know, I would like to see types become central. With types come the
possibility of type inference, as well as easier integration with the type
systems of other frameworks. Anything which conforms to the simply typed
lambda calculus, including extensions to product and coproduct types, will
be in the native language of the TP4 VM.



> —I have more to say on the necessity of multi-types (union of types) and
> their role in pattern definitions.
>

I think you agree that with both products and coproducts in the type
system, we can support functional pattern matching in the style of Haskell
or Scala, which is a powerful feature that we just couldn't take advantage
of in  TP1..3, for lack of a strong type system.

Josh



>
> Thoughts?,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>


A Type System Core to mm-ADT

2019-05-26 Thread Marko Rodriguez
Hi,

*** This email is primarily for Josh (and Kuppitz).

I think we need Josh's type system core to mm-ADT. I’m facing the problem of 
having to model both a “range” (legal schema) and a “codomain” (current schema) 
of the referents of a reference. Let me explain with an example.

Suppose that there is an SQL table called ‘people’ and the table is empty. When 
I mm-ADT serves up a this table, it looks like this in mm-ADT:


[db][get,’people’] // *{name:@string, age:@int}

This says that ‘people’ is a pointer to maps containing a name-key with a 
string value and an age-key with an integer value.

Now lets say I insert some rows into this table. Now, according to the mm-ADT 
spec, every reference must have as much information as possible about the 
referents. Thus, the people-reference pattern can change. Lets assume it does 
and it now is:

[db][get,’people’] // *{name:@string, age:!gt(20)&!lt(33)}

We have lost information about the “schema.” This is not good as compile-time 
write validation is not possible.

Thus, I want to make a distinction between “range” and “codomain”. Here is some 
bytecode:

[db][define,’person’,{name:@string,age:@int}]
[db][create,’people’,*person]

I define a type called person, where all such instances must match the 
respective map-pattern.
I then create a people-key on the db map that maintains a person-reference.

Now:

[db][add,’people’,{name:marko,age:29}]
[db][add,’people’,{name:josh,age:32}]
...
[db][get,’people']// *person{name:@string, age:!gt(20)&!lt(33)}
[db][type,’person']   // {name:@string, age:@int}

Thus, when I get the reference at people, I see the “codomain” of current 
person referents, but when I get the person-type, I get the “range” of legal 
person referents.

In this way, “types” become central to mm-ADT, where schema is crucial in 
specifying a referent range.

—I have more to say on the necessity of multi-types (union of types) and their 
role in pattern definitions.

Thoughts?,
Marko.

http://rredux.com <http://rredux.com/>