Re: [DISCUSS] The Two Protocols of TP4

2019-04-29 Thread Marko Rodriguez
Hi,

> Currently users can send either bytecode or groovy scripts to be executed
> on the server. I'm saying we replace "groovy scripts evaluation" with
> "gremlin groovy traversal execution”.

I concur. But why even send Gremlin-Groovy traversals? Just send bytecode.
- assuming we can get rid of lambdas

> In TP3, it's possible for the user to submit to the script engine something
> like "Thread.sleep(4000)" that will be executed inside a sandboxed vm.
> I'm proposing we get rid of this approach in TP4 and, as gremlin groovy
> script are still useful (for example, you can store a bunch of traversals
> to execute in a text file), we replace it with a language recognition
> engine that will parse what is sent and evaluate it, using a restricted
> grammar set. The variant for gremlin strings would still be groovy/java but
> the user won't be able to submit arbitrary groovy instructions.

Understood. Again, I would make this super simple by just sending bytecode.

One thing I’m pushing for is a “reference implementation server.” No more 
monolithic GremlinServer. The reference server has the following features:

- Sits on a socket waiting for bytecode.
- Executes bytecode and returns traversers.
- For distributed processors, can send traversers back to client from 
any machine in the cluster.

From this reference server, providers can extend it as they see fit. Perhaps 
someone wants to execute Groovy scripts!

- ScriptEngineStrategy
- ScriptEngineFlatMap
- [ex:script,groovy,Thread.sleep(1000)]

In other words, our reference implementation server is bare bones, rock solid, 
speedy, and safe. How the pieces are reassembled by the provider is up to them.

Thoughts?,
Marko.

http://rredux.com 



Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez
Hey,

Check this out:


Machine machine = LocalMachine.open();
TraversalSource jdbc =
Gremlin.traversal(machine).
withProcessor(PipesProcessor.class).
withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
  
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
  path("x", "y").toList());
System.out.println(“\n\n”)
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
  path("x", "y").explain().toList());


[[{NAME=marko, AGE=29}, {CITY=santa fe, NAME=marko}], [{NAME=josh, AGE=32}, 
{CITY=san jose, NAME=josh}]]


[Original   [db, values(people)@x, db, 
values(addresses)@y, hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCStrategy[db(), values(people)@x, db(), values(addresses)@y, 
hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCQueryStrategy   [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
PipesStrategy   [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientVerificationStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
---
Compilation [FlatMapInitial]
Execution Plan [PipesProcessor] [InitialStep[FlatMapInitial]]]





I basically look for a db.values.db.values.has-pattern in the bytecode and if I 
find it, I try and roll it into a single provider-specific instruction that 
does a SELECT query.

Here is JDBCQueryStrategy (its ghetto and error prone, but I just wanted to get 
the basic concept working):

https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java
 

Here is SqlFlatMapStep (hyper-ghetto… but whateva’):

https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java
 


Na na!,
Marko.

http://rredux.com 




> On Apr 29, 2019, at 11:50 AM, Marko Rodriguez  wrote:
> 
> Hello Kuppitz,
> 
>> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
>> important in RDBMS. I don't know, maybe you can convince SQL people that
>> dropping a value is the same as setting its value to NULL. It would work
>> for you and me and everybody else who's familiar with Gremlin, but SQL
>> people really love their NULLs….
> 
> Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge. 
> 
> 
>> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
>> At least valueMaps would then only have String-keys.
>> I see the issue with that (naming collisions), but it's still better than
>> the enums in my opinion (which became a pain when started to implement
>> GLVs).
> 
> So, TSymbols are not Java enums. They are simply a “primitive”-type that will 
> have a serialization like:
> 
>   symbol[id]
> 
> Meaning, that people can make up Symbols all day long without having to 
> update serializers. How I see them working is that they are Strings prefixed 
> with #.
> 
> g.V().outE() <=>   g.V().values(“#outE”)
> g.V().id()   <=>   g.V().value(“#id”)
> g.V().hasLabel(“person") <=>   g.V().has(“#label”,”person”)
> 
> Now that I type this out, perhaps we don’t even have a TSymbol-class. 
> Instead, any String that starts with # is considered a symbol. Now watch this:
> 
> g.V().label()  <=>   g.V().value(“#label”)
> g.V().labels() <=>   g.V().values(“#label”)
> 
> In this way, we can support Neo4j multi-labels as a Neo4jVertex’s #label-Key 
> references a TSequence.
> 
> g.V(1).label() => TSequence
> g.V(1).labels() => String, String, String, …
> 

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez
Hello Kuppitz,

> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
> important in RDBMS. I don't know, maybe you can convince SQL people that
> dropping a value is the same as setting its value to NULL. It would work
> for you and me and everybody else who's familiar with Gremlin, but SQL
> people really love their NULLs….

Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge. 

> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
> At least valueMaps would then only have String-keys.
> I see the issue with that (naming collisions), but it's still better than
> the enums in my opinion (which became a pain when started to implement
> GLVs).

So, TSymbols are not Java enums. They are simply a “primitive”-type that will 
have a serialization like:

symbol[id]

Meaning, that people can make up Symbols all day long without having to update 
serializers. How I see them working is that they are Strings prefixed with #.

g.V().outE() <=>   g.V().values(“#outE”)
g.V().id()   <=>   g.V().value(“#id”)
g.V().hasLabel(“person") <=>   g.V().has(“#label”,”person”)

Now that I type this out, perhaps we don’t even have a TSymbol-class. Instead, 
any String that starts with # is considered a symbol. Now watch this:

g.V().label()  <=>   g.V().value(“#label”)
g.V().labels() <=>   g.V().values(“#label”)

In this way, we can support Neo4j multi-labels as a Neo4jVertex’s #label-Key 
references a TSequence.

g.V(1).label() => TSequence
g.V(1).labels() => String, String, String, …
g.V(1).label().add(“programmer”)
g.V(1).label().drop(“person”)

So we could do “meta()”, but then you need respective “hasXXX”-meta() methods. 
I think #symbol is easiest .. ?

> Also, what I'm wondering about now: Have you thought about Stored
> Procedures and Views in RDBMS? Views can be treated as tables, easy, but
> what about stored procedures? SPs can be found in many more DBMS, would be
> bad to not support them (or hack something ugly together later in the
> development process).

I’m not super versed in RDBMS technology. Can you please explain to me how to 
create a StoreProcedure and the range of outputs a StoredProcedure produces? 
From there, I can try and “Bytecode-ize” it.

Thanks Kuppitz,
Marko.

http://rredux.com 




> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez  >
> wrote:
> 
>> Hi,
>> 
>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>> interested… ***
>> 
>> So I did a lot of thinking this weekend about structure/ and this morning,
>> I prototyped both graph/ and rdbms/.
>> 
>> This is the way I’m currently thinking of things:
>> 
>>1. There are 4 base types in structure/.
>>- Primitive: string, long, float, int, … (will constrain
>> these at some point).
>>- TTuple: key/value map.
>>- TSequence: an iterable of v objects.
>>- TSymbol: like Ruby, I think we need “enum-like” symbols
>> (e.g., #id, #label).
>> 
>>2. Every structure has a “root.”
>>- for graph its TGraph implements TSequence
>>- for rdbms its a TDatabase implements
>> TTuple
>> 
>>3. Roots implement Structure and thus, are what is generated by
>> StructureFactory.mint().
>>- defined using withStructure().
>>- For graph, its accessible via V().
>>- For rdbms, its accessible via db().
>> 
>>4. There is a list of core instructions for dealing with these
>> base objects.
>>- value(K key): gets the TTuple value for the provided key.
>>- values(K key): gets an iterator of the value for the
>> provided key.
>>- entries(): gets an iterator of T2Tuple objects for the
>> incoming TTuple.
>>- hasXXX(A,B): various has()-based filters for looking
>> into a TTuple and a TSequence
>>- db()/V()/etc.: jump to the “root” of the withStructure()
>> structure.
>>- drop()/add(): behave as one would expect and thus.
>> 
>> 
>> 
>> For RDBMS, we have three interfaces in rdbms/.
>> (machine/machine-core/structure/rdbms)
>> 
>>1. TDatabase implements TTuple // the root
>> structure that indexes the tables.
>>2. TTable implements TSequence> // a table is a sequence
>> of rows
>>3. TRow implements TTuple> // a row has string column
>> names
>> 
>> I then created a new project at machine/structure/jdbc). The classes in
>> here implement the above rdbms/ interfaces/
>> 
>> Here is an RDBMS session:
>> 
>> final Machine machine = LocalMachine.open();
>> final TraversalSource jdbc =
>>Gremlin.traversal(machine).
>>withProcessor(PipesProcessor.class).
>>withStructure(JDBCStructure.class,
>> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>> 
>> 

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez
Hello Josh,

> A has("age",29), for example, operates at a different level of abstraction 
> than a
> has("city","Santa Fe") if "city" is a column in an "addresses" table.

So hasXXX() operators work on TTuples. Thus:

g.V().hasLabel(‘person’).has(‘age’,29)
g.V().hasLabel(‘address’).has(‘city’,’Santa Fe’)

..both work as a person-vertex and an address-vertex are TTuples. If these were 
tables, then:

jdbc.db().values(‘people’).has(‘age’,29)
jdbc.db().values(‘addresses’).has(‘city’,’Santa Fe’)

…also works as both people and addresses are TTables which extend 
TTuple.

In summary, its its a TTuple, then hasXXX() is good go.

// IGNORE UNTIL AFTER READING NEXT SECTION //
*** SIDENOTE: A TTable (which is a TSequence) could have Symbol-based metadata. 
Thus TTable.value(#label) -> “people.” If so, then
jdbc.db().hasLabel(“people”).has(“age”,29)

> At least, they
> are different if the data model allows for multi-properties,
> meta-properties, and hyper-edges. A property is something that can either
> be there, attached to an element, or not be there. There may also be more
> than one such property, and it may have other properties attached to it. A
> column of a table, on the other hand, is always there (even if its value is
> allowed to be null), always has a single value, and cannot have further
> properties attached.

1. Multi-properties.

Multi-properties works because if name references a TSequence, then its the 
sequence that you analyze with has(). This is another reason why TSequence is 
important. Its a reference to a “stream” so there isn’t another layer of 
tuple-nesting.

// assume v[1] has name={marko,mrodriguez,markor}
g.V(1).value(‘name’) => TSequence
g.V(1).values(‘name’) => marko, mrodriguez, markor
g.V(1).has(‘name’,’marko’) => v[1]

2. Meta-properties

// assume v[1] has name=[value:marko,creator:josh,timestamp:12303] // i.e. a 
tuple value
g.V(1).value(‘name’) => TTuple // doh!
g.V(1).value(‘name’).value(‘value’) => marko
g.V(1).value(‘name’).value(‘creator’) => josh

So things get screwy. — however, it only gets screwy when you mix your 
“metadata” key/values with your “data” key/values. This is why I think TSymbols 
are important. Imagine the following meta-property tuple for v[1]:

[#value:marko,creator:josh,timestamp:12303]

If you do g.V(1).value(‘name’), we could look to the value indexed by the 
symbol #value, thus => “marko”.
If you do g.V(1).values(‘name’), you would get back a TSequence with a single 
TTuple being the meta property.
If you do g.V(1).values(‘name’).value(), we could get the value indexed by the 
symbol #value.
If you do g.V(1).values(‘name’).value(‘creator’), it will return the primitive 
string “josh”.

I believe that the following symbols should be recommended for use across all 
data structures.
#id, #label, #key, #value
…where id(), label(), key(), value() are tuple.get(Symbol). Other symbols for 
use with propertygraph/ include:
#outE, #inV, #inE, #outV, #bothE, #bothV

> In order to simplify user queries, you can let has() and values() do double
> duty, but I still feel that there are lower-level operations at play, at a
> logical level even if not at a bytecode level. However, expressing the a
> traversal in terms of its lowest-level relational operations may also be
> useful for query optimization.

One thing that I’m doing, that perhaps you haven’t caught onto yet, is that I’m 
not modeling everything in terms of “tables.” Each data structure is trying to 
stay as pure to its conceptual model as possible. Thus, there are no “joins” in 
property graphs as outE() references a TSequence, where TEdge is an 
interface that extends TTuple. You can just walk without doing any type of 
INNER JOIN. Now, if you model a property graph in a relational database, you 
will have to strategize the bytecode accordingly! Just a heads up in case you 
haven’t noticed that.

Thanks for your input,
Marko.

http://rredux.com 



> 
> Josh
> 
> 
> 
> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez  >
> wrote:
> 
>> Hi,
>> 
>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>> interested… ***
>> 
>> So I did a lot of thinking this weekend about structure/ and this morning,
>> I prototyped both graph/ and rdbms/.
>> 
>> This is the way I’m currently thinking of things:
>> 
>>1. There are 4 base types in structure/.
>>- Primitive: string, long, float, int, … (will constrain
>> these at some point).
>>- TTuple: key/value map.
>>- TSequence: an iterable of v objects.
>>- TSymbol: like Ruby, I think we need “enum-like” symbols
>> (e.g., #id, #label).
>> 
>>2. Every structure has a “root.”
>>- for graph its TGraph implements TSequence
>>- for rdbms its a TDatabase implements
>> TTuple
>> 
>>3. Roots implement Structure and thus, are what is generated by
>> 

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Daniel Kuppitz
>
> we don’t support ‘null' in TP


I don't think it's a good idea to keep this mindset for TP4; NULLs are too
important in RDBMS. I don't know, maybe you can convince SQL people that
dropping a value is the same as setting its value to NULL. It would work
for you and me and everybody else who's familiar with Gremlin, but SQL
people really love their NULLs

TSymbol: like Ruby, I think we need “enum-like” symbols (e.g., #id, #label).


I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
At least valueMaps would then only have String-keys.
I see the issue with that (naming collisions), but it's still better than
the enums in my opinion (which became a pain when started to implement
GLVs).

Also, what I'm wondering about now: Have you thought about Stored
Procedures and Views in RDBMS? Views can be treated as tables, easy, but
what about stored procedures? SPs can be found in many more DBMS, would be
bad to not support them (or hack something ugly together later in the
development process).

Cheers,
Daniel


On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez 
wrote:

> Hi,
>
> *** This email is primarily for Josh (and Kuppitz). However, if others are
> interested… ***
>
> So I did a lot of thinking this weekend about structure/ and this morning,
> I prototyped both graph/ and rdbms/.
>
> This is the way I’m currently thinking of things:
>
> 1. There are 4 base types in structure/.
> - Primitive: string, long, float, int, … (will constrain
> these at some point).
> - TTuple: key/value map.
> - TSequence: an iterable of v objects.
> - TSymbol: like Ruby, I think we need “enum-like” symbols
> (e.g., #id, #label).
>
> 2. Every structure has a “root.”
> - for graph its TGraph implements TSequence
> - for rdbms its a TDatabase implements
> TTuple
>
> 3. Roots implement Structure and thus, are what is generated by
> StructureFactory.mint().
> - defined using withStructure().
> - For graph, its accessible via V().
> - For rdbms, its accessible via db().
>
> 4. There is a list of core instructions for dealing with these
> base objects.
> - value(K key): gets the TTuple value for the provided key.
> - values(K key): gets an iterator of the value for the
> provided key.
> - entries(): gets an iterator of T2Tuple objects for the
> incoming TTuple.
> - hasXXX(A,B): various has()-based filters for looking
> into a TTuple and a TSequence
> - db()/V()/etc.: jump to the “root” of the withStructure()
> structure.
> - drop()/add(): behave as one would expect and thus.
>
> 
>
> For RDBMS, we have three interfaces in rdbms/.
> (machine/machine-core/structure/rdbms)
>
> 1. TDatabase implements TTuple // the root
> structure that indexes the tables.
> 2. TTable implements TSequence> // a table is a sequence
> of rows
> 3. TRow implements TTuple> // a row has string column
> names
>
> I then created a new project at machine/structure/jdbc). The classes in
> here implement the above rdbms/ interfaces/
>
> Here is an RDBMS session:
>
> final Machine machine = LocalMachine.open();
> final TraversalSource jdbc =
> Gremlin.traversal(machine).
> withProcessor(PipesProcessor.class).
> withStructure(JDBCStructure.class,
> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>
> System.out.println(jdbc.db().toList());
> System.out.println(jdbc.db().entries().toList());
> System.out.println(jdbc.db().value("people").toList());
> System.out.println(jdbc.db().values("people").toList());
> System.out.println(jdbc.db().values("people").value("name").toList());
> System.out.println(jdbc.db().values("people").entries().toList());
>
> This yields:
>
> []
> [PEOPLE:]
> []
> [, ]
> [marko, josh]
> [NAME:marko, AGE:29, NAME:josh, AGE:32]
>
> The bytecode of the last query is:
>
> [db(), values(people),
> entries]
>
> JDBCDatabase implements TDatabase, Structure.
> *** JDBCDatabase is the root structure and is referenced by db()
> *** (CRUCIAL POINT)
>
> Assume another table called ADDRESSES with two columns: name and city.
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
>
> The above is equivalent to:
>
> SELECT city FROM people,addresses WHERE people.name=addresses.name
>
> If you want to do an inner join (a product), you do this:
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")
>
> The above is equivalent to:
>
> SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name
>
> NOTES:
> 1. Instead of select(), we simply jump to the root via db() (or
> V() for graph).
> 2. Instead of 

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Joshua Shinavier
Hi Marko,

I will respond in more detail tomorrow (I'm a late-night-thinking,
early-morning-writing kind of guy) but yes I think this is cool, so long as
we are not overloading the steps with different levels of abstraction.
A has("age",
29), for example, operates at a different level of abstraction than a
has("city",
"Santa Fe") if "city" is a column in an "addresses" table. At least, they
are different if the data model allows for multi-properties,
meta-properties, and hyper-edges. A property is something that can either
be there, attached to an element, or not be there. There may also be more
than one such property, and it may have other properties attached to it. A
column of a table, on the other hand, is always there (even if its value is
allowed to be null), always has a single value, and cannot have further
properties attached. The same goes for values().

In order to simplify user queries, you can let has() and values() do double
duty, but I still feel that there are lower-level operations at play, at a
logical level even if not at a bytecode level. However, expressing the a
traversal in terms of its lowest-level relational operations may also be
useful for query optimization.

Josh



On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez 
wrote:

> Hi,
>
> *** This email is primarily for Josh (and Kuppitz). However, if others are
> interested… ***
>
> So I did a lot of thinking this weekend about structure/ and this morning,
> I prototyped both graph/ and rdbms/.
>
> This is the way I’m currently thinking of things:
>
> 1. There are 4 base types in structure/.
> - Primitive: string, long, float, int, … (will constrain
> these at some point).
> - TTuple: key/value map.
> - TSequence: an iterable of v objects.
> - TSymbol: like Ruby, I think we need “enum-like” symbols
> (e.g., #id, #label).
>
> 2. Every structure has a “root.”
> - for graph its TGraph implements TSequence
> - for rdbms its a TDatabase implements
> TTuple
>
> 3. Roots implement Structure and thus, are what is generated by
> StructureFactory.mint().
> - defined using withStructure().
> - For graph, its accessible via V().
> - For rdbms, its accessible via db().
>
> 4. There is a list of core instructions for dealing with these
> base objects.
> - value(K key): gets the TTuple value for the provided key.
> - values(K key): gets an iterator of the value for the
> provided key.
> - entries(): gets an iterator of T2Tuple objects for the
> incoming TTuple.
> - hasXXX(A,B): various has()-based filters for looking
> into a TTuple and a TSequence
> - db()/V()/etc.: jump to the “root” of the withStructure()
> structure.
> - drop()/add(): behave as one would expect and thus.
>
> 
>
> For RDBMS, we have three interfaces in rdbms/.
> (machine/machine-core/structure/rdbms)
>
> 1. TDatabase implements TTuple // the root
> structure that indexes the tables.
> 2. TTable implements TSequence> // a table is a sequence
> of rows
> 3. TRow implements TTuple> // a row has string column
> names
>
> I then created a new project at machine/structure/jdbc). The classes in
> here implement the above rdbms/ interfaces/
>
> Here is an RDBMS session:
>
> final Machine machine = LocalMachine.open();
> final TraversalSource jdbc =
> Gremlin.traversal(machine).
> withProcessor(PipesProcessor.class).
> withStructure(JDBCStructure.class,
> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>
> System.out.println(jdbc.db().toList());
> System.out.println(jdbc.db().entries().toList());
> System.out.println(jdbc.db().value("people").toList());
> System.out.println(jdbc.db().values("people").toList());
> System.out.println(jdbc.db().values("people").value("name").toList());
> System.out.println(jdbc.db().values("people").entries().toList());
>
> This yields:
>
> []
> [PEOPLE:]
> []
> [, ]
> [marko, josh]
> [NAME:marko, AGE:29, NAME:josh, AGE:32]
>
> The bytecode of the last query is:
>
> [db(), values(people),
> entries]
>
> JDBCDatabase implements TDatabase, Structure.
> *** JDBCDatabase is the root structure and is referenced by db()
> *** (CRUCIAL POINT)
>
> Assume another table called ADDRESSES with two columns: name and city.
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
>
> The above is equivalent to:
>
> SELECT city FROM people,addresses WHERE people.name=addresses.name
>
> If you want to do an inner join (a product), you do this:
>
>
> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")
>
> The above is equivalent to:
>
> SELECT * FROM addresses INNER JOIN people ON 

The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

2019-04-29 Thread Marko Rodriguez
Hi,

*** This email is primarily for Josh (and Kuppitz). However, if others are 
interested… ***

So I did a lot of thinking this weekend about structure/ and this morning, I 
prototyped both graph/ and rdbms/.

This is the way I’m currently thinking of things:

1. There are 4 base types in structure/.
- Primitive: string, long, float, int, … (will constrain these 
at some point).
- TTuple: key/value map.
- TSequence: an iterable of v objects.
- TSymbol: like Ruby, I think we need “enum-like” symbols 
(e.g., #id, #label).

2. Every structure has a “root.”
- for graph its TGraph implements TSequence
- for rdbms its a TDatabase implements TTuple

3. Roots implement Structure and thus, are what is generated by 
StructureFactory.mint().
- defined using withStructure().
- For graph, its accessible via V().
- For rdbms, its accessible via db().

4. There is a list of core instructions for dealing with these base 
objects.
- value(K key): gets the TTuple value for the provided key.
- values(K key): gets an iterator of the value for the provided 
key.
- entries(): gets an iterator of T2Tuple objects for the 
incoming TTuple.
- hasXXX(A,B): various has()-based filters for looking into a 
TTuple and a TSequence
- db()/V()/etc.: jump to the “root” of the withStructure() 
structure.
- drop()/add(): behave as one would expect and thus.



For RDBMS, we have three interfaces in rdbms/. 
(machine/machine-core/structure/rdbms)

1. TDatabase implements TTuple // the root structure 
that indexes the tables.
2. TTable implements TSequence> // a table is a sequence of rows
3. TRow implements TTuple> // a row has string column names

I then created a new project at machine/structure/jdbc). The classes in here 
implement the above rdbms/ interfaces/

Here is an RDBMS session:

final Machine machine = LocalMachine.open();
final TraversalSource jdbc =
Gremlin.traversal(machine).
withProcessor(PipesProcessor.class).
withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));

System.out.println(jdbc.db().toList());
System.out.println(jdbc.db().entries().toList());
System.out.println(jdbc.db().value("people").toList());
System.out.println(jdbc.db().values("people").toList());
System.out.println(jdbc.db().values("people").value("name").toList());
System.out.println(jdbc.db().values("people").entries().toList());

This yields:

[]
[PEOPLE:]
[]
[, ]
[marko, josh]
[NAME:marko, AGE:29, NAME:josh, AGE:32]

The bytecode of the last query is:

[db(), values(people), entries]

JDBCDatabase implements TDatabase, Structure. 
*** JDBCDatabase is the root structure and is referenced by db() *** 
(CRUCIAL POINT)

Assume another table called ADDRESSES with two columns: name and city.

jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)

The above is equivalent to:

SELECT city FROM people,addresses WHERE people.name=addresses.name

If you want to do an inner join (a product), you do this:


jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")

The above is equivalent to:

SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name

NOTES:
1. Instead of select(), we simply jump to the root via db() (or V() for 
graph).
2. Instead of project(), we simply use value() or values().
3. Instead of select() being overloaded with by() join syntax, we use 
has() and path().
- like TP3 we will be smart about dropping path() data once its 
no longer referenced.
4. We can also do LEFT and RIGHT JOINs (haven’t thought through FULL 
OUTER JOIN yet).
- however, we don’t support ‘null' in TP so I don’t know if we 
want to support these null-producing joins. ?

LEFT JOIN:
* If an address doesn’t exist for the person, emit a “null”-filled path.

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
choose(has(“name”,eq(path(“x”).by(“name”))),
  identity(),
  path(“y”).by(null).as(“y”)).
  path(“x”,”y")

SELECT * FROM addresses LEFT JOIN people ON people.name=addresses.name

RIGHT JOIN:

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
choose(has(“name”,eq(path(“x”).by(“name”))),
  identity(),
  path(“x”).by(null).as(“x”)).
  path(“x”,”y")


SUMMARY:

There are no “low level” instructions. Everything is based on the standard 
instructions that we know and love. Finally, if not apparent, the above 
bytecode chunks would ultimately get strategized