More emails from Marko. Yes!

2019-04-23 Thread Marko Rodriguez
Hi,

The parallel Josh/Marko/Pieter thread got me thinking… So, given

ComplexType
Iterator siblings(String label)
Iterator children(String label)

…lets see how both structure and processor providers can influence each other 
within the TP4 VM.

Lets take JanusGraph as the example structure.

JanusVertex implements ComplexType

Lets take Akka as the example processor. AkkaProcessor can document:

“If you want query routing functionality for your ComplexTypes, provide 
an akka:location child reference.”

The JanusGraph team plans to provide AkkaProcessor support so they do as asked.

janusVertex.children(“akka:location”) => 127.0.2.2

This is the physical location of the vertex in JanusGraph’s underlying 
Cassandra/HBase/etc. cluster. Now, an AkkaProviderStrategy can do the following:

g.V().has(’name’,’marko’).out(‘knows’).asMap()
==strategizesTo==>
g.V().has(’name’,’marko’).out(‘knows’).akka:route().asMap()

akka:route() is a provider-specific instruction that will look at the incoming 
object, check to see if it has an akka:location child reference, if it does, it 
will teleport the traverser to that machine for the final asMap() execution. 
(i.e. data local query routing). Why pull a bunch of map data over the wire 
when you can send the traverser to the hosting machine and populate the map 
there.

——

We have always talked about providers being able to have custom instructions 
(inserted via provider-specific strategies). What we haven’t discussed and what 
I bring up here is the idea that providers can require/recommend/etc. that data 
providers use certain reference types that they can capitalize on.

Thus, providers interact with other providers within the TP4 VM via:

1. Custom bytecode instructions (process interaction).
2. Custom type references (structure interaction).

Bye,
Marko.

http://rredux.com 






Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Marko Rodriguez
Hi,

I think we are very close to something useable for TP4 structure/. Solving this 
problem elegantly will open the flood gates on tp4/ development.

——

I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit of 
having two instructions for “pointer chasing” instead of one.

Lets put that aside for now and lets turn to modeling a Vertex. Go back to my 
original representation:

vertex.goto(‘label’)
vertex.goto(‘id’)
vertex.goto(‘outE’)
vertex.goto(‘inE’)
vertex.goto(‘properties’)

Any object can be converted into a Map. In TinkerPop3 we convert vertices into 
maps via:

g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
g.V().has(‘name’,’marko’).valueMap(true) => 
{id:1,label:person,name:marko,age:29}

In the spirit of instruction reuse, we should have an asMap() instruction that 
works for ANY object. (As a side: this gets back to ONLY sending primitives 
over the wire, no Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, 
the above is:

g.V().has(‘name’,’marko’).properties().asMap() => {name:marko,age:29}
g.V().has(‘name’,’marko’).asMap() => 
{id:1,label:person,properties:{name:marko,age:29}}

You might ask, why didn’t it go to outE and inE and map-ify that data? Because 
those are "sibling” references, not “children” references. 

goto(‘outE’) is a “sibling” reference. (a vertex does not contain an 
edge)
goto(‘id’) is a “child” reference. (a vertex contains the id)

Where do we find sibling references?
Graphs: vertices don’t contain each other.
OO heaps: many objects don’t contain each other.
RDBMS: rows are linked by joins, but don’t contain each other.

So, the way in which we structure our references (pointers) determines the 
shape of the data and ultimately how different instructions will behave. We 
can’t assume that asMap() knows anything about 
vertices/edges/documents/rows/tables/etc. It will simply walk all 
child-references and create a map.

We don’t want TP to get involved in “complex data types.” We don’t care. You 
can propagate MyDatabaseObject through the TP4 VM pipeline and load your object 
up with methods for optimizations with your DB and all that, but for TP4, your 
object is just needs to implement:

ComplexType
- Iterator children(String label)
- Iterator siblings(String label)
- default Iterator references(String label) { 
IteratorUtils.concat(children(label), siblings(label)) }
- String toString()

When a ComplexType goes over the wire to the user, it just represented as a 
ComplexTypeProxy with a toString() like v[1], 
tinkergraph[vertices:10,edges:34], etc. All references are disconnected. Yes, 
even children references. We do not want language drivers having to know about 
random object types and have to deal with implementing serializers and all that 
non-sense. The TP4 serialization protocol is primitives, maps, lists, bytecode, 
and traversers. Thats it!

*** Only Maps and Lists (that don’t contain complex data types) maintain their 
child references “over the wire.”

——

I don’t get your hypergraph example, so let me try another example:

tp ==member==> marko, josh

TP is a vertex and there is a directed hyperedge with label “member” connecting 
to marko and josh vertices.

tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)

Looks exactly like a property graph query? However, its not because goto(“inV”) 
returns 2 vertices, not 1. EdgeVertexFlatmapFunction works for property graphs 
and hypergraphs. It doesn’t care — it just follows goto() pointers! That is, it 
follows the ComplexType.references(“inV”). Multi-properties are the same as 
well. Likewise for meta-properties. These data model variations are not 
“special” to the TP4 VM. It just walks references whether there are 0,1,2, or N 
of them.

Thus, what is crucial to all this is the “shape of the data.” Using your 
pointers wisely so instructions produce useful results.

Does any of what I wrote update your comeFrom().goto() stuff? If not, can you 
please explain to me why comeFrom() is cool — sorry for being dense (aka “being 
Kuppitz" — thats right, I said it. boom!).

Thanks,
Marko.

http://rredux.com 




> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier  wrote:
> 
> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez 
> wrote:
> 
>> Hey Josh,
>> 
>> This gets to the notion I presented in “The Fabled GMachine.”
>>http://rredux.com/the-fabled-gmachine.html <
>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
>> “Structures, Processes, and Languages” section)
>> 
>> All that exists are memory addresses that contain either:
>> 
>>1. A primitive
>>2. A set of labeled references to other references or primitives.
>> 
>> Using your work and the above, here is a super low-level ‘bytecode' for
>> property graphs.
>> 
>> v.goto("id") => 1
>> 
> 
> LGTM. An id is 

[Article] Pull vs. Push-Based Loop Fusion in Query Engines

2019-04-23 Thread Marko Rodriguez
Hello,

I just read this article:

Push vs. Pull-Based Loop Fusion in Query Engines
https://arxiv.org/abs/1610.09166 

It is a really good read if you are interested in TP4. Here are some notes I 
jotted down:

1. Pull-based engines are inefficient when there are lots of filters().
- they require a while(predicate.test(next())) which introduces 
branch flow control and subsequent JVM performance issues.
- push-based engines simply don’t emit() if the 
predicate.test() is false. Thus, no branching.
2. Pull-based engines are better at limit() based queries.
- they only process what is necessary to satisfy the limit.
- push-based engines will provide more results than needed 
given their eager evaluation strategy (backpressure comes into play).
3. We should introduce a "collection()" operator in TP4 for better 
expressivity with list and map manipulation and so we don’t have to use 
unfold()…fold().
- [9,11,13].collection(incr().is(gt(10))) => [12,14]
- the ability to chain functions in a collection manipulation 
sequence is crucial for performance as you don’t create intermediate 
collections.
4. Given that some bytecode is best on a push-based vs. a pull-based 
(and vice versa), we can strategize for this accordingly.
- We have Pipes for pull-based.
- We have RxJava for push-based.
- We can even isolate sub-sections of a flow. For instance:
g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
==>becomes
g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
- where the local(bytecode) (TP3-style) is 
executed by Pipes and the root bytecode by rxJava.
5. They have lots of good tips for writing JVM performant 
operators/steps/functions.
- All their work is done in Scala.

Enjoy!,
Marko.

http://rredux.com 






Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Joshua Shinavier
On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez 
wrote:

> Hey Josh,
>
> This gets to the notion I presented in “The Fabled GMachine.”
> http://rredux.com/the-fabled-gmachine.html <
> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> “Structures, Processes, and Languages” section)
>
>  All that exists are memory addresses that contain either:
>
> 1. A primitive
> 2. A set of labeled references to other references or primitives.
>
> Using your work and the above, here is a super low-level ‘bytecode' for
> property graphs.
>
> v.goto("id") => 1
>

LGTM. An id is special because it is uniquely identifying / is a primary
key for the element. However, it is also just a field of the element, like
"in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id would
only really need to be unique among other elements of the same type. To the
above, I would add:

v.type() => Person

...a special operation which takes you from an element to its type. This is
important if unions are supported; e.g. "name" in my example can apply
either to a Person or a Project.


v.goto("label") => person
>

Or that. Like "id", "type"/"label" is special. You can think of it as a
field; it's just a different sort of field which will have the same value
for all elements of any given type.



> v.goto("properties").goto("name") => "marko"
>

OK, properties. Are properties built-in as a separate kind of thing from
edges, or can we treat them the same as vertices and edges here? I think we
can treat them the same. A property, in the algebraic model I described
above, is just an element with two fields, the second of which is a
primitive value. As I said, I think we need two distinct traversal
operations -- projection and selection -- and here is where we can use the
latter. Here, I will call it "comeFrom".

v.comeFrom("name", "out").goto("in") => {"marko"}

You can think of this comeFrom as a special case of a select() function
which takes a type -- "name" -- and a set of key/value pairs {("out", v)}.
It returns all matching elements of the given type. You then project to the
"in" value using your goto. I wrote {"marko"} as a set, because comeFrom
can give you multiple properties, depending on whether multi-properties are
supported.

Note how similar this is to an edge traversal:

v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}

Of course, you could define "properties" in such a way that a
goto("properties") does exactly this under the hood, but in terms of low
level instructions, you need something like comeFrom.


v.goto("properties").goto("name").goto(0) => "m"
>

This is where the notion of optionals becomes handy. You can make
array/list indices into fields like this, but IMO you should also make them
safe. E.g. borrowing Haskell syntax for a moment:

v.goto("properties").goto("name").goto(0) => Just 'm'

v.goto("properties").goto("name").goto(5) => Nothing


v.goto("outE").goto("inV") => v[2], v[4]
>

I am not a big fan of untyped "outE", but you can think of this as a union
of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
"knows" and "created" are edge types which are applicable to "Person", so
you will only get {v[2], v[4]}. If you want to get really crazy, you can
allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.



> g.goto("V").goto(1) => v[1]
>

That, or you give every element a virtual field called "graph". So:

v.goto("graph") => g

g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}

g.comeFrom("Person", "graph").restrict("id", 1)

...where restrict() is the relational "sigma" operation as above, not to be
confused with TinkerPop's select(), filter(), or has() steps. Again, I
prefer to specify a type in comeFrom (i.e. we're looking specifically for a
Person with id of 1), but you could also do a comprehension g.comeFrom(x,
"graph"), letting x range over all types.



> The goto() instruction moves the “memory reference” (traverser) from the
> current “memory address” to the “memory address” referenced by the goto()
> argument.
>

Agreed, if we also think of primitive values as memory references.



> The Gremlin expression:
>
> g.V().has(‘name’,’marko’).out(‘knows’).drop()
>
> ..would compile to:
>
>
> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>


In the alternate universe:

g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
"marko").goto("out").comeFrom("knows", "out").goto("in").free()

I have wimped out on free() and just left it as you had it, but I think it
would be worthwhile to explore a monadic syntax for traversals with
side-effects. Different topic.

Now, all of this "out", "in" business is getting pretty repetitive, right?
Well, the field names become more diverse if we allow hyper-edges and
generalized ADTs. E.g. in my Trip example, say I want to know all drop-off
locations for a given rider:


TP4 Processors now support both push- and pull-based semantics.

2019-04-23 Thread Marko Rodriguez
Hi,

Stephen and Bryn were looking over my RxJava implementation the other day and 
Bryn, with his British accent, was like [I paraphrase]:

“Whoa dawg! Bro should like totally not be blocking to fill an 
iterator. Gnar gnar for surezies.”

Prior to now, Processor implemented Iterator, where for RxJava, when 
you do next()/hasNext() if there were no results in the queue and the flowable 
was still running, then the iterator while()-blocks waiting for a result or for 
the flowable to terminate.

This morning I decided to redo the Processor interface (and respective 
implementations) and it is much nicer now. We have two “execute” methods:

Iterator Processor.iterator(Iterator starts)
void Processor.subscribe(Iterator starts, Consumer 
consumer)

A processor can only be executed using one of the methods above. Thus, 
depending on context and the underlying processor, the VM determines whether to 
use pull-based or push-based semantics. Pretty neat, eh?


https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/processor/Processor.java
 


Check out how I do Pipes:


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L113-L126
 


Pipes is inherently pull-based. However, to simulate push-based semantics, I 
Thread().start() the iterator.hasNext()/next() and just consume.accept() the 
results. Thus, as desired, subscribe() returns immediately.

Next, here is my RxJava implementation.


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/SerialRxJava.java#L59-L65
 


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/AbstractRxJava.java#L66-L86
 


You can see how I turn a push-based subscription into a pull-based iteration 
using the good ‘ol while()-block :).


https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/AbstractRxJava.java#L98-L102
 


——

What I need to do next is to redo the RxJava execution planner such that nested 
traversals (e.g. map(out( are subscription-based with the parent flowable. 
I don’t quite know how I will do it — but I believe I will have to write custom 
Publisher/Subscriber objects for use with Flowable.compose() such that onNext() 
and onComplete() will be called accordingly within the consumer.accept(). It 
will be tricky as I’m not too good with low-level RxJava, but thems the breaks.

Please note that my push-based conceptual skills are not the sharpest so if 
anyone has any recommendations, please advise.

Take care,
Marko.

http://rredux.com 






Re: What makes 'graph traversals' and 'relational joins' the same?

2019-04-23 Thread Marko Rodriguez
Hey Josh,

This gets to the notion I presented in “The Fabled GMachine.”
http://rredux.com/the-fabled-gmachine.html 
 (first paragraph of “Structures, 
Processes, and Languages” section)

 All that exists are memory addresses that contain either:

1. A primitive
2. A set of labeled references to other references or primitives.

Using your work and the above, here is a super low-level ‘bytecode' for 
property graphs.

v.goto("id") => 1
v.goto("label") => person
v.goto("properties").goto("name") => "marko"
v.goto("properties").goto("name").goto(0) => "m"
v.goto("outE").goto("inV") => v[2], v[4]
g.goto("V").goto(1) => v[1]

The goto() instruction moves the “memory reference” (traverser) from the 
current “memory address” to the “memory address” referenced by the goto() 
argument.

The Gremlin expression:

g.V().has(‘name’,’marko’).out(‘knows’).drop()

..would compile to:


g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()

…where free() is the opposite of malloc().

If we can get things that “low-level” and still efficient to compile, then we 
can model every data structure. All you are doing is pointer chasing through a 
withStructure() data structure. .

No one would ever want to write strategies for goto()-based Bytecode. Thus, 
perhaps there could be a PropertyGraphDecorationStrategy that does:

g = Gremlin.traversal(machine).withStructure(JanusGraph.class)  // this will 
register the strategy
g.V().has(‘name’,’marko’).out(‘knows’).drop() // this generates goto()-based 
bytecode underneath the covers
==submit==>
[goto,V][filter,[goto…]][goto][goto][free]] // Bytecode from the “fundamental 
instruction set” 
[V][has,name,marko][out,knows][drop] // PropertyGraphDecorationStrategy 
converts goto() instructions into a property graph-specific instruction set.
[V-idx,name,marko][out,knows][drop] // JanusGraphProviderStrategy converts 
V().has() into an index lookup instruction.

[I AM NOW GOING OFF THE RAILS]

Like fluent-style Gremlin, we could have an AssemblyLanguage that only has 
goto(), free(), malloc(), filter(), map(), reduce(), flatmap(), barrier(), 
branch(), repeat(), sideEffect() instructions. For instance, if you wanted to 
create an array list (not a linked list! :):

[“marko”,29,true]

you would do:

malloc(childrefs(0,1,2)).sideEffect(goto(0).malloc(“marko”)).sideEffect(goto(1).malloc(29)).sideEffect(goto(2).malloc(true))

This tells the underlying data structure (e.g. database) that you want to 
create a set of “children references" labeled 0, 1, and 2. And then you goto() 
each reference and add primitives. Now, if JanusGraph got this batch of 
instructions, it would do the following:

Vertex refs = graph.addVertex()
refs.addEdge(“childref", graph.addVertex(“value”,”marko”)).property(“ref”,0)
refs.addEdge(“childref", graph.addVertex(“value”,29)).property(“ref”,1)
refs.addEdge(“childref", graph.addVertex(“value”,true)).property(“ref”,2)

The reason for childref, is that if you delete the list, you should delete all 
the children referenced data! In other words, refs-vertex has cascading deletes.

list.drop()
==>
list.sideEffect(goto(0,1,2).free()).free()

JanusGraph would then do:

refs.out(“childref").drop()
refs.drop()

Or, more generally:

refs.emit().repeat(out(“childref”)).drop()

Trippy.

[I AM NOW BACK ON THE RAILS]

Its as if “properties”, “outE”, “label”, “inV”, etc. references mean something 
to property graph providers and they can do more intelligent stuff than what 
MongoDB would do with such information. However, someone, of course, can create 
a MongoDBPropertyGraphStrategy that would make documents look like vertices and 
edges and then use O(log(n)) lookups on ids to walk the graph. However, if that 
didn’t exist, it would still do something that works even if its horribly 
inefficient as every database can make primitives with references between them!

Anywho @Josh, I believe goto() is what you are doing with multi-references off 
an object. How do we make it all clean, easy, and universal?

Marko.

http://rredux.com 




> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier  wrote:
> 
> Ah, glad you asked. It's all in the pictures. I have nowhere to put them 
> online at the moment... maybe this attachment will go through to the list?
> 
> Btw. David Spivak gave his talk today at Uber; it was great. Juan Sequeda 
> (relational <--> RDF mapping guy) was also here, and Ryan joined remotely. 
> Really interesting discussion about databases vs. graphs, and what category 
> theory brings to the table.
> 
> 
> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez  > wrote:
> Hey Josh,
> 
> I’m digging what you are saying, but the pictures didn’t come through for me 
> ? … Can you provide them again (or if dev@ is filtering them, can you give me 
> URLs to them)?
> 
> Thanks,

Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Marko Rodriguez
Whoa! — are you saying that we should write an ANTLR parser that compiles 
Gremlin-XXX into Bytecode directly?

Thus, for every Gremlin language variant, we will have an ANTLR parser.

Marko.

http://rredux.com 




> On Apr 23, 2019, at 5:01 AM, Jorge Bay Gondra  
> wrote:
> 
> Hi,
> Language recognition engines will give us a set of tokens, usually in some
> sort of tree but the result can be thought of nested collections, for
> example:
> 
> The following string "g.V().values('name')" could be parsed into something
> like [["g"], ["V"], ["values", "name"]].
> 
> Then, we would have to create some sort of "evaluator", that translates
> these string tokens into a traversal, similar to bytecode parsing and
> execution. This evaluator can use static evaluation of the tokens (like, do
> the tokens evaluate into something meaningful?), can be optimized with
> caching techniques (like preparing traversals) and more importantly, will
> only execute class methods that are whitelisted, i.e., users can't use it
> to execute arbitrary groovy code.
> 
> Best,
> Jorge
> 
> 
> On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez  >
> wrote:
> 
>> Hi Jorge,
>> 
>>> Instead of supporting a ScriptEngine or enable providers to implement
>> one,
>>> TP4 could be a good opportunity to ditch script engines while continue
>>> supporting gremlin-groovy string literals using language recognition
>>> engines like ANTLR.
>> 
>> Huh…….. Can you explain how you think of using ANTLR vs
>> ScriptEngine.submit(String)
>> 
>>> Language recognition and parsing engines have several benefits over the
>>> current approach, most notably that it's safe to parse text using
>> language
>>> recognition as it results in string tokens, opposed to let users run code
>>> in a sandboxed vm.
>> 
>> How would the ANTLR-parsed text ultimately be executed?
>> 
>> Thanks,
>> Marko.
>> 
>> http://rredux.com  > >



Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Jorge Bay Gondra
Hi,
Language recognition engines will give us a set of tokens, usually in some
sort of tree but the result can be thought of nested collections, for
example:

The following string "g.V().values('name')" could be parsed into something
like [["g"], ["V"], ["values", "name"]].

Then, we would have to create some sort of "evaluator", that translates
these string tokens into a traversal, similar to bytecode parsing and
execution. This evaluator can use static evaluation of the tokens (like, do
the tokens evaluate into something meaningful?), can be optimized with
caching techniques (like preparing traversals) and more importantly, will
only execute class methods that are whitelisted, i.e., users can't use it
to execute arbitrary groovy code.

Best,
Jorge


On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez 
wrote:

> Hi Jorge,
>
> > Instead of supporting a ScriptEngine or enable providers to implement
> one,
> > TP4 could be a good opportunity to ditch script engines while continue
> > supporting gremlin-groovy string literals using language recognition
> > engines like ANTLR.
>
> Huh…….. Can you explain how you think of using ANTLR vs
> ScriptEngine.submit(String)
>
> > Language recognition and parsing engines have several benefits over the
> > current approach, most notably that it's safe to parse text using
> language
> > recognition as it results in string tokens, opposed to let users run code
> > in a sandboxed vm.
>
> How would the ANTLR-parsed text ultimately be executed?
>
> Thanks,
> Marko.
>
> http://rredux.com 
>
>
>


Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Marko Rodriguez
Hi Jorge,

> Instead of supporting a ScriptEngine or enable providers to implement one,
> TP4 could be a good opportunity to ditch script engines while continue
> supporting gremlin-groovy string literals using language recognition
> engines like ANTLR.

Huh…….. Can you explain how you think of using ANTLR vs 
ScriptEngine.submit(String)

> Language recognition and parsing engines have several benefits over the
> current approach, most notably that it's safe to parse text using language
> recognition as it results in string tokens, opposed to let users run code
> in a sandboxed vm.

How would the ANTLR-parsed text ultimately be executed?

Thanks,
Marko.

http://rredux.com 




Re: [DISCUSS] The Two Protocols of TP4

2019-04-23 Thread Jorge Bay Gondra
Hi,
I'm still trying to catch up with TP4 topics.

I agree that we can reuse bytecode to submit gremlin string literals,
like [[submit,
[ex:script, gremlin-groovy, g.V.out.name]]]

Instead of supporting a ScriptEngine or enable providers to implement one,
TP4 could be a good opportunity to ditch script engines while continue
supporting gremlin-groovy string literals using language recognition
engines like ANTLR.

Language recognition and parsing engines have several benefits over the
current approach, most notably that it's safe to parse text using language
recognition as it results in string tokens, opposed to let users run code
in a sandboxed vm.

Jorge



On Tue, Apr 16, 2019 at 8:43 PM Marko Rodriguez 
wrote:

> Hi,
>
>
> > hmm - it sounds like supporting the vm protocol requires a session. like
> > each "g" from a client needs to hold state on the server between
> requests.
> > or am i thinking about it too concretely and this protocol is more of an
> > abstraction of what's happening?
>
> No, you are right. Its pretty analogous to TP3. The server holds a bunch
> of “g” instances. “g” instances are thread-safe and immutable. Submitted
> bytecode can have a source instruction that references a cached “g” on the
> server (e.g. via a UUID — though this is up to the Machine implementation).
> If it does, then that cached “g” is used to spawn the traversal via the
> operation instructions. Also, this is not just for “over the wire”
> communication. Its not specific to server behavior. The Machine interface
> can be a LocalMachine and still you have this notion of pre-compiled source
> instructions that were machine.registered().
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41
> >
>
> Finally, if you want to build a Machine that doesn’t pre-compile the
> source instructions, well, this is what your Machine implementation looks
> like:
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java
> >
>
> Marko.
>
> >
> >
> > On Tue, Apr 16, 2019 at 1:58 PM Marko Rodriguez  >
> > wrote:
> >
> >> Hi,
> >>
> >>> i get the "submit" part but could you explain the "register" and
> >>> "unregister" parts (referenced in another post somewhere perhaps)?
> >>
> >> These three methods are from the Machine API.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >
> >>>
> >>
> >> Bytecode is composed of two sets of instructions.
> >>- source instructions
> >>- operation instructions
> >>
> >> source instructions are withProcessor(), withStructure(),
> withStrategy(),
> >> etc.
> >> operation instructions are out(), in(), count(), where(), etc.
> >>
> >> The source instructions are expensive to execute. Why? — when you
> evaluate
> >> a withStructure(), you are creating a connection to the database. When
> you
> >> evaluate a withStrategy(), you are sorting strategies. It is for this
> >> reason that we have the concept of a TraversalSource in TP3 that does
> all
> >> that “setup stuff” once and only once for each g. The reason we tell
> people
> >> to not do graph.traversal().V(), but instead g = graph.traversal(). Once
> >> you have ‘g’, you can then spawn as many traversals as you want off
> that it
> >> without incurring the cost of re-processing the source instructions
> again.
> >>
> >> In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t
> >> know about databases, processors, strategy compilation, etc. Thus, when
> you
> >> Machine.register(Bytecode) you are sending over the source instructions,
> >> having them processed at the TP4 VM and then all subsequent submits()
> with
> >> the same source instruction header will use the “pre-compiled” source
> >> bytecode cached in the TP4 VM. g.close() basically does
> >> Machine.unregister().
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> >> <
> >>
>