Re: A collection of examples that map a query language query to provider bytecode.

2019-05-12 Thread Marko Rodriguez
Hi,

> Machine machine = RemoteMachine
>.withStructure(NeptuneStructure.class, config1)
>.withProcessor(AkkaProcessor.class, config2)
>.withCompiler(CypherCompiler.class, config3)
>.open(config0);


Yea, I think something like this would work well. 

I like it because it exposes the three main components that TinkerPop is gluing 
together:

Language
Structure
Process

Thus, I would have it:

withStructure()
withProcessor()
withLanguage()

Marko.

http://rredux.com 


> On May 10, 2019, at 8:27 AM, Dmitry Novikov  wrote:
> 
> Stephen, Remote Compiler - very interesting idea to explore. Just for 
> brainstorming, let me imagine how this may look like:
> 
> 
> 1. If the client supports compilation - compiles on the client side
> 2. If remote supports compilation - compiles on the server side
> 3. If neither client and remote support compilation, `config3` could contain 
> the path to microservice.  Microservice does compilation and either return 
> bytecode, either send bytecode to remote and proxy response to the client. 
> Microservice could be deployed on remote as well.
> 
> `config3` may look like respectively:
> 
> 1. `{compilation: 'embedded'}`
> 2. `{compilation: 'remote'}`
> 2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`
> 
> On 2019/05/10 13:45:50, Stephen Mallette  wrote: 
>>> If VM, server or compiler is implemented in another language, there is
>> always a possibility to use something like gRPC or even REST to call
>> microservice that will do query→Universal Bytecode conversion.
>> 
>> That's an interesting way to handle it especially if it could be done in a
>> completely transparent way - a Remote Compiler of some sort. If we had such
>> a thing then the compilation could conceivably happen anywhere, client or
>> server of the host programming language.
>> 
>> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov 
>> wrote:
>> 
>>> Hello,
>>> 
>>> Marko, thank you for the clear explanation.
>>> 
 I don’t like that you would have to create a CypherCompiler class (even
>>> if its just a wrapper) for all popular programming languages. :(
>>> 
>>> Fully agree about this. For declarative languages like SQL, Cypher and
>>> SPARQL complex compilation will be needed, most probably requiring AST
>>> walk. Writing compilers for all popular languages could be possible in
>>> theory, but increases the amount of work n times (where n>language count)
>>> and complicates testing. Also, libraries necessary for the task might not
>>> be available for all languages.
>>> 
>>> In my opinion, to avoid the situation when the number of supported query
>>> languages differs depending on client programming language, it is
>>> preferable to introduce a plugin system. The server might have multiple
>>> endpoints, one for Bytecode, one for SQL, Cypher, etc.
>>> 
>>> If VM, server or compiler is implemented in another language, there is
>>> always a possibility to use something like gRPC or even REST to call
>>> microservice that will do query→Universal Bytecode conversion.
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> On 2019/05/10 12:03:30, Stephen Mallette  wrote:
> I don’t like that you would have to create a CypherCompiler class
>>> (even
 if its just a wrapper) for all popular programming languages. :(
 
 Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
 that GLVs can support the g.sparql() step properly. It seems like no
>>> matter
 what you do, you end up with a situation where the language designer has
>>> to
 do something in each programming language they want to support. The bulk
>>> of
 the work seems to be in the "compiler" so if that were moved to the
>>> server
 (what we did in TP3) then the language designer would only have to write
 that once per VM they wanted to support and then provide a more
>>> lightweight
 library for each programming language they supported on the client-side.
>>> A
 programming language that had the full compiler implementation would have
 the advantage that they could client-side compile or rely on the server.
>>> I
 suppose that a lightweight library would then become the basis for a
>>> future
 full blown compiler in that languagehard one.
 
 
 
 On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez 
>>> wrote:
 
> Hello Dmitry,
> 
>> In TP3 compilation to Bytecode can happen on Gremlin Client side or
> Gremlin Server side:
>> 
>> 1. If compilation is simple, it is possible to implement it for all
> Gremlin Clients: Java, Python, JavaScript, .NET...
>> 2. If compilation is complex, it is possible to create a plugin for
> Gremlin Server. Clients send query string, and server does the
>>> compilation.
> 
> Yes, but not for the reasons you state. Every TP3-compliant language
>>> must
> be able to compile to TP3 bytecode. That bytecode is then 

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-10 Thread Dmitry Novikov
Stephen, Remote Compiler - very interesting idea to explore. Just for 
brainstorming, let me imagine how this may look like:

Machine machine = RemoteMachine
.withStructure(NeptuneStructure.class, config1)
.withProcessor(AkkaProcessor.class, config2)
.withCompiler(CypherCompiler.class, config3)
.open(config0);

1. If the client supports compilation - compiles on the client side
2. If remote supports compilation - compiles on the server side
3. If neither client and remote support compilation, `config3` could contain 
the path to microservice.  Microservice does compilation and either return 
bytecode, either send bytecode to remote and proxy response to the client. 
Microservice could be deployed on remote as well.

`config3` may look like respectively:

1. `{compilation: 'embedded'}`
2. `{compilation: 'remote'}`
2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`

On 2019/05/10 13:45:50, Stephen Mallette  wrote: 
> >  If VM, server or compiler is implemented in another language, there is
> always a possibility to use something like gRPC or even REST to call
> microservice that will do query→Universal Bytecode conversion.
> 
> That's an interesting way to handle it especially if it could be done in a
> completely transparent way - a Remote Compiler of some sort. If we had such
> a thing then the compilation could conceivably happen anywhere, client or
> server of the host programming language.
> 
> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov 
> wrote:
> 
> > Hello,
> >
> > Marko, thank you for the clear explanation.
> >
> > > I don’t like that you would have to create a CypherCompiler class (even
> > if its just a wrapper) for all popular programming languages. :(
> >
> > Fully agree about this. For declarative languages like SQL, Cypher and
> > SPARQL complex compilation will be needed, most probably requiring AST
> > walk. Writing compilers for all popular languages could be possible in
> > theory, but increases the amount of work n times (where n>language count)
> > and complicates testing. Also, libraries necessary for the task might not
> > be available for all languages.
> >
> > In my opinion, to avoid the situation when the number of supported query
> > languages differs depending on client programming language, it is
> > preferable to introduce a plugin system. The server might have multiple
> > endpoints, one for Bytecode, one for SQL, Cypher, etc.
> >
> > If VM, server or compiler is implemented in another language, there is
> > always a possibility to use something like gRPC or even REST to call
> > microservice that will do query→Universal Bytecode conversion.
> >
> > Regards,
> > Dmitry
> >
> > On 2019/05/10 12:03:30, Stephen Mallette  wrote:
> > > >  I don’t like that you would have to create a CypherCompiler class
> > (even
> > > if its just a wrapper) for all popular programming languages. :(
> > >
> > > Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
> > > that GLVs can support the g.sparql() step properly. It seems like no
> > matter
> > > what you do, you end up with a situation where the language designer has
> > to
> > > do something in each programming language they want to support. The bulk
> > of
> > > the work seems to be in the "compiler" so if that were moved to the
> > server
> > > (what we did in TP3) then the language designer would only have to write
> > > that once per VM they wanted to support and then provide a more
> > lightweight
> > > library for each programming language they supported on the client-side.
> > A
> > > programming language that had the full compiler implementation would have
> > > the advantage that they could client-side compile or rely on the server.
> > I
> > > suppose that a lightweight library would then become the basis for a
> > future
> > > full blown compiler in that languagehard one.
> > >
> > >
> > >
> > > On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez 
> > wrote:
> > >
> > > > Hello Dmitry,
> > > >
> > > > > In TP3 compilation to Bytecode can happen on Gremlin Client side or
> > > > Gremlin Server side:
> > > > >
> > > > > 1. If compilation is simple, it is possible to implement it for all
> > > > Gremlin Clients: Java, Python, JavaScript, .NET...
> > > > > 2. If compilation is complex, it is possible to create a plugin for
> > > > Gremlin Server. Clients send query string, and server does the
> > compilation.
> > > >
> > > > Yes, but not for the reasons you state. Every TP3-compliant language
> > must
> > > > be able to compile to TP3 bytecode. That bytecode is then submitted,
> > > > evaluated by the TP3 VM, and a traverser iterator is returned.
> > > >
> > > > However, TP3’s GremlinServer also supports JSR223 ScriptEngine which
> > can
> > > > compile query language Strings server side and then return a traverser
> > > > iterator. This exists so people can submit complex Groovy/Python/JS
> > scripts
> > > > to GremlinServer. The problem with this access point is that arbitrary

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-10 Thread Stephen Mallette
>  If VM, server or compiler is implemented in another language, there is
always a possibility to use something like gRPC or even REST to call
microservice that will do query→Universal Bytecode conversion.

That's an interesting way to handle it especially if it could be done in a
completely transparent way - a Remote Compiler of some sort. If we had such
a thing then the compilation could conceivably happen anywhere, client or
server of the host programming language.

On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov 
wrote:

> Hello,
>
> Marko, thank you for the clear explanation.
>
> > I don’t like that you would have to create a CypherCompiler class (even
> if its just a wrapper) for all popular programming languages. :(
>
> Fully agree about this. For declarative languages like SQL, Cypher and
> SPARQL complex compilation will be needed, most probably requiring AST
> walk. Writing compilers for all popular languages could be possible in
> theory, but increases the amount of work n times (where n>language count)
> and complicates testing. Also, libraries necessary for the task might not
> be available for all languages.
>
> In my opinion, to avoid the situation when the number of supported query
> languages differs depending on client programming language, it is
> preferable to introduce a plugin system. The server might have multiple
> endpoints, one for Bytecode, one for SQL, Cypher, etc.
>
> If VM, server or compiler is implemented in another language, there is
> always a possibility to use something like gRPC or even REST to call
> microservice that will do query→Universal Bytecode conversion.
>
> Regards,
> Dmitry
>
> On 2019/05/10 12:03:30, Stephen Mallette  wrote:
> > >  I don’t like that you would have to create a CypherCompiler class
> (even
> > if its just a wrapper) for all popular programming languages. :(
> >
> > Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
> > that GLVs can support the g.sparql() step properly. It seems like no
> matter
> > what you do, you end up with a situation where the language designer has
> to
> > do something in each programming language they want to support. The bulk
> of
> > the work seems to be in the "compiler" so if that were moved to the
> server
> > (what we did in TP3) then the language designer would only have to write
> > that once per VM they wanted to support and then provide a more
> lightweight
> > library for each programming language they supported on the client-side.
> A
> > programming language that had the full compiler implementation would have
> > the advantage that they could client-side compile or rely on the server.
> I
> > suppose that a lightweight library would then become the basis for a
> future
> > full blown compiler in that languagehard one.
> >
> >
> >
> > On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez 
> wrote:
> >
> > > Hello Dmitry,
> > >
> > > > In TP3 compilation to Bytecode can happen on Gremlin Client side or
> > > Gremlin Server side:
> > > >
> > > > 1. If compilation is simple, it is possible to implement it for all
> > > Gremlin Clients: Java, Python, JavaScript, .NET...
> > > > 2. If compilation is complex, it is possible to create a plugin for
> > > Gremlin Server. Clients send query string, and server does the
> compilation.
> > >
> > > Yes, but not for the reasons you state. Every TP3-compliant language
> must
> > > be able to compile to TP3 bytecode. That bytecode is then submitted,
> > > evaluated by the TP3 VM, and a traverser iterator is returned.
> > >
> > > However, TP3’s GremlinServer also supports JSR223 ScriptEngine which
> can
> > > compile query language Strings server side and then return a traverser
> > > iterator. This exists so people can submit complex Groovy/Python/JS
> scripts
> > > to GremlinServer. The problem with this access point is that arbitrary
> code
> > > can be submitted and thus while(true) { } can hang the system! dar.
> > >
> > > > For example, in Cypher for Gremlin it is possible to use compilation
> to
> > > Bytecode in JVM client, or on the server when using [other language
> > > clients][1].
> > >
> > > I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I
> > > would say that all TP3-compliant query languages must be able to
> compile to
> > > TP3 bytecode.
> > >
> > > > My current understanding is that TP4 Server would serve only for I/O
> > > purposes.
> > >
> > > This is still up in the air, but I believe that we should:
> > >
> > > 1. Only support one data access point.
> > > TP4 bytecode in and traversers out.
> > > 2. The TP4 server should have two components.
> > > (1) One (or many) bytecode input locations (IP/port)
> that
> > > pass the bytecode to the TP4 VM.
> > > (2) Multiple traverser output locations where
> distributed
> > > processors can directly send halted traversers back to the client.
> > >
> > > For me, thats it. However, I’m not a network 

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-10 Thread Dmitry Novikov
Hello,

Marko, thank you for the clear explanation.

> I don’t like that you would have to create a CypherCompiler class (even if 
> its just a wrapper) for all popular programming languages. :(

Fully agree about this. For declarative languages like SQL, Cypher and SPARQL 
complex compilation will be needed, most probably requiring AST walk. Writing 
compilers for all popular languages could be possible in theory, but increases 
the amount of work n times (where n>language count) and complicates testing. 
Also, libraries necessary for the task might not be available for all languages.

In my opinion, to avoid the situation when the number of supported query 
languages differs depending on client programming language, it is preferable to 
introduce a plugin system. The server might have multiple endpoints, one for 
Bytecode, one for SQL, Cypher, etc.

If VM, server or compiler is implemented in another language, there is always a 
possibility to use something like gRPC or even REST to call microservice that 
will do query→Universal Bytecode conversion.

Regards,
Dmitry

On 2019/05/10 12:03:30, Stephen Mallette  wrote: 
> >  I don’t like that you would have to create a CypherCompiler class (even
> if its just a wrapper) for all popular programming languages. :(
> 
> Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
> that GLVs can support the g.sparql() step properly. It seems like no matter
> what you do, you end up with a situation where the language designer has to
> do something in each programming language they want to support. The bulk of
> the work seems to be in the "compiler" so if that were moved to the server
> (what we did in TP3) then the language designer would only have to write
> that once per VM they wanted to support and then provide a more lightweight
> library for each programming language they supported on the client-side. A
> programming language that had the full compiler implementation would have
> the advantage that they could client-side compile or rely on the server. I
> suppose that a lightweight library would then become the basis for a future
> full blown compiler in that languagehard one.
> 
> 
> 
> On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez  wrote:
> 
> > Hello Dmitry,
> >
> > > In TP3 compilation to Bytecode can happen on Gremlin Client side or
> > Gremlin Server side:
> > >
> > > 1. If compilation is simple, it is possible to implement it for all
> > Gremlin Clients: Java, Python, JavaScript, .NET...
> > > 2. If compilation is complex, it is possible to create a plugin for
> > Gremlin Server. Clients send query string, and server does the compilation.
> >
> > Yes, but not for the reasons you state. Every TP3-compliant language must
> > be able to compile to TP3 bytecode. That bytecode is then submitted,
> > evaluated by the TP3 VM, and a traverser iterator is returned.
> >
> > However, TP3’s GremlinServer also supports JSR223 ScriptEngine which can
> > compile query language Strings server side and then return a traverser
> > iterator. This exists so people can submit complex Groovy/Python/JS scripts
> > to GremlinServer. The problem with this access point is that arbitrary code
> > can be submitted and thus while(true) { } can hang the system! dar.
> >
> > > For example, in Cypher for Gremlin it is possible to use compilation to
> > Bytecode in JVM client, or on the server when using [other language
> > clients][1].
> >
> > I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I
> > would say that all TP3-compliant query languages must be able to compile to
> > TP3 bytecode.
> >
> > > My current understanding is that TP4 Server would serve only for I/O
> > purposes.
> >
> > This is still up in the air, but I believe that we should:
> >
> > 1. Only support one data access point.
> > TP4 bytecode in and traversers out.
> > 2. The TP4 server should have two components.
> > (1) One (or many) bytecode input locations (IP/port) that
> > pass the bytecode to the TP4 VM.
> > (2) Multiple traverser output locations where distributed
> > processors can directly send halted traversers back to the client.
> >
> > For me, thats it. However, I’m not a network server-guy so I don’t have a
> > clear understanding of what is absolutely necessary.
> >
> > > Where do you see "Query language -> Universal Bytecode" part in TP4
> > architecture? Will it be in the VM? Or in middleware? How will clients look
> > like in TP4?
> >
> > TP4 will publish a binary serialization specification.
> > It will be dead simple compared to TP3’s binary specification.
> > The only types of objects are: Bytecode, Instruction, Traverser, Tuple,
> > and Primitive.
> >
> > Every query language designer that wants to have their query language
> > execute on the TP4 VM (and thus, against all supporting processing engines
> > and data storage systems) will need to have a compiler from their language
> > 

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-10 Thread Stephen Mallette
>  I don’t like that you would have to create a CypherCompiler class (even
if its just a wrapper) for all popular programming languages. :(

Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
that GLVs can support the g.sparql() step properly. It seems like no matter
what you do, you end up with a situation where the language designer has to
do something in each programming language they want to support. The bulk of
the work seems to be in the "compiler" so if that were moved to the server
(what we did in TP3) then the language designer would only have to write
that once per VM they wanted to support and then provide a more lightweight
library for each programming language they supported on the client-side. A
programming language that had the full compiler implementation would have
the advantage that they could client-side compile or rely on the server. I
suppose that a lightweight library would then become the basis for a future
full blown compiler in that languagehard one.



On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez  wrote:

> Hello Dmitry,
>
> > In TP3 compilation to Bytecode can happen on Gremlin Client side or
> Gremlin Server side:
> >
> > 1. If compilation is simple, it is possible to implement it for all
> Gremlin Clients: Java, Python, JavaScript, .NET...
> > 2. If compilation is complex, it is possible to create a plugin for
> Gremlin Server. Clients send query string, and server does the compilation.
>
> Yes, but not for the reasons you state. Every TP3-compliant language must
> be able to compile to TP3 bytecode. That bytecode is then submitted,
> evaluated by the TP3 VM, and a traverser iterator is returned.
>
> However, TP3’s GremlinServer also supports JSR223 ScriptEngine which can
> compile query language Strings server side and then return a traverser
> iterator. This exists so people can submit complex Groovy/Python/JS scripts
> to GremlinServer. The problem with this access point is that arbitrary code
> can be submitted and thus while(true) { } can hang the system! dar.
>
> > For example, in Cypher for Gremlin it is possible to use compilation to
> Bytecode in JVM client, or on the server when using [other language
> clients][1].
>
> I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I
> would say that all TP3-compliant query languages must be able to compile to
> TP3 bytecode.
>
> > My current understanding is that TP4 Server would serve only for I/O
> purposes.
>
> This is still up in the air, but I believe that we should:
>
> 1. Only support one data access point.
> TP4 bytecode in and traversers out.
> 2. The TP4 server should have two components.
> (1) One (or many) bytecode input locations (IP/port) that
> pass the bytecode to the TP4 VM.
> (2) Multiple traverser output locations where distributed
> processors can directly send halted traversers back to the client.
>
> For me, thats it. However, I’m not a network server-guy so I don’t have a
> clear understanding of what is absolutely necessary.
>
> > Where do you see "Query language -> Universal Bytecode" part in TP4
> architecture? Will it be in the VM? Or in middleware? How will clients look
> like in TP4?
>
> TP4 will publish a binary serialization specification.
> It will be dead simple compared to TP3’s binary specification.
> The only types of objects are: Bytecode, Instruction, Traverser, Tuple,
> and Primitive.
>
> Every query language designer that wants to have their query language
> execute on the TP4 VM (and thus, against all supporting processing engines
> and data storage systems) will need to have a compiler from their language
> to TP4 bytecode.
>
> We will provide 2 tools in all the popular programming languages (Java,
> Python, JS, …).
> 1. A TP4 serializer and deserializer.
> 2. A lightweight network client to submit serialized bytecode and
> deserialize Iterator into objects in that language.
>
> Thus, if the Cypher-TP4 compiler is written in Scala, you would:
> 1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode
> object during your compilation process.
> 2. use our org.apache.tinkerpop.machine.io <
> http://org.apache.tinkerpop.machine.io/>.RemoteMachine object to send the
> Bytecode and get back Iterator objects.
> - RemoteMachine does the serialization and deserialization
> for you.
>
> I originally wrote out how it currently looks in the tp4/ branch, but
> realized that it asks you to write one too many classes. Thus, I think we
> will probably go with something like this:
>
> Machine machine = RemoteMachine.
> withStructure(NeptuneStructure.class, config1).
> withProcessor(AkkaProcessor.class, config2).
>   open(config0);
>
> Iterator results = machine.submit(CypherCompiler.compile("MATCH
> (x)-[knows]->(y)”));
>
> Thus, you would only have to provide a single 

Re: A collection of examples that map a query language query to provider bytecode.

2019-05-09 Thread Marko Rodriguez
Hello Dmitry,

> In TP3 compilation to Bytecode can happen on Gremlin Client side or Gremlin 
> Server side:
> 
> 1. If compilation is simple, it is possible to implement it for all Gremlin 
> Clients: Java, Python, JavaScript, .NET...
> 2. If compilation is complex, it is possible to create a plugin for Gremlin 
> Server. Clients send query string, and server does the compilation.

Yes, but not for the reasons you state. Every TP3-compliant language must be 
able to compile to TP3 bytecode. That bytecode is then submitted, evaluated by 
the TP3 VM, and a traverser iterator is returned.

However, TP3’s GremlinServer also supports JSR223 ScriptEngine which can 
compile query language Strings server side and then return a traverser 
iterator. This exists so people can submit complex Groovy/Python/JS scripts to 
GremlinServer. The problem with this access point is that arbitrary code can be 
submitted and thus while(true) { } can hang the system! dar.

> For example, in Cypher for Gremlin it is possible to use compilation to 
> Bytecode in JVM client, or on the server when using [other language 
> clients][1].

I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I would 
say that all TP3-compliant query languages must be able to compile to TP3 
bytecode.

> My current understanding is that TP4 Server would serve only for I/O purposes.

This is still up in the air, but I believe that we should:

1. Only support one data access point.
TP4 bytecode in and traversers out.
2. The TP4 server should have two components.
(1) One (or many) bytecode input locations (IP/port) that pass 
the bytecode to the TP4 VM.
(2) Multiple traverser output locations where distributed 
processors can directly send halted traversers back to the client.

For me, thats it. However, I’m not a network server-guy so I don’t have a clear 
understanding of what is absolutely necessary.

> Where do you see "Query language -> Universal Bytecode" part in TP4 
> architecture? Will it be in the VM? Or in middleware? How will clients look 
> like in TP4?

TP4 will publish a binary serialization specification.
It will be dead simple compared to TP3’s binary specification.
The only types of objects are: Bytecode, Instruction, Traverser, Tuple, and 
Primitive.

Every query language designer that wants to have their query language execute 
on the TP4 VM (and thus, against all supporting processing engines and data 
storage systems) will need to have a compiler from their language to TP4 
bytecode.

We will provide 2 tools in all the popular programming languages (Java, Python, 
JS, …).
1. A TP4 serializer and deserializer.
2. A lightweight network client to submit serialized bytecode and 
deserialize Iterator into objects in that language. 

Thus, if the Cypher-TP4 compiler is written in Scala, you would:
1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode object 
during your compilation process.
2. use our org.apache.tinkerpop.machine.io 
.RemoteMachine object to send the 
Bytecode and get back Iterator objects.
- RemoteMachine does the serialization and deserialization for 
you.

I originally wrote out how it currently looks in the tp4/ branch, but realized 
that it asks you to write one too many classes. Thus, I think we will probably 
go with something like this:

Machine machine = RemoteMachine.
withStructure(NeptuneStructure.class, config1).
withProcessor(AkkaProcessor.class, config2).
  open(config0);

Iterator results = machine.submit(CypherCompiler.compile("MATCH 
(x)-[knows]->(y)”));

Thus, you would only have to provide a single CypherCompiler class.

If you have any better ideas, please say so. I don’t like that you would have 
to create a CypherCompiler class (even if its just a wrapper) for all popular 
programming languages. :(

Perhaps TP4 has a Compiler interface and compilation happens server side….? But 
then that requires language designers to write their compiler in Java … hmm…..

Hope I’m clear,
Marko.

http://rredux.com 







Re: A collection of examples that map a query language query to provider bytecode.

2019-05-09 Thread Dmitry Novikov
Hello,

please clarify one moment regarding Universal Bytecode and Gremlin Clients.

In TP3 compilation to Bytecode can happen on Gremlin Client side or Gremlin 
Server side:

1. If compilation is simple, it is possible to implement it for all Gremlin 
Clients: Java, Python, JavaScript, .NET...
2. If compilation is complex, it is possible to create a plugin for Gremlin 
Server. Clients send query string, and server does the compilation.

For example, in Cypher for Gremlin it is possible to use compilation to 
Bytecode in JVM client, or on the server when using [other language clients][1].

My current understanding is that TP4 Server would serve only for I/O purposes.

Where do you see "Query language -> Universal Bytecode" part in TP4 
architecture? Will it be in the VM? Or in middleware? How will clients look 
like in TP4?

[1]: 
https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-server-plugin#usage

On 2019/05/06 15:58:09, Marko Rodriguez  wrote: 
> Hello,
> 
> I’m experimenting with moving between X query language and Y bytecode via 
> Universal Bytecode.
> 
> The general (and very difficult) goal of TP4 is to be able to execute queries 
> (from any known query language) against any database (regardless of 
> underlying data model) using any processing engine.
>   - e.g. Gremlin over MySQL (as Relational) using RxJava.
>   - e.g. SQL over Cassandra (as WideColumn) using Flink.
>   - e.g. SPARQL over MongoDB (as Document) using Akka.
>   - e.g. Cypher over Neptune (as Graph) using Pipes.
>   - e.g. ...
> 
> ——
> 
> NOTES:
>   1. Realize that databases are both processors and structures.
>   - MySQL has its own SQL engine.
>   - Cassandra has its own CQL engine.
>   - MongoDB has its own DocumentQuery engine.
>   - …
>   2. What can be processed by the database’s engine should be evaluated 
> by the database (typically).
>   3. What can not be processed by the database’s engine should be 
> evaluated by the processor.
>   4. DATABASE_ENGINE->PROCESSOR->DATABASE_ENGINE->PROCESSOR->etc.
>   - data may move from database to processor back to database 
> back to processor, etc. to yield the final query result.
> 
> The universal bytecode chunks in the examples to come assume the following 
> n-tuple structure accessible via db():
> 
> [0][id:1, label:person, name:marko, outE:*1]
> [1][0:*2, 1:*3]
> [2][id:7, label:knows, outV:*0, inV:*4]
> [3][id:8, label:knows, outV:*0, inV:*5]
> [4][id:2, label:person, name:vadas]
> [5][id:4, label:person, name:josh]
> 
> - All tuples have an id that is outside the id-space of the underlying data 
> model.
> - Field values can have pointers to other tuples via *-prefix notation.
> 
> Every compilation goes:
> 
>   Query language -> Universal Bytecode -> Data Model Bytecode -> Provider 
> Bytecode
> 
> ——
> 
> How do you compile Gremlin to universal bytecode for evaluation over MySQL?
> 
> —— GREMLIN QUERY ——
> 
> g.V().has(“name”,”marko”).out(“knows”).values(“name”)
> 
> —— UNIVERSAL BYTECODE ——
> 
>==> (using tuple pointers)
> 
> db().has(‘label’, within(‘person’,’project’)).
>  has(‘name’,’marko’)
>   values(‘outE’).has(‘label’,’knows’).
>   values(‘inV’).values(‘name’)
> 
> —— RELATIONAL BYTECODE ——
> 
>==>
> 
> R(“people”,”projects").has(“name”,”marko”).
>  join(R(“knows”)).by(“id”,eq(“outV”)).
>  join(R(“people”)).by(“inV”,eq(“id”)).
>   values(“name”)
> 
> —— JDBC BYTECODE ——
> 
>==>
> 
> union(
>   sql(‘SELECT name FROM people as p1, knows, people as p2 WHERE 
> p1.id=knows.outV AND knows.inV=p2.id’),
>   sql(‘SELECT name FROM projects as p1, knows, projects as p2 WHERE 
> p1.id=knows.outV AND knows.inV=p2.id’))
>   
> The assumed SQL tables are:
> 
> CREATE TABLE people (
> id int,
> label string, // person
> name string,
> PRIMARY KEY id
> );
> 
> CREATE TABLE knows (
> id int,
> label string, // knows
> name string,
> outV int,
> inV int,
> PRIMARY KEY id,
> FOREIGN KEY (outV) REFERENCES people(id),
> FOREIGN KEY (inV) REFERENCES people(id)
> );
> 
> There needs to be two mapping specifications (Graph->Universal & 
> Universal->Relational)
>   - V() -> vertex tables are people+projects.
>   - label() -> plural is table name
>   - outE -> person.outE.knows is resolved via knows.outV (foreign key to 
> person table by id)
>   - inV -> knows.values(‘inV’) is resolved to person (foreign key to 
> person table by id)
> 
> Next, we are assuming that a property graph is encoded in MySQL as we have 
> outE, inV, etc. column names. If we want to interpret any relational data as 
> graph, then it is important to denote which tables are “join tables” and what 
> the column-names are for joining. What about when a “join table” references 
> more than 2 other rows? (i.e. n-ary relation — hypergraph) ? Is there a 
> general solution to looking at any 

A collection of examples that map a query language query to provider bytecode.

2019-05-06 Thread Marko Rodriguez
Hello,

I’m experimenting with moving between X query language and Y bytecode via 
Universal Bytecode.

The general (and very difficult) goal of TP4 is to be able to execute queries 
(from any known query language) against any database (regardless of underlying 
data model) using any processing engine.
- e.g. Gremlin over MySQL (as Relational) using RxJava.
- e.g. SQL over Cassandra (as WideColumn) using Flink.
- e.g. SPARQL over MongoDB (as Document) using Akka.
- e.g. Cypher over Neptune (as Graph) using Pipes.
- e.g. ...

——

NOTES:
1. Realize that databases are both processors and structures.
- MySQL has its own SQL engine.
- Cassandra has its own CQL engine.
- MongoDB has its own DocumentQuery engine.
- …
2. What can be processed by the database’s engine should be evaluated 
by the database (typically).
3. What can not be processed by the database’s engine should be 
evaluated by the processor.
4. DATABASE_ENGINE->PROCESSOR->DATABASE_ENGINE->PROCESSOR->etc.
- data may move from database to processor back to database 
back to processor, etc. to yield the final query result.

The universal bytecode chunks in the examples to come assume the following 
n-tuple structure accessible via db():

[0][id:1, label:person, name:marko, outE:*1]
[1][0:*2, 1:*3]
[2][id:7, label:knows, outV:*0, inV:*4]
[3][id:8, label:knows, outV:*0, inV:*5]
[4][id:2, label:person, name:vadas]
[5][id:4, label:person, name:josh]

- All tuples have an id that is outside the id-space of the underlying data 
model.
- Field values can have pointers to other tuples via *-prefix notation.

Every compilation goes:

Query language -> Universal Bytecode -> Data Model Bytecode -> Provider 
Bytecode

——

How do you compile Gremlin to universal bytecode for evaluation over MySQL?

—— GREMLIN QUERY ——

g.V().has(“name”,”marko”).out(“knows”).values(“name”)

—— UNIVERSAL BYTECODE ——

   ==> (using tuple pointers)

db().has(‘label’, within(‘person’,’project’)).
 has(‘name’,’marko’)
  values(‘outE’).has(‘label’,’knows’).
  values(‘inV’).values(‘name’)

—— RELATIONAL BYTECODE ——

   ==>

R(“people”,”projects").has(“name”,”marko”).
 join(R(“knows”)).by(“id”,eq(“outV”)).
 join(R(“people”)).by(“inV”,eq(“id”)).
  values(“name”)

—— JDBC BYTECODE ——

   ==>

union(
  sql(‘SELECT name FROM people as p1, knows, people as p2 WHERE 
p1.id=knows.outV AND knows.inV=p2.id’),
  sql(‘SELECT name FROM projects as p1, knows, projects as p2 WHERE 
p1.id=knows.outV AND knows.inV=p2.id’))

The assumed SQL tables are:

CREATE TABLE people (
id int,
label string, // person
name string,
PRIMARY KEY id
);

CREATE TABLE knows (
id int,
label string, // knows
name string,
outV int,
inV int,
PRIMARY KEY id,
FOREIGN KEY (outV) REFERENCES people(id),
FOREIGN KEY (inV) REFERENCES people(id)
);

There needs to be two mapping specifications (Graph->Universal & 
Universal->Relational)
- V() -> vertex tables are people+projects.
- label() -> plural is table name
- outE -> person.outE.knows is resolved via knows.outV (foreign key to 
person table by id)
- inV -> knows.values(‘inV’) is resolved to person (foreign key to 
person table by id)

Next, we are assuming that a property graph is encoded in MySQL as we have 
outE, inV, etc. column names. If we want to interpret any relational data as 
graph, then it is important to denote which tables are “join tables” and what 
the column-names are for joining. What about when a “join table” references 
more than 2 other rows? (i.e. n-ary relation — hypergraph) ? Is there a general 
solution to looking at any relational schema in terms of a binary property 
graph?

——

How do you compile SQL to universal bytecode for evaluation over Cassandra?

—— SQL QUERY ——

SELECT p2.name FROM people as p1, knows, people as p2 
  WHERE p1.name=marko AND p1.id=knows.outV AND knows.inV=p2.id

—— UNIVERSAL BYTECODE ——

   ==> (using tuple pointers)

db().has(‘label’,’person’).has(‘name’,’marko’).
  values(‘outE’).has(‘label’,’knows’).
  values(‘inV’).values(‘name’)

—— WIDE-COLUMN BYTECODE ——

   ==>

R(‘people’).has(‘name’,’marko’).
  values(‘outE’).has(‘label’,’knows’).values(‘inV’).as(‘$inV’)
R(‘people’).has(‘id’,eq(path(‘$inV’))).values(‘name’)

—— CASSANDRA BYTECODE ——

   ==> 

cql('SELECT outE:knows FROM people WHERE name=marko’).
cql('SELECT name FROM people WHERE id=$inV’).by(‘inV’)

There needs to be a mapping specification from SQL->Universal
- knows.outV is foreign key a person row.
- person.outE.knows is referenced by a knows.outV.

The people-table is is defined as below where each edge is a column.

CREATE TABLE people (
id int,
name string,
age int
outE list>,
PRIMARY KEY (id,name)
);

The last bytecode is chained CQL where each result from