Re: [TinkerPop] What is the fundamental bytecode for TP4?

2019-03-30 Thread Marko Rodriguez
Hello,

> (As in SQL to "guide" (force?) you, then PL/SQL or TSQL or UDFs, etc.)  The 
> core should be simple, but not too simple, and avoid redundancy.


If you look at how I currently have it set up, we have “core instruction set” 
and “common instruction set.” Common is your standard count, group, sum, 
repeat, etc (~20 instructions). Core is only 6 instructions — branch, initial, 
map, flatmap, filter, and reduce. Every time an instruction is added to common, 
the respective core instruction is also added. The test suite for Pipes uses 
common and the test suite for Beam uses core. By just riding out these two 
instruction set branches I hope to see a pattern emerge and perhaps a 
“common/core instruction set” can be converged upon.

And yes, redundancy is a big flaw in the TP3 instruction set. This popped up on 
the radar early due to the stream ring theory article and initial stabs on TP4 
development. I suspect that the common instruction set will have 1/3 of the 
instructions of TP3.

> I kinda wonder if it just shoves the complexity down into analyzing the 
> arguments of the instructions themselves or other contexts associated with 
> the instructions.maybe too early to tell. I'm just really hoping that 
> TP4 can offer what TP3 didn't, which was an easy way to reason about complex 
> query patterns. we promised that with "tools" in TP3 but those never really 
> materialized (attempts were made, but nothing seemed to stick really).

The problem with TP3 reasoning is that you are reasoning at the “step” level, 
not at the “instruction” level. In TP3, after bytecode, the compilation goes to 
Pipes. This was a wrong move. It meant that we had to embed one execution 
engine (Pipes) into another (Spark, e.g.). In TP4, we compile from bytecode to 
CFunctions (coefficient functions). CFunctions do not assume an execution 
engine. They are simply Map, FlatMap, Reduce, Initial, Branch, and Filter 
functions (stateless functions). It is then up to the execution engine to 
coordinate these functions accordingly. Thus, the strategy reasoning in TP3 was 
awkward because you had to work at manipulating methods/fields on Pipe steps 
(i.e. object reasoning). In TP4, you manipulate [op,arg*]-instructions (i.e. 
primitive array reasoning).

I have not flushed out strategies to any great extent in TP4, but I believe 
they will be easier to write than in TP3. However, I sorta don’t think 
strategies are going to go the same direction as they did in TP3. I’m having 
some inklings that we are not thinking about bytecode optimization in the most 
elegant way… 

Take care,
Marko.

http://rredux.com 




> On Mar 30, 2019, at 10:00 AM, Ben Krug  wrote:
> 
> As an outsider of sorts, this was my thought, too.  Supposedly 'mov' is 
> Turing-complete, but I wouldn't want to program with just that.
> (https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf 
> )  
> 
> Ideally, you have a core language that guides you in how to think, model, and 
> approach, then probably extensions for greater flexibility.
> 
> Hopefully that's the goal.
> 
> On Sat, Mar 30, 2019 at 6:15 AM Stephen Mallette  > wrote:
> Do you/kuppitz think that the reduced/core instruction set means that complex 
> strategy development is simplified? on the surface, less instructions sounds 
> like it will be easier to reason about patterns when providers go to build 
> strategies, but I'm not sure. I kinda wonder if it just shoves the complexity 
> down into analyzing the arguments of the instructions themselves or other 
> contexts associated with the instructions.maybe too early to tell. 
> I'm just really hoping that TP4 can offer what TP3 didn't, which was an easy 
> way to reason about complex query patterns. we promised that with "tools" in 
> TP3 but those never really materialized (attempts were made, but nothing 
> seemed to stick really). 
> 
> On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez  > wrote:
> Hello,
> 
> As you know, one of the major objectives of TP4 is to generalize the virtual 
> machine in order to support any data structure (not just graph).
> 
> Here is an idea that Kuppitz and I batted around yesterday and I spent this 
> morning implementing on the tp4/ branch. 
> 
> From the Stream Ring Theory paper [https://zenodo.org/record/2565243 
> ],
>  we know that universal computation is possible with branch, initial, map, 
> flatmap, filter, reduce stream-based functions. If this is the case, why not 
> make those instructions the TP4 VM instruction set. 
> 
> If 
> 
> arg = constant | bytecode | method call, 
> 
> then the general pattern for each inst

Re: [TinkerPop] What is the fundamental bytecode for TP4?

2019-03-30 Thread Ben Krug
As an outsider of sorts, this was my thought, too.  Supposedly 'mov' is
Turing-complete, but I wouldn't want to program with just that.
(https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf)

Ideally, you have a core language that guides you in how to think, model,
and approach, then probably extensions for greater flexibility.
(As in SQL to "guide" (force?) you, then PL/SQL or TSQL or UDFs, etc.)  The
core should be simple, but not too simple, and avoid redundancy.
Hopefully that's the goal.

On Sat, Mar 30, 2019 at 6:15 AM Stephen Mallette 
wrote:

> Do you/kuppitz think that the reduced/core instruction set means that
> complex strategy development is simplified? on the surface, less
> instructions sounds like it will be easier to reason about patterns when
> providers go to build strategies, but I'm not sure. I kinda wonder if it
> just shoves the complexity down into analyzing the arguments of the
> instructions themselves or other contexts associated with the
> instructions.maybe too early to tell. I'm just really hoping that
> TP4 can offer what TP3 didn't, which was an easy way to reason about
> complex query patterns. we promised that with "tools" in TP3 but those
> never really materialized (attempts were made, but nothing seemed to stick
> really).
>
> On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez 
> wrote:
>
>> Hello,
>>
>> As you know, one of the major objectives of TP4 is to generalize the
>> virtual machine in order to support any data structure (not just graph).
>>
>> Here is an idea that Kuppitz and I batted around yesterday and I spent
>> this morning implementing on the tp4/ branch.
>>
>> From the Stream Ring Theory paper [https://zenodo.org/record/2565243
>> ],
>> we know that universal computation is possible with branch, initial, map,
>> flatmap, filter, reduce stream-based functions. If this is the case, why
>> not make those instructions the TP4 VM instruction set.
>>
>> If
>>
>> arg = constant | bytecode | method call,
>>
>> then the general pattern for each instruction type is:
>>
>> [branch, (arg, bytecode)*]
>> [initial, arg]
>> [map, arg]
>> [flatmap, arg]
>> [filter, ?predicate, arg]
>> [reduce, operator, arg]
>>
>> Let this be called the “core instruction set."
>>
>> Now check this out:
>>
>> g.inject(7L).choose(is(7L), incr()).sum()
>> [initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
>>
>>
>> g.inject(Map.of("name", "marko", "age",
>> 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
>> [initial({age=29, name=marko}), filter([flatmap(map::keys),
>> filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]),
>> map(map::get,age)]
>>
>>
>> These core bytecode chunks currently execute on Pipes and Beam processors
>> as expected.
>>
>> Pretty trippy eh?
>>
>> Now the beautiful thing about this is:
>>
>> 1. Implementing a TP4 VM is trivial. All you have to do is support 6
>> instruction types.
>> - You could rip out a TP4 VM implementation in 1-2 days time.
>> - We can create a foundational C#, Python, C/C++, etc. TP4 VM
>> implementation.
>> - this foundation can then be evolved over time at our leisure. (see next
>> point)
>> 2. More advanced TP4 VMs will compile the the core bytecode to a TP4
>> VM-native bytecode.
>> - This is just like Java’s JIT compiler. For example, the core
>> instruction:
>>   filter([map(dictionary::get,name), filter(eq,marko)])
>> is compiled to the TP4-Java instruction:
>>   has(name,marko)
>> - Every processor must be able to work with core bytecode, but can
>> support VM native instructions such as has(), is(), path(), loops(),
>> groupCount(), etc.
>> - These instructions automatically work for all integrating processors
>> (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
>> - these higher-level instructions don’t require any updates to the
>> processors as these are still (abstractly) filter, flatmap, reduce, etc.
>> functions.
>> 3. Core bytecode is as data agnostic as you can possibly get.
>> - Data structures are accessed via method call references — e.g.
>> map::keys, list::get, vertex::outEdges, etc.
>> - Adding new data structures is simply a matter of adding new datatypes.
>> - The TP4 VM can be used as a general purpose, universal stream-based VM.
>>
>> Here is the conceptual mapping between Java and TP4 terminology:
>>
>> Java sourcecode <=> Gremlin traversal
>> Java bytecode <=> Core bytecode
>> JIT trees <=> TP4-Java-native bytecode
>> Machine code <=> Processor execution plan
>>
>>
>> Its a pretty intense move and all the kinks haven’t been fully worked
>> out, but its definitely something to consider.
>>
>> Your questions and comments are welcome.
>>
>> Take care,
>> Marko.
>>
>> http://rredux.com
>> 

Re: [TinkerPop] What is the fundamental bytecode for TP4?

2019-03-30 Thread Stephen Mallette
Do you/kuppitz think that the reduced/core instruction set means that
complex strategy development is simplified? on the surface, less
instructions sounds like it will be easier to reason about patterns when
providers go to build strategies, but I'm not sure. I kinda wonder if it
just shoves the complexity down into analyzing the arguments of the
instructions themselves or other contexts associated with the
instructions.maybe too early to tell. I'm just really hoping that
TP4 can offer what TP3 didn't, which was an easy way to reason about
complex query patterns. we promised that with "tools" in TP3 but those
never really materialized (attempts were made, but nothing seemed to stick
really).

On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez 
wrote:

> Hello,
>
> As you know, one of the major objectives of TP4 is to generalize the
> virtual machine in order to support any data structure (not just graph).
>
> Here is an idea that Kuppitz and I batted around yesterday and I spent
> this morning implementing on the tp4/ branch.
>
> From the Stream Ring Theory paper [https://zenodo.org/record/2565243], we
> know that universal computation is possible with branch, initial, map,
> flatmap, filter, reduce stream-based functions. If this is the case, why
> not make those instructions the TP4 VM instruction set.
>
> If
>
> arg = constant | bytecode | method call,
>
> then the general pattern for each instruction type is:
>
> [branch, (arg, bytecode)*]
> [initial, arg]
> [map, arg]
> [flatmap, arg]
> [filter, ?predicate, arg]
> [reduce, operator, arg]
>
> Let this be called the “core instruction set."
>
> Now check this out:
>
> g.inject(7L).choose(is(7L), incr()).sum()
> [initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
>
>
> g.inject(Map.of("name", "marko", "age",
> 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
> [initial({age=29, name=marko}), filter([flatmap(map::keys),
> filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]),
> map(map::get,age)]
>
>
> These core bytecode chunks currently execute on Pipes and Beam processors
> as expected.
>
> Pretty trippy eh?
>
> Now the beautiful thing about this is:
>
> 1. Implementing a TP4 VM is trivial. All you have to do is support 6
> instruction types.
> - You could rip out a TP4 VM implementation in 1-2 days time.
> - We can create a foundational C#, Python, C/C++, etc. TP4 VM
> implementation.
> - this foundation can then be evolved over time at our leisure. (see next
> point)
> 2. More advanced TP4 VMs will compile the the core bytecode to a TP4
> VM-native bytecode.
> - This is just like Java’s JIT compiler. For example, the core instruction:
>   filter([map(dictionary::get,name), filter(eq,marko)])
> is compiled to the TP4-Java instruction:
>   has(name,marko)
> - Every processor must be able to work with core bytecode, but can support
> VM native instructions such as has(), is(), path(), loops(), groupCount(),
> etc.
> - These instructions automatically work for all integrating processors
> (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
> - these higher-level instructions don’t require any updates to the
> processors as these are still (abstractly) filter, flatmap, reduce, etc.
> functions.
> 3. Core bytecode is as data agnostic as you can possibly get.
> - Data structures are accessed via method call references — e.g.
> map::keys, list::get, vertex::outEdges, etc.
> - Adding new data structures is simply a matter of adding new datatypes.
> - The TP4 VM can be used as a general purpose, universal stream-based VM.
>
> Here is the conceptual mapping between Java and TP4 terminology:
>
> Java sourcecode <=> Gremlin traversal
> Java bytecode <=> Core bytecode
> JIT trees <=> TP4-Java-native bytecode
> Machine code <=> Processor execution plan
>
>
> Its a pretty intense move and all the kinks haven’t been fully worked out,
> but its definitely something to consider.
>
> Your questions and comments are welcome.
>
> Take care,
> Marko.
>
> http://rredux.com
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/0C21D862-0F7A-4827-81F4-360E20E52B8F%40gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>