Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hi Pieter,

You give some good motivation for a formal schema language. My proposal for
an abstract data model for TinkerPop was, and is Algebraic Property Graphs (
paper ), of which Dragon's data model is
an extension. APG is broader than typical property graphs (e.g. allowing
hyperelements, nested data, and other features which are uncommon or
unknown in connection with TinkerPop), so the best answer to your question
is probably "a variant of APG with restrictions".

Given a formal specification of TinkerPop's data model, we can be very
flexible with respect to concrete syntaxes. Dragon has its YAML syntax, and
the new framework will probably support a slightly different YAML syntax,
but you can specify graph schemas in a variety of languages (the current
tooling will read schemas expressed in YAML, JSON, Thrift, or Protobuf),
and you can express graph data in a variety of languages. What the formal
specification of the data model, and the mappings, give you is the ability
to map schemas and data transparently between the formats, so you can use
whatever is most appropriate to your application.

Btw. at some point, you'll see a schema for property graph features appear
in the branch -- a kind of TP4 successor to Graph.Features
.
This will be a small language for declaring the specific refinement of APG
/ the TinkerPop data model which is supported by a given property graph
implementation. That will help you understand not only a single graph, but
also the characteristic class of graphs for a given vendor, adapter, etc.

Josh




On Thu, Jun 3, 2021 at 11:43 AM pieter gmail 
wrote:

> Hi,
>
> I kinda lost track of what we discussed previously.
> Did we come to a decision regarding what language we are going to use to
> describe the structure of the graph.
>
> yaml,xsd,uml,yang or some category theory based language?
>
> From my understanding this would be the biggest change in tp4. A TinkerPop
> graph will no be longer a tangle of endless vertices and edges but instead
> can, optionally, be well defined and constrained. This way an engineer can,
> long after the original creators of a graph have left, immediately
> understand the graph, without needing to write a single query.
>
> Thanks
> Pieter
>
>
>
>
> On Thu, 2021-06-03 at 09:59 -0700, Joshua Shinavier wrote:
>
> Hi Pieter,
>
>
> On Thu, Jun 3, 2021 at 9:40 AM pieter gmail 
> wrote:
>
> Hi,
>
> Just to understand a bit better whats going on.
>
> Did you hand write the dragon yaml with the antlr grammar as input?
>
>
>
> Yes, the YAML was written by hand, and based pretty closely on Gremlin.g4.
> You can see Stephen's ANTLR definitions inline with the YAML as comments. I
> also took some direction from the Java API.
>
>
>
>
> Did you generate the java classes from the yaml using dragon or
> something else?
>
>
>
> Yes, the Java classes are currently generated using Dragon. I'm limiting
> the generated code to Java for now (other possible targets being Scala and
> Haskell) just to keep diffs to a reasonable size, and because a new,
> open-source solution is needed to replace Dragon. My current thinking is
> that the new transformation framework will be separate from TinkerPop, as
> it will serve non-graph as well as graph use cases. For now, you can think
> of the code generation as a bootstrapping strategy.
>
> Josh
>
>
>
>
>
> Thanks
> Pieter
>
> On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> > Hello all,
> >
> > I would like to take some concrete steps toward the TinkerPop 4
> > interoperability goals I've stated a few times (e.g. see TinkerPop
> > 2020
> > from last year). At
> > a
> > meetup  a
> > couple
> > of months ago, I demonstrated an approach for generating TinkerPop
> > APIs
> > consistently into different languages. I have started to check in
> > some of
> > that generated code in a branch (see my commits here
> > <
> https://github.com/apache/tinkerpop/commits/TINKERPOP-2563-language/gremlin-language
> > >)
> > and add bits and pieces for RDF support, as well.
> >
> > The Apache Software Foundation asks us to discuss any significant
> > changes
> > to the code base on the dev list. Since these steps toward TP4 will
> > be
> > major changes if and when they are merged into the master branch, I
> > will
> > start discussing them here. Expect occasional emails from me about
> > the
> > various things I will be doing in the branch. I absolutely invite
> > comments,
> > feedback, and actual discussion on these design proposals, but even
> > if it's
> > just me issuing self-affirming statements into the void like the King
> > of
> > Pointland, I will just carry on, because that's how this process
> > works.
> >
> > A brief summary of the changes so far:
> >
> >
> >- *Abstract 

Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread pieter gmail
Hi,

I kinda lost track of what we discussed previously.
Did we come to a decision regarding what language we are going to use
to describe the structure of the graph.

yaml,xsd,uml,yang or some category theory based language?

>From my understanding this would be the biggest change in tp4. A
TinkerPop graph will no be longer a tangle of endless vertices and
edges but instead can, optionally, be well defined and constrained.
This way an engineer can, long after the original creators of a graph
have left, immediately understand the graph, without needing to write a
single query.

Thanks
Pieter




On Thu, 2021-06-03 at 09:59 -0700, Joshua Shinavier wrote:
> Hi Pieter,
> 
> 
> On Thu, Jun 3, 2021 at 9:40 AM pieter gmail 
> wrote:
> > Hi,
> > 
> > Just to understand a bit better whats going on.
> > 
> > Did you hand write the dragon yaml with the antlr grammar as input?
> > 
> 
> 
> 
> Yes, the YAML was written by hand, and based pretty closely on
> Gremlin.g4. You can see Stephen's ANTLR definitions inline with the
> YAML as comments. I also took some direction from the Java API.
> 
> 
>  
> > Did you generate the java classes from the yaml using dragon or
> > something else?
> > 
> 
> 
> 
> Yes, the Java classes are currently generated using Dragon. I'm
> limiting the generated code to Java for now (other possible targets
> being Scala and Haskell) just to keep diffs to a reasonable size, and
> because a new, open-source solution is needed to replace Dragon. My
> current thinking is that the new transformation framework will be
> separate from TinkerPop, as it will serve non-graph as well as graph
> use cases. For now, you can think of the code generation as a
> bootstrapping strategy.
> 
> Josh
> 
> 
>  
> > 
> > Thanks
> > Pieter
> > 
> > On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> > > Hello all,
> > > 
> > > I would like to take some concrete steps toward the TinkerPop 4
> > > interoperability goals I've stated a few times (e.g. see
> > TinkerPop
> > > 2020
> > > from last
> > year). At
> > > a
> > > meetup 
> > a
> > > couple
> > > of months ago, I demonstrated an approach for generating
> > TinkerPop
> > > APIs
> > > consistently into different languages. I have started to check in
> > > some of
> > > that generated code in a branch (see my commits here
> > >
> >
>  > > >)
> > > and add bits and pieces for RDF support, as well.
> > > 
> > > The Apache Software Foundation asks us to discuss any significant
> > > changes
> > > to the code base on the dev list. Since these steps toward TP4
> > will
> > > be
> > > major changes if and when they are merged into the master branch,
> > I
> > > will
> > > start discussing them here. Expect occasional emails from me
> > about
> > > the
> > > various things I will be doing in the branch. I absolutely invite
> > > comments,
> > > feedback, and actual discussion on these design proposals, but
> > even
> > > if it's
> > > just me issuing self-affirming statements into the void like the
> > King
> > > of
> > > Pointland, I will just carry on, because that's how this process
> > > works.
> > > 
> > > A brief summary of the changes so far:
> > > 
> > > 
> > >    - *Abstract specification of Gremlin traversals*. I have
> > turned
> > >    Stephen's Gremlin.g4
> > >   
> > >
> >
>  > > >
> > >    ANTLR grammar into an abstract specification of Gremlin
> > traversal
> > > syntax
> > >    using the Dragon (YAML-based) format. Unfortunately, it is
> > looking
> > > very
> > >    unlikely that Dragon will become available as open-source
> > > software, so you
> > >    can expect this YAML format to change just slightly once we
> > have a
> > > new
> > >    Dragon-like tool for schema and data transformations. More on
> > that
> > > later.
> > >    Right now, the syntax specification can be found here
> > >   
> > >
> >
>  > > >,
> > >    although the file path might change in the future.
> > > 
> > > 
> > >    - *Traversal DTOs*. Based on the abstract specification, I
> > have
> > >    generated Java classes for building and working with
> > traversals.
> > > The
> > >    generated files can currently be found here
> > >   
> > >
> >
>  > > >.
> > >    These are essentially POJOs or DTO classes, with special
> > > boilerplate
> > >    methods for equality, pattern matching over alternative
> > > constructors, and
> > >    modification by copying (since the instances are 

Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hi Pieter,


On Thu, Jun 3, 2021 at 9:40 AM pieter gmail  wrote:

> Hi,
>
> Just to understand a bit better whats going on.
>
> Did you hand write the dragon yaml with the antlr grammar as input?
>


Yes, the YAML was written by hand, and based pretty closely on Gremlin.g4.
You can see Stephen's ANTLR definitions inline with the YAML as comments. I
also took some direction from the Java API.




> Did you generate the java classes from the yaml using dragon or
> something else?
>


Yes, the Java classes are currently generated using Dragon. I'm limiting
the generated code to Java for now (other possible targets being Scala and
Haskell) just to keep diffs to a reasonable size, and because a new,
open-source solution is needed to replace Dragon. My current thinking is
that the new transformation framework will be separate from TinkerPop, as
it will serve non-graph as well as graph use cases. For now, you can think
of the code generation as a bootstrapping strategy.

Josh




>
> Thanks
> Pieter
>
> On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> > Hello all,
> >
> > I would like to take some concrete steps toward the TinkerPop 4
> > interoperability goals I've stated a few times (e.g. see TinkerPop
> > 2020
> > from last year). At
> > a
> > meetup  a
> > couple
> > of months ago, I demonstrated an approach for generating TinkerPop
> > APIs
> > consistently into different languages. I have started to check in
> > some of
> > that generated code in a branch (see my commits here
> > <
> https://github.com/apache/tinkerpop/commits/TINKERPOP-2563-language/gremlin-language
> > >)
> > and add bits and pieces for RDF support, as well.
> >
> > The Apache Software Foundation asks us to discuss any significant
> > changes
> > to the code base on the dev list. Since these steps toward TP4 will
> > be
> > major changes if and when they are merged into the master branch, I
> > will
> > start discussing them here. Expect occasional emails from me about
> > the
> > various things I will be doing in the branch. I absolutely invite
> > comments,
> > feedback, and actual discussion on these design proposals, but even
> > if it's
> > just me issuing self-affirming statements into the void like the King
> > of
> > Pointland, I will just carry on, because that's how this process
> > works.
> >
> > A brief summary of the changes so far:
> >
> >
> >- *Abstract specification of Gremlin traversals*. I have turned
> >Stephen's Gremlin.g4
> >
> > <
> https://github.com/apache/tinkerpop/blob/TINKERPOP-2563-language/gremlin-language/src/main/antlr4/Gremlin.g4
> > >
> >ANTLR grammar into an abstract specification of Gremlin traversal
> > syntax
> >using the Dragon (YAML-based) format. Unfortunately, it is looking
> > very
> >unlikely that Dragon will become available as open-source
> > software, so you
> >can expect this YAML format to change just slightly once we have a
> > new
> >Dragon-like tool for schema and data transformations. More on that
> > later.
> >Right now, the syntax specification can be found here
> >
> > <
> https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/main/yaml/org/apache/tinkerpop/gremlin/language/model
> > >,
> >although the file path might change in the future.
> >
> >
> >- *Traversal DTOs*. Based on the abstract specification, I have
> >generated Java classes for building and working with traversals.
> > The
> >generated files can currently be found here
> >
> > <
> https://github.com/apache/tinkerpop/tree/TINKERPOP-2563-language/gremlin-language/src/gen/java/org/apache/tinkerpop/gremlin/language/model
> > >.
> >These are essentially POJOs or DTO classes, with special
> > boilerplate
> >methods for equality, pattern matching over alternative
> > constructors, and
> >modification by copying (since the instances are immutable). These
> > classes
> >allow you to build traversals in a declarative way, while all of
> > the logic
> >for evaluating traversals goes elsewhere. Support for
> > serialization and
> >deserialization for traversals is to be added in the future -- and
> > the same
> >goes for all other classes generated in this way.
> >
> >
> >- *RDF 1.1 concepts model*. RDF support was part of TinkerPop from
> > the
> >beginning, but it was de-emphasized for TinkerPop 3 due to other
> > priorities
> >such as OLAP. For years, developers have been asking us for better
> >interoperability with RDF. While we do have some query-level
> > support for
> >RDF these days in sparql-gremlin, we no longer have any data-level
> > support,
> >e.g. supporting loading RDF data into a property graph and getting
> > it back
> >out, evaluating Gremlin traversals over RDF datasets, etc. These
> > things are
> >not especially hard to do, in certain limited ways, but 

Re: code generation and RDF support in TinkerPop 4

2021-06-03 Thread pieter gmail
Hi,

Just to understand a bit better whats going on.

Did you hand write the dragon yaml with the antlr grammar as input?
Did you generate the java classes from the yaml using dragon or
something else?

Thanks
Pieter

On Thu, 2021-06-03 at 07:48 -0700, Joshua Shinavier wrote:
> Hello all,
> 
> I would like to take some concrete steps toward the TinkerPop 4
> interoperability goals I've stated a few times (e.g. see TinkerPop
> 2020
> from last year). At
> a
> meetup  a
> couple
> of months ago, I demonstrated an approach for generating TinkerPop
> APIs
> consistently into different languages. I have started to check in
> some of
> that generated code in a branch (see my commits here
>  >)
> and add bits and pieces for RDF support, as well.
> 
> The Apache Software Foundation asks us to discuss any significant
> changes
> to the code base on the dev list. Since these steps toward TP4 will
> be
> major changes if and when they are merged into the master branch, I
> will
> start discussing them here. Expect occasional emails from me about
> the
> various things I will be doing in the branch. I absolutely invite
> comments,
> feedback, and actual discussion on these design proposals, but even
> if it's
> just me issuing self-affirming statements into the void like the King
> of
> Pointland, I will just carry on, because that's how this process
> works.
> 
> A brief summary of the changes so far:
> 
> 
>    - *Abstract specification of Gremlin traversals*. I have turned
>    Stephen's Gremlin.g4
>   
>  >
>    ANTLR grammar into an abstract specification of Gremlin traversal
> syntax
>    using the Dragon (YAML-based) format. Unfortunately, it is looking
> very
>    unlikely that Dragon will become available as open-source
> software, so you
>    can expect this YAML format to change just slightly once we have a
> new
>    Dragon-like tool for schema and data transformations. More on that
> later.
>    Right now, the syntax specification can be found here
>   
>  >,
>    although the file path might change in the future.
> 
> 
>    - *Traversal DTOs*. Based on the abstract specification, I have
>    generated Java classes for building and working with traversals.
> The
>    generated files can currently be found here
>   
>  >.
>    These are essentially POJOs or DTO classes, with special
> boilerplate
>    methods for equality, pattern matching over alternative
> constructors, and
>    modification by copying (since the instances are immutable). These
> classes
>    allow you to build traversals in a declarative way, while all of
> the logic
>    for evaluating traversals goes elsewhere. Support for
> serialization and
>    deserialization for traversals is to be added in the future -- and
> the same
>    goes for all other classes generated in this way.
> 
> 
>    - *RDF 1.1 concepts model*. RDF support was part of TinkerPop from
> the
>    beginning, but it was de-emphasized for TinkerPop 3 due to other
> priorities
>    such as OLAP. For years, developers have been asking us for better
>    interoperability with RDF. While we do have some query-level
> support for
>    RDF these days in sparql-gremlin, we no longer have any data-level
> support,
>    e.g. supporting loading RDF data into a property graph and getting
> it back
>    out, evaluating Gremlin traversals over RDF datasets, etc. These
> things are
>    not especially hard to do, in certain limited ways, but our old
> approach of
>    writing adapters like GraphSail
>   
> ,
>    SailGraph
>   
> ,
> and
>    PropertyGraphSail
>   
>  >
>    in Java, with no support for other languages, does not seem
> appropriate for
>    TinkerPop 4. Also, those early mappings were extremely
> underspecified in a
>    formal sense -- good enough for some practical applications, but
> not good
>    enough for anything requiring inference, optimization, or
> composition with
>    other mappings. To that end, I am starting to add abstract
> specifications
>    for RDF along the lines of the Gremlin specifications I described
> above.
>    The first of these, a specification of RDF 1.1 Concepts, can
> currently be
>    found here
>   
> 

code generation and RDF support in TinkerPop 4

2021-06-03 Thread Joshua Shinavier
Hello all,

I would like to take some concrete steps toward the TinkerPop 4
interoperability goals I've stated a few times (e.g. see TinkerPop 2020
from last year). At a
meetup  a couple
of months ago, I demonstrated an approach for generating TinkerPop APIs
consistently into different languages. I have started to check in some of
that generated code in a branch (see my commits here
)
and add bits and pieces for RDF support, as well.

The Apache Software Foundation asks us to discuss any significant changes
to the code base on the dev list. Since these steps toward TP4 will be
major changes if and when they are merged into the master branch, I will
start discussing them here. Expect occasional emails from me about the
various things I will be doing in the branch. I absolutely invite comments,
feedback, and actual discussion on these design proposals, but even if it's
just me issuing self-affirming statements into the void like the King of
Pointland, I will just carry on, because that's how this process works.

A brief summary of the changes so far:


   - *Abstract specification of Gremlin traversals*. I have turned
   Stephen's Gremlin.g4
   

   ANTLR grammar into an abstract specification of Gremlin traversal syntax
   using the Dragon (YAML-based) format. Unfortunately, it is looking very
   unlikely that Dragon will become available as open-source software, so you
   can expect this YAML format to change just slightly once we have a new
   Dragon-like tool for schema and data transformations. More on that later.
   Right now, the syntax specification can be found here
   
,
   although the file path might change in the future.


   - *Traversal DTOs*. Based on the abstract specification, I have
   generated Java classes for building and working with traversals. The
   generated files can currently be found here
   
.
   These are essentially POJOs or DTO classes, with special boilerplate
   methods for equality, pattern matching over alternative constructors, and
   modification by copying (since the instances are immutable). These classes
   allow you to build traversals in a declarative way, while all of the logic
   for evaluating traversals goes elsewhere. Support for serialization and
   deserialization for traversals is to be added in the future -- and the same
   goes for all other classes generated in this way.


   - *RDF 1.1 concepts model*. RDF support was part of TinkerPop from the
   beginning, but it was de-emphasized for TinkerPop 3 due to other priorities
   such as OLAP. For years, developers have been asking us for better
   interoperability with RDF. While we do have some query-level support for
   RDF these days in sparql-gremlin, we no longer have any data-level support,
   e.g. supporting loading RDF data into a property graph and getting it back
   out, evaluating Gremlin traversals over RDF datasets, etc. These things are
   not especially hard to do, in certain limited ways, but our old approach of
   writing adapters like GraphSail
   ,
   SailGraph
   , and
   PropertyGraphSail
   

   in Java, with no support for other languages, does not seem appropriate for
   TinkerPop 4. Also, those early mappings were extremely underspecified in a
   formal sense -- good enough for some practical applications, but not good
   enough for anything requiring inference, optimization, or composition with
   other mappings. To that end, I am starting to add abstract specifications
   for RDF along the lines of the Gremlin specifications I described above.
   The first of these, a specification of RDF 1.1 Concepts, can currently be
   found here
   
,
   with generated Java classes here
   
.
   This gives us a way of working with RDF data in a language-neutral and
   framework-neutral way (whereas we were previously tied to Java and to the
   RDF4j, nee Sesame, API). Mappings into and out of RDF will be defined with
   respect to these