Hi,
I have done some work on defining a meta model for Gremlin's property
graph. I am using the approach used in the modelling world, in
particular as done by the OMG group when defining their various meta
models and specifications.
However where OMG uses a subset of the UML to define their meta models
I suggest we use Gremlin. After all Gremlin is the language we use to
describe the world and the property graph meta model can also be
described in Gremlin.
I propose that we have 3 levels of modelling. Each of which can itself
be specified in gremlin.
1: The property graph meta model.
2: The model.
3: The graph representing the actual data.
1) The property graph meta model describes the nature of the property
graph itself. i.e. that property graphs have vertices, edges and
properties.
2) The model is an instance of the meta model. It describes the schema
of a particular graph. i.e. for TinkerPop's modern graph this would be
'person', 'software', 'created' and 'knows' and the various properties
'weight', 'age', 'name' and 'lang' properties.
3) The final level is an instance of the model. It is the actual graph
itself. i.e. for TinkerPop's modern graph it is 'Marko', 'Josh', 'java'
...
1: Property Graph Meta Model
public static Graph gremlinMetaModel() {
enum GremlinDataType {
STRING,
INTEGER,
DOUBLE,
DATE,
TIME
//...
}
TinkerGraph propertyGraphMetaModel = TinkerGraph.open();
Vertex graph = propertyGraphMetaModel.addVertex(T.label, "Graph",
"name", "GremlinDataType::STRING");
Vertex vertex = propertyGraphMetaModel.addVertex(T.label,
"VertexLabel", "label", "GremlinDataType::STRING");
Vertex edge = propertyGraphMetaModel.addVertex(T.label, "EdgeLabel",
"label", "GremlinDataType::STRING");
Vertex vertexProperty = propertyGraphMetaModel.addVertex(T.label,
"VertexProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType");
Vertex edgeProperty = propertyGraphMetaModel.addVertex(T.label,
"EdgeProperty", "name", "GremlinDataType::STRING", "type", "GremlinDataType");
graph.addEdge("vertices", vertex);
graph.addEdge("edges", edge);
vertex.addEdge("properties", vertexProperty);
vertex.addEdge("properties", edgeProperty);
vertex.addEdge("out", edge);
vertex.addEdge("in", edge);
return propertyGraphMetaModel;
}
This can be visualized as,
Notes:
1) GremlinDataType is an enumeration of named data types that Gremlin
supports. All gremlin data types are assumed to be atomic and its life
cycle fully owned by its containing parent. How it is persisted on disc
or transported over the wire is not a concern for the meta model.
2) Gremlin's semantics is to weak to fully specify a valid meta model.
Accompanying the meta model we need a list of constraints specified as
gremlin queries to augment the semantics of the meta model. These
constraints/queries will be able to validate any gremlin specified
model for correctness.
3) It is trivial to extend the meta model. e.g. To specify something
like index support just add an 'Index' vertex and an edge from
'VertexLabel' to it.
Property graph meta model constraints,
1) Every 'VertexLabel' must have a 'label'.
g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id()
2) Every 'EdgeLabel' must have a 'label'.
g.V().hasLabel("EdgeLabel").or(__.hasNot("label"), __.has("label",
P.eq(""))).id()
3) Every 'EdgeLabel' must have at least one 'outEdge' 'VertexLabel'
g.V().hasLabel("EdgeLabel").where(__.not(__.in("outEdge"))).id()
4) Every 'EdgeLabel' must have at least on 'inEdge' 'VertexLabel'
g.V().hasLabel("EdgeLabel").where(__.not(__.in("inEdge"))).id()
5) Every 'VertexProperty' must have a 'name'
gV().hasLabel("VertexProperty").or(__.hasNot("name"), __.has("name",
P.eq(""))).id()
6) Every 'VertexProperty' must have a 'type'
g.V().hasLabel("VertexProperty").or(__.hasNot("type"), __.has("type",
P.eq(""))).id()
7) Every 'EdgePropery' must have a 'name'
g.V().hasLabel("EdgeProperty").or(__.hasNot("name"), __.has("name",
P.eq(""))).id()
8) Every 'EdgeProperty' must have a 'type'
g.V().hasLabel("EdgeProperty").or(__.hasNot("type"), __.has("type",
P.eq(""))).id()
9) Every 'VertexProperty' must have a in 'properties' edge.
g.V().hasLabel("VertexProperty").where(__.not(__.in("properties"))).id()
10) Every 'EdgeProperty' must have a in 'properties' edge.
g.V().hasLabel("EdgeProperty").where(__.not(__.in("properties"))).id()
...
This can be visualized as,
2: The model
What follows is an example of TinkerPop's 'modern' graph specified as
an instance of the above property graph meta model.
public static Graph modernModel() {
//import this from a base package
enum GremlinDataType {
STRING,
INTEGER,
DOUBLE,
DATE,
TIME
//...
}