Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-09-13 Thread Henry
I was just reading the Postgresql 11 roadmap and it mentions native graph
support. I would be interested in following the design work for this.

Would this require a the new pluggable storage which is currently in
development or would the existing storage engine be sufficient? I am just
wondering if there are any rough design/plans for this...

https://wiki.postgresql.org/wiki/Fujitsu_roadmap#Multi-model_database

   - *graph: Natively support graph data model. Implement Cypher and/or
   Gremlin as the query language through UDFs.*


Thank you,
Henry


On Sun, Sep 3, 2017 at 1:14 PM MauMau  wrote:

> From: Henry M
> > This may be interesting... they implement cypher (unfortunately they
> had to fork in order to have cypher be a first class query language
> with SQL).
> >
> > https://github.com/bitnine-oss/agensgraph
>
> I'm sorry for my very late reply.
>
> Thanks for the information.  AgensGraph is certainly interesting, but
> the problem is that it's a fork of PostgreSQL as you mentioned.  I
> wish the data models, including query languages, to be pluggable
> extensions, so that various people (especially database researchers?)
> can develop them flexibly.  Of course, I want various data models to
> be incorporated in the core as early as possible, but I'm afraid it's
> not easy.  If new data models can be added as extensions, they can be
> developed outside the PostgreSQL community process, get popular and
> mature, and then be embraced in core like GiST/SP-Gist indexes and
> full text search did.
>
>
> Regards
> MauMau
>
>


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-09-03 Thread MauMau
From: Henry M
> This may be interesting... they implement cypher (unfortunately they
had to fork in order to have cypher be a first class query language
with SQL).
>
> https://github.com/bitnine-oss/agensgraph

I'm sorry for my very late reply.

Thanks for the information.  AgensGraph is certainly interesting, but
the problem is that it's a fork of PostgreSQL as you mentioned.  I
wish the data models, including query languages, to be pluggable
extensions, so that various people (especially database researchers?)
can develop them flexibly.  Of course, I want various data models to
be incorporated in the core as early as possible, but I'm afraid it's
not easy.  If new data models can be added as extensions, they can be
developed outside the PostgreSQL community process, get popular and
mature, and then be embraced in core like GiST/SP-Gist indexes and
full text search did.


Regards
MauMau



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-25 Thread Henry M
This may be interesting... they implement cypher (unfortunately they had to
fork in order to have cypher be a first class query language with SQL).

https://github.com/bitnine-oss/agensgraph



On Mon, Aug 21, 2017 at 12:44 AM Chris Travers 
wrote:

> On Sun, Aug 20, 2017 at 4:10 AM, MauMau  wrote:
>
>> From: Chris Travers
>> > Why cannot you do all this in a language handler and treat as a user
>> defined function?
>> > ...
>> > If you have a language handler for cypher, why do you need in_region
>> or cast_region?  Why not just have a graph_search() function which
>> takes in a cypher query and returns a set of records?
>>
>> The language handler is for *stored* functions.  The user-defined
>> function (UDF) doesn't participate in the planning of the outer
>> (top-level) query.  And they both assume that they are executed in SQL
>> commands.
>>
>
> Sure but stored functions can take arguments, such as a query string which
> gets handled by the language handler.  There's absolutely no reason you
> cannot declare a function in C that takes in a Cypher query and returns a
> set of tuples.   And you can do a whole lot with preloaded shared libraries
> if you need to.
>
> The planning bit is more difficult, but see below as to where I see major
> limits here.
>
>>
>> I want the data models to meet these:
>>
>> 1) The query language can be used as a top-level session language.
>> For example, if an app specifies "region=cypher_graph" at database
>> connection, it can use the database as a graph database and submit
>> Cypher queries without embedding them in SQL.
>>
>
> That sounds like a foot gun.  I would probably think of those cases as
> being ideal for a custom background worker, similar to Mongress.
> Expecting to be able to switch query languages on the fly strikes me as
> adding totally needless complexity everywhere to be honest.  Having
> different listeners on different ports simplifies this a lot and having,
> additionally, query languages for ad-hoc mixing via language handlers might
> be able to get most of what you want already.
>
>>
>> 2) When a query contains multiple query fragments of different data
>> models, all those fragments are parsed and planned before execution.
>> The planner comes up with the best plan, crossing the data model
>> boundary.  To take the query example in my first mail, which joins a
>> relational table and the result of a graph query.  The relational
>> planner considers how to scan the table, the graph planner considers
>> how to search the graph, and the relational planner considers how to
>> join the two fragments.
>>
>
> It seems like all you really need is a planner hook for user defined
> languages (I.e. "how many rows does this function return with these
> parameters" right?).  Right now we allow hints but they are static.  I
> wonder how hard this would be using preloaded, shared libraries.
>
>
>>
>> So in_region() and cast_region() are not functions to be executed
>> during execution phase, but are syntax constructs that are converted,
>> during analysis phase, into calls to another region's parser/analyzer
>> and an inter-model cast routine.
>>
>
> So basically they work like immutable functions except that you cannot
> index the output?
>
>>
>> 1. The relational parser finds in_region('cypher_graph', 'graph
>> query') and produces a parse node InRegion(region_name, query) in the
>> parse tree.
>>
>> 2. The relational analyzer looks up the system catalog to checks if
>> the specified region exists, then calls its parser/analyzer to produce
>>
> the query tree for the graph query fragment.  The relational analyser
>
>
>> attaches the graph query tree to the InRegion node.
>>
>> 3. When the relational planner finds the graph query tree, it passes
>> the graph query tree to the graph planner to produce the graph
>> execution plan.
>>
>> 4. The relational planner produces a join plan node, based on the
>> costs/statistics of the relational table scan and graph query.  The
>> graph execution plan is attached to the join plan node.
>>
>> The parse/query/plan nodes have a label to denote a region, so that
>> appropriate region's routines can be called.
>>
>
> It would be interesting to see how much of what you want you can get with
> what we currently have and what pieces are really missing.
>
> Am I right that if you wrote a function in C to take a Cypher query plan,
> and analyse it, and execute it, the only thing really missing would be
> feedback to the PostgreSQL planner regarding number of rows expected?
>
>>
>> Regards
>> MauMau
>>
>>
>
>
> --
> Best Regards,
> Chris Travers
> Database Administrator
>
> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
> www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-21 Thread Chris Travers
On Sun, Aug 20, 2017 at 4:10 AM, MauMau  wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

Sure but stored functions can take arguments, such as a query string which
gets handled by the language handler.  There's absolutely no reason you
cannot declare a function in C that takes in a Cypher query and returns a
set of tuples.   And you can do a whole lot with preloaded shared libraries
if you need to.

The planning bit is more difficult, but see below as to where I see major
limits here.

>
> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

That sounds like a foot gun.  I would probably think of those cases as
being ideal for a custom background worker, similar to Mongress.
Expecting to be able to switch query languages on the fly strikes me as
adding totally needless complexity everywhere to be honest.  Having
different listeners on different ports simplifies this a lot and having,
additionally, query languages for ad-hoc mixing via language handlers might
be able to get most of what you want already.

>
> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

It seems like all you really need is a planner hook for user defined
languages (I.e. "how many rows does this function return with these
parameters" right?).  Right now we allow hints but they are static.  I
wonder how hard this would be using preloaded, shared libraries.


>
> So in_region() and cast_region() are not functions to be executed
> during execution phase, but are syntax constructs that are converted,
> during analysis phase, into calls to another region's parser/analyzer
> and an inter-model cast routine.
>

So basically they work like immutable functions except that you cannot
index the output?

>
> 1. The relational parser finds in_region('cypher_graph', 'graph
> query') and produces a parse node InRegion(region_name, query) in the
> parse tree.
>
> 2. The relational analyzer looks up the system catalog to checks if
> the specified region exists, then calls its parser/analyzer to produce
> the query tree for the graph query fragment.  The relational analyser
> attaches the graph query tree to the InRegion node.
>
> 3. When the relational planner finds the graph query tree, it passes
> the graph query tree to the graph planner to produce the graph
> execution plan.
>
> 4. The relational planner produces a join plan node, based on the
> costs/statistics of the relational table scan and graph query.  The
> graph execution plan is attached to the join plan node.
>
> The parse/query/plan nodes have a label to denote a region, so that
> appropriate region's routines can be called.
>

It would be interesting to see how much of what you want you can get with
what we currently have and what pieces are really missing.

Am I right that if you wrote a function in C to take a Cypher query plan,
and analyse it, and execute it, the only thing really missing would be
feedback to the PostgreSQL planner regarding number of rows expected?

>
> Regards
> MauMau
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-20 Thread Craig Ringer
On 20 August 2017 at 10:10, MauMau  wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

While I generally agree with Tom on this, I think there are some useful
ideas to examine.

Allow a UDF to emit multiple result sets that can then be incorporated into
a outer query. IMO it'd be fine to support this by returning a wide row of
REFCURSORs and then allow FETCH to be used in a subquery.

The UDF would need to be invoked before the rest of the query was planned,
so the planner could learn the structure of the cursor's result sets.

Or some higher level concept could be introduced, like it was for
aggregates and window functions, where one call can be made to get the
output structure and some stats estimates, and another call (or series) to
get the rows.

I guess you're going two steps further than that, seeking a more integrated
model where the plugin can generate paths and participate more actively in
planning, and where you can optionally make it the default so you don't
need a SQL function call to access it.

If you want to pursue that, I suggest you start small and go step-by-step.
Things like:

* Allow FETCH ...  to be used in subqueries with explicitly
listed output relation structure, like calling a function that returns
record

* Allow pre-execution of parts of a query that produce refcursors used in
subqueries, then finish planning the outer query once the cursor output
types are known

* A construct that can inject arbitrary virtual relations into the
namespace at parse-time, so you don't have to do the dance with refcursors.
(Like WITH).

* Construct that can supply stats estimates for the virtual relations

So try to build it in stages.

You could also potentially use the FDW interface.


> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

Why? What does this offer over the app or client tool wrapping its queries
in "SELECT cypher_graph('')" ?


> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

Here, what you need is a way to define a set of virtual relations on a
per-query basis, where you can get stats estimates for the relations during
planning.

I guess what you're imagining is something more sophisticated where you're
generating some kind of sub-plan candidates, like the path model. With some
kind of interaction so the sub-planner for the other model could know to
generate a different sub-plan based on the context of the outer plan. I
have no idea how that could work. But I think you have about zero chance of
achieving what you want by going straight there. Focus on small incremental
steps, preferably ones you can find other uses for too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread MauMau
From: Chris Travers
> Why cannot you do all this in a language handler and treat as a user
defined function?
> ...
> If you have a language handler for cypher, why do you need in_region
or cast_region?  Why not just have a graph_search() function which
takes in a cypher query and returns a set of records?

The language handler is for *stored* functions.  The user-defined
function (UDF) doesn't participate in the planning of the outer
(top-level) query.  And they both assume that they are executed in SQL
commands.

I want the data models to meet these:

1) The query language can be used as a top-level session language.
For example, if an app specifies "region=cypher_graph" at database
connection, it can use the database as a graph database and submit
Cypher queries without embedding them in SQL.

2) When a query contains multiple query fragments of different data
models, all those fragments are parsed and planned before execution.
The planner comes up with the best plan, crossing the data model
boundary.  To take the query example in my first mail, which joins a
relational table and the result of a graph query.  The relational
planner considers how to scan the table, the graph planner considers
how to search the graph, and the relational planner considers how to
join the two fragments.

So in_region() and cast_region() are not functions to be executed
during execution phase, but are syntax constructs that are converted,
during analysis phase, into calls to another region's parser/analyzer
and an inter-model cast routine.

1. The relational parser finds in_region('cypher_graph', 'graph
query') and produces a parse node InRegion(region_name, query) in the
parse tree.

2. The relational analyzer looks up the system catalog to checks if
the specified region exists, then calls its parser/analyzer to produce
the query tree for the graph query fragment.  The relational analyzer
attaches the graph query tree to the InRegion node.

3. When the relational planner finds the graph query tree, it passes
the graph query tree to the graph planner to produce the graph
execution plan.

4. The relational planner produces a join plan node, based on the
costs/statistics of the relational table scan and graph query.  The
graph execution plan is attached to the join plan node.

The parse/query/plan nodes have a label to denote a region, so that
appropriate region's routines can be called.

Regards
MauMau



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread Tom Lane
"MauMau"  writes:
> I'm thinking of making PostgreSQL a multi-model database by supporting
> data models other than the current relational model.  A data model
> consists of a query language (e.g. SQL for relational model, Cypher
> for graph model), a parser and analyzer to transform a query into a
> query tree, a planner to transform the query tree into an execution
> plan, an executor, and a storage engine.

It sounds like what you want is to replace all of Postgres except
the name.  I'm not clear on the point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread Chris Travers
On Sat, Aug 19, 2017 at 4:29 PM, MauMau  wrote:

> Hello,
>
> Please forgive me for asking such a stupid and rough question.
>
> I'm thinking of making PostgreSQL a multi-model database by supporting
> data models other than the current relational model.  A data model
> consists of a query language (e.g. SQL for relational model, Cypher
> for graph model), a parser and analyzer to transform a query into a
> query tree, a planner to transform the query tree into an execution
> plan, an executor, and a storage engine.
>
> To promote the data model development, I want to make data models
> pluggable.  The rough sketch is:
>
> 1) A data model developer implements the parser, analyzer,
> transformer, planner, executor, and storage engine functions in a
> shared library.
>
> 2) The DBA registers the data model.
>
>   CREATE QUERY LANGUAGE Cypher (
> PARSER = 
>   );
>
>   CREATE DATA MODEL graph (
> QUERY LANGUAGE = Cypher,
> ANALYZER = ,
> TRANSFORMER = ,
> PLANNER = ,
> EXECUTOR = ,
> STORAGE ENGINE = ,
>   );
>
>   CREATE REGION cypher_graph (
> QUERY LANGUAGE = Cypher,
> DATA MODEL = graph
>   );
>

Why cannot you do all this in a language handler and treat as a user
defined function?

>
> The region is just a combination of a query language and a data model,
> much like a locale is a combination of a language and a country.  This
> is because there may be multiple popular query languages for a data
> model.
>
> 3) The application connects to the database, specifying a desired
> region.  The specified region's query language becomes the default
> query language for the session.
>
>
> The application can use the data of multiple data models in one query
> by specifying another region and its query via in_region().  For
> example, the following query combines the relational restaurant table
> and a social graph to get the five chinese restaurants in Tokyo that
> are most popular among friends of John and their friends.
>
>   SELECT r.name, g.num_likers
>   FROM restaurant r,
> cast_region(
>   in_region('cypher_graph',
> 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend),
>   (friend)-[:LIKES]->(restaurant:Restaurant)
>   RETURN restaurant.name, count(*)'),
> 'relational', 'g', '(name text, num_likers int')
>   WHERE r.name = g.name AND
> r.city = 'Tokyo' AND r.cuisine = 'chinese'
>   ORDER BY g.num_likers DESC LIMIT 5


If you have a language handler for cypher, why do you need in_region or
cast_region?  Why not just have a graph_search() function which takes in a
cypher query and returns a set of records?

> ;
>
>
> What do you think would be difficult to make data models pluggable,
> especially related to plugging the parser, planner, executor, etc?
> One possible concern is that various PostgreSQL components might be
> too dependent on the data model being relational, and it would be
> difficult to separate tight coupling.
>

I guess I am missing why the current language handler structure is not
enough.  Maybe I am missing something?


>
> Regards
> MauMau
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin