Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Henry M Fri, 25 Aug 2017 10:06:41 -0700

This may be interesting... they implement cypher (unfortunately they had to
fork in order to have cypher be a first class query language with SQL).


https://github.com/bitnine-oss/agensgraph



On Mon, Aug 21, 2017 at 12:44 AM Chris Travers <chris.trav...@adjust.com>
wrote:

> On Sun, Aug 20, 2017 at 4:10 AM, MauMau <maumau...@gmail.com> wrote:
>
>> From: Chris Travers
>> > Why cannot you do all this in a language handler and treat as a user
>> defined function?
>> > ...
>> > If you have a language handler for cypher, why do you need in_region
>> or cast_region?  Why not just have a graph_search() function which
>> takes in a cypher query and returns a set of records?
>>
>> The language handler is for *stored* functions.  The user-defined
>> function (UDF) doesn't participate in the planning of the outer
>> (top-level) query.  And they both assume that they are executed in SQL
>> commands.
>>
>
> Sure but stored functions can take arguments, such as a query string which
> gets handled by the language handler.  There's absolutely no reason you
> cannot declare a function in C that takes in a Cypher query and returns a
> set of tuples.   And you can do a whole lot with preloaded shared libraries
> if you need to.
>
> The planning bit is more difficult, but see below as to where I see major
> limits here.
>
>>
>> I want the data models to meet these:
>>
>> 1) The query language can be used as a top-level session language.
>> For example, if an app specifies "region=cypher_graph" at database
>> connection, it can use the database as a graph database and submit
>> Cypher queries without embedding them in SQL.
>>
>
> That sounds like a foot gun.  I would probably think of those cases as
> being ideal for a custom background worker, similar to Mongress.
> Expecting to be able to switch query languages on the fly strikes me as
> adding totally needless complexity everywhere to be honest.  Having
> different listeners on different ports simplifies this a lot and having,
> additionally, query languages for ad-hoc mixing via language handlers might
> be able to get most of what you want already.
>
>>
>> 2) When a query contains multiple query fragments of different data
>> models, all those fragments are parsed and planned before execution.
>> The planner comes up with the best plan, crossing the data model
>> boundary.  To take the query example in my first mail, which joins a
>> relational table and the result of a graph query.  The relational
>> planner considers how to scan the table, the graph planner considers
>> how to search the graph, and the relational planner considers how to
>> join the two fragments.
>>
>
> It seems like all you really need is a planner hook for user defined
> languages (I.e. "how many rows does this function return with these
> parameters" right?).  Right now we allow hints but they are static.  I
> wonder how hard this would be using preloaded, shared libraries.
>
>
>>
>> So in_region() and cast_region() are not functions to be executed
>> during execution phase, but are syntax constructs that are converted,
>> during analysis phase, into calls to another region's parser/analyzer
>> and an inter-model cast routine.
>>
>
> So basically they work like immutable functions except that you cannot
> index the output?
>
>>
>> 1. The relational parser finds in_region('cypher_graph', 'graph
>> query') and produces a parse node InRegion(region_name, query) in the
>> parse tree.
>>
>> 2. The relational analyzer looks up the system catalog to checks if
>> the specified region exists, then calls its parser/analyzer to produce
>>
> the query tree for the graph query fragment.  The relational analyser
>
>
>> attaches the graph query tree to the InRegion node.
>>
>> 3. When the relational planner finds the graph query tree, it passes
>> the graph query tree to the graph planner to produce the graph
>> execution plan.
>>
>> 4. The relational planner produces a join plan node, based on the
>> costs/statistics of the relational table scan and graph query.  The
>> graph execution plan is attached to the join plan node.
>>
>> The parse/query/plan nodes have a label to denote a region, so that
>> appropriate region's routines can be called.
>>
>
> It would be interesting to see how much of what you want you can get with
> what we currently have and what pieces are really missing.
>
> Am I right that if you wrote a function in C to take a Cypher query plan,
> and analyse it, and execute it, the only thing really missing would be
> feedback to the PostgreSQL planner regarding number of rows expected?
>
>>
>> Regards
>> MauMau
>>
>>
>
>
> --
> Best Regards,
> Chris Travers
> Database Administrator
>
> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
> www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Reply via email to