Re: SQL Property Graph Queries (SQL/PGQ)

Henson Choi Fri, 02 Jan 2026 05:24:16 -0800

Hi Ashutosh,

Thank you for the detailed responses and for addressing the test coverage
items so quickly.


2026년 1월 2일 (금) PM 5:48, Ashutosh Bapat <[email protected]>님이 작성:

> Hi Henson,
>
> Thanks for your systematic review.
>
> On Wed, Dec 31, 2025 at 11:53 AM Henson Choi <[email protected]> wrote:
> >
> > Overall assessment: GOOD
>
> Glad to know that.
>
> > - Test Coverage:    Good (90.5% line coverage, ~180 test cases)
>
> Thanks for the coverage analysis. Looks good right now. But see below.
>
> > 2.1 Critical / Major
> > (None)
>
> Great.
>
> >
> > 2.2 Minor Issues
> >
> > #1 [Code] src/backend/commands/propgraphcmds.c:1632
> >    FIXME: Missing index for plppropid column causes sequential scan.
> >    Decision needed: (a) add index, or (c) allow seq scan for rare path.
>
> The path is rare enough that I think we can allow the seq scan. Given
> that Peter has marked it as FIXME, it seems we will keep it as is for
> now, but will revisit if performance becomes an issue. Peter, please
> correct me if I'm wrong.
>
> >
> > #2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286
> >    Compatibility section incorrectly states "CREATE PROPERTY GRAPH"
> >    Should be: "ALTER PROPERTY GRAPH"
> >
>
> That's right. Fixed.
>
> > #3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65
> >    Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses,
> >    but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}.
>
> Their syntax is slightly different hence they are separate in
> Synopsis. If we separate them in the Description, we might have to
> repeat the same text twice. Having them merged in the Description
> makes it more concise without losing any clarity or meaning. I think
> we can keep it as is. What do you think?
>
>
I understand the rationale for avoiding repetition. That makes sense.


> >
> > #4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80
> >    Grammar error: "the graph removed" should be "the graph is removed"
>
> Right. Fixed.
>
> >
> > #5 [Doc] doc/src/sgml/queries.sgml:2929,2931
> >    TODO comments for unimplemented features without clear limitation
> notes.
>
> Those TODOs are not visible to users. I think they serve as
> placeholders to add the documentation when the features are
> implemented. We may want to remove them, but I see no harm in keeping
> them for now. Peter, what do you think?
>
> >
> > #6 [Doc] doc/src/sgml/ddl.sgml:5717
> >    Typo: "infrastucture" should be "infrastructure"
> >
>
> Fixed. Thanks for catching it.
>
> > 2.3 Alternative Approaches for Discussion
> >
> > #1 Support CREATE PROPERTY GRAPH IF NOT EXISTS
> >    Rationale: PostgreSQL-style extension, consistent with other DDL.
> >
>
> I think this will be good-to-have for consistency across DDLs.
> However, I think it is better to discuss and implement it separately
> since lack of it does not block the main functionality of property
> graphs. Meantime users can DROP and CREATE. There are other DDLs which
> do not have this support. What do you think?
>
>
I agree. We can consider this separately when appropriate timing comes.


> > #2 Return 0 rows for non-existent labels instead of error
> >    Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...)
> raises error
> >    Alternative: Return empty result set instead
> >    Rationale: Better for exploratory queries, similar to SQL WHERE on
> >    non-existent values returning 0 rows rather than error.
>
> I think this makes sense in a proper graph database where the labels
> are not predefined. However, in property graphs the labels, the
> properties associated with them and data types of those properties are
> predefined in the graph schema. If the specified label does not exist
> in the graph schema, we can not determine what properties are
> associated with it and their data types. In turn we can not determine
> the shape of the result set, which is essential for reporting even 0
> rows. Hence the error instead of 0 rows. Oracle has similar behaviour.
>
>
I understand your reasoning about the structured vs semi-structured
difference.
However, I think it's worth extra consideration when behavior differs from
Cypher -
the similar syntax might confuse users despite the fundamental
architectural differences.


> >
> > #3 Return 0 rows when same variable has different labels
> >    Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company)
> ...)
> >    raises error because variable 'a' has conflicting labels.
> >    Alternative: Return empty result set instead
> >    Rationale: Logically, a node cannot be both Person AND Company,
> >    so 0 rows is the correct result. Consistent with standard SQL
> >    WHERE semantics (impossible condition = 0 rows, not error).
> >
>
> In such a case 0 rows isn't a correct result. According to the
> standard, variable a is bound to all the nodes that have both the
> labels. This is label conjunction, a feature we don't support right
> now. Hence the "unsupported feature" error. I expect that some day we
> will implement label conjunction, and then the same query will return
> non-zero rows if there are nodes with both labels.
>
>
Thank you for clarifying the label conjunction feature. I missed the error
code details in my review. Please always view my tests critically, as I
will your
code - that's how we build better software together. I respect your
attention to detail.


>
> >
> -------------------------------------------------------------------------------
> > TEMP graph           CREATE TEMP PROPERTY GRAPH                   Medium
>
> Done
>
> > TEMP graph           TEMP graph with permanent table reference    Medium
>
> Done
>
> > TEMP graph           ALTER ADD permanent table to TEMP graph      Medium
>
> Done
>
> > CASCADE              DROP ... CASCADE (dependent view cascade)    Low
>
> Done
>
> > RESTRICT             DROP ... RESTRICT (dependent object error)   Low
>
> Done, but without explicit RESTRICT keyword since it's default behavior.
>
> - Tests specifying NODE/RELATIONSHIP instead of VERTEX/EDGE (Low priority)
>
> Done
>
> > propgraphcmds.c:136,179,254-262    TEMP table in graph creation   Auto
> TEMP conversion
>
> Done
>
> > propgraphcmds.c:1323,1364          ALTER ADD TEMP to perm graph   Error
> branch
>
> Done
>
> > propgraphcmds.c:724-734            CREATE without LABEL clause
> Default label else
>
> This should be covered. There are several CREATE PROPERTY GRAPH
> statements without LABEL clauses.
>
> > propgraphcmds.c:1300,1303          ALTER IF EXISTS non-existent
>  missing_ok branch
>
> This is unreachable since the syntax is not supported. The only IF
> EXISTS variant supported is ALTER PROPERTY GRAPH IF EXISTS ... SET
> SCHEMA .... Added a test for the same.
>
> > propgraphcmds.c:116                CREATE UNLOGGED attempt
> UNLOGGED error
>
> Done
>
> > execMain.c:1162-1163               DML on graph
>  RELKIND_PROPGRAPH
>
> This is unreachable since a property graph reference is only allowed
> in GRAPH_TABLE().
>
> > rewriteGraphTable.c:120            Multiple path patterns         Length
> check
>
> Done
>
> > rewriteGraphTable.c:205            Quantifier usage
>  Quantifier error
>
> Done
>
> > ruleutils.c:7939-7963              Complex label VIEW deparsing
>  T_BoolExpr branch
> >
>
> Done
>
> Can you please check whether the uncovered lines are now covered?
>
>
I'll run the coverage analysis again with the updated patches and report
back.


> > - Tests executed: run_pgq_tests.sql
>
> What's this test? It's not in the patchset. Also you might want to try
> 002_pg_upgrade.
>
>
Apologies for the confusion - run_pgq_tests.sql was my local test runner
script that executes only the PGQ-related tests.

>
> >   +----------------------------------------------------------+
> >   | SQL/PGQ Implementation: NO MEMORY LEAKS DETECTED         |
> >   +----------------------------------------------------------+
>
> Great. Thanks for confirming.
>
> >>
> >> On Fri, Dec 26, 2025 at 6:03 PM Henson Choi <[email protected]> wrote:
> >> >
> >> > 1. LABELS() function
> >> > - Returns text[] of element labels
> >> > - Fixed privilege checking from previous version
> >> > - Enables optimizer pushdown for branch pruning
> >> >
> >> > 2. PROPERTY_NAMES() function
> >> > - Returns text[] of property names
> >> > - Similar approach to LABELS()
> >> >
> >>
> >> I could not find specification of these functions in SQL/PGQ standard.
> >> It's a large document and I might be missing something. Can you please
> >> point me to the relevant sections?
> >>
> >
> > You're correct - LABELS() and PROPERTY_NAMES() are not in the SQL/PGQ
> > standard. I was inspired by similar functions in Cypher (labels(),
> keys())
> > and Oracle's PGQL (which has LABELS(), though Oracle is moving toward
> > SQL/PGQ compliance as well).
> >
> > The attached code demonstrates that through query rewrite, these
> functions
> > can enable planner-level table pruning optimizations. This becomes
> > particularly valuable for client applications - GUI tools could use label
> > names as host variables for flexible label-based filtering, and
> > PROPERTY_NAMES() (similar to Cypher's keys()) would enable dynamic
> property
> > inspection for display or selective querying of elements with specific
> > properties.
> >
> > While there's a distinction between structured (SQL/PGQ) and
> semi-structured
> > (Cypher) query models, I don't think we need to strictly exclude
> > semi-structured patterns that work well within the structured framework.
> > Whether this should be proposed as a future SQL/PGQ standard extension or
> > remain a PostgreSQL-specific extension is worth discussion. Given the
> > practical utility demonstrated in graph query deployments, there might be
> > value in standardizing such introspection functions.
>
> While these functions are useful with semistructured frameworks like
> Cypher, I am not sure how useful they are in structured framework like
> SQL/PGQ. In SQL/PGQ, the graph schema is predefined and known to the
> users. Hence users know what labels and properties exist in the graph.
> However, if they become part of the standard, their chances of getting
> accepted in the PostgreSQL core will increase.
>
> Even if we keep them out of core, I think we should be able to
> implement them as stable functions in an extension and still benefit
> from planner optimizations you mentioned.
>

I understand your point about the structured vs semi-structured difference.
I raised this as something worth considering - if someone can participate
in the next
standard revision, I believe these functions are worth discussing there.


>
> >
> > Similarly, I'd suggest considering Cypher's SHORTEST PATH for future
> > SQL/PGQ standards. If we treat SHORTEST PATH as a specialized join type
> > in SQL and handle it through query rewrite, PostgreSQL's existing planner
> > infrastructure could support it efficiently. This approach has been
> proven
> > in practice and could be a valuable addition to the next standard
> revision.
>
> IIRC there is SQL/PGQ standard specification for shortest path. We
> should implement that when we get to it.
>
>
I wasn't aware that shortest path is already in the SQL/PGQ standard - I
don't
have access to the standard document. Could you share the relevant excerpt
from the
specification?

Regarding implementation: this would require adding two non-standard
Executor nodes as
a SQL JOIN type before it can be rewritten. This is a significant
undertaking - likely
several times larger than the current PGQ code modifications. Fortunately,
I have prior
experience implementing shortest path in PostgreSQL, so I understand the
complexity
involved. Since the SQL layer work needs to be done first, I think we
should start
discussing this two-track approach now, even though the actual
implementation will take
considerable time.


> >
> > That was my finding as well - the architecture is well-designed for
> > extensibility.
>
> Great. Thanks for the confirmation.
>
> >
> > Having more people who can maintain and extend SQL/PGQ reduces the
> > risk for both the community and PostgreSQL itself. When I have a revised
> > version ready, would you be willing to review it again?
> >
>
> We will need to prioritize the features to be implemented next. This
> looks like a useful pattern to support. I believe this will appear
> higher in the priority list, thus worth reviewing.
>
>
Thank you. Please feel free to reach out when you're ready to discuss
prioritization or need my input.


> The patches are thus
> 0001 - same as previous except a. addition of new lines in
> create_property_graph.sql for consistency with other sections in that
> file, b. some minor corrections suggested by Hensen.
> 0002 - fixes : references and improves documentation. I think we can
> merge this into 0001 after a quick review from Peter
> 0003 - ECPG test. This might need a bit of review. The patch is large
> because of the .c and .err files.
> 0004 - This reverts back ECPG and PSQL lexer changes introduced in
> 0001. Does not show any failure even in the test case added by 0003.
> Those changes seem unnecessary. Peter, can you please confirm.
> 0005 - Additional test cases for code coverage as pointed out by
> Hensen's report.
>
> --
> Best Wishes,
> Ashutosh Bapat
>

When I raise issues about differences from Cypher, even when I sense my
suggestions
might be overreaching, I do so for two reasons: (a) I don't have access to
the
SQL/PGQ standard document, and (b) my implementation-level experience is
primarily
with Cypher. However, if I feel this dissonance, users coming from Cypher
will
likely experience the same confusion. I hope you'll consider this user
perspective
valuable, even when the technical decisions differ.

Best regards,
Henson

Re: SQL Property Graph Queries (SQL/PGQ)

Reply via email to