Hi Alexander,

I've discovered an assertion failure produced with the following script:
> CREATE TABLE v1 (id int primary key, t text);
> CREATE TABLE v2 (id int primary key, t text);
> CREATE TABLE e1_2 (id_1 int, id_2 int, t text);
> CREATE PROPERTY GRAPH g1
>      VERTEX TABLES (v1, v2)
>      EDGE TABLES (
>          e1_2 key (id_1, id_2)
>              SOURCE KEY (id_1) REFERENCES v1 (id)
>              DESTINATION KEY (id_2) REFERENCES v2 (id)
>      );
>
> SELECT count(*) FROM GRAPH_TABLE (g1 MATCH ()-> COLUMNS (1 AS one));


I looked into this. The pattern `()->` causes an assertion failure
at rewriteGraphTable.c:783 because the edge's dest_pf is NULL -- there
is no destination vertex following the edge pattern.

Per SQL/PGQ standard (ISO/IEC 9075-16:2023), Section 10.6 Syntax
Rules 13c and 13d, implicit empty vertex patterns should be inserted
when an edge pattern is not preceded or followed by a vertex pattern
at the same depth of graph pattern matching. Specifically:

  SR 13c: If an edge pattern EP contained in a <path term> PST at
          the same depth of graph pattern matching is not preceded
          by a <vertex pattern> contained in PST at the same depth
          of graph pattern matching, then an implicit empty
          <vertex pattern> VP is inserted in PST immediately
          prior to EP.

  SR 13d: If an edge pattern EP contained in a <path term> PST at
          the same depth of graph pattern matching is not followed
          by a <vertex pattern> contained in PST at the same depth
          of graph pattern matching, then an implicit empty
          <vertex pattern> VP is inserted in PST immediately
          after EP.

So `()->` should be treated as `()->()`, and `->` as `()->()`.
This also applies to left-pointing edges (`<-`, `()<-`) and
any-direction edges (`-[]-`).

The crash affects all patterns where a vertex is missing adjacent
to an edge:

  -- trailing edge (SR 13d)
  MATCH ()->          MATCH ()<-          MATCH ()-[]-

  -- leading edge (SR 13c)
  MATCH ->()          MATCH <-()

  -- edge only (SR 13c + 13d)
  MATCH ->            MATCH <-            MATCH -[]-

  -- consecutive edges (SR 13b, with separator to avoid SR 3)
  MATCH ()-> ->()

In a non-assert build, this results in a segfault (NULL pointer
dereference at pf->dest_pf->factorpos).

I have a fix that inserts implicit empty vertex patterns in the
rewrite stage (not the parser), so that ruleutils can still
reconstruct the original user-written pattern from the parse tree.
The fix handles all three cases: SR 13b (consecutive edges),
SR 13c (leading edge), and SR 13d (trailing edge).

The change is in generate_queries_for_path_pattern(), before the
path factor linking loop:

  /*
   * Per SQL standard 10.6 SR 13b/c/d, insert implicit empty vertex
   * patterns where needed:
   *   SR 13b: between two consecutive edge patterns
   *   SR 13c: before a leading edge pattern
   *   SR 13d: after a trailing edge pattern
   *
   * We do this in the rewrite stage (not the parser) so that ruleutils
   * can reconstruct the original user-written pattern from the parse tree.
   */
  {
      List       *expanded = NIL;
      GraphElementPattern *prev_gep = NULL;

      foreach_node(GraphElementPattern, gep, path_pattern)
      {
          /* SR 13c: leading edge needs a vertex before it */
          /* SR 13b: consecutive edges need a vertex between them */
          if (IS_EDGE_PATTERN(gep->kind) &&
              (prev_gep == NULL || IS_EDGE_PATTERN(prev_gep->kind)))
          {
              GraphElementPattern *implicit_vp =
makeNode(GraphElementPattern);

              implicit_vp->kind = VERTEX_PATTERN;
              expanded = lappend(expanded, implicit_vp);
          }

          expanded = lappend(expanded, gep);
          prev_gep = gep;
      }

      /* SR 13d: trailing edge needs a vertex after it */
      if (prev_gep && IS_EDGE_PATTERN(prev_gep->kind))
      {
          GraphElementPattern *implicit_vp = makeNode(GraphElementPattern);

          implicit_vp->kind = VERTEX_PATTERN;
          expanded = lappend(expanded, implicit_vp);
      }

      path_pattern = expanded;
  }

Separately, while testing various edge patterns I noticed that
`<-[]->` (full edge left or right) produces a syntax error:

  SELECT count(*) FROM GRAPH_TABLE (g1 MATCH ()<-[]->() COLUMNS (1 AS one));
  ERROR:  syntax error at or near "->"

Per the standard BNF, this should be valid:

  <full edge left or right> ::=
      <left arrow bracket> <element pattern filler> <bracket right arrow>

i.e., `<-[]->`  =  `<-[` + `]->`. This is a separate parser issue,
not related to the implicit vertex insertion above.

All existing regression tests (graph_table, graph_table_rls, and
the full pg-check suite) pass with this change.

Ashutosh, since you designed the path factor linking logic in
generate_queries_for_path_pattern(), you would know best whether
this insertion point has any side effects on cyclic patterns or
the edge-vertex qual construction downstream. Could you take
this snippet and fold it into a proper patch if it looks right?

Regards,
Henson

Reply via email to