Dear Tom,

> No, just redefine "yyerror" as a macro that passes additional parameters.

Ok!  That's just the kind of the hack-hook I was looking for!!


> > The terminals are not available, they must be kept separatly if you
> > want them. This can be done in yylex().
>
> We don't need them. Any already-shifted tokens are to the left of where
> the error is, no?

Yes and no. I agree that you don't need them for computing the follow set.
However it would help a lot to generate nicer hints. I'm looking for help
the user, and the follow set is just part of the help.

Think of typical "SELECT foo, bla, FROM xxx" (it looks stupid on a one
line query, but not so on a very large query) which is quite easier to
detect and hint about because of FROM is just after a comma.

The rule part is also a real issue. There is no easy way to translate
states into rules, to know whether we're somewhere in a "ShowStmt" or
"AlterTableStmt"...

If you're after a comma in state 1232, should you just say IDENT... I'd
rather say "user name" or "table name" if I can... Otherwise the hint
stays at a low parser level. Good hints requires to know about the
current context, and it is not easy to get that deep in the automaton.


> Yeah, I had come to the same conclusion --- state moves made without
> consuming input would need to be backed out if we wanted to report the
> complete follow set.  I haven't yet looked to see how to do that.

Well, following you're previous suggestion, one can redefine the YYLEX
macro to save the current state each time a token is required.


> > As you noted, for things like "SHOW 123", the follow set basically
> > includes all keywords although you can have SHOW ALL or SHOW name.
> > So, as you suggested, you can either say "ident" as a simplification, but
>
> You're ignoring the distinction between classes of keywords.  I would
> not recommend treating reserved_keywords as a subclass of ident.

Ok, I agree that it can help in this very case a little bit here because
ALL is reserved. I did not make this distinction when I played with bison.


> > (5) anything that can be done would be hardwired to one version of bison.
>
> I think this argument is completely without merit.

Possible;-)

However I don't like writing hacks that depends on other people future
behavior.


> > (b) write a new "recursive descendant" parser, and drop "gram.y"
>
> Been there, done that, not impressed with the idea.  There's a reason
> why people invented parser generators...

Sure. It was just a list of options;-)


> > As a side effect of my inspection is that the automaton generated by bison
> > could be simplified if a different tradeoff between the lexer, the parser
> > and the post-processing would be chosen. Namelly, all tokens that are
> > just identifiers should be dropped and processed differently.
>
> We are not going to reserve the keywords that are presently unreserved.

Reserving keywords would help, but I would not think it could be accepted,
because it is not SQL philosophy. SQL is about 30 years also, as old
as yacc ideas... but they did not care at the time;-) When you look at
the syntax, it was designed with a recursive parser in mind.

Part of my comment was to explain "why" the generated automaton is large.
SQL is a small language, but it has a big automaton in postgresql, and
this is IMHO the explanation.


> If you can think of a reasonable way to stop treating them as separate
> tokens inside the grammar without altering the user-visible behavior,
> I'm certainly interested.

I was thinking about type names especially, and maybe others.

I join a small proof-of-concept patch to drop some tokens out of the
parser. As a results, 6 tokens, 6 rules and 9 states are removed,
and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
I think others could be dealt with similarily, or with more work.


Thanks for the discussion,

-- 
Fabien.
*** ./src/backend/parser/gram.y.orig    Mon Apr  5 12:06:42 2004
--- ./src/backend/parser/gram.y Mon Apr  5 18:21:21 2004
***************
*** 336,343 ****
        AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
        ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!       BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
!       BOOLEAN_P BOTH BY
  
        CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
        CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
--- 336,343 ----
        AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
        ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!       BACKWARD BEFORE BEGIN_P BETWEEN BINARY BIT
!       BOTH BY
  
        CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
        CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
***************
*** 362,368 ****
  
        ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
        INDEX INHERITS INITIALLY INNER_P INOUT INPUT_P
!       INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT
        INTERVAL INTO INVOKER IS ISNULL ISOLATION
  
        JOIN
--- 362,368 ----
  
        ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
        INDEX INHERITS INITIALLY INNER_P INOUT INPUT_P
!       INSENSITIVE INSERT INSTEAD INTERSECT
        INTERVAL INTO INVOKER IS ISNULL ISOLATION
  
        JOIN
***************
*** 386,398 ****
        PRECISION PRESERVE PREPARE PRIMARY 
        PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
!       READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
        RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
        RULE
  
        SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
        SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
!       SHOW SIMILAR SIMPLE SMALLINT SOME STABLE START STATEMENT
        STATISTICS STDIN STDOUT STORAGE STRICT_P SUBSTRING SYSID
  
        TABLE TEMP TEMPLATE TEMPORARY THEN TIME TIMESTAMP
--- 386,398 ----
        PRECISION PRESERVE PREPARE PRIMARY 
        PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
!       READ RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
        RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
        RULE
  
        SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
        SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
!       SHOW SIMILAR SIMPLE SOME STABLE START STATEMENT
        STATISTICS STDIN STDOUT STORAGE STRICT_P SUBSTRING SYSID
  
        TABLE TEMP TEMPLATE TEMPORARY THEN TIME TIMESTAMP
***************
*** 5216,5222 ****
  GenericType:
                        type_name
                                {
!                                       $$ = makeTypeName($1);
                                }
                ;
  
--- 5216,5234 ----
  GenericType:
                        type_name
                                {
!                                       if (strcasecmp($1,"boolean")==0)
!                                               $$ = SystemTypeName("bool");
!                                       else if (strcasecmp($1, "bigint")==0)
!                                               $$ = SystemTypeName("int8");
!                                       else if (strcasecmp($1, "integer")==0 ||
!                                                        strcasecmp($1, "int")==0)
!                                               $$ = SystemTypeName("int4");
!                                       else if (strcasecmp($1, "smallint")==0)
!                                               $$ = SystemTypeName("int2");
!                                       else if (strcasecmp($1, "real")==0)
!                                               $$ = SystemTypeName("float4");
!                                       else
!                                               $$ = makeTypeName($1);
                                }
                ;
  
***************
*** 5225,5251 ****
   * - thomas 1997-09-18
   * Provide real DECIMAL() and NUMERIC() implementations now - Jan 1998-12-30
   */
! Numeric:      INT_P
!                               {
!                                       $$ = SystemTypeName("int4");
!                               }
!                       | INTEGER
!                               {
!                                       $$ = SystemTypeName("int4");
!                               }
!                       | SMALLINT
!                               {
!                                       $$ = SystemTypeName("int2");
!                               }
!                       | BIGINT
!                               {
!                                       $$ = SystemTypeName("int8");
!                               }
!                       | REAL
!                               {
!                                       $$ = SystemTypeName("float4");
!                               }
!                       | FLOAT_P opt_float
                                {
                                        $$ = $2;
                                }
--- 5237,5244 ----
   * - thomas 1997-09-18
   * Provide real DECIMAL() and NUMERIC() implementations now - Jan 1998-12-30
   */
! Numeric:
!                        FLOAT_P opt_float
                                {
                                        $$ = $2;
                                }
***************
*** 5268,5277 ****
                                        $$ = SystemTypeName("numeric");
                                        $$->typmod = $2;
                                }
-                       | BOOLEAN_P
-                               {
-                                       $$ = SystemTypeName("bool");
-                               }
                ;
  
  opt_float:    '(' Iconst ')'
--- 5261,5266 ----
***************
*** 7585,7593 ****
   * looks too much like a function call for an LR(1) parser.
   */
  col_name_keyword:
!                         BIGINT
!                       | BIT
!                       | BOOLEAN_P
                        | CHAR_P
                        | CHARACTER
                        | COALESCE
--- 7574,7580 ----
   * looks too much like a function call for an LR(1) parser.
   */
  col_name_keyword:
!                         BIT
                        | CHAR_P
                        | CHARACTER
                        | COALESCE
***************
*** 7598,7605 ****
                        | EXTRACT
                        | FLOAT_P
                        | INOUT
-                       | INT_P
-                       | INTEGER
                        | INTERVAL
                        | NATIONAL
                        | NCHAR
--- 7585,7590 ----
***************
*** 7610,7619 ****
                        | OVERLAY
                        | POSITION
                        | PRECISION
-                       | REAL
                        | ROW
                        | SETOF
-                       | SMALLINT
                        | SUBSTRING
                        | TIME
                        | TIMESTAMP
--- 7595,7602 ----
*** ./src/backend/parser/keywords.c.orig        Thu Mar 11 15:32:54 2004
--- ./src/backend/parser/keywords.c     Mon Apr  5 18:20:32 2004
***************
*** 55,64 ****
        {"before", BEFORE},
        {"begin", BEGIN_P},
        {"between", BETWEEN},
-       {"bigint", BIGINT},
        {"binary", BINARY},
        {"bit", BIT},
-       {"boolean", BOOLEAN_P},
        {"both", BOTH},
        {"by", BY},
        {"cache", CACHE},
--- 55,62 ----
***************
*** 165,172 ****
        {"insensitive", INSENSITIVE},
        {"insert", INSERT},
        {"instead", INSTEAD},
-       {"int", INT_P},
-       {"integer", INTEGER},
        {"intersect", INTERSECT},
        {"interval", INTERVAL},
        {"into", INTO},
--- 163,168 ----
***************
*** 249,255 ****
        {"procedural", PROCEDURAL},
        {"procedure", PROCEDURE},
        {"read", READ},
-       {"real", REAL},
        {"recheck", RECHECK},
        {"references", REFERENCES},
        {"reindex", REINDEX},
--- 245,250 ----
***************
*** 282,288 ****
        {"show", SHOW},
        {"similar", SIMILAR},
        {"simple", SIMPLE},
-       {"smallint", SMALLINT},
        {"some", SOME},
        {"stable", STABLE},
        {"start", START},
--- 277,282 ----
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to