Re: [PATCHES] hint infrastructure setup (v3)

2004-04-06 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 Another is that depending on where you put the renaming that this patch
 removes without replacing :-(,

 I do not understand your point. It seems to me that the renaming is
 performed when a type name is expected? The boolean  keyword (not token)
 is translated to system bool type in the GenericType rule?? ???

I mean that you removed functionality without putting it back; the
modified parser will fail to recognize BOOLEAN as a type name at all,
because it doesn't match bool which is in the catalogs.  (And changing
the entry to boolean is not a solution, it just moves the problem.)

I assume you intended to handle this by doing the substitutions in type
name lookup elsewhere in the parser, but I don't think that is a valid
solution, because there is no longer enough information.  In particular
you can't any longer tell the difference between BOOLEAN and boolean
(with quotes), which are not the same thing --- a quoted string is never
a keyword, per spec.

Possibly a better example than boolean is the REAL = pg_catalog.float4
transformation.  If a user has defined his own type named foo.real,
he ought to be able to refer to it as real (with quotes) and not get
messed up by the keyword transformation.  I think our original
motivation for converting all these things to keywords was the
realization that pg_dump would in fact screw up and fail to dump such
a type definition correctly if real wasn't recognized as conflicting
with a keyword (which is what prompts pg_dump to stick quotes on).

The basic point here is that eliminating tokens as you propose will
result in small changes in behavior, none of which are good or per spec.
Making the parser automaton smaller would be nice, but not at that
price.

 My point is that you can have the very same *semantical* result with a
 smaller automaton if you chose a different trade-off within the
 lexer/parser/post filtering. I don't want to change the language.

You have not proven that you can have the same result.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-06 Thread Fabien COELHO

Dear Tom,

  I join a small proof-of-concept patch to drop some tokens out of the
  parser.

 I believe these were treated this way *specifically* because of the
 keyword-is-not-an-identifier issue.  SQL99 calls out most of these
 as being keywords:

Well, I think that the reserved keywords are fine as tokens in a
lexer/parser, but think that the unreserved keywords should be dropped
of the token status if possible.

 and if we don't treat them as keywords then we will have a couple of
 problems. One is case-conversion issues in locales where the standard
 downcasing is not an extension of ASCII (Turkish is the only one I know
 of offhand).

Do you mean it should use an ASCII-only strcasecmp, not a possibly
localised version? I agree, but this is just a proof of concept
patch to show that you don't need so many tokens in the parser.

 Another is that depending on where you put the renaming that this patch
 removes without replacing :-(,

I do not understand your point. It seems to me that the renaming is
performed when a type name is expected? The boolean  keyword (not token)
is translated to system bool type in the GenericType rule?? ???

 it would be possible for the renaming transformation to get applied to
 user-defined types with similar names, or for user-defined types to
 unexpectedly shadow system definitions.

I don't think that the patch changes the result of the parsing. It drops
*TOKENS* out of the lexer, but they are still *KEYWORDS*, although they
are not explicitly in the lexer list.

keyword.c deals with tokens, the file name was ill-chosen. If you think
that keywords can only be lexical tokens, then you end-up with an
automaton larger than necessary, IMVHO.

Note that the removed tokens are still keywords as they are treated
*especially* anyway. It is not a semantical transformation.

Also, if you don't want these names as candidate function names, they
could be filtered out at some other point in the parser. They really don't
need to be special tokens.

My point is that you can have the very same *semantical* result with a
smaller automaton if you chose a different trade-off within the
lexer/parser/post filtering. I don't want to change the language.

 The former would be surprising and the latter would violate the spec.

I'm really not sure this is the case with the patch I sent.

 Check the archives; I'm sure this was discussed in the 7.3 development
 cycle and we concluded that treating these names as keywords was the
 only practical solution.

Hmmm... I can check the archive, but I cannot see how different the
language is with the patch. Maybe there is a missing filter out, or
strcasecmp is not the right version, but no more.

I think it is a small technical issue in the parser internals, and has
nothing to do with great principles and whether this or that is a keyword.
It's about what keywords need to be tokens.

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-06 Thread Fabien COELHO

 (b) write a new recursive descendant parser, and drop gram.y

 er, that's recursive descent :-)

Sorry for my French version.

 Well, unless you are a serious masochist,

I'm not a masochist;-) I'm arguing about where hint should/could be put.

 In fact, considering this thread I'm not sure any of the suggested steps
 will achieve Fabien's goal. ISTM that a smarter training wheels
 frontend, perhaps with some modest backend improvements, is likely to
 have better results. (Oh, you found an error near there - now what do I
 suggest belongs there?)

I cannot see what you're suggesting practically as a frontend.

I don't think having another parser next to the first one for better error
messages is a serious option? I would not like to put another parser that
need to be kept synchronized with the first one. So either it is
integrated or linked with the current parser, or it is not there?

Out of the parser, the only information is the offending token (embedded
in the error string) and the character number in the string, that is quite
small a clue to give a hint.

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-06 Thread Fabien COELHO

Dear Tom,

 In particular you can't any longer tell the difference between BOOLEAN
 and boolean (with quotes), which are not the same thing --- a quoted
 string is never a keyword, per spec. [...]

Ok, so you mean that on -boolean- the lexer returns a BOOLEAN_P token, but
with -boolean- it returns an Ident and -boolean- as a lval. Indeed, in
such a case I cannot recognize that simply boolean vs boolean if they
are both idents that look the same.

As a matter of fact, this can also be fixed with some post-filtering. Say,
all quoted idents could be returned with a leading  to show it was
dquoted, and the IDENT rules in the parser could remove when it is not
needed anymore to distinguish the case.

Not beautiful, I agree, but my point is that the current number of tokens
and number of states and automaton size are not inherent to SQL but to the
way the lexing/parsing is performed in postgresql.

 The basic point here is that eliminating tokens as you propose will
 result in small changes in behavior, none of which are good or per spec.
 Making the parser automaton smaller would be nice, but not at that
 price.

Ok. I don't want to change the spec. I still stand that it can be done,
although some more twicking is required. It was just a proof of concept,
not a patch submission. Well, a proof of concept must still be a proof;-)

I attach a small patch that solve the boolean vs boolean issue, still as
a proof of concept that it is 'doable' to preserve semantics with a
different lexer/parser balance. I don't claim that it should be applied, I
just claim that the automaton size could be smaller, especially by
shortening the unreserved_keyword list.

 You have not proven that you can have the same result.

Well, I passed the regression tests, but that does not indeed prove
anything, because these issues are not tested at all.

Maybe you could consider to add the regression part of the attached
patcht, which creates a user boolean type.

Anyway, my motivation is about hints and advises, and that does not
help a lot to solve these issues.

-- 
Fabien.*** ./src/backend/parser/gram.y.origTue Apr  6 18:15:39 2004
--- ./src/backend/parser/gram.y Tue Apr  6 17:56:46 2004
***
*** 95,100 
--- 95,102 
  static Node *doNegate(Node *n);
  static void doNegateFloat(Value *v);
  
+ #define clean_dqname(n) (((*(n))!='')? (n): pstrdup((n)+1))
+ 
  %}
  
  
***
*** 336,343 
AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!   BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
!   BOOLEAN_P BOTH BY
  
CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
--- 338,345 
AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!   BACKWARD BEFORE BEGIN_P BETWEEN BINARY BIT
!   BOTH BY
  
CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
***
*** 362,368 
  
ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
INDEX INHERITS INITIALLY INNER_P INOUT INPUT_P
!   INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT
INTERVAL INTO INVOKER IS ISNULL ISOLATION
  
JOIN
--- 364,370 
  
ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
INDEX INHERITS INITIALLY INNER_P INOUT INPUT_P
!   INSENSITIVE INSERT INSTEAD INTERSECT
INTERVAL INTO INVOKER IS ISNULL ISOLATION
  
JOIN
***
*** 386,398 
PRECISION PRESERVE PREPARE PRIMARY 
PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
!   READ REAL RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
RULE
  
SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
!   SHOW SIMILAR SIMPLE SMALLINT SOME STABLE START STATEMENT
STATISTICS STDIN STDOUT STORAGE STRICT_P SUBSTRING SYSID
  
TABLE TEMP TEMPLATE TEMPORARY THEN TIME TIMESTAMP
--- 388,400 
PRECISION PRESERVE PREPARE PRIMARY 
PRIOR PRIVILEGES PROCEDURAL PROCEDURE
  
!   READ RECHECK REFERENCES REINDEX RELATIVE_P RENAME REPEATABLE REPLACE
RESET RESTART RESTRICT RETURNS REVOKE RIGHT ROLLBACK ROW ROWS
RULE
  
SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
!   SHOW SIMILAR SIMPLE SOME STABLE START STATEMENT
STATISTICS STDIN STDOUT STORAGE STRICT_P SUBSTRING SYSID
  
TABLE TEMP TEMPLATE TEMPORARY THEN TIME TIMESTAMP
***
*** 959,965 
}
| IDENT
{
!   $$ = makeStringConst($1, NULL);
  

Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Fabien COELHO

Dear Tom,

First, thanks for the discussion about my hint infrastructure patch
submission. Whatever the opinion, it helps.


 We'd have to write our own version of bison's verbose-error code anyway,
 because the canned code doesn't support localization --- it uses
 hardwired strings for expecting and so on.  But it looks possibly
 doable to me.

I also played a little bit with the parser over the weekend. I found the
same issues that you noted in your mail, tried more options, and found
other problems. I sent here the full mail I wrote.

Although I agree that it is doable, I have stronger reserve than yours.
Also, I do not find it an appealing solution to change gram.c a lot.

The problems I found are:


(1) not available raw information.

The automaton stack keeps states, which are not directly linked to rules
and terminals. The terminals are not available, they must be kept
separatly if you want them. This can be done in yylex().

The internal state, stack, token... are local variables within yyparse().
As a result, they are not accessible from yyerror. I haven't found any
available hook, so you have to hack gram.c to get this information. It
is a 1 line hack, but it modifies the generated code. Or you have to
produce a new hacked gram.c, which is what you suggest.


(2) hard to compute follow set

The information about the syntax is there, obviously! However, from a
given automaton state, the actually allowed tokens are not directly
available. Only some are directly available, so the result is:

DROP ? ;
ERROR:  syntax error at or near ? at character 6
HINT:  hint status is state=422 n=796 char=575 type=320=Op stack= 0 16 422
   tokens= 88:DROP 320:Op
   expected= 150:LANGUAGE?

Other tokens may be found, but you have either:

- to *simulate* the automaton forward through reductions to check whether
  it is finally accepted. Not really beautiful, but that can be done.
  It is just the bison code without the reduction production rules;-)

- move backwards before doing the above, if some reductions where
  performed because of the submitted token and finally resulted in the error,
  the state that lead to the error may not be the highest available one, so
  maybe other allowed tokens may also be missed. We would need to have
  the last state before any reduction.
  Another small hack in generated gram.c would be needed to get it.

However,


(3) the computed follow set is often useless, as you noted.

The reason for that is that keywords are accepted in place of identifiers.
The use of rules: reserved_keyword, func_name_keyword, col_name_keyword,
unreserved_keyword as possible column/function/... identifiers is IMHO the
main reason for the size of the generated automaton.

If you drop that, everything would be much simpler, because it would
reduce a lot the number of arcs in the automaton. In particular, anything
which is just an identifier should not be given a name (say, BOOLEAN_P or
SMALLINT should not be tokens, but could rather be recognized as a special
case of identifier from within the parser, maybe with some simple
post-filtering).

As you noted, for things like SHOW 123, the follow set basically
includes all keywords although you can have SHOW ALL or SHOW name.

So, as you suggested, you can either say ident as a simplification, but
you miss ALL which is meaningful, or you say all keywords, which is
useless.

An alternative is to know that you're within a SHOW something, that is
you somehow reparse the code from within the syntax error:-( It is
necessary if you want to say something more interesting than ident, say
option name. You may also get this information by simulating the
automaton forward, or noticing that some of the current on stack states
can only lead to the reduction of a ShowStmt...

Well, on the positive side, it is sometimes right, say:

REVOKE ? ;
ERROR:  syntax error, unexpected Op at or near ? at character 8
HINT:  hint status is state=481 n=1297 char=575 type=320=Op
  stack= 0 31 481
  tokens= 234:REVOKE 320:Op
  expected= 10:ALL 59:CREATE 80:DELETE_P 98:EXECUTE 136:INSERT 224:REFERENCES
239:RULE 244:SELECT 268:TEMP 270:TEMPORARY 279:TRIGGER 292:UPDATE
293:USAGE?


(4) the available automaton data are inverted with respect to what would
be needed... It would be nice to know whether we're heading for a create
statement or something like that, but the information is hidden down the
automaton. basically, the automaton would need to be inverted to compute
this information. We would need the initial information in gram.y, which
was inverted to build the automaton. So the code somehow needs to undo
what bison compilation has done.


(5) anything that can be done would be hardwired to one version of bison.
There is a lot of asumptions in the code and data structures, and any
newer/older version with some different internal representation would
basically break any code that would rely on that. So postgres would not be
bison portable:-( I don't think 

Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Fabien COELHO

 As another teaching aid idea, is there something that would auto-format
 queries to they are more readable?

Not that I know of. But I guess it must exist in some interface,
maybe with Oracle or DB2.

My emacs does seem to do that. It just put colors on keywords.

If I had to put it somewhere, I would try to enhance emacs sql-mode.
But I don't like much lisp programming.

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Bruce Momjian

As another teaching aid idea, is there something that would auto-format
queries to they are more readable?

---

Fabien COELHO wrote:
 
 
 
 Dear Tom,
 
  No, just redefine yyerror as a macro that passes additional parameters.
 
 Ok!  That's just the kind of the hack-hook I was looking for!!
 
 
   The terminals are not available, they must be kept separatly if you
   want them. This can be done in yylex().
 
  We don't need them. Any already-shifted tokens are to the left of where
  the error is, no?
 
 Yes and no. I agree that you don't need them for computing the follow set.
 However it would help a lot to generate nicer hints. I'm looking for help
 the user, and the follow set is just part of the help.
 
 Think of typical SELECT foo, bla, FROM xxx (it looks stupid on a one
 line query, but not so on a very large query) which is quite easier to
 detect and hint about because of FROM is just after a comma.
 
 The rule part is also a real issue. There is no easy way to translate
 states into rules, to know whether we're somewhere in a ShowStmt or
 AlterTableStmt...
 
 If you're after a comma in state 1232, should you just say IDENT... I'd
 rather say user name or table name if I can... Otherwise the hint
 stays at a low parser level. Good hints requires to know about the
 current context, and it is not easy to get that deep in the automaton.
 
 
  Yeah, I had come to the same conclusion --- state moves made without
  consuming input would need to be backed out if we wanted to report the
  complete follow set.  I haven't yet looked to see how to do that.
 
 Well, following you're previous suggestion, one can redefine the YYLEX
 macro to save the current state each time a token is required.
 
 
   As you noted, for things like SHOW 123, the follow set basically
   includes all keywords although you can have SHOW ALL or SHOW name.
   So, as you suggested, you can either say ident as a simplification, but
 
  You're ignoring the distinction between classes of keywords.  I would
  not recommend treating reserved_keywords as a subclass of ident.
 
 Ok, I agree that it can help in this very case a little bit here because
 ALL is reserved. I did not make this distinction when I played with bison.
 
 
   (5) anything that can be done would be hardwired to one version of bison.
 
  I think this argument is completely without merit.
 
 Possible;-)
 
 However I don't like writing hacks that depends on other people future
 behavior.
 
 
   (b) write a new recursive descendant parser, and drop gram.y
 
  Been there, done that, not impressed with the idea.  There's a reason
  why people invented parser generators...
 
 Sure. It was just a list of options;-)
 
 
   As a side effect of my inspection is that the automaton generated by bison
   could be simplified if a different tradeoff between the lexer, the parser
   and the post-processing would be chosen. Namelly, all tokens that are
   just identifiers should be dropped and processed differently.
 
  We are not going to reserve the keywords that are presently unreserved.
 
 Reserving keywords would help, but I would not think it could be accepted,
 because it is not SQL philosophy. SQL is about 30 years also, as old
 as yacc ideas... but they did not care at the time;-) When you look at
 the syntax, it was designed with a recursive parser in mind.
 
 Part of my comment was to explain why the generated automaton is large.
 SQL is a small language, but it has a big automaton in postgresql, and
 this is IMHO the explanation.
 
 
  If you can think of a reasonable way to stop treating them as separate
  tokens inside the grammar without altering the user-visible behavior,
  I'm certainly interested.
 
 I was thinking about type names especially, and maybe others.
 
 I join a small proof-of-concept patch to drop some tokens out of the
 parser. As a results, 6 tokens, 6 rules and 9 states are removed,
 and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
 I think others could be dealt with similarily, or with more work.
 
 
 Thanks for the discussion,
 
 -- 
 Fabien.

Content-Description: 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 If you can think of a reasonable way to stop treating them as separate
 tokens inside the grammar without altering the user-visible behavior,
 I'm certainly interested.

 I join a small proof-of-concept patch to drop some tokens out of the
 parser.

I believe these were treated this way *specifically* because of the
keyword-is-not-an-identifier issue.  SQL99 calls out most of these
as being keywords:

 SQL defines predefined data types named by the following key
 words: CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT,
 BINARY LARGE OBJECT, BIT, BIT VARYING, NUMERIC, DECIMAL, INTEGER,
 SMALLINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME,
 TIMESTAMP, and INTERVAL. These names are used in the type

and if we don't treat them as keywords then we will have a couple of
problems.  One is case-conversion issues in locales where the standard
downcasing is not an extension of ASCII (Turkish is the only one I know
of offhand).  Another is that depending on where you put the renaming
that this patch removes without replacing :-(, it would be possible for
the renaming transformation to get applied to user-defined types with
similar names, or for user-defined types to unexpectedly shadow system
definitions.  The former would be surprising and the latter would
violate the spec.  Check the archives; I'm sure this was discussed in
the 7.3 development cycle and we concluded that treating these names
as keywords was the only practical solution.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Andrew Dunstan
Tom Lane wrote:

Fabien COELHO [EMAIL PROTECTED] writes:
 

(b) write a new recursive descendant parser, and drop gram.y
   

er, that's recursive descent :-)

Been there, done that, not impressed with the idea.  There's a reason
why people invented parser generators...
 



Well, unless you are a serious masochist, cutting a parser of the LR 
family (including LALR(1), such as yacc/bison produce) by hand is very 
difficult and error prone. That is not the case with LL type parsers. 
Also, there are parser generators for this family, both table driven and 
recursive descent. The table driven variety especially can have quite 
well tuned error reporting and recovery. (Among the reasons I know this 
is that I actually wrote such a beast around 13 years ago).

That is not to say that I think we should move from the present setup. 
Familiarity of the developer community with the tool used suggests we 
should not, quite apart from any other reasons. Changing to, say, an RD 
parser, would be a massive and probably error prone change and the 
benefit just doesn't seem worth it.

In fact, considering this thread I'm not sure any of the suggested steps 
will achieve Fabien's goal. ISTM that a smarter training wheels 
frontend, perhaps with some modest backend improvements, is likely to 
have better results. (Oh, you found an error near there - now what do I 
suggest belongs there?)

cheers

andrew







---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-05 Thread Fabien COELHO

Dear Tom,

 No, just redefine yyerror as a macro that passes additional parameters.

Ok!  That's just the kind of the hack-hook I was looking for!!


  The terminals are not available, they must be kept separatly if you
  want them. This can be done in yylex().

 We don't need them. Any already-shifted tokens are to the left of where
 the error is, no?

Yes and no. I agree that you don't need them for computing the follow set.
However it would help a lot to generate nicer hints. I'm looking for help
the user, and the follow set is just part of the help.

Think of typical SELECT foo, bla, FROM xxx (it looks stupid on a one
line query, but not so on a very large query) which is quite easier to
detect and hint about because of FROM is just after a comma.

The rule part is also a real issue. There is no easy way to translate
states into rules, to know whether we're somewhere in a ShowStmt or
AlterTableStmt...

If you're after a comma in state 1232, should you just say IDENT... I'd
rather say user name or table name if I can... Otherwise the hint
stays at a low parser level. Good hints requires to know about the
current context, and it is not easy to get that deep in the automaton.


 Yeah, I had come to the same conclusion --- state moves made without
 consuming input would need to be backed out if we wanted to report the
 complete follow set.  I haven't yet looked to see how to do that.

Well, following you're previous suggestion, one can redefine the YYLEX
macro to save the current state each time a token is required.


  As you noted, for things like SHOW 123, the follow set basically
  includes all keywords although you can have SHOW ALL or SHOW name.
  So, as you suggested, you can either say ident as a simplification, but

 You're ignoring the distinction between classes of keywords.  I would
 not recommend treating reserved_keywords as a subclass of ident.

Ok, I agree that it can help in this very case a little bit here because
ALL is reserved. I did not make this distinction when I played with bison.


  (5) anything that can be done would be hardwired to one version of bison.

 I think this argument is completely without merit.

Possible;-)

However I don't like writing hacks that depends on other people future
behavior.


  (b) write a new recursive descendant parser, and drop gram.y

 Been there, done that, not impressed with the idea.  There's a reason
 why people invented parser generators...

Sure. It was just a list of options;-)


  As a side effect of my inspection is that the automaton generated by bison
  could be simplified if a different tradeoff between the lexer, the parser
  and the post-processing would be chosen. Namelly, all tokens that are
  just identifiers should be dropped and processed differently.

 We are not going to reserve the keywords that are presently unreserved.

Reserving keywords would help, but I would not think it could be accepted,
because it is not SQL philosophy. SQL is about 30 years also, as old
as yacc ideas... but they did not care at the time;-) When you look at
the syntax, it was designed with a recursive parser in mind.

Part of my comment was to explain why the generated automaton is large.
SQL is a small language, but it has a big automaton in postgresql, and
this is IMHO the explanation.


 If you can think of a reasonable way to stop treating them as separate
 tokens inside the grammar without altering the user-visible behavior,
 I'm certainly interested.

I was thinking about type names especially, and maybe others.

I join a small proof-of-concept patch to drop some tokens out of the
parser. As a results, 6 tokens, 6 rules and 9 states are removed,
and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
I think others could be dealt with similarily, or with more work.


Thanks for the discussion,

-- 
Fabien.*** ./src/backend/parser/gram.y.origMon Apr  5 12:06:42 2004
--- ./src/backend/parser/gram.y Mon Apr  5 18:21:21 2004
***
*** 336,343 
AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!   BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
!   BOOLEAN_P BOTH BY
  
CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
--- 336,343 
AGGREGATE ALL ALSO ALTER ANALYSE ANALYZE AND ANY ARRAY AS ASC
ASSERTION ASSIGNMENT AT AUTHORIZATION
  
!   BACKWARD BEFORE BEGIN_P BETWEEN BINARY BIT
!   BOTH BY
  
CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
***
*** 362,368 
  
ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IN_P INCLUDING INCREMENT
INDEX INHERITS INITIALLY INNER_P INOUT INPUT_P
!   INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT
INTERVAL INTO INVOKER IS ISNULL ISOLATION
  
JOIN
--- 362,368 
  
ILIKE IMMEDIATE IMMUTABLE 

Re: [PATCHES] hint infrastructure setup (v3)

2004-04-03 Thread Fabien COELHO

Dear Tom,

  Not really in terms of state. The state should basically be the same.
  However yes in terms of explicit state that are given explicit names.
  And definitely in terms of actions, as you say.

 But mid-rule actions are implemented by inventing additional internal
 productions

Sure.

 That's not only more states, but more symbols, which is going to impose
 an O(N^2) cost on the raw tabular representation of the parsing rules.
 Maybe much of this will be bought back when bison compresses the tables,
 and maybe not.

Mmh. Maybe. I don't know at the time.

 Have you checked how much the size of gram.o grows with the stuff
 you've installed so far?

I have not looked at that. It was just a quick and dirty implementation,
just to convince myself of what can be achieved.

 (I'm also slightly worried about what this will do to parsing speed,

Well, more reductions are performed, and I'm not sure that the switch()
implementation is really intelligent. Having a hash table could help big
grammars, but that is bison problems, and I will not rewrite that.

However, I'm not sure that parsing overhead is a big issue wrt other costs
in the backend, but this is not a reason to make it explode.

  I'm afraid it looks like internal state 1232, 43425 and [...],

 The string names of the grammar symbols are all embedded in gram.c

Yes, I've noticed yytname[].

 anyway, so if you can figure out the symbols that are expected next,
 their names could be printed directly.

That is done with YYERROR_VERBOSE, but the result is really poor
most of the time, because it does not look for all possible terminals,
just the ones easilly accessible. So you have something like:

DROP TABL foo;
ERROR: syntax error at unexpected Ident TABL, LANGUAGE expected.

no comment. A better job could be done, but you need to touch a little
bit gram.c for that.

 We could alter the symbol names to be more useful to novices, or
 alternatively install an additional lookup table to substitute
 reader-friendly phrases.

Yep, I thought about that also. Some symbols are appended _P to avoid
clashes.


I'm investigating the internal way. Not really optimistic because
of the details, but I may find workarounds. I'll post later when I'll
have a definite opinion.

-- 
Fabien COELHO _ http://www.cri.ensmp.fr/~coelho _ [EMAIL PROTECTED]
   CRI-ENSMP, 35, rue Saint-Honoré, 77305 Fontainebleau cedex, France
   phone: (+33|0) 1 64 69 {voice: 48 52, fax: 47 09, standard: 47 08}
     All opinions expressed here are mine  _

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-03 Thread Fabien COELHO

  You *really* don't want to go there. If you want to see what the parser
  is doing you can run bison -r all over the grammar and examine the
  .output file. But please, let's not examine the internal states. Talk
  about unmaintainability!

 What I was suggesting was that we might be able to extract the follow
 set from bison's tables, ie, the set of grammar symbols that are legal
 next inputs given the current parse state stack.

 I surely agree that we don't want code that goes like if state is N
 then print message X ... but the follow set should be stable

These are 2 different issues.

(1) computing the set of expected following tokens.

It is possible to do that, with some processing of bison tables.


(2) give advices based on guesses:
for what I have looked so far, it could look like:

 - I'm here for rule user_list, the previous token was ','
   OR I'm here for select_expressions_list, last token was ',' and
  current token is FROM

   = HINT: remove previous comma

I don't think that (2) would be a bad idea, especially for my students.
The good point is that the cost would only be paid at error time.


 One way of describing Fabien's existing patch is that it's essentially
 keeping track of the follow set by hand :-(

Yes. I give name to existing states, and states leads to expected
follow tokens.

  Also, I suspect that bison does a good bit of
  optimisation by way of combining states that removes some of the
  information you might need, but I haven't looked into it closely.

 That could be a showstopper if true, but it's all speculation at this
 point.

I think the information is there.

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-03 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 That is done with YYERROR_VERBOSE, but the result is really poor
 most of the time, because it does not look for all possible terminals,
 just the ones easilly accessible.

I wasn't aware that bison had a built-in facility for better messages
--- this is kind of cool actually.  I played with it a little and found
that:

1. There's an arbitrary restriction in the bison code to show no more
than four alternatives; if there's more than four then it shows none
of them, rather than doing something helpful like etc.

2. The problem you exhibit with DROP seems to be due to our use of
empty productions:

 DROP TABL foo;
 ERROR: syntax error at unexpected Ident TABL, LANGUAGE expected.

It looks like the parser makes one additional state move, reducing
opt_procedural to empty, before raising the error.  We might be
able to suppress that; alternatively we could look at the stacked
state(s) and add their follow sets to the printout.  In any case it
is clear that we can extract the follow set from the tables.

A bigger problem with this, though, is the verbosity of the results
in some cases.  I diked out the limitation to four outputs and soon
found examples like

regression=# grant;
ERROR:  syntax error, unexpected ';', expecting ALL or CREATE or DELETE_P or EXECUTE 
or INSERT or REFERENCES or RULE or SELECT or TEMP or TEMPORARY or TRIGGER or UPDATE or 
USAGE at or near ; at character 6
LINE 1: grant;
 ^

which is useful, and

regression=# grant select  on t to;
ERROR:  syntax error, unexpected ';', expecting ABORT_P or ABSOLUTE_P or ACCESS or 
ACTION or ADD or AFTER or AGGREGATE or ALTER or ASSERTION or ASSIGNMENT or AT or 
BACKWARD or BEFORE or BEGIN_P or BIGINT or BIT or BOOLEAN_P or BY or CACHE or CALLED 
or CASCADE or CHAIN or CHAR_P or CHARACTER or CHARACTERISTICS or CHECKPOINT or CLASS 
or CLOSE or CLUSTER or COALESCE or COMMENT or COMMIT or COMMITTED or CONSTRAINTS or 
CONVERSION_P or CONVERT or COPY or CREATEDB or CREATEUSER or CURSOR or CYCLE or 
DATABASE or DAY_P or DEALLOCATE or DEC or DECIMAL_P or DECLARE or DEFAULTS or DEFERRED 
or DEFINER or DELETE_P or DELIMITER or DELIMITERS or DOMAIN_P or DOUBLE_P or DROP or 
EACH or ENCODING or ENCRYPTED or ESCAPE or EXCLUDING or EXCLUSIVE or EXECUTE or EXISTS 
or EXPLAIN or EXTERNAL or EXTRACT or FETCH or FIRST_P or FLOAT_P or FORCE or FORWARD 
or FUNCTION or GLOBAL or GROUP_P or HANDLER or HOLD or HOUR_P or IMMEDIATE or 
IMMUTABLE or IMPLICIT_P or INCLUDING or INCREMENT or INDEX or INHERIT!
 S or INOUT or INPUT_P or INSENSITIVE or INSERT or INSTEAD or INT_P or INTEGER or 
INTERVAL or INVOKER or ISOLATION or KEY or LANCOMPILER or LANGUAGE or LARGE_P or 
LAST_P or LEVEL or LISTEN or LOAD or LOCAL or LOCATION or LOCK_P or MATCH or MAXVALUE 
or MINUTE_P or MINVALUE or MODE or MONTH_P or MOVE or NAMES or NATIONAL or NCHAR or 
NEXT or NO or NOCREATEDB or NOCREATEUSER or NONE or NOTHING or NOTIFY or NULLIF or 
NUMERIC or OBJECT_P or OF or OIDS or OPERATOR or OPTION or OUT_P or OVERLAY or OWNER 
or PARTIAL or PASSWORD or PATH_P or PENDANT or POSITION or PRECISION or PRESERVE or 
PREPARE or PRIOR or PRIVILEGES or PROCEDURAL or PROCEDURE or READ or REAL or RECHECK 
or REINDEX or RELATIVE_P or RENAME or REPEATABLE or REPLACE or RESET or RESTART or 
RESTRICT or RETURNS or REVOKE or ROLLBACK or ROW or ROWS or RULE or SCHEMA or SCROLL 
or SECOND_P or SECURITY or SEQUENCE or SERIALIZABLE or SESSION or SET or SETOF or 
SHARE or SHOW or SIMPLE or SMALLINT or STABLE or START or STATEMENT o!
 r STATISTICS or STDIN or STDOUT or STORAGE or STRICT_P or SUBSTRING or
 SYSID or TEMP or TEMPLATE or TEMPORARY or TIME or TIMESTAMP or TOAST or TRANSACTION 
or TREAT or TRIGGER or TRIM or TRUNCATE or TRUSTED or TYPE_P or UNCOMMITTED or 
UNENCRYPTED or UNKNOWN or UNLISTEN or UNTIL or UPDATE or USAGE or VACUUM or VALID or 
VALIDATOR or VALUES or VARCHAR or VARYING or VERSION or VIEW or VOLATILE or WITH or 
WITHOUT or WORK or WRITE or YEAR_P or ZONE or IDENT at or near ; at character 22
LINE 1: grant select  on t to;
 ^

which is not real useful at all :-(.  You really want to see just
expecting IDENT in such a case.  Still we might be able to do some
postprocessing on the collected set of valid follow symbols, such as
removing all the unreserved_keywords when they are present along with
IDENT.  It'd be fairly reasonable to embed knowledge about this in
keywords.c and/or scan.l.

We'd have to write our own version of bison's verbose-error code anyway,
because the canned code doesn't support localization --- it uses
hardwired strings for expecting and so on.  But it looks possibly
doable to me.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Fabien COELHO

Dear Tom,

 I quite agree that you shouldn't do a complete implementation when it's
 not clear if we'll accept it or not.  What I'd like to see is a demo,
 basically.  How about one example of each of several different kinds
 of hints?  We need to get a feeling for what can be accomplished and how
 messy the grammar would get.

I've already tried some implementation but it was not showable, and
because the infrastructure was not in place, inappropriate hints could
be shown.

It is not that messy. Just prefix are given a name to attach a rule
for hint updates.

 BTW, it seems like you are putting in a lot of infrastructure to
 recreate externally the parse-stack state that bison has internally.

Yes.

 Maybe an interesting alternative would be to look at touching that
 state directly.  I realize that bison doesn't consider that part of
 their official API, but bison is a stable tool and seems unlikely to
 change much in the future.  We could probably get away with it...

Interesting, however:

I would not like to break postgresql portability with other parser
generators. It would be enough to reject the patch from my point of
view;-)

Using some non standard undocumented API would not help other people who
would like to touch the parser.

It would not help me either, because I would have to learn the
undocumented API;-)

I really need to give a name to the state so as to be able to attach a
hint. I cannot set how I could guess the state number given after GRANT
just from the source, and without generating lots of conflicts.

So I'll do some more programming maybe over the week-end a resubmit
something soon.

-- 
Fabien.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Bruce Momjian
Fabien COELHO wrote:
 I would not like to break postgresql portability with other parser
 generators. It would be enough to reject the patch from my point of
 view;-)

We require bison to build, period. I am sure we use enough
bison-specific stuff now.  No one has suggested another parser generator
to support either.

 
 Using some non standard undocumented API would not help other people who
 would like to touch the parser.
 
 It would not help me either, because I would have to learn the
 undocumented API;-)
 
 I really need to give a name to the state so as to be able to attach a
 hint. I cannot set how I could guess the state number given after GRANT
 just from the source, and without generating lots of conflicts.
 
 So I'll do some more programming maybe over the week-end a resubmit
 something soon.

Please show us an example of what a typical hint would look like to the
user.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Fabien COELHO

 Please show us an example of what a typical hint would look like to the
 user.

As an illustration, here is the kind of thing I had with an early
quick and dirty proof of concept implementation I made for myself
one month ago.

As a result of this first try, I've convinced myself that I should do
that cleanly and incrementally, one command at a time, with a proper clean
infrastructure, and intermediate validations...

My idea of the target audience for those hints is my students while
they're learning SQL.


--
-- test HINTs provided on parser errors.
-- 
-- it should:
-- (1) trigger all possible hints
-- (2) include ACTUAL syntax errors by REAL people
-- 
-- Note that these test assume a psql like interface which
-- can only send lexable and even commands to the backend.
-- Thus I cannot test badly closed quotes or parentheses.
-- 
--
-- (1) all hints
--
--
-- sql command expected

foo;
ERROR:  syntax error at or near foo at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

123;
ERROR:  syntax error at or near 123 at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

+;
ERROR:  syntax error at or near + at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

'hello world';
ERROR:  syntax error at or near 'hello world' at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

NULL;
ERROR:  syntax error at or near NULL at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

COLUMN;
ERROR:  syntax error at or near COLUMN at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

CREAT TABLE foo(id INT4);
ERROR:  syntax error at or near CREAT at character 1
HINT:  sql command such as SELECT, CREATE, GRANT... expected

--
-- CREATE xxx

CREATE 123;
ERROR:  syntax error at or near 123 at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE foo;
ERROR:  syntax error at or near foo at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE + CREATE ();
ERROR:  syntax error at or near + at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE INT4;
ERROR:  syntax error at or near INT4 at character 8
HINT:  OR REPLACE, TABLE, DATABASE, USER... expected

CREATE OR foo;
ERROR:  syntax error at or near foo at character 11
HINT:  REPLACE expected

--
-- DROP xxx

DROP 123;
ERROR:  syntax error at or near 123 at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP foo;
ERROR:  syntax error at or near foo at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP USER 123;
ERROR:  syntax error at or near 123 at character 11
HINT:  user id such as calvin expected

DROP SYSID 123;
ERROR:  syntax error at or near SYSID at character 6
HINT:  object such as TABLE, DATABASE, USER... expected

DROP USER SYSID 123;
ERROR:  syntax error at or near 123 at character 17

DROP GROUP 321;
ERROR:  syntax error at or near 321 at character 12
HINT:  group id such as admin expected

DROP GROUP calvin, foo;
ERROR:  syntax error at or near , at character 18
HINT:  ; expected

DROP TABLE (foo);
ERROR:  syntax error at or near ( at character 12
HINT:  table name... expected

-- 
-- CREATE USER

CREATE USER COLUMN;
ERROR:  syntax error at or near COLUMN at character 13
HINT:  user id such as calvin expected

CREATE USER 123;
ERROR:  syntax error at or near 123 at character 13
HINT:  user id such as calvin expected

CREATE USER 'hello';
ERROR:  syntax error at or near 'hello' at character 13
HINT:  user id such as calvin expected

CREATE USER calvin foo;
ERROR:  syntax error at or near foo at character 20
HINT:  user options or SET or RESET expected

CREATE USER calvin WITH foo;
ERROR:  syntax error at or near foo at character 25
HINT:  user options expected

CREATE USER calvin WITH CREATEDB foo;
ERROR:  syntax error at or near foo at character 34
HINT:  other user options or ';' expected

CREATE USER calvin PASSWORD 123;
ERROR:  syntax error at or near 123 at character 29
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH PASSWORD 123;
ERROR:  syntax error at or near 123 at character 34
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH ENCRYPTED 'pass';
ERROR:  syntax error at or near 'pass' at character 35
HINT:  PASSWORD expected

CREATE USER calvin WITH UNENCRYPTED 'pass';
ERROR:  syntax error at or near 'pass' at character 37
HINT:  PASSWORD expected

CREATE USER calvin WITH ENCRYPTED PASS 123;
ERROR:  syntax error at or near PASS at character 35
HINT:  PASSWORD expected

CREATE USER calvin WITH ENCRYPTED PASSWORD 123;
ERROR:  syntax error at or near 123 at character 44
HINT:  password string such as 'secret' expected

CREATE USER calvin WITH UNENCRYPTED PASSWORD 'secret' CREADB;
ERROR:  syntax error at or near CREADB at character 55
HINT:  other user options or ';' expected

CREATE USER calvin NOCREATEDB foo;
ERROR:  syntax error at or near foo at character 31
HINT:  other 

Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 I would not like to break postgresql portability with other parser
 generators.

I agree with Bruce's opinion that this is a nonissue.  In particular I'd
point out that the current grammar is already too big for any known yacc
clone besides bison (even with bison you need a very recent version),
and your proposed changes are going to make the grammar a whole lot
larger yet, at least in terms of states and actions.  Even if there was
any usefulness to supporting other yacc clones, the odds of making one
work seem minuscule.

 Using some non standard undocumented API would not help other people who
 would like to touch the parser.

It would help them a *lot* if it meant that they didn't have to be aware
of this stuff in order to do anything at all with the grammar.  My
fundamental objection to this proposal continues to be that it will make
the grammar unreadable and unmaintainable.  If we could push the error
message generation issues into a subroutine of yyerror() and not touch
the grammar rules at all, the chances of getting this accepted would
rise about a thousand percent.

The sample hints you show in another message still aren't impressing me
much, but in any case they look like they only depend on being able to
see what grammar symbols can follow the current point.  The last time
I studied this stuff (which was quite a while back) the follow set was
something an LR parser generator would have to compute anyway.  I don't
know whether bison's internal tables expose that set in any useful
fashion, but it surely seems worth a look.

regards, tom lane

PS: another reason for doing it that way is that I suspect your current
approach is a dead end anyhow.  You've already hit one blocker problem
where you got reduce/reduce errors when you tried to insert
state-recording actions, and you've only gone as far as hinting the very
first symbol, which is hardly one of the more complex parts of the
grammar.  There are large parts of the grammar that only manage to avoid
ambiguity by the skin of their teeth --- I don't believe you'll be able
to do anything of this sort in those areas without breaking it.  Take a
look at SelectStmt and subsidiary productions as an example ... that's
a fragile bit of work that took a long time to get right.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Fabien COELHO

Dear Tom,

  I would not like to break postgresql portability with other parser
  generators.

 I agree with Bruce's opinion that this is a nonissue.  In particular I'd
 point out that the current grammar is already too big for any known yacc
 clone besides bison (even with bison you need a very recent version),
 and your proposed changes are going to make the grammar a whole lot
 larger yet, at least in terms of states and actions.

Not really in terms of state. The state should basically be the same.
However yes in terms of explicit state that are given explicit names.
And definitely in terms of actions, as you say.


  Using some non standard undocumented API would not help other people who
  would like to touch the parser.

 It would help them a *lot* if it meant that they didn't have to be aware
 of this stuff in order to do anything at all with the grammar.  My
 fundamental objection to this proposal continues to be that it will make
 the grammar unreadable and unmaintainable.  If we could push the error
 message generation issues into a subroutine of yyerror() and not touch
 the grammar rules at all, the chances of getting this accepted would
 rise about a thousand percent.

Hmmm. I'm so much below zero! So maybe I think I won't resubmit a
infrastructure patch with some hints for some commands, as it seems
just a lose of time.


 The sample hints you show in another message still aren't impressing me
 much,

I don't mean them to be impressive! It is not an expert system or a seer
I'm planing. I just want them to be useful to my students!


 but in any case they look like they only depend on being able to
 see what grammar symbols can follow the current point.

Yes. That's basically where the parser is when it generates an error, it
is expecting something and cannot find it. I could do more, but with more
efforts.


 The last time I studied this stuff (which was quite a while back) the
 follow set was something an LR parser generator would have to compute
 anyway.

Yes.


 I don't know whether bison's internal tables expose that set in any
 useful fashion, but it surely seems worth a look.

Having a look is something I can do;-)

I'm afraid it looks like internal state 1232, 43425 and 42523, but there
may be some support enough in the generated code to get something more
interesting. It would require to be able to get textual meta informations
out of the state number, which is possible if bison/flex people did
something about it.

I can remembre something like that, somewhere deep in my memory...


 PS: another reason for doing it that way is that I suspect your current
 approach is a dead end anyhow.  You've already hit one blocker problem
 where you got reduce/reduce errors when you tried to insert
 state-recording actions, and you've only gone as far as hinting the very
 first symbol, which is hardly one of the more complex parts of the
 grammar.

These states exists anyway. Once the first keyword(s) are passed, there is
much less troubles as the SQL syntax is really verbose, so it is usually
hard to get lost. The only real problem I had passed the first symbol, for
the part I played with, is with SET [SESSION|LOCAL] SESSION AUTHORIZATION
which needed a real bit of refactoring.

Also, it is better to avoid some factorizations if the semantics
is different. For instance, user_list is used both for groups and users,
so I switched during my tests to a user_list and a group_list.

 There are large parts of the grammar that only manage to avoid
 ambiguity by the skin of their teeth --- I don't believe you'll be able
 to do anything of this sort in those areas without breaking it.  Take a
 look at SelectStmt and subsidiary productions as an example ... that's
 a fragile bit of work that took a long time to get right.

I noticed that great care has been taken to avoid any ambiguity.
I have no intention to change that.

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Andrew Dunstan
Fabien COELHO wrote:

The last time I studied this stuff (which was quite a while back) the
follow set was something an LR parser generator would have to compute
anyway.
   

Yes.

 

I don't know whether bison's internal tables expose that set in any
useful fashion, but it surely seems worth a look.
   

Having a look is something I can do;-)

I'm afraid it looks like internal state 1232, 43425 and 42523, but there
may be some support enough in the generated code to get something more
interesting. It would require to be able to get textual meta informations
out of the state number, which is possible if bison/flex people did
something about it.
 

You *really* don't want to go there. If you want to see what the parser 
is doing you can run bison -r all over the grammar and examine the 
.output file. But please, let's not examine the internal states. Talk 
about unmaintainability! Also, I suspect that bison does a good bit of 
optimisation by way of combining states that removes some of the 
information you might need, but I haven't looked into it closely.

If we really wanted to go down this route, a predictive (i.e. LL(n) 
type) parser rather than a bottom up parser would probably be more 
suitable, but I doubt anyone has any stomach for doing that work, even 
if there was agreement on the need for it.

cheers

andrew



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 ... your proposed changes are going to make the grammar a whole lot
 larger yet, at least in terms of states and actions.

 Not really in terms of state. The state should basically be the same.
 However yes in terms of explicit state that are given explicit names.
 And definitely in terms of actions, as you say.

But mid-rule actions are implemented by inventing additional internal
productions (see the bison manual's discussion of them; the O'Reilly
lex  yacc book says that all yaccs do it like that).  That's not only
more states, but more symbols, which is going to impose an O(N^2) cost
on the raw tabular representation of the parsing rules.  Maybe much of
this will be bought back when bison compresses the tables, and maybe
not.  Have you checked how much the size of gram.o grows with the stuff
you've installed so far?

(I'm also slightly worried about what this will do to parsing speed,
but evidence about that would have to wait for further development of
the patch.  In any case it'd be more attractive if the cost of this
could be paid only when we've already detected an error...)

 I'm afraid it looks like internal state 1232, 43425 and 42523, but there
 may be some support enough in the generated code to get something more
 interesting. It would require to be able to get textual meta informations
 out of the state number, which is possible if bison/flex people did
 something about it.

The string names of the grammar symbols are all embedded in gram.c
anyway, so if you can figure out the symbols that are expected next,
their names could be printed directly.  We could alter the symbol names
to be more useful to novices, or alternatively install an additional
lookup table to substitute reader-friendly phrases.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 You *really* don't want to go there. If you want to see what the parser 
 is doing you can run bison -r all over the grammar and examine the 
 .output file. But please, let's not examine the internal states. Talk 
 about unmaintainability!

What I was suggesting was that we might be able to extract the follow
set from bison's tables, ie, the set of grammar symbols that are legal
next inputs given the current parse state stack.  I surely agree that
we don't want code that goes like if state is N then print message X
... but the follow set should be stable.  One way of describing Fabien's
existing patch is that it's essentially keeping track of the follow set
by hand :-(

 Also, I suspect that bison does a good bit of 
 optimisation by way of combining states that removes some of the 
 information you might need, but I haven't looked into it closely.

That could be a showstopper if true, but it's all speculation at this
point.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Bruce Momjian
Fabien COELHO wrote:
 CREATE USER calvin WITH IN GROUP admin, CREATEDB;
 ERROR:  group admin does not exist
 
 CREATE USER calvin WITH IN GROUP admin CREATEDB foo;
 ERROR:  syntax error at or near foo at character 49
 HINT:  other user options or ';' expected
 
 CREATE USER calvin NOCREATEUSER CREATEDB foo;
 ERROR:  syntax error at or near foo at character 42
 HINT:  other user options or ';' expected
 
 CREATE USER calvin CREATEUSER foo;
 ERROR:  syntax error at or near foo at character 31
 HINT:  other user options or ';' expected

I hate to say it but I don't see these hints as being very helpful.

Basically, if you look at the developers FAQ, you should discuss what
you want to do, how you want to do it, then code.  I realize you spent a
lot of time on a patch, but I don't see a lot of value in adding such
hints.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Stephan Szabo

On Fri, 2 Apr 2004, Bruce Momjian wrote:

 Fabien COELHO wrote:
  CREATE USER calvin WITH IN GROUP admin, CREATEDB;
  ERROR:  group admin does not exist
 
  CREATE USER calvin WITH IN GROUP admin CREATEDB foo;
  ERROR:  syntax error at or near foo at character 49
  HINT:  other user options or ';' expected
 
  CREATE USER calvin NOCREATEUSER CREATEDB foo;
  ERROR:  syntax error at or near foo at character 42
  HINT:  other user options or ';' expected
 
  CREATE USER calvin CREATEUSER foo;
  ERROR:  syntax error at or near foo at character 31
  HINT:  other user options or ';' expected

 I hate to say it but I don't see these hints as being very helpful.

I can see them as potentially being useful for people who don't have alot
of knowledge of SQL or our dialect thereof.  I think some of the ones
shown may need better wording (for example the ones above basically seem
to mean go look at the help for create user which is pretty much the same
as the error on its own, but perhaps with a longer hint, it might be less
so) but I think it illustrates the point for these kinds of queries.  I'm
not sure that the HINT strings would be as meaningful in the middle of
complicated select/update/etc queries, but that would be something to see.

I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
to teach our own commands, but if it were possible to do without much of a
performance, readability or maintenance cost, it'd probably be worth
doing.  I can't really say much specifically about the patch either way on
any of those grounds (having not actually looked).


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Bruce Momjian
Stephan Szabo wrote:
  I hate to say it but I don't see these hints as being very helpful.
 
 I can see them as potentially being useful for people who don't have alot
 of knowledge of SQL or our dialect thereof.  I think some of the ones
 shown may need better wording (for example the ones above basically seem
 to mean go look at the help for create user which is pretty much the same
 as the error on its own, but perhaps with a longer hint, it might be less
 so) but I think it illustrates the point for these kinds of queries.  I'm
 not sure that the HINT strings would be as meaningful in the middle of
 complicated select/update/etc queries, but that would be something to see.
 
 I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
 to teach our own commands, but if it were possible to do without much of a
 performance, readability or maintenance cost, it'd probably be worth
 doing.  I can't really say much specifically about the patch either way on
 any of those grounds (having not actually looked).

I wonder if we could just do a \h command as a hint.  In some cases, it
might be clearer than printing some text.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-02 Thread Tom Lane
Stephan Szabo [EMAIL PROTECTED] writes:
 I'm not sure that the HINT strings would be as meaningful in the middle of
 complicated select/update/etc queries, but that would be something to see.

That's pretty much my stumbling block too.  The examples Fabien has
shown so far don't seem to me to be all that much more helpful than a
simple facility to show the \h command syntax summary.  Now \h
doesn't know anything about syntax errors within expressions, and so
I'm prepared to believe that there's some gold to be mined here if we
can get the parser to offer useful hints in the context of complex
expressions.  What I've not seen yet is any proof that that can be done
at all, much less at a reasonable cost in terms of grammar complexity...

 I'm not sure it's PostgreSQL's responsibility to teach SQL or even really
 to teach our own commands, but if it were possible to do without much of a
 performance, readability or maintenance cost, it'd probably be worth
 doing.

Right; we have to consider those costs and trade them off against the
benefit of better error reporting.

One thought here is that the extra costs might be better spent in a
client program that's in a training wheels mode.  Would it be
reasonable to invent a psql-student client that has its own version of
the parser with better error detection?  Then production clients
wouldn't have to pay any cycles towards support of novices.  I'm not
sure right now how hard it would be to keep such a client in sync with
the production backend, but it's an idea to consider...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-01 Thread Bruce Momjian
Fabien COELHO wrote:
 
 Dear patchers,
 
 Well, as the discussion rages over my previous patch submissions, I've
 time to improve the code;-)
 
 I finally figured out that there is 2 errhint functions (elog.c vs
 ipc_text.c), and the one I'm calling is the fist one, so I need to put a
 format. The second appears to ignore it's arguments after the first.
 
 Anyway, please consider the following patch for inclusion to current
 head. It validates for me. I need it to be able to go on.

Why did all the tags have to be renamed:

+ cmdGRANT: GRANT {noH;};

Also, what is typical output for a hint?  Can you show one?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-01 Thread Fabien COELHO

Dear Bruce,

 Why did all the tags have to be renamed:

   + cmdGRANT: GRANT {noH;};

that's a good question.

In order to add hints, I want to attach them to the states of the parser
automaton. So as to do that, I need I put a name on every state, so I need
to refactor the prefix that lead to a state, at give it a name.


Otherwise, I would have :

xxx: GRANT {noH;} ALL
   | GRANT {noH;} CREATE
   ;

(1) I would have to put the after GRANT hint twice, one after each
GRANT occurence.

(2) this would result in shift/reduce conflicts, as the parser does not
know which hint should be put. Well, it is the same code in both case,
but yacc does not look at that.

Also, I need to stop hints, otherwise the last 'pushed' hint would be
shown on any error later.


 Also, what is typical output for a hint?  Can you show one?

Well, the current status of the infrastructure is that there is no hint;-)
The only hint with the patch is if you do not put an SQL command.

It may look like :

psql creat table foo();
ERROR: syntax error at or near creat at character 1
creat table foo();
^
HINT: sql command such as SELECT or CREATE... ?


All other syntax errors result in no hint being displayed, thanks to
all added noH calls I put.


Also, As far as I remember, I put a new option to activate these hints.
Hints are disactivated by default. This is because some people do not
like advices and may want to turn them on or off.


-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-01 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 Also, what is typical output for a hint?  Can you show one?

 Well, the current status of the infrastructure is that there is no hint;-)

Ah, now I remember why I was waiting to review that stuff: I was expecting
you to come out with a version that actually did some things :-(

What you've got at the moment is a patch that significantly uglifies the
grammar without actually doing anything useful, or even clearly pointing
the way to what else will need to happen before it does do something
useful.  So it's difficult to form any reasoned judgment about whether
we want to commit to following this path or not.

As I said back at the beginning, I'm unconvinced that this path leads
to anything useful --- it seems like an extremely painful way of
accomplishing less than what a simple change to auto-show the command's
syntax summary could do.  To change my mind, you've got to show me some
concrete results of hints that are more useful than \h command is.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-01 Thread Fabien COELHO

Dear Tom,

  Well, the current status of the infrastructure is that there is no hint;-)

 Ah, now I remember why I was waiting to review that stuff: I was expecting
 you to come out with a version that actually did some things :-(

Well, if you wait for something from me, it is better to tell me directly.
I was waiting for anything to happen to the patch (accept, discuss or
reject) before submitting anything else.


 What you've got at the moment is a patch that significantly uglifies the
 grammar without actually doing anything useful, or even clearly pointing
 the way to what else will need to happen before it does do something
 useful.  So it's difficult to form any reasoned judgment about whether
 we want to commit to following this path or not.

I can see your point.


The reasonnable way out of the deadlock that I can suggest is the
following:

I can resubmit a new patch that would provide the needed infrastructure
AND some hints on some commands as a proof of concept of what can be
achieved.

Then you can decide whether it is worth having this kind of thing on all
commands, or not.

If not, I won't have passed 3 week-ends in the parser instead if my
garden for nothing. If you are okay then you apply, and I'll submit some
new patches later on, one command at a time, and when I have time.

Does this sound reasonnable enough?

-- 
Fabien Coelho - [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PATCHES] hint infrastructure setup (v3)

2004-04-01 Thread Tom Lane
Fabien COELHO [EMAIL PROTECTED] writes:
 I can resubmit a new patch that would provide the needed infrastructure
 AND some hints on some commands as a proof of concept of what can be
 achieved.

I quite agree that you shouldn't do a complete implementation when it's
not clear if we'll accept it or not.  What I'd like to see is a demo,
basically.  How about one example of each of several different kinds
of hints?  We need to get a feeling for what can be accomplished and how
messy the grammar would get.

BTW, it seems like you are putting in a lot of infrastructure to
recreate externally the parse-stack state that bison has internally.
Maybe an interesting alternative would be to look at touching that
state directly.  I realize that bison doesn't consider that part of
their official API, but bison is a stable tool and seems unlikely to
change much in the future.  We could probably get away with it...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])