[PATCHES] dollar quoting with flex

2004-02-24 Thread Andrew Dunstan
(Fourth try ;-)

Attached is a patch for dollar quoting in the backend and in psql (with 
the new flex scanner). I'm fairly confident about the backend (because 
this is mainly Tom's work adapted :-) ) but rather less so about psql - 
I don't entirely understand all the odd states in psql's scanner. I'm 
not sure that I have freed up memory in all the necessary cases. Nor am 
I sure what the state is or should be if we end an included file in a 
dollar-quoting state, nor how to handle such a situation. So, some extra 
eyeballs would be appreciated.

However - it does seem to work in my simple testing.

If this is all OK, the remaining tasks would include pg_dump, docs (Jon 
Jensen says he will attack these two) and some regression tests (any 
volunteers?)

cheers

andrew
Index: src/backend/parser/scan.l
===
RCS file: /projects/cvsroot/pgsql-server/src/backend/parser/scan.l,v
retrieving revision 1.114
diff -c -r1.114 scan.l
*** src/backend/parser/scan.l   21 Feb 2004 00:34:52 -  1.114
--- src/backend/parser/scan.l   24 Feb 2004 17:33:01 -
***
*** 37,42 
--- 37,43 
  extern YYSTYPE yylval;
  
  static intxcdepth = 0;/* depth of nesting in slash-star comments */
+ static char*dolqstart;  /* current $foo$ quote start string */
  
  /*
   * literalbuf is used to accumulate literal values when multiple rules
***
*** 94,99 
--- 95,101 
   *  xd delimited identifiers (double-quoted identifiers)
   *  xh hexadecimal numeric string
   *  xq quoted strings
+  *  dolq $foo$ quoted strings
   */
  
  %x xb
***
*** 101,106 
--- 103,109 
  %x xd
  %x xh
  %x xq
+ %x dolq
  
  /*
   * In order to make the world safe for Windows and Mac clients as well as
***
*** 175,180 
--- 178,194 
  xqoctesc  [\\][0-7]{1,3}
  xqcat {quote}{whitespace_with_newline}{quote}
  
+ /* $foo$ style quotes (dollar quoting)
+  * The quoted string starts with $foo$ where foo is an optional string
+  * in the form of an identifier, except that it may not contain $, 
+  * and extends to the first occurrence of an identical string.  
+  * There is *no* processing of the quoted text.
+  */
+ dolq_start[A-Za-z\200-\377_]
+ dolq_cont [A-Za-z\200-\377_0-9]
+ dolqdlm \$({dolq_start}{dolq_cont}*)?\$
+ dolqins [^$]+
+ 
  /* Double quote
   * Allows embedded spaces and other special characters into identifiers.
   */
***
*** 242,248 
  other .
  
  /*
!  * Quoted strings must allow some special characters such as single-quote
   *  and newline.
   * Embedded single-quotes are implemented both in the SQL standard
   *  style of two adjacent single quotes '' and in the Postgres/Java style
--- 256,263 
  other .
  
  /*
!  * Dollar quoted strings are totally opaque, and no escaping is done on them.
!  * Other quoted strings must allow some special characters such as single-quote
   *  and newline.
   * Embedded single-quotes are implemented both in the SQL standard
   *  style of two adjacent single quotes '' and in the Postgres/Java style
***
*** 390,395 
--- 405,439 
}
  xqEOF   { yyerror(unterminated quoted string); }
  
+ {dolqdlm}   {
+   token_start = yytext;
+   dolqstart = pstrdup(yytext);
+   BEGIN(dolq);
+   startlit();
+ }
+ dolq{dolqdlm} {
+ if (strcmp(yytext, dolqstart) == 0)
+   {
+   pfree(dolqstart);
+   BEGIN(INITIAL);
+   yylval.str = litbufdup();
+   return SCONST;
+   }
+   /*
+* When we fail to match $...$ to dolqstart, 
transfer
+* the $... part to the output, but put back 
the final
+* $ for rescanning.  Consider 
$delim$...$junk$delim$
+*/
+   addlit(yytext, yyleng-1); 
+   yyless(yyleng-1); 
+ }
+ dolq{dolqins} {
+ addlit(yytext, yyleng);
+ }
+ dolq. {
+   addlitchar(yytext[0]);
+ }
+ dolqEOF   { yyerror(unterminated dollar-quoted string); }
  {xdstart} {
token_start = yytext;
BEGIN(xd);
***
*** 407,413 
   

Re: [PATCHES] dollar quoting with flex

2004-02-24 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Attached is a patch for dollar quoting in the backend and in psql (with 
 the new flex scanner). I'm fairly confident about the backend (because 
 this is mainly Tom's work adapted :-) ) but rather less so about psql - 
 I don't entirely understand all the odd states in psql's scanner. I'm 
 not sure that I have freed up memory in all the necessary cases. Nor am 
 I sure what the state is or should be if we end an included file in a 
 dollar-quoting state, nor how to handle such a situation. So, some extra 
 eyeballs would be appreciated.

I'll take a look soon.  The psql behavior is that a new lexer is
instantiated for each include-file level, which means that quoting
states can't persist across file boundaries.  This emulates the behavior
of the old handmade lexing code, and seems fairly reasonable to me.
(By definition, you weren't in a quoting state when you recognized the
\i command, and so you shouldn't be when you come out of the include
file.)  We could argue about that if people want to reconsider it, but
it seems orthogonal to the dollar-quoting change to me.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] dollar quoting with flex

2004-02-24 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Attached is a patch for dollar quoting in the backend and in psql (with 
 the new flex scanner).

Applied with minor fixes.

 If this is all OK, the remaining tasks would include pg_dump, docs (Jon 
 Jensen says he will attack these two) and some regression tests (any 
 volunteers?)

I think plpgsql's lexer also needs to be taught about dollar-quoting.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings