Re: Sync ECPG scanner with core

2018-11-20 Thread John Naylor
I wrote:

> On 11/14/18, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> Maybe time to compile the ecpg test cases during "make check-world"?
>>
>> I'm dubious that the lexer is a significant part of that, though I could
>> be wrong ...
>
> If it were, it'd be much easier to try a Flex flag other than the
> default, most compact representation, something like -Cfe. That'd be a
> prerequisite for no-backtracking to matter anyway. That's easy enough
> to test, so I'll volunteer to do that sometime.

The preproc phase of make check in the ecpg dir only takes 150ms.
Compiling and linking the test executables takes another 2s, and
running the tests takes another 5s on my machine. Even without setting
up and tearing down the temp instance, preproc is trivial.

-John Naylor



Re: Sync ECPG scanner with core

2018-11-13 Thread John Naylor
On 11/14/18, Tom Lane  wrote:
> Looks like good stuff, pushed with a small additional amount of work.

Thanks.

>> -Does ECPG still need its own functions for mm_alloc() and
>> mm_strdup(), if we have equivalents in src/common?
>
> The error handling isn't the same, so I don't think we can just replace
> them with the src/common functions.  It is, however, potentially worth
> our trouble to fix things so that within pgc.l, palloc() and pstrdup()
> are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
> to the core lexer.
>
> I'd be pretty tempted to also change all the hard references to
> base_yylval to go through a pointer variable, so that diffs like
> this go away:
>
> -yylval->str = pstrdup(yytext);
> +base_yylval.str = mm_strdup(yytext);
>
> I believe that that'd result in more compact code, too --- on most
> machines, references to global variables are more expensive in
> code bytes than indirecting through a local pointer variable.

I'll look into that as time permits.

On 11/14/18, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> On 2018-Nov-13, Tom Lane wrote:
>>> More generally, I wonder if it'd be practical to make pgc.l backup-free
>>> altogether.  I'm not sure that the resulting gain in lexing speed would
>>> really be of interest to anyone,
>
>> Maybe time to compile the ecpg test cases during "make check-world"?
>
> I'm dubious that the lexer is a significant part of that, though I could
> be wrong ...

If it were, it'd be much easier to try a Flex flag other than the
default, most compact representation, something like -Cfe. That'd be a
prerequisite for no-backtracking to matter anyway. That's easy enough
to test, so I'll volunteer to do that sometime.

-John Naylor



Re: Sync ECPG scanner with core

2018-11-13 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Nov-13, Tom Lane wrote:
>> More generally, I wonder if it'd be practical to make pgc.l backup-free
>> altogether.  I'm not sure that the resulting gain in lexing speed would
>> really be of interest to anyone,

> Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

regards, tom lane



Re: Sync ECPG scanner with core

2018-11-13 Thread Alvaro Herrera
On 2018-Nov-13, Tom Lane wrote:

> More generally, I wonder if it'd be practical to make pgc.l backup-free
> altogether.  I'm not sure that the resulting gain in lexing speed would
> really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Sync ECPG scanner with core

2018-11-13 Thread Tom Lane
John Naylor  writes:
> Commit 21c8ee7946 was one of several that sync'd the backend scanner
> with the psql scanner, with the commit message stating "Someday it'd
> be nice to make ecpg's pgc.l more easily diff'able too, but today is
> not that day." Attached is a set of patches to make progress towards
> that. The ECPG tests pass. There's probably more that could be done,
> but this seems like a good base to start from.

Looks like good stuff, pushed with a small additional amount of work.

> -base_yyerror() is used once, but every other call site uses
> mmerror(), so use that instead.

Yeah, I think this is a bug, though minor.  The other places that
deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though,
so I did it like that.

> In the course of doing this, I noticed a few other possible ECPG
> scanner cleanups to consider, although the payoff might not justify
> the work involved:

> -Copy process_integer_literal() from the core scanner (maybe only if
> decimal_fail is also brought over).

I went ahead and did both parts of that, mainly because as things
stood the ecpg lexer would produce a different token sequence for
"1..10" than the core would.  I don't think this really matters
right at the moment, but if for instance ecpg decided to insert
spaces where it thought the token boundaries were, it would matter.

> -Match core's handling of unicode escapes, even though pgc.l is not 
> backup-free.

Yeah, the amount of remaining diffs due to the omission of the anti-backup
rules is a bit annoying and confusing.

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether.  I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone, but just in terms of using uniform lexer
design rules, it might be worthwhile.  I tried a run with -b just to see,
and indeed there are a bunch of backup cases in the ECPG-specific rules,
so it'd take some work.

> -Does ECPG still need its own functions for mm_alloc() and
> mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions.  It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:

-yylval->str = pstrdup(yytext);
+base_yylval.str = mm_strdup(yytext);

I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

> -Use reentrant APIs?

Hm.  There's no functional value in that for ecpg, but maybe worth doing
anyway to get rid of diffs like

-addlit(yytext, yyleng, yyscanner);
+addlit(yytext, yyleng);

Guess it depends on how nitpicky you're feeling.  In principle,
reducing these remaining diffs would ease future maintenance of
the lexers, but they're not something we change often.

regards, tom lane



Re: Sync ECPG scanner with core

2018-10-02 Thread John Naylor
It seems the pach tester is confused by the addition of the
demonstration diff file. I'm reattaching just the patchset to see if
it turns green.

-John Naylor
From 107e3c8a0b65b0196ea4370a724c8b2a1b0fdf79 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 30 Sep 2018 12:51:41 +0700
Subject: [PATCH v1 1/4] First pass at syncing ECPG scanner with the core
 scanner.

Adjust whitespace and formatting, clean up some comments, and move
the block of whitespace rules.
---
 src/backend/parser/scan.l |   2 +-
 src/fe_utils/psqlscan.l   |   2 +-
 src/interfaces/ecpg/preproc/pgc.l | 773 --
 3 files changed, 408 insertions(+), 369 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 950b8b8591..a2454732a1 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -192,7 +192,7 @@ extern void core_yyset_column(int column_no, yyscan_t yyscanner);
  * XXX perhaps \f (formfeed) should be treated as a newline as well?
  *
  * XXX if you change the set of whitespace characters, fix scanner_isspace()
- * to agree, and see also the plpgsql lexer.
+ * to agree.
  */
 
 space			[ \t\n\r\f]
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index fdf49875a7..25253b54ea 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -151,7 +151,7 @@ extern void psql_yyset_column(int column_no, yyscan_t yyscanner);
  * XXX perhaps \f (formfeed) should be treated as a newline as well?
  *
  * XXX if you change the set of whitespace characters, fix scanner_isspace()
- * to agree, and see also the plpgsql lexer.
+ * to agree.
  */
 
 space			[ \t\n\r\f]
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 0792118cfe..b96f17ca20 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -108,16 +108,19 @@ static struct _if_value
  * We use exclusive states for quoted strings, extended comments,
  * and to eliminate parsing troubles for numeric strings.
  * Exclusive states:
- *	 bit string literal
- *	 extended C-style comments in C
- *	 extended C-style comments in SQL
- *	 delimited identifiers (double-quoted identifiers) - thomas 1997-10-27
- *	 hexadecimal numeric string - thomas 1997-11-16
- *	 standard quoted strings - thomas 1997-07-30
- *	 standard quoted strings in C - michael
- *	 extended quoted strings (support backslash escape sequences)
- *	 national character quoted strings
+ *   bit string literal
+ *   extended C-style comments in C
+ *   extended C-style comments in SQL
+ *   delimited identifiers (double-quoted identifiers)
+ *  
+ *   hexadecimal numeric string
+ *   standard quoted strings
+ *   extended quoted strings (support backslash escape sequences)
+ *   national character quoted strings
+ *   standard quoted strings in C
  *   $foo$ quoted strings
+ *  
+ *  
  *   quoted identifier with Unicode escapes
  *   quoted string with Unicode escapes
  */
@@ -138,6 +141,48 @@ static struct _if_value
 %x xui
 %x xus
 
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline.  A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems.  SQL-style comments, which start with -- and extend to the
+ * next newline, are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment.  This is correct.  Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix ecpg_isspace()
+ * to agree.
+ */
+
+space			[ \t\n\r\f]
+horiz_space		[ \t\f]
+newline			[\n\r]
+non_newline		[^\n\r]
+
+comment			("--"{non_newline}*)
+
+whitespace		({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated.  Silly, but who are we
+ * to argue?  Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+horiz_whitespace		({horiz_space}|{comment})
+whitespace_with_newline	({horiz_whitespace}*{newline}{whitespace}*)
+
+quote			'
+quotestop		{quote}{whitespace}*
+quotecontinue	{quote}{whitespace_with_newline}{quote}
+quotefail		{quote}{whitespace}*"-"
+
 /* Bit string
  */
 xbstart			[bB]{quote}
@@ -216,17 +261,17 @@ xdcinside		({xdcqq}|{xdcqdq}|{xdcother})
  * The "extended comment" syntax closely resembles allowable operator syntax.
  * The tricky part here is to get lex to recognize a string starting with
  * slash-star as a comment, when interpreting it as an operator would produce
- * a longer match --- remember lex will prefer a longer match!	Also, if we
+ * a longer match 

Re: Sync ECPG scanner with core

2018-10-01 Thread John Naylor
On 9/30/18, Michael Paquier  wrote:
> It would be nice to add this patch to the next commit fest:
> https://commitfest.postgresql.org/20/

Done, thanks.

-John Naylor



Re: Sync ECPG scanner with core

2018-09-30 Thread Michael Paquier
On Sun, Sep 30, 2018 at 04:32:53PM +0700, John Naylor wrote:
> Commit 21c8ee7946 was one of several that sync'd the backend scanner
> with the psql scanner, with the commit message stating "Someday it'd
> be nice to make ecpg's pgc.l more easily diff'able too, but today is
> not that day." Attached is a set of patches to make progress towards
> that. The ECPG tests pass. There's probably more that could be done,
> but this seems like a good base to start from.

It would be nice to add this patch to the next commit fest:
https://commitfest.postgresql.org/20/
--
Michael


signature.asc
Description: PGP signature