Re: Bugs in ecpg's macro mechanism

2024-04-15 Thread Tom Lane
Andres Freund  writes:
> On 2024-04-15 20:47:16 -0400, Tom Lane wrote:
>> Ah, thanks.  I guess this depends on getopt_long reordering arguments
>> (since the "-o outfile" bit will come later).  That is safe enough
>> in HEAD since 411b72034, but it might fail on weird platforms in v16.
>> How much do we care about that?  (We can avoid that hazard in the
>> makefile build easily enough.)

> As moving the arguments around would just be the following, I see no reason to
> just do so.

Fair enough.  I'm inclined to include that change only in v16, though.

regards, tom lane




Re: Bugs in ecpg's macro mechanism

2024-04-15 Thread Andres Freund
Hi,

On 2024-04-15 20:47:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2024-04-15 17:48:32 -0400, Tom Lane wrote:
> >> But I have no idea about making it work in meson.  Any suggestions?
> 
> > So you just want to compile define.c twice? The below should suffice:
> 
> > -  'define': ['-DCMDLINESYM=123'],
> > +  'define': ['-DCMDLINESYM=123', files('define.pgc')],
> 
> Ah, thanks.  I guess this depends on getopt_long reordering arguments
> (since the "-o outfile" bit will come later).  That is safe enough
> in HEAD since 411b72034, but it might fail on weird platforms in v16.
> How much do we care about that?  (We can avoid that hazard in the
> makefile build easily enough.)

Oh, I didn't even think of that. If we do care, we can just move the -o to
earlier. Or just officially add it as another input, that'd just be a bit of
notational overhead.

As moving the arguments around would just be the following, I see no reason to
just do so.

diff --git i/src/interfaces/ecpg/test/meson.build 
w/src/interfaces/ecpg/test/meson.build
index c1e508ccc82..d7c0e9de7d6 100644
--- i/src/interfaces/ecpg/test/meson.build
+++ w/src/interfaces/ecpg/test/meson.build
@@ -45,9 +45,10 @@ ecpg_preproc_test_command_start = [
   '--regression',
   '-I@CURRENT_SOURCE_DIR@',
   '-I@SOURCE_ROOT@' + '/src/interfaces/ecpg/include/',
+  '-o', '@OUTPUT@',
 ]
 ecpg_preproc_test_command_end = [
-  '-o', '@OUTPUT@', '@INPUT@'
+  '@INPUT@',
 ]
 
 ecpg_test_dependencies = []


Greetings,

Andres Freund




Re: Bugs in ecpg's macro mechanism

2024-04-15 Thread Tom Lane
Andres Freund  writes:
> On 2024-04-15 17:48:32 -0400, Tom Lane wrote:
>> But I have no idea about making it work in meson.  Any suggestions?

> So you just want to compile define.c twice? The below should suffice:

> -  'define': ['-DCMDLINESYM=123'],
> +  'define': ['-DCMDLINESYM=123', files('define.pgc')],

Ah, thanks.  I guess this depends on getopt_long reordering arguments
(since the "-o outfile" bit will come later).  That is safe enough
in HEAD since 411b72034, but it might fail on weird platforms in v16.
How much do we care about that?  (We can avoid that hazard in the
makefile build easily enough.)

> I assume that was just an test hack, because it leads to the build failing
> because of main being duplicated. But it'd work the same with another, "non
> overlapping", file.

Yeah, I hadn't actually worked through what to do in detail.
Here's a v2 that adds that testing.  I also added some more
user-facing doco, and fixed a small memory leak that I noted
from valgrind testing.  (It's hardly the only one in ecpg,
but it was easy to fix as part of this patch.)

regards, tom lane

From 10181143871d2fd927d8119771ecba38bb51951d Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 15 Apr 2024 20:37:50 -0400
Subject: [PATCH v2] Fix assorted bugs in ecpg's macro mechanism.

The code associated with EXEC SQL DEFINE was unreadable and full of
bugs, notably:

* it'd attempt to free a non-malloced string if the ecpg program
tries to redefine a macro that was defined on the command line;

* possible memory stomp if user writes "-D=foo";

* undef'ing or redefining a macro defined on the command line would
change the state visible to the next file, when multiple files are
specified on the command line;

* missing "break" in defining a new macro meant that redefinition
of an existing name would cause an extra entry to be added to the
definition list.  While not immediately harmful, a subsequent undef
would result in the prior entry becoming visible again.

* The interactions with input buffering are subtle and were entirely
undocumented.

It's not that surprising that we hadn't noticed these bugs,
because there was no test coverage at all of either the -D
command line switch or multiple input files.  This patch adds
such coverage (in a rather hacky way I guess).

In addition to the code bugs, the user documentation was confused
about whether the -D switch defines a C macro or an ecpg one, and
it failed to mention that you can write "-Dsymbol=value".

Discussion: https://postgr.es/m/998011.1713217...@sss.pgh.pa.us
---
 doc/src/sgml/ecpg.sgml|   8 ++
 doc/src/sgml/ref/ecpg-ref.sgml|   6 +-
 src/interfaces/ecpg/preproc/ecpg.c|  79 ++-
 src/interfaces/ecpg/preproc/pgc.l | 127 --
 src/interfaces/ecpg/preproc/type.h|  21 ++-
 .../ecpg/test/expected/sql-define.c   |  65 -
 .../ecpg/test/expected/sql-define.stderr  |  24 
 .../ecpg/test/expected/sql-define.stdout  |   3 +
 src/interfaces/ecpg/test/sql/Makefile |   3 +
 src/interfaces/ecpg/test/sql/define.pgc   |  25 
 .../ecpg/test/sql/define_prelim.pgc   |   6 +
 src/interfaces/ecpg/test/sql/meson.build  |   1 +
 12 files changed, 288 insertions(+), 80 deletions(-)
 create mode 100644 src/interfaces/ecpg/test/sql/define_prelim.pgc

diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index b332aa435d..e7a53f3c9d 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -5793,6 +5793,14 @@ EXEC SQL UPDATE Tbl SET col = MYNUMBER;
 embedded SQL query because in this case the embedded SQL precompiler is not
 able to see this declaration.

+
+   
+If multiple input files are named on the ecpg
+preprocessor's command line, the effects of EXEC SQL
+DEFINE and EXEC SQL UNDEF do not carry
+across files: each file starts with only the symbols defined
+by -D switches on the command line.
+   
   
 
   
diff --git a/doc/src/sgml/ref/ecpg-ref.sgml b/doc/src/sgml/ref/ecpg-ref.sgml
index f3b6034f42..43f2d8bdaa 100644
--- a/doc/src/sgml/ref/ecpg-ref.sgml
+++ b/doc/src/sgml/ref/ecpg-ref.sgml
@@ -93,10 +93,12 @@ PostgreSQL documentation
 
 
 
- -D symbol
+ -D symbol[=value]
  
   
-   Define a C preprocessor symbol.
+   Define a preprocessor symbol, equivalently to the EXEC SQL
+   DEFINE directive.  If no value is
+   specified, the symbol is defined with the value 1.
   
  
 
diff --git a/src/interfaces/ecpg/preproc/ecpg.c b/src/interfaces/ecpg/preproc/ecpg.c
index 93e66fc60f..cbdc7866cd 100644
--- a/src/interfaces/ecpg/preproc/ecpg.c
+++ b/src/interfaces/ecpg/preproc/ecpg.c
@@ -82,35 +82,46 @@ add_include_path(char *path)
 	}
 }
 
+/*

Re: Bugs in ecpg's macro mechanism

2024-04-15 Thread Andres Freund
Hi,

On 2024-04-15 17:48:32 -0400, Tom Lane wrote:
> I started looking into the ideas discussed at [1] about reimplementing
> ecpg's string handling.  Before I could make any progress I needed
> to understand the existing input code, part of which is the macro
> expansion mechanism ... and the more I looked at that the more bugs
> I found, not to mention that it uses misleading field names and is
> next door to uncommented.

As part of the discussion leading to [1] I had looked at parse.pl and found it
fairly impressively obfuscated and devoid of helpful comments.


> I found two ways to crash ecpg outright and several more cases in which it'd
> produce surprising behavior.

:/


> One thing it's missing is any test of the behavior when command-line macro
> definitions are carried from one file to the next one.  To test that, we'd
> need to compile more than one ecpg input file at a time.  I can see how to
> kluge the Makefiles to make that happen, basically this'd do:
> 
>  define.c: define.pgc $(ECPG_TEST_DEPENDENCIES)
> - $(ECPG) -DCMDLINESYM=123 -o $@ $<
> + $(ECPG) -DCMDLINESYM=123 -o $@ $< $<
> 
> But I have no idea about making it work in meson.  Any suggestions?

So you just want to compile define.c twice? The below should suffice:

diff --git i/src/interfaces/ecpg/test/sql/meson.build 
w/src/interfaces/ecpg/test/sql/meson.build
index e04684065b0..202dc69c6ea 100644
--- i/src/interfaces/ecpg/test/sql/meson.build
+++ w/src/interfaces/ecpg/test/sql/meson.build
@@ -31,7 +31,7 @@ pgc_files = [
 ]
 
 pgc_extra_flags = {
-  'define': ['-DCMDLINESYM=123'],
+  'define': ['-DCMDLINESYM=123', files('define.pgc')],
   'oldexec': ['-r', 'questionmarks'],
 }
 

I assume that was just an test hack, because it leads to the build failing
because of main being duplicated. But it'd work the same with another, "non
overlapping", file.

Greetings,

Andres Freund




Bugs in ecpg's macro mechanism

2024-04-15 Thread Tom Lane
I started looking into the ideas discussed at [1] about reimplementing
ecpg's string handling.  Before I could make any progress I needed
to understand the existing input code, part of which is the macro
expansion mechanism ... and the more I looked at that the more bugs
I found, not to mention that it uses misleading field names and is
next door to uncommented.  I found two ways to crash ecpg outright
and several more cases in which it'd produce surprising behavior.
As an example,

$ cd .../src/interfaces/ecpg/test/preproc/
$ ../../preproc/ecpg --regression -I./../../include -I. -DNAMELEN=99 -o 
define.c define.pgc
munmap_chunk(): invalid pointer
Aborted

Attached is a patch that cleans all that up and attempts to add a
little documentation about how things work.  One thing it's missing
is any test of the behavior when command-line macro definitions are
carried from one file to the next one.  To test that, we'd need to
compile more than one ecpg input file at a time.  I can see how
to kluge the Makefiles to make that happen, basically this'd do:

 define.c: define.pgc $(ECPG_TEST_DEPENDENCIES)
-   $(ECPG) -DCMDLINESYM=123 -o $@ $<
+   $(ECPG) -DCMDLINESYM=123 -o $@ $< $<

But I have no idea about making it work in meson.  Any suggestions?

regards, tom lane

[1] https://www.postgresql.org/message-id/3897526.1712710536%40sss.pgh.pa.us

From 9fa59a96edf28d0b5e7a483234b3ae70a95046d5 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 15 Apr 2024 17:33:30 -0400
Subject: [PATCH v1] Fix assorted bugs in ecpg's macro mechanism.

The code associated with EXEC SQL DEFINE was unreadable and full of
bugs, notably:

* it'd attempt to free a non-malloced string if the ecpg program
tries to redefine a macro that was defined on the command line;

* possible memory stomp if user writes "-D=foo";

* undef'ing or redefining a macro defined on the command line would
change the state visible to the next file, when multiple files are
specified on the command line;

* missing "break" in defining a new macro meant that redefinition
of an existing name would cause an extra entry to be added to the
definition list.  While not immediately harmful, a subsequent undef
would result in the prior entry becoming visible again.

* The interactions with input buffering are subtle and were entirely
undocumented.

In addition to the code bugs, the user documentation was confused
about whether the -D switch defines a C macro or an ecpg one, and
it failed to mention that you can write "-Dsymbol=value".

This patch includes test additions that exercise the first and
fourth of these complaints, but I'm not sure how to persuade our
test scaffolding to compile more than one file at once.
---
 doc/src/sgml/ref/ecpg-ref.sgml|   6 +-
 src/interfaces/ecpg/preproc/ecpg.c|  79 ++-
 src/interfaces/ecpg/preproc/pgc.l | 125 --
 src/interfaces/ecpg/preproc/type.h|  21 ++-
 .../ecpg/test/expected/sql-define.c   |  45 ++-
 .../ecpg/test/expected/sql-define.stderr  |  24 
 .../ecpg/test/expected/sql-define.stdout  |   3 +
 src/interfaces/ecpg/test/sql/Makefile |   3 +
 src/interfaces/ecpg/test/sql/define.pgc   |  20 +++
 src/interfaces/ecpg/test/sql/meson.build  |   1 +
 10 files changed, 247 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/ref/ecpg-ref.sgml b/doc/src/sgml/ref/ecpg-ref.sgml
index f3b6034f42..43f2d8bdaa 100644
--- a/doc/src/sgml/ref/ecpg-ref.sgml
+++ b/doc/src/sgml/ref/ecpg-ref.sgml
@@ -93,10 +93,12 @@ PostgreSQL documentation
 
 
 
- -D symbol
+ -D symbol[=value]
  
   
-   Define a C preprocessor symbol.
+   Define a preprocessor symbol, equivalently to the EXEC SQL
+   DEFINE directive.  If no value is
+   specified, the symbol is defined with the value 1.
   
  
 
diff --git a/src/interfaces/ecpg/preproc/ecpg.c b/src/interfaces/ecpg/preproc/ecpg.c
index 93e66fc60f..cbdc7866cd 100644
--- a/src/interfaces/ecpg/preproc/ecpg.c
+++ b/src/interfaces/ecpg/preproc/ecpg.c
@@ -82,35 +82,46 @@ add_include_path(char *path)
 	}
 }
 
+/*
+ * Process a command line -D switch
+ */
 static void
 add_preprocessor_define(char *define)
 {
-	struct _defines *pd = defines;
-	char	   *ptr,
-			   *define_copy = mm_strdup(define);
+	/* copy the argument to avoid relying on argv storage */
+	char	   *define_copy = mm_strdup(define);
+	char	   *ptr;
+	struct _defines *newdef;
 
-	defines = mm_alloc(sizeof(struct _defines));
+	newdef = mm_alloc(sizeof(struct _defines));
 
 	/* look for = sign */
 	ptr = strchr(define_copy, '=');
 	if (ptr != NULL)
 	{
+		/* symbol has a value */
 		char	   *tmp;
 
-		/* symbol has a value */
-		for (tmp = ptr - 1; *tmp == ' '; tmp--);
+		/* strip any spaces between name and '=' */
+		for (tmp = ptr - 1; tmp >=