Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-18 Thread Tom Lane
I wrote:
 [ speculation about refactoring the API of transformRelOptions ]

This morning I remembered that there's another patch in the queue with
an interest in the API and behavior of transformRelOptions:
https://commitfest.postgresql.org/action/patch_view?id=1318

While I think that one has no chance of getting committed in exactly
the submitted form, some descendant patch may well make it in; if we do
anything to transformRelOptions right now it'll create merge issues for
any work in that area.

Moreover, the amount of .data space that'd be saved by twiddling
transformRelOptions' API is negligible, so it's not really that
interesting for the purposes of this patch.  So I think the right
thing to do for now is just drop the relevant parts of this patch.
We can revisit the issue, if still needed, after the extension-options
dust settles.

I'll keep looking at the rest of this patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-18 Thread Tom Lane
Oskari Saarenmaa o...@ohmu.fi writes:
 On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
 -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
 +/* The extra NUL-terminator will make sure a warning is raised if the
 + * storage space for name is too small, otherwise when strlen(name) ==
 + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
 + */
 +#define DEF_ENC2NAME(name, codepage) { #name \0, PG_##name }
 
 - The above hunk is not related to the primary purpose of this patch.

 It sort-of is.  Without fixed size char-arrays it's not possible to move
 everything to .rodata, but fixed size char-arrays come with the drawback of
 silently dropping the NUL-terminator when strlen(str) == sizeof(array), by
 forcing a NUL-terminator in we always get a warning if it would've been
 dropped and the size of the array can then be increased.

AFAICT, this change and some similar ones are based on a false assumption.
It is *not* necessary to replace pointers by fixed-length arrays in order
to get things into .rodata.  For example, in chklocale.c where we already
had this coding:

struct encoding_match
{
enum pg_enc pg_enc_code;
const char *system_enc_name;
};

static const struct encoding_match encoding_match_list[] = {
{PG_EUC_JP, EUC-JP},
{PG_EUC_JP, eucJP},
...

what I see in gcc -S (on x86_64 Linux) is:

.section.rodata
.align 32
.type   encoding_match_list, @object
.size   encoding_match_list, 1776
encoding_match_list:
.long   1
.zero   4
.quad   .LC5
.long   1
.zero   4
.quad   .LC6
...
.section.rodata.str1.1
.LC5:
.string EUC-JP
.LC6:
.string eucJP
...

So it's all read-only data anyway, as can be confirmed with size
which shows that the .o file contains *no* data outside the text
segment.  There might be some platforms where this isn't the case,
but I don't think they're mainstream.

Converting the pointer arrangement to a fixed-length struct member
as per the patch has the following implications:

* We save the pointer, and possibly some alignment padding, at the cost
that now every string has to be padded to the maximal length.  This
might produce a net space savings if all the strings in the table
are of similar length, but it's hardly a guaranteed win.  Note also
that we no longer get to share storage with any other uses of the
same strings as plain literals.

* Possibly there's some savings in executable startup time as a
consequence of eliminating pointer-relocation work.  This would be
platform dependent, and in any case I've never heard any complaints
suggesting that this is a metric we need to care about at all.

* We now have a risk of possible silent breakage if any string reaches
the maximum length.  This is exacerbated by the temptation to not leave
a lot of daylight in the chosen maximum length, so as to minimize the
amount of bloat created.

The hunk Robert is complaining about is attempting to deal with that last
drawback; but it seems like a mighty ugly technique to me, and there are
other places where you've replaced pointers by fixed-length arrays without
adding any such protection (because it's notationally impractical).

TBH I don't find this to be an improvement, especially in cases where
it forces code changes beyond just const-ifying the data declarations.
I don't have a lot of confidence for instance that you found all the
places that were checking for a NULL pointer as an array terminator.
Having to change existing code greatly weakens the argument that this
is a correctness-improving patch, because now you have to factor in a
risk of actively breaking something.

So I think the fixed-length-array changes in this patch are a bad
idea; it's sufficient to make sure we've const-ified both the pointers
and the containing structs, as in the existing chklocale.c coding.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-18 Thread Tom Lane
I wrote:
 AFAICT, this change and some similar ones are based on a false assumption.
 It is *not* necessary to replace pointers by fixed-length arrays in order
 to get things into .rodata.

After further experimentation I find that this claim is true when
compiling normally, but apparently not when using -fpic, at least
not on RHEL6 x86_64.  Even when const-ified, the tables in encnames.o
and wchar.o end up in the data segment (though the underlying strings
are not).  So if we want a meaningful shrinkage in the size of the data
segment in libpq.so, we'd have to do something similar to what Oskari
proposes.

However, I'm still against doing so; the other points I made before
still apply, and I think on balance fixed-length arrays are still a
bad idea.  It seems like a particularly bad idea to expose a limit
on the length of an encoding name as part of the ABI defined by
pg_wchar.h, as the submitted patch did.  I'm on board with making
changes like this where they can be argued to improve correctness and
maintainability, but surely moving to fixed-length arrays is the
opposite of that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-18 Thread Tom Lane
Oskari Saarenmaa o...@ohmu.fi writes:
 This allows the variables to be moved from .data to .rodata section which
 means that more data can be shared by processes and makes sure that nothing
 can accidentally overwrite the read-only definitions.  On a x86-64 Linux
 system this moves roughly 9 kilobytes of previously writable data to the
 read-only data segment in the backend and 4 kilobytes in libpq.

Committed, with the changes mentioned upthread and some other minor
editorialization.  I also adopted Wim Lewis' suggestion to not export
pg_encname_tbl[] at all anymore, since it's hard to see what the point
is of doing anything besides pg_char_to_encoding() with it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
Oskari Saarenmaa o...@ohmu.fi writes:
 On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
 - Why change the API of transformRelOptions()?

 The comment was changed to reflect the new API, I modified
 transformRelOptions to only accept a single valid namespace to make things
 simpler in the calling code.  Nothing used more than one valid namespace
 anyway, and it allows us to just use a constant toast without having to
 create a 2 char* array with a NULL.

That may be true today, but the code was obviously designed to allow for
more than one namespace in the future.  I'm inclined to reject this part
of the patch, or at least rework it to const-ify the existing data
structure rather than editorialize on what's needed.

However, I believe this code was Alvaro's to begin with, so I'd be
interested in his opinion on how important it is for transformRelOptions
to allow more than one namespace per call.

Actually, I'm kind of wondering why the code has a concept of namespaces
at all, given the fact that it fails to store them in the resulting array.
It seems impossible to verify after the fact that an option was given with
the right namespace, so isn't the feature pretty much pointless?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
I wrote:
 However, I believe this code was Alvaro's to begin with, so I'd be
 interested in his opinion on how important it is for transformRelOptions
 to allow more than one namespace per call.

 Actually, I'm kind of wondering why the code has a concept of namespaces
 at all, given the fact that it fails to store them in the resulting array.
 It seems impossible to verify after the fact that an option was given with
 the right namespace, so isn't the feature pretty much pointless?

After thinking about it some more, I realize that the intended usage
pattern is to call transformRelOptions once for each allowed namespace.
Since each call would ignore options outside its namespace, that would
result in any options with invalid namespace names being silently ignored;
so the fix was to add a parameter saying which namespaces are valid,
and then each call checks that, independently of extracting the options
it cares about.

This design seems a bit odd to me; it's certainly wasting effort, since
each namespace check after the first one is redundant.  I'm inclined to
propose that we factor out the namespace validity checking into a separate
function, such that you do something like

void checkOptionNamespaces(List *defList, const char * const validnsps[])

just once, and then call transformRelOptions for each of the desired
namespaces; transformRelOptions's validnsps argument goes away.  In at
least some places this looks like it would net out cleaner; for instance,
there is no good reason why callers that are trying to extract toast
options should have to know which other namespaces are allowed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-03 Thread Oskari Saarenmaa
On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
 On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa o...@ohmu.fi wrote:
  This allows the variables to be moved from .data to .rodata section which
  means that more data can be shared by processes and makes sure that nothing
  can accidentally overwrite the read-only definitions.  On a x86-64 Linux
  system this moves roughly 9 kilobytes of previously writable data to the
  read-only data segment in the backend and 4 kilobytes in libpq.
 
  https://github.com/saaros/postgres/compare/constify
 
  24 files changed, 108 insertions(+), 137 deletions(-)
 
 This sounds like a broadly good thing, but I've had enough painful
 experiences with const to be a little wary.  And how much does this
 really affect data sharing?  Doesn't copy-on-write do the same thing
 for writable data?  Could we get most of the benefit by const-ifying
 one or two large data structures and forget the rest?

Thanks for the review and sorry for the late reply, I was offline for a
while.

As Wim Lewis pointed out in his mail the const data is most likely
mixed with non-const data and copy-on-write won't help with all of it. 
Also, some of the const data includes duplicates and thus .data actually
shrinks more than .rodata grows.

We'd probably get most of the space-saving benefits by just constifying the
biggest variables, but I think applying const to more things will also make
things more correct.

 Other comments:
 
 - The first hunk of the patch mutilates the comment it modifies for no
 apparent reason.  Please revert.
 
 - Why change the API of transformRelOptions()?

The comment was changed to reflect the new API, I modified
transformRelOptions to only accept a single valid namespace to make things
simpler in the calling code.  Nothing used more than one valid namespace
anyway, and it allows us to just use a constant toast without having to
create a 2 char* array with a NULL.

 -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
 +/* The extra NUL-terminator will make sure a warning is raised if the
 + * storage space for name is too small, otherwise when strlen(name) ==
 + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
 + */
 +#define DEF_ENC2NAME(name, codepage) { #name \0, PG_##name }
 
 - The above hunk is not related to the primary purpose of this patch.

It sort-of is.  Without fixed size char-arrays it's not possible to move
everything to .rodata, but fixed size char-arrays come with the drawback of
silently dropping the NUL-terminator when strlen(str) == sizeof(array), by
forcing a NUL-terminator in we always get a warning if it would've been
dropped and the size of the array can then be increased.

Thanks,
Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-02 Thread Wim Lewis
By an odd coincidence, I also decided to try to const-ify libpq
recently, and noticed this thread as I was cleaning up my patch for
submission. For what it's worth, I've attached my patch to this message.
It doesn't move as much data into the text segment as Oskari Saarenmaa's
patch does, but it is less intrusive (simply adding const modifiers here
and there). I just went after the low-hanging fruit in libpq.

As an aside, it might make sense to make pg_encname_tbl and
pg_encname_tbl_sz static, since as far as I can tell those symbols are
never used outside of encnames.c nor are they likely to be. I didn't
make that change in this patch though.

On Mon, Dec 23, 2013, Robert Haas robertmh...@gmail.com wrote:
 And how much does this really affect data sharing?  Doesn't copy-on-write do 
 the same thing for writable data?

It can have a surprisingly large effect if read-only data gets
intermixed on pages with actual read-write data and can get COWd
unnecessarily.

My motivation, though, was more about code correctness than memory
sharing, though sharing is a nice benefit--- I was examining unexpected
symbols in .data/.bss in case they represented a thread-safety problem
for my program linked against libpq. (They didn't, FWIW, except for the
known and documented issue with PQoidStatus(). Not that I really
expected to find a problem in libpq, but marking those structures const
makes it nice and clear that they're not mutable.)

diff --git a/src/backend/utils/mb/encnames.c b/src/backend/utils/mb/encnames.c
index 772d4a5..3f8c592 100644
--- a/src/backend/utils/mb/encnames.c
+++ b/src/backend/utils/mb/encnames.c
@@ -29,7 +29,7 @@
  * Karel Zak, Aug 2001
  * --
  */
-pg_encname pg_encname_tbl[] =
+const pg_encname   pg_encname_tbl[] =
 {
{
abc, PG_WIN1258
@@ -291,7 +291,7 @@ pg_encname  pg_encname_tbl[] =
}   /* last */
 };
 
-unsigned int pg_encname_tbl_sz = \
+const unsigned int pg_encname_tbl_sz = \
 sizeof(pg_encname_tbl) / sizeof(pg_encname_tbl[0]) - 1;
 
 /* --
@@ -304,7 +304,7 @@ sizeof(pg_encname_tbl) / sizeof(pg_encname_tbl[0]) - 1;
 #else
 #define DEF_ENC2NAME(name, codepage) { #name, PG_##name, codepage }
 #endif
-pg_enc2name pg_enc2name_tbl[] =
+const pg_enc2name pg_enc2name_tbl[] =
 {
DEF_ENC2NAME(SQL_ASCII, 0),
DEF_ENC2NAME(EUC_JP, 20932),
@@ -356,7 +356,7 @@ pg_enc2name pg_enc2name_tbl[] =
  * This covers all encodings except MULE_INTERNAL, which is alien to gettext.
  * --
  */
-pg_enc2gettext pg_enc2gettext_tbl[] =
+const pg_enc2gettext pg_enc2gettext_tbl[] =
 {
{PG_SQL_ASCII, US-ASCII},
{PG_UTF8, UTF-8},
@@ -469,11 +469,11 @@ clean_encoding_name(const char *key, char *newkey)
  * Search encoding by encoding name
  * --
  */
-pg_encname *
+const pg_encname *
 pg_char_to_encname_struct(const char *name)
 {
unsigned int nel = pg_encname_tbl_sz;
-   pg_encname *base = pg_encname_tbl,
+   const pg_encname *base = pg_encname_tbl,
   *last = base + nel - 1,
   *position;
int result;
@@ -521,7 +521,7 @@ pg_char_to_encname_struct(const char *name)
 int
 pg_char_to_encoding(const char *name)
 {
-   pg_encname *p;
+   const pg_encname *p;
 
if (!name)
return -1;
@@ -545,7 +545,7 @@ pg_encoding_to_char(int encoding)
 {
if (PG_VALID_ENCODING(encoding))
{
-   pg_enc2name *p = pg_enc2name_tbl[encoding];
+   const pg_enc2name *p = pg_enc2name_tbl[encoding];
 
Assert(encoding == p-encoding);
return p-name;
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 6d1cd8e..08440e9 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -55,9 +55,9 @@ static FmgrInfo *ToClientConvProc = NULL;
 /*
  * These variables track the currently-selected encodings.
  */
-static pg_enc2name *ClientEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
-static pg_enc2name *DatabaseEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
-static pg_enc2name *MessageEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
+static const pg_enc2name *ClientEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
+static const pg_enc2name *DatabaseEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
+static const pg_enc2name *MessageEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
 
 /*
  * During backend startup we can't set client encoding because we (a)
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
index 45bc3c1..6d03a10 100644
--- a/src/backend/utils/mb/wchar.c
+++ b/src/backend/utils/mb/wchar.c
@@ -1720,7 +1720,7 @@ pg_eucjp_increment(unsigned char *charptr, int length)
  * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
  *---
  */
-pg_wchar_tbl pg_wchar_table[] = {
+const pg_wchar_tbl 

Re: [HACKERS] [PATCH] Make various variables read-only (const)

2013-12-22 Thread Robert Haas
On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa o...@ohmu.fi wrote:
 This allows the variables to be moved from .data to .rodata section which
 means that more data can be shared by processes and makes sure that nothing
 can accidentally overwrite the read-only definitions.  On a x86-64 Linux
 system this moves roughly 9 kilobytes of previously writable data to the
 read-only data segment in the backend and 4 kilobytes in libpq.

 https://github.com/saaros/postgres/compare/constify

 24 files changed, 108 insertions(+), 137 deletions(-)

This sounds like a broadly good thing, but I've had enough painful
experiences with const to be a little wary.  And how much does this
really affect data sharing?  Doesn't copy-on-write do the same thing
for writable data?  Could we get most of the benefit by const-ifying
one or two large data structures and forget the rest?

Other comments:

- The first hunk of the patch mutilates the comment it modifies for no
apparent reason.  Please revert.

- Why change the API of transformRelOptions()?

-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
+/* The extra NUL-terminator will make sure a warning is raised if the
+ * storage space for name is too small, otherwise when strlen(name) ==
+ * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
+ */
+#define DEF_ENC2NAME(name, codepage) { #name \0, PG_##name }

- The above hunk is not related to the primary purpose of this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers