Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-31 Thread Bruce Momjian
On Thu, Dec 24, 2015 at 05:28:11PM +0300, Dmitry Ivanov wrote:
> Suppose you want to upgrade from 9.4 to 9.6. In that case you would use the 
> pg_upgrade utility provided by the release 9.6, which means that it's the 
> pg_dump who would have to connect to the older instance and to prepare tuples 
> to be inserted to the pg_statistic of the newer instance. The pg_dump utility 
> would have to convert statistical data to the new format (for example, add 
> placeholders for new columns), so generated INSERT statements would be fine 
> provided that the pg_dump would be up-to-date.
> 
> The documentation states that we should always run the pg_upgrade binary of 
> the new server, not the old one [http://www.postgresql.org/docs/9.5/static/
> pgupgrade.html, Usage, #9]. This means that the pg_upgrade will definitely 
> use 
> a fresh version of pg_dump utility that is aware of all possible pitfalls.
> 
> Furthermore, each INSERT statement consists of textually-serialized columns 
> of 
> pg_statistic. Columns of 'anyarray' type are deserialized using the 
> 'array_in' 
> procedure which performs various sanity checks, including the element type 
> check. Thus it is not possible to insert an anyarray object which will cause 
> server death.

My idea was to do the insert via a function, with a version number at
the head:

SELECT pg_stats_insert('1.0', '{row value}');

When the pg_stats format is changed, the version number is bumped.  The
backend would know the pg_stats version it supports and either remap or
ignore pg_stats_insert() calls for older versions.

To get more complicated, you could have a version number for data types
too and just invalidate inserts for data type format changes, rather
than requiring the entire pg_stats version to be bumped.  I am not sure
how we would consistently record the data type name.  pg_upgrade
preserves pg_type.oid, so that would work for it, but pg_type.oid is not
preserved for non-pg_upgrade usage of non-builtin data types.  For those
cases, I guess the type name would be sufficient.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-28 Thread Dmitry Ivanov
Here's the patch with an 'array_elemtype' procedure which returns array's 
element type as an oid. It should apply to the commit fc995b. I wasn't sure if 
I shoud start a new thread.


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e08bf60..671c6d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11695,6 +11695,9 @@ SELECT NULLIF(value, '(none)') ...
 array_dims
   
   
+array_elemtype
+  
+  
 array_fill
   
   
@@ -11794,6 +11797,17 @@ SELECT NULLIF(value, '(none)') ...

 
  
+  array_elemtype(anyarray)
+ 
+
+oid
+returns the element type of an array as oid
+array_elemtype(ARRAY[1,2,3])
+23
+   
+   
+
+ 
   array_fill(anyelement, int[],
   , int[])
  
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 72d308a..c2883e9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -158,6 +158,18 @@ static int width_bucket_array_variable(Datum operand,
 			Oid collation,
 			TypeCacheEntry *typentry);
 
+/*
+ * array_elemtype :
+ *		returns the element type of the array
+ *		pointed to by "v" as an Oid.
+ */
+Datum
+array_elemtype(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *v = PG_GETARG_ANY_ARRAY(0);
+	
+	return ObjectIdGetDatum(AARR_ELEMTYPE(v));
+}
 
 /*
  * array_in :
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d8640db..de55c01 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -892,6 +892,8 @@ DATA(insert OID = 2092 (  array_upper	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s
 DESCR("array upper dimension");
 DATA(insert OID = 2176 (  array_length	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 23 "2277 23" _null_ _null_ _null_ _null_ _null_ array_length _null_ _null_ _null_ ));
 DESCR("array length");
+DATA(insert OID = 3317 (  array_elemtype   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 26 "2277" _null_ _null_ _null_ _null_ _null_	array_elemtype _null_ _null_ _null_ ));
+DESCR("array element type");
 DATA(insert OID = 3179 (  cardinality	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ array_cardinality _null_ _null_ _null_ ));
 DESCR("array cardinality");
 DATA(insert OID = 378 (  array_append	   PGNSP PGUID 12 1 0 0 0 f f f f f f i s 2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ _null_ array_append _null_ _null_ _null_ ));
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 716e756..22c0fc9 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -328,6 +328,7 @@ extern bool Array_nulls;
 /*
  * prototypes for functions defined in arrayfuncs.c
  */
+extern Datum array_elemtype(PG_FUNCTION_ARGS);
 extern Datum array_in(PG_FUNCTION_ARGS);
 extern Datum array_out(PG_FUNCTION_ARGS);
 extern Datum array_recv(PG_FUNCTION_ARGS);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-24 Thread Dmitry Ivanov
> There's still the problem that it won't work across a major version
> upgrade that makes any change whatsoever to the rowtype of pg_statistic.

I'm sorry if my previous explanation was poor, this time I am going to be 
detailed.

Suppose you want to upgrade from 9.4 to 9.6. In that case you would use the 
pg_upgrade utility provided by the release 9.6, which means that it's the 
pg_dump who would have to connect to the older instance and to prepare tuples 
to be inserted to the pg_statistic of the newer instance. The pg_dump utility 
would have to convert statistical data to the new format (for example, add 
placeholders for new columns), so generated INSERT statements would be fine 
provided that the pg_dump would be up-to-date.

The documentation states that we should always run the pg_upgrade binary of 
the new server, not the old one [http://www.postgresql.org/docs/9.5/static/
pgupgrade.html, Usage, #9]. This means that the pg_upgrade will definitely use 
a fresh version of pg_dump utility that is aware of all possible pitfalls.

Furthermore, each INSERT statement consists of textually-serialized columns of 
pg_statistic. Columns of 'anyarray' type are deserialized using the 'array_in' 
procedure which performs various sanity checks, including the element type 
check. Thus it is not possible to insert an anyarray object which will cause 
server death.


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-23 Thread Tom Lane
Dmitry Ivanov  writes:
>> That concern is exactly the reason why we never did this originally.
>> In particular, emitting raw INSERTs into pg_statistic is just plain
>> foolish; we have changed the rowtype of pg_statistic in the past and
>> are likely to do so again.  At a minimum we would need such a facility
>> to be proof against addition of more statistic "slots" (more columns)
>> to pg_statistic.

> While this approach may indeed look dumb, it is intended to be used only in 
> conjunction with 'binary-upgrade' option, which effectively means that the 
> pg_dump-generated INSERT statement has to be compatible only with the release 
> that includes this very pg_dump version. Thus, there's no need for validation.

There's still the problem that it won't work across a major version
upgrade that makes any change whatsoever to the rowtype of pg_statistic.
That's a sufficient reason to reject it.

If the amount of infrastructure required were trivial, I'd probably be
okay with just putting it in as a kluge that pg_upgrade could use for
some version transitions and not others.  But your proposal seems quite
complicated, which makes me think we could solve the more general problem
for roughly comparable amounts of work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-23 Thread Dmitry Ivanov
> That concern is exactly the reason why we never did this originally.
> In particular, emitting raw INSERTs into pg_statistic is just plain
> foolish; we have changed the rowtype of pg_statistic in the past and
> are likely to do so again.  At a minimum we would need such a facility
> to be proof against addition of more statistic "slots" (more columns)
> to pg_statistic.

While this approach may indeed look dumb, it is intended to be used only in 
conjunction with 'binary-upgrade' option, which effectively means that the 
pg_dump-generated INSERT statement has to be compatible only with the release 
that includes this very pg_dump version. Thus, there's no need for validation.

> And of course there was code to emit such a dump, producing one
> dump statement per occupied "slot" in pg_statistic plus one call to
> the other function per pg_statistic row.

> An API like this seems a good deal more future-proof than plain INSERTs.

This sounds really good, but I doubt that this is necessary if we're to just 
preserve statistical data during an upgrade.

> Ideally, ordinary users
> could use this facility to transfer statistics for their own tables, just
> as they can use pg_dump ... but without adequate validity checking, it's
> way too much of a security hazard to allow that.

This could be implemented separately from the pg_dump if needed. The latter 
proposal aims for the preservation of the statistical data during the database 
upgrade, which is a rather important feature required by many DBAs. Of course, 
it would be great if there was a way for a user to dump and restore stats for 
his own tables on a whim, but, in my opinion, it is far less important.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-22 Thread Bruce Momjian
On Fri, Nov 27, 2015 at 06:52:59PM +0300, Dmitry Ivanov wrote:
> Proposed changes to pg_dump
> ===
> 
> Now that the approach has been developed, it may be applied to improve the 
> 'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit 
> specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade' 
> mode, thus allowing us to restore saved stats after an upgrade.

I think this is great.  My only question for the list is how much does
this dumping create new compatibility requirements between major
versions.  While pg_upgrade could disable it, pg_dump can't because it
doesn't know what PG version it is being loaded into.  Can we create a
format that is portable to all future PG versions, or at least detect
incompatibility?  I am thinking we would need some pg_statistic version
number in the dump data that can cause the data to be ignored.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-22 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Nov 27, 2015 at 06:52:59PM +0300, Dmitry Ivanov wrote:
>> Now that the approach has been developed, it may be applied to improve the 
>> 'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit 
>> specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade' 
>> mode, thus allowing us to restore saved stats after an upgrade.

> I think this is great.  My only question for the list is how much does
> this dumping create new compatibility requirements between major
> versions.

That concern is exactly the reason why we never did this originally.
In particular, emitting raw INSERTs into pg_statistic is just plain
foolish; we have changed the rowtype of pg_statistic in the past and
are likely to do so again.  At a minimum we would need such a facility
to be proof against addition of more statistic "slots" (more columns)
to pg_statistic.

While at Salesforce, I did some work on this problem, and came up
with code that emitted calls to functions that would insert data for
a single "slot".  That is, the dump consisted of calls to functions
defined more or less like this:

pg_load_statistics_slot (table_oid regclass,
 attr_name name,
 slot_kind int,
 slot_op oid,
 slot_numbers float4[],
 slot_values anyarray);

and there was another one to load the non-slot columns of a pg_statistic
entry.  And of course there was code to emit such a dump, producing one
dump statement per occupied "slot" in pg_statistic plus one call to
the other function per pg_statistic row.

An API like this seems a good deal more future-proof than plain INSERTs.
Even if the underlying pg_statistic representation changes, such functions
could try to adapt the presented data to the new format, or at worst they
could be redefined as no-ops.

The major problem that I never had a very good solution for is how the
"load" function could validate that the presented data was actually valid
for the specified table attribute's data type and slot kind.  The absolute
minimum that you would want it to do is to cross-check that the
slot_values array has the correct element datatype, because if it doesn't
the backend is just about certain to crash when it tries to use the data.
I think there are probably other assumptions that would need to be
validated to be safe, depending on the code that looks at each slot kind.
I have not found an answer other than the load function knowing all there
is to know about each defined slot kind; which seems like a maintenance
problem, not to mention breaking extensions (like PostGIS) that define
their own slot kinds.

Failing any good solution to that, we could probably get by by restricting
use of these functions to superusers (which is certainly where direct use
of INSERTs would have to stop).  But that puts a pretty considerable crimp
in the usefulness of the stats dump/load process.  Ideally, ordinary users
could use this facility to transfer statistics for their own tables, just
as they can use pg_dump ... but without adequate validity checking, it's
way too much of a security hazard to allow that.

I don't have access to this code anymore, but could probably persuade
Salesforce to donate it.  Or we could just write something similar
from scratch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-08 Thread Simon Riggs
On 27 November 2015 at 15:52, Dmitry Ivanov  wrote:

Currently there is no way to backup 'pg_statistic' because columns of
> 'anyarray' type cannot be reconstructed solely with their textual
> representation. Meanwhile, all that is needed to solve this problem is a
> small
> extension capable of retrieving the element type of an 'anyarray' object
> and
> recreating this particular 'anyarray' object using the 'array_in'
> procedure.
>

Please submit a patch on core for this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services