Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)

2013-06-01 Thread Fabien COELHO



New submission which put option help in alphabetical position, as
per Peter Eisentraut f0ed3a8a99b052d2d5e0b6153a8907b90c486636

This is for reference to the next commitfest.


Patch update after conflict induced by pg-indentation, for the next 
commitfest.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 8ff6623..c583f39 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -164,6 +164,7 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+int			progress = 0;   /* thread progress report every this seconds */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -354,6 +355,8 @@ usage(void)
 		  protocol for submitting queries to server (default: simple)\n
 		 -n   do not run VACUUM before tests\n
 		 -N   do not update tables \pgbench_tellers\ and \pgbench_branches\\n
+		 -P SEC, --progress SEC\n
+		  show thread progress report about every SEC seconds\n
 		 -r   report average latency per command\n
 		 -s NUM   report this scale factor in output\n
 		 -S   perform SELECT-only transactions\n
@@ -2112,6 +2115,7 @@ main(int argc, char **argv)
 		{unlogged-tables, no_argument, unlogged_tables, 1},
 		{sampling-rate, required_argument, NULL, 4},
 		{aggregate-interval, required_argument, NULL, 5},
+		{progress, required_argument, NULL, 'P'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2178,7 +2182,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:, long_options, optindex)) != -1)
+	while ((c = getopt_long(argc, argv, ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:, long_options, optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2333,6 +2337,16 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 'P':
+progress = atoi(optarg);
+if (progress = 0)
+{
+	fprintf(stderr,
+	   thread progress delay (-P) must not be negative (%s)\n,
+			optarg);
+	exit(1);
+}
+break;
 			case 0:
 /* This covers long options which take no argument. */
 break;
@@ -2695,6 +2709,9 @@ threadRun(void *arg)
 	int			nstate = thread-nstate;
 	int			remains = nstate;		/* number of remaining clients */
 	int			i;
+	/* for reporting progress: */
+	int64		last_report = INSTR_TIME_GET_MICROSEC(thread-start_time);
+	int64		last_count = 0;
 
 	AggVals		aggs;
 
@@ -2858,6 +2875,29 @@ threadRun(void *arg)
 st-con = NULL;
 			}
 		}
+
+		/* per thread progress report, about every 5s */
+		if (progress)
+		{
+			instr_time now_time;
+			int64 now, run;
+			INSTR_TIME_SET_CURRENT(now_time);
+			now = INSTR_TIME_GET_MICROSEC(now_time);
+			run = now - last_report;
+			if (run = progress * 100)
+			{
+/* generate and show report */
+int64 count = 0;
+for (i=0; instate; i++)
+	count += state[i].cnt;
+fprintf(stderr, thread %d running at %f tps after %.1f s\n,
+		thread-tid, 100.0 * (count-last_count) / run,
+		(now - INSTR_TIME_GET_MICROSEC(thread-start_time))/
+		100.0);
+last_count = count;
+last_report = now;
+			}
+		}
 	}
 
 done:
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index e9900d3..e58ea58 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -392,6 +392,16 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
  /varlistentry
 
  varlistentry
+  termoption-P/option replaceablesec//term
+  termoption--progress/option replaceablesec//term
+  listitem
+   para
+	Show thread progress report about every literalsec/ seconds.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-s/option replaceablescale_factor//term
   listitem
para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Michael Paquier
On Sat, Jun 1, 2013 at 4:55 AM, Alvaro Herrera alvhe...@2ndquadrant.comwrote:

 Heikki Linnakangas wrote:

  The manual says:
  pg_is_in_backup()boolTrue if an on-line exclusive backup is
 still in progress.
 
  So clearly that is intentional. That could use some rephrasing,
  though; a layman won't know what an exclusive backup is.

 Heck, I don't understand what it is either.

Same here. Does it mean taking a backup not with pg_basebackup but by
executing yourself external operations between pg_start/stop_backup calls?
-- 
Michael


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Andres Freund
On 2013-06-01 17:05:57 +0900, Michael Paquier wrote:
 On Sat, Jun 1, 2013 at 4:55 AM, Alvaro Herrera 
 alvhe...@2ndquadrant.comwrote:
 
  Heikki Linnakangas wrote:
 
   The manual says:
   pg_is_in_backup()boolTrue if an on-line exclusive backup is
  still in progress.
  
   So clearly that is intentional. That could use some rephrasing,
   though; a layman won't know what an exclusive backup is.
 
  Heck, I don't understand what it is either.
 
 Same here. Does it mean taking a backup not with pg_basebackup but by
 executing yourself external operations between pg_start/stop_backup calls?

Basically yes. pg_start/stop_backup places the backup label into the data
directory itself so there can only be one of them at a time since it has
to have a fixed name. With the streaming protocol the backup label is
only added to the streamed data, so there can be multiple ones at the
same time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Combo xids

2013-06-01 Thread Simon Riggs
Currently, we hold xmin and xmax for each tuple.

For xmax, we have the multixact mechanism that allows us to represent
an array of xids with just a single pseudo xid.

So why not hold xmin and xmax as part of a multixact? Instead of
caching two xids on each tuple, why not hold just one and allow access
to the array of xids via multixact?

An idea very similar to combo cids, hence the name combo xids.

When would this make sense?
Frequently. Most of the time a tuple needs only one xid set. In most
cases, we set xmin and xmax a long time apart. Very few cases end with
both of them set inside the *same* xmin horizon. In a heavy
transactional enviroment, the horizon moves forwards quickly, on the
order of a few seconds. Very few rows get inserted and then
updated/deleted that quickly. With long reporting queries, data tends
to be updated less, so again the rows aren't touched within the same
horizon. As a result, we hardly ever need both xmin and xmax at the
same time - when we need to set xmax, xmin is already
committed/cleaned.

What is the benefit?
Merging xmin/xmax would save 4 bytes per row. On servers with 8 byte
word length, that means that we'd save 8 bytes per row for tables that
have between 9 and 40 columns. Which is the majority of tables.

How?
Clearly this would require changes to tuple format and therefore not
something I would suggest for this year ahead, but seems worth getting
the idea down, even if it gets ruled out.

Multixact is now persistent, so this change wouldn't take much to implement.

Given that it would require a change in in-disk format, we might also
consider that it could be possible to cache multixactid data on the
disk blocks that need it. A list of multixactids could be stored in
amongst the tuples on block, with the head of the list being a 2 byte
addition to the block header. Which could be used to reduce the
overhead of multixactid use.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Deferring transaction wraparound

2013-06-01 Thread Simon Riggs
As we get faster, we'll need to wrap the xid counter more quickly. At
current full speed, we could wrap the xid counter every 72 hours.

This is a concern for very large tables, since currently we have to
rescan the whole table. This has my interest for change in the next
release.

We could solve that with a freeze map. Avoiding wraparound vacuums
will get harder and harder for us and an approach which is a minor
improvement on existing freezing won't help for long. People often ask
about 8 byte xids directly, to which I reply: been discussed, major
downsides. So I've been trying to rethink this from first principles
to come up with a better idea.

A much better idea is to hold the xmin epoch on the tuple, in addition
to the xid, if there was a good place to hold this.

CommandId is 32 bits, but very few commands need 2 billion commands in
one transaction. So the suggestion is that we optionally store the xid
epoch in 16 bits of the cmdid field. We would still allow very large
commandIds and combocids, but we optimise the common usage. By doing
this, we will be able to reduce xid wraparounds from every 72 hours to
once every 538 years, at current transaction rates, in most cases. And
in much less busy databases, this will drop to essentially never.

In detail: Whenever we write a tuple with a t_cid of less than 65536
we set a new flag, infomask2 HEAP_HAS_EPOCH and write the xid epoch to
the remaining 16 bits. (Or at least, the epoch modulo 65536). So this
approach is binary compatible with current storage format (even though
I was happy to break it if necessary, I don't think that is helpful).

We hold the relnextfreezeepoch on pg_class, which starts at the epoch
when the table is created -1 (or 65535 at bootstrap). i.e. we next do
a freeze scan in 65536 epochs. Anytime we write a t_cid that is 65535
we reset the relnextfreezeepoch for the table to currentepoch+1 using
a non-transactional update. Autovacuum looks at the relnextfreezeepoch
when deciding whether to kick off a wraparound freeze vacuum. Setting
relnextfreezeepoch needs good interlocks to avoid resetting it when a
long running transaction is running. Various designs, but suggest
using something similar to vacuum cycleid, for when a long running
transaction and an wraparound freeze vacuum occur concurrently. That
way we only need to take special care when we have a very long
transaction running.

Very long transactions, that is transactions with 65536 commands will
need freezing within 1 cycle just as we currently do, but only for
changes made in the later part of the transaction. However, those are
typically pg_dump reloads and we should be able to use COPY FREEZE
with those so they get loaded frozen and don't then need later
freezing at all.

Hang on! Which xid is the epoch for? Well, for xmin of course. If xmax
is set, its either a lock or a delete. And either way we get to clean
things up with regular vacuums, which we would still need to do each
cycle. It's only the xmin that is annoying, because there isn't much
other need to revisit those data blocks. If we don't like that, we
could use a few extra bits from the epoch field to determine which xid
it applies to, but I think that's too much.

This seemed a bit radical when I first thought of this, but it still
seems solid now.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimising Foreign Key checks

2013-06-01 Thread Simon Riggs
FK checks can be expensive, especially when loading large volumes of
data into an existing table or partition. A couple of ideas for
improving performance are discussed here:

1. Use Case: Bulk loading
COPY pgbench_accounts;  -- references pgbench_branches with many
repeated values

Proposal: Transactions that need multiple checks can be optimised by
simply LOCKing the whole referenced table, once. We can then hold the
referenced table as a Hash, like we do with a Hash Join (its almost
exactly the same thing). This works in two ways: it speeds up checks
and it also reduces the locking overhead.

This would require explicit permission of the user, which would be
given by a new table parameter, set on the referenced table.

  WITH (foreign_key_lock_level = row | table)

Setting this would lock out changes on that table, so would only be
suitable for read-mostly tables. But that is exactly the most
frequently referenced table in a FK anyway, reference tables, so the
optimisation is appropriate in probably the majority of cases.

2. Use Case: Transactional repetition
BEGIN;
INSERT INTO order VALUES (ordid, )
INSERT INTO order_line VALUES (ordid, 1, .)
INSERT INTO order_line VALUES (ordid, 2, .)
INSERT INTO order_line VALUES (ordid, 3, .)
INSERT INTO order_line VALUES (ordid, 4, .)
...
COMMIT;
The inserts into order_line repeatedly execute checks against the same
ordid. Deferring and then de-duplicating the checks would optimise the
transaction.

Proposal: De-duplicate multiple checks against same value. This would
be implemented by keeping a hash of rows that we had already either
inserted and/or locked as the transaction progresses, so we can use
the hash to avoid queuing up after triggers.

We could also use this technique to de-duplicate checks within a
single statement.

In both cases we are building up a local hash table with values and
then using those values to avoid queuing constraint triggers. So code
is similar for both.

Thoughts?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-01 Thread Fabien COELHO


New submission for the next commit fest.

This new version also reports the average lag time, i.e. the delay between 
scheduled and actual transaction start times. This may help detect whether 
things went smothly, or if at some time some delay was introduced because 
of the load and some catchup was done afterwards.


Question 1: should it report the maximum lang encountered?

Question 2: the next step would be to have the current lag shown under 
option --progress, but that would mean having a combined --throttle 
--progress patch submission, or maybe dependencies between patches.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 8ff6623..9b5adc2 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * whether clients are throttled to a given rate, expressed as a delay in us.
+ * 0, the default means no throttling.
+ */
+int64		throttle = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -205,6 +211,9 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	int64		trigger;		/* previous/next throttling (us) */
+	bool		throttled;  /* whether current transaction was throttled */
+	int64		throttle_lag;   /* transaction lag behind throttling */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -348,6 +357,9 @@ usage(void)
 		 -D VARNAME=VALUE\n
 		  define variable for use by custom script\n
 		 -f FILENAME  read transaction script from FILENAME\n
+		 -H SPEC, --throttle SPEC\n
+		  delay in second to throttle each client\n
+		  sample specs: 0.025 40tps 25ms 25000us\n
 		 -j NUM   number of threads (default: 1)\n
 		 -l   write transaction times to log file\n
 		 -M simple|extended|prepared\n
@@ -902,13 +914,40 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 top:
 	commands = sql_files[st-use_file];
 
+	/* handle throttling once per transaction by inserting a sleep.
+	 * this is simpler than doing it at the end.
+	 */
+	if (throttle  ! st-throttled)
+	{
+		/* compute delay to approximate a Poisson distribution
+		 * 100 = 13.8 .. 0 multiplier
+		 * if transactions are too slow or a given wait shorter than
+		 * a transaction, the next transaction will start right away.
+		 */
+		int64 wait = (int64)
+			throttle * -log(getrand(thread, 1, 100)/100.0);
+		st-trigger += wait;
+		st-sleeping = 1;
+		st-until = st-trigger;
+		st-throttled = true;
+		if (debug)
+			fprintf(stderr, client %d throttling INT64_FORMAT us\n,
+	st-id, wait);
+	}
+
 	if (st-sleeping)
 	{			/* are we sleeping? */
 		instr_time	now;
-
+		int64 now_us;
 		INSTR_TIME_SET_CURRENT(now);
-		if (st-until = INSTR_TIME_GET_MICROSEC(now))
+		now_us = INSTR_TIME_GET_MICROSEC(now);
+		if (st-until = now_us)
+		{
 			st-sleeping = 0;	/* Done sleeping, go ahead with next command */
+			if (throttle  st-state==0)
+/* measure lag of throttled transaction */
+st-throttle_lag += (now_us - st-until);
+		}
 		else
 			return true;		/* Still sleeping, nothing to do here */
 	}
@@ -1091,6 +1130,7 @@ top:
 			st-state = 0;
 			st-use_file = (int) getrand(thread, 0, num_files - 1);
 			commands = sql_files[st-use_file];
+			st-throttled = false;
 		}
 	}
 
@@ -2012,7 +2052,8 @@ process_builtin(char *tb)
 
 /* print out results */
 static void
-printResults(int ttype, int normal_xacts, int nclients,
+printResults(int ttype, int normal_xacts,
+			 CState *clients, int nclients,
 			 TState *threads, int nthreads,
 			 instr_time total_time, instr_time conn_total_time)
 {
@@ -2052,6 +2093,23 @@ printResults(int ttype, int normal_xacts, int nclients,
 		printf(number of transactions actually processed: %d\n,
 			   normal_xacts);
 	}
+
+	if (throttle)
+	{
+		/* Report average transaction lag under throttling, i.e. the delay
+		   between scheduled and actual start times for the transaction.
+		   The measured lag may be linked to the thread/client load,
+		   the database load, or the Poisson throttling process.
+		   should it report the maximum encountered lag?
+		 */
+		int64 throttle_lag = 0;
+		int c;
+		for (c = 0; c  nclients; c++)
+			throttle_lag += clients[c].throttle_lag;
+		printf(average transaction lag: %.3f ms\n,
+			   0.001 * throttle_lag / normal_xacts);
+	}
+
 	printf(tps = %f (including connections establishing)\n, tps_include);
 	printf(tps = %f (excluding connections establishing)\n, tps_exclude);
 
@@ -2112,6 +2170,7 @@ main(int argc, char **argv)
 		{unlogged-tables, no_argument, unlogged_tables, 1},
 		{sampling-rate, required_argument, NULL, 4},
 		{aggregate-interval, required_argument, NULL, 5},
+		{throttle, required_argument, NULL, 'H'},
 		{NULL, 0, 

Re: [HACKERS] Combo xids

2013-06-01 Thread Martijn van Oosterhout
On Sat, Jun 01, 2013 at 09:22:05AM +0100, Simon Riggs wrote:
 When would this make sense?
 Frequently. Most of the time a tuple needs only one xid set. In most
 cases, we set xmin and xmax a long time apart. Very few cases end with
 both of them set inside the *same* xmin horizon. In a heavy
 transactional enviroment, the horizon moves forwards quickly, on the
 order of a few seconds. Very few rows get inserted and then
 updated/deleted that quickly. With long reporting queries, data tends
 to be updated less, so again the rows aren't touched within the same
 horizon. As a result, we hardly ever need both xmin and xmax at the
 same time - when we need to set xmax, xmin is already
 committed/cleaned.

Is this really true? Consider a long running query A and a tuple
created by B after A. If another transaction comes to update B you
can't throw away the xmin because you need it to prove that A can't see
the tuple.

Or is the idea to create multixacts for each combination of xmin/xmax
encountered? And the assumption is that there aren't that many? That
could be measured.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Christoph Moench-Tegeder
## Joe Conway (m...@joeconway.com):

 However there is a period of time after pg_start_backup() is first
 executed to when it completes, during which backup_label file does not
 exist yet, but the backup has essentially been started. Is there any
 way to detect this state?

When I did some research on the very same question just a few weeks
ago, I settled for external locks (lockfile, pg_advisory_lock(), ...,
depending on your exact situation) around the backup-taking code.

Regards,
Christoph

-- 
Spare Space


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Soroosh Sardari
Dear Hackers

I've created a new DB, and  a bunch of files created in base/12054, 12054
is oid of the new DB.
I want to find what table stored in each file.
BTW, I read this
http://www.postgresql.org/docs/9.2/interactive/storage-file-layout.html
I have 156 files with numerical names, vm and fsm file are ignored.
107 files are same as pg_class.reltoastrelid,so I have 49 files that I do
not know what tables stored in them.
Any idea to find ?

Thanks,
Soroosh


Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Martijn van Oosterhout
On Sat, Jun 01, 2013 at 03:11:50PM +0430, Soroosh Sardari wrote:
 Dear Hackers
 
 I've created a new DB, and  a bunch of files created in base/12054, 12054
 is oid of the new DB.
 I want to find what table stored in each file.
 BTW, I read this
 http://www.postgresql.org/docs/9.2/interactive/storage-file-layout.html
 I have 156 files with numerical names, vm and fsm file are ignored.
 107 files are same as pg_class.reltoastrelid,so I have 49 files that I do
 not know what tables stored in them.
 Any idea to find ?

From that page:

Each table and index is stored in a separate file. For ordinary
relations, these files are named after the table or index's filenode
number, which can be found in pg_class.relfilenode. 

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Soroosh Sardari
Yes, I have some files which is not in pg_class.relfilenode of any table or
index.
I want to know which table or index stored in such files.



 From that page:

 Each table and index is stored in a separate file. For ordinary
 relations, these files are named after the table or index's filenode
 number, which can be found in pg_class.relfilenode.

 Have a nice day,
 --
 Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
  He who writes carelessly confesses thereby at the very outset that he
 does
  not attach much importance to his own thoughts.
-- Arthur Schopenhauer

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)

 iQIVAwUBUanSAkvt++dL5i1EAQhn6hAAg9eiZEz2eV6Z/5f8ae56MNGwM5L1P6nU
 y2pN49PoSz0FkO3lBwcShH3/O0s+SgNy8kh6Klm1qDlwvX9HFGeRVd9guX7/fFil
 eu+Ueg5nVzXA4fb/NwjS+Hh1B+/NdJQnklddP6K4Pm0VW51wqaaFA3hn/CfNMiO2
 07i8L/NFjlngc5wstQLGcxuE5bl69c1qGhl8RHoOPLRhFgMSzkxSR9TglTDPaniu
 rptpWvHgfRYdorANBaSI3SByw8WeSPbrTHusX4XC5zVkIk7GZQiogQlQVRA7yBT6
 YpdjqB4thWDctR4VLv0yvBRJ5g7M9GkhWSOmpDoRBWCB2EFFPwrBhyrxt/e/aPCn
 +Nt1nFxtKGV4/tPW7cI9b4bv2OZctmOaoDByqAZUuB891eOebVjif9MsQeG5IWFb
 5KOnQcQ+TxlmCkF7zot5Tv8ndMTtJN8eKAkhay+xmLjON/2tGl+ArKbVAqck2oIb
 xGSavSLg6HZ/FmMNkbHVSo6/Z7Nmup2GGYsWWJhHvoO0hbGHCnxobAsWQGPUsC7l
 6osFCcBokvZtIERLttznP1S8RvmLP6EuByxNNQY4MV1GJm55P1PHZeWRGCYMEDil
 Fs73My0YxHBtnjI/LbgJ4GhKzINsQqviHJPFraKq8NdW/+B3Pte6bmtlRFa8Z/t+
 J6hjI9Wgky0=
 =68cp
 -END PGP SIGNATURE-




Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Martijn van Oosterhout
On Sat, Jun 01, 2013 at 03:27:40PM +0430, Soroosh Sardari wrote:
 Yes, I have some files which is not in pg_class.relfilenode of any table or
 index.
 I want to know which table or index stored in such files.

That shouldn't happen. Are you sure you're looking in the right
database? Kan you list the filenames?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Victor Yegorov
2013/6/1 Martijn van Oosterhout klep...@svana.org

 On Sat, Jun 01, 2013 at 03:27:40PM +0430, Soroosh Sardari wrote:
  Yes, I have some files which is not in pg_class.relfilenode of any table
 or
  index.
  I want to know which table or index stored in such files.

 That shouldn't happen. Are you sure you're looking in the right
 database? Kan you list the filenames?


According to the docs, it is possible if there had been
operations on the table that change filenode, like TRUNCATE.
Also, some some relations has relfilenode being 0, like pg_class catalogue
table.

Check more here:
http://www.postgresql.org/docs/current/interactive/storage-file-layout.html

It is recommended to use pg_relation_filenode() function:
http://www.postgresql.org/docs/current/interactive/functions-admin.html#FUNCTIONS-ADMIN-DBLOCATION


-- 
Victor Y. Yegorov


Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Andres Freund
On 2013-06-01 13:04:55 +0200, Martijn van Oosterhout wrote:
 On Sat, Jun 01, 2013 at 03:27:40PM +0430, Soroosh Sardari wrote:
  Yes, I have some files which is not in pg_class.relfilenode of any table or
  index.
  I want to know which table or index stored in such files.
 
 That shouldn't happen. Are you sure you're looking in the right
 database? Kan you list the filenames?

It's actually entirely normal. For some system tables the actual
relfilenode isn't stored in the system catalog but in the relation
mapper. Those are
a) tables needed to access the catalogs themselves like pg_class,
   pg_attribute, pg_proc, ..
b) shared tables where we otherwise couldn't change the relfilenode from
   another database

To get the actual relfilenode you actually need to do something like:
SELECT relname, pg_relation_filenode(pg_class.oid) FROM pg_class;


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Which table stored in which file in PGDATA/base/[db-oid]

2013-06-01 Thread Soroosh Sardari
Yes, I'm sure.

Difference of filenodes and new files changed w.r.t my first mail, because
I added a table.
I attached 3 files,
newfile.pg : list of numerical files in base/[db-oid], ls | grep
'[[:digit:]]\'
filenode.pg : select distinct relfilenode from pg_class
newfile-filenode.pg : Set of oids which exists in newfile.pg and does not
in filenode.pg


On Sat, Jun 1, 2013 at 3:34 PM, Martijn van Oosterhout klep...@svana.orgwrote:

 On Sat, Jun 01, 2013 at 03:27:40PM +0430, Soroosh Sardari wrote:
  Yes, I have some files which is not in pg_class.relfilenode of any table
 or
  index.
  I want to know which table or index stored in such files.

 That shouldn't happen. Are you sure you're looking in the right
 database? Kan you list the filenames?

 Have a nice day,



newfile.pg
Description: Binary data


filenode.pg
Description: Binary data


newfile-filenode.pg
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Simon Riggs
On 31 May 2013 21:06, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-31 22:53:14 +0300, Heikki Linnakangas wrote:
 On 31.05.2013 22:36, Andres Freund wrote:
 On 2013-05-31 22:29:45 +0300, Heikki Linnakangas wrote:
 Note that pg_is_in_backup() just checks for presence of
 $PGDATA/backup_label. Also note that pg_basebackup doesn't create
 backup_label in the server. It's included in the backup that's sent to the
 client, but it's never written to disk in the server. So checking for
 backup_label manually or with pg_is_in_backup() will return false even if
 pg_basebackup is running.
 
 Whoa. You are right, but I'd call that a bug. I don't understand why we
 aren't just checking
 XLogCtl-Insert.(nonExclusiveBackups||exlusiveBackup)?

 Well, depends on what you imagine the function is used for. If you think of
 it as will pg_start_backup() throw an error if I call it now, or do I
 need to call pg_stop_backup(), then the current behavior is correct.

 The manual says:
 pg_is_in_backup()boolTrue if an on-line exclusive backup is still 
 in progress.

 So clearly that is intentional.

 Well, just because it's intentional, doesn't mean its a good idea
 ;). There very well are reasons to check for in progress non-exclusive
 backups as well. You e.g. wouldn't want to restart the database while
 the weekly base backup of your 1TB database is in progress, just because
 it's done via the replication protocol.

 If we weren't in beta 1 already I'd vote for making it into:
 pg_backup_in_progress(OUT bool exclusive, OUT int non_exclusive) or
 similar. Perhaps we should do that anyway?

 That could use some rephrasing, though; a layman won't know what an
 exclusive backup is.

 True. Although I have to admit I can't come up with a succinct name for
 it it right now.

I see that this exact discussion has happened once before, after the
initial commit.

AFAICS nobody likes the fact that pg_is_in_backup() only covers
exclusive backups. The problem seems to be that we can't find a better
term.

But the problem remains that having a function called
pg_is_in_backup() that *clearly* does *not* do what it says, is a
problem. Yes, few people will understand what an exclusive backup is,
but that is a very good reason to not split hairs in the definition.

The way to resolve this is to have two functions:

 pg_is_in_backup()  - which covers both/all kinds of backup
 pg_is_in_exclusive_backup() - which covers just the exclusive backup mode

and some clear documentation that explains why the two functions are necessary.

Any objections to me committing those changes?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Simon Riggs
On 30 May 2013 14:33, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 Since we're bashing around ideas around freezing, let me write down the idea
 I've been pondering and discussing with various people for years. I don't
 think I invented this myself, apologies to whoever did for not giving
 credit.

 The reason we have to freeze is that otherwise our 32-bit XIDs wrap around
 and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but
 that would waste a lot space. The trick is to add a field to the page header
 indicating the 'epoch' of the XID, while keeping the XIDs in tuple header
 32-bit wide (*).

 The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we
 wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we
 could implicitly treat anything older than relfrozenxid as frozen.

 That's the basic idea. Vacuum freeze only needs to remove dead tuples, but
 doesn't need to dirty pages that contain no dead tuples.

I have to say this is pretty spooky. I'd not read hackers all week, so
I had no idea so many other people were thinking about freezing as
well. This idea is damn near identical to what I've suggested. My
suggestion came because I was looking to get rid of fields out of the
tuple header; which didn't come to much. The good news is that is
complete chance, so it must mean we're on the right track.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Simon Riggs
On 30 May 2013 19:39, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 30, 2013 at 9:33 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 The reason we have to freeze is that otherwise our 32-bit XIDs wrap around
 and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but
 that would waste a lot space. The trick is to add a field to the page header
 indicating the 'epoch' of the XID, while keeping the XIDs in tuple header
 32-bit wide (*).

 Check.

 The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we
 wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we
 could implicitly treat anything older than relfrozenxid as frozen.

 Check.

 That's the basic idea. Vacuum freeze only needs to remove dead tuples, but
 doesn't need to dirty pages that contain no dead tuples.

 Check.

Yes, this is the critical point. Large insert-only tables don't need
to be completely re-written twice.


 Since we're not storing 64-bit wide XIDs on every tuple, we'd still need to
 replace the XIDs with FrozenXid whenever the difference between the smallest
 and largest XID on a page exceeds 2^31. But that would only happen when
 you're updating the page, in which case the page is dirtied anyway, so it
 wouldn't cause any extra I/O.

 It would cause some extra WAL activity, but it wouldn't dirty the page
 an extra time.

 This would also be the first step in allowing the clog to grow larger than 2
 billion transactions, eliminating the need for anti-wraparound freezing
 altogether. You'd still want to truncate the clog eventually, but it would
 be nice to not be pressed against the wall with run vacuum freeze now, or
 the system will shut down.

 Interesting.  That seems like a major advantage.

 (*) Adding an epoch is inaccurate, but I like to use that as my mental
 model. If you just add a 32-bit epoch field, then you cannot have xids from
 different epochs on the page, which would be a problem. In reality, you
 would store one 64-bit XID value in the page header, and use that as the
 reference point for all the 32-bit XIDs on the tuples. See existing
 convert_txid() function for how that works. Another method is to store the
 32-bit xid values in tuple headers as offsets from the per-page 64-bit
 value, but then you'd always need to have the 64-bit value at hand when
 interpreting the XIDs, even if they're all recent.

 As I see it, the main downsides of this approach are:

 (1) It breaks binary compatibility (unless you do something to
 provided for it, like put the epoch in the special space).

 (2) It consumes 8 bytes per page.  I think it would be possible to get
 this down to say 5 bytes per page pretty easily; we'd simply decide
 that the low-order 3 bytes of the reference XID must always be 0.
 Possibly you could even do with 4 bytes, or 4 bytes plus some number
 of extra bits.

Yes, the idea of having a base Xid on every page is complicated and
breaks compatibility. Same idea can work well if we do this via tuple
headers.


 (3) You still need to periodically scan the entire relation, or else
 have a freeze map as Simon and Josh suggested.

I don't think that is needed with this approach.

(The freeze map was Andres' idea, not mine. I just accepted it as what
I thought was the only way forwards. Now I see other ways)

 The upsides of this approach as compared with what Andres and I are
 proposing are:

 (1) It provides a stepping stone towards allowing indefinite expansion
 of CLOG, which is quite appealing as an alternative to a hard
 shut-down.

I would be against expansion of the CLOG beyond its current size. If
we have removed all aborted rows and marked hints, then we don't need
the CLOG values and can trim that down.

I don't mind the hints, its the freezing we don't need.


 convert_txid() function for how that works. Another method is to store the
 32-bit xid values in tuple headers as offsets from the per-page 64-bit
 value, but then you'd always need to have the 64-bit value at hand when
 interpreting the XIDs, even if they're all recent.

You've touched here on the idea of putting the epoch in the tuple
header, which is where what I posted comes together. We don't need
anything at page level, we just need something on each tuple.

Please can you look at my recent post on how to put this in the tuple header?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Running pgindent

2013-06-01 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 OK.

Done.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 The way to resolve this is to have two functions:
  pg_is_in_backup()  - which covers both/all kinds of backup
  pg_is_in_exclusive_backup() - which covers just the exclusive backup mode

What will you do with pg_backup_start_time()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Placing hints in line pointers

2013-06-01 Thread Simon Riggs
Notes on a longer term idea...

An item pointer (also called line pointer) is used to allow an
external pointer to an item, while allowing us to place the tuple that
anywhere on the page. An ItemId is 4 bytes long and currently consists
of (see src/include/storage/itemid.h)...

typedef struct ItemIdData
 {
  unsignedlp_off:15,  /* offset to tuple (from start of page) */
  lp_flags:2, /* state of item pointer, see below */
  lp_len:15;  /* byte length of tuple */
 } ItemIdData;

The offset to the tuple is 15 bits, which is sufficient to point to
32768 separate byte positions, and hence why we limit ourselves to
32kB blocks.

If we use 4 byte alignment for tuples, then that would mean we
wouldn't ever use the lower 2 bits of lp_off, nor would we use the
lower 2 bits of lp_len. They are always set at zero. (Obviously, with
8 byte alignment we would have 3 bits spare in each, but I'm looking
for something that works the same on various architectures for
simplicity).

So my suggestion is to make lp_off and lp_len store the values in
terms of 4 byte chunks, which would allow us to rework the data
structure like this...

typedef struct ItemIdData
{
  unsignedlp_off:13,  /* offset to tuple (from start of
page), number of 4 byte chunks */
  lp_xmin_hint:2, /* committed and invalid hints for xmin */
  lp_flags:2, /* state of item pointer, see below */
  lp_len:13;  /* byte length of tuple, number of 4
byte chunks */
  lp_xmax_hint:2, /*committed and invalid hints for xmax */
} ItemIdData;

i.e. we have room for 4 additional bits and we use those to put the
tuple hints for xmin and xmax

Doing this would have two purposes:

* We wouldn't need to follow the pointer if the row is marked aborted.
This would save a random memory access for that tuple

* It would isolate the tuple hint values into a smaller area of the
block, so we would be able to avoid the annoyance of recalculating the
checksums for the whole block when a single bit changes.

We wouldn't need to do a FPW when a hint changes, we would only need
to take a copy of the ItemId array, which is much smaller. And it
could be protected by its own checksum.

(In addition, if we wanted, this could be used to extend block size to
64kB if we used 8-byte alignment for tuples)

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/31/2013 08:46 PM, Robert Haas wrote:
 On Wed, May 29, 2013 at 6:55 PM, Joe Conway m...@joeconway.com
 wrote:
 OK, simple enough. New patch attached. I still need to do some 
 testing to verify this does not break anything, but other than 
 that, any complaints (including the notion of backpatching
 this back to 9.1)?
 
 Here's a cleaned up version, which also includes documentation.
 I'll commit back to 9.1 in a day or two unless there are any
 objections.
 
 Changing SQL syntax in the back-branches isn't normally something
 we do, but I confess I don't see any real reason not to do it in
 this case.

That was part of my hesitation, but I don't see any better way to fix
existing installations and this is pretty well self-contained. Any
other opinions out there?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqgvcAAoJEDfy90M199hlmG0P/0LmkNBvMrMqASA4zyhtKGTG
3Wd+/wC2cHPrfVqFmEKsuCStWTiQxzdcNGgPBfzdg5QskB8xcAr81ggH3mW5ldHE
Gnz9ZJ6LaAWeqAg0IjIir2spQmZbNfPc9BY+vnTQAoSPmJwoXgFLnJSdW8+5JrLR
qwrRv3f6jJzYPXYdSXu91fDCwNi7mZmcqjJRtjO58xI+hcrNsKMjGnloryeifrVP
N1ZI2vrPiwUBmKR01RTjjfTjCA1iBxwABLbzknO4hNchE7l8ghcXmE/K5Zkaj8E4
QXQk/dx5EzXlKtOqBpKh2QpZZDoKD1NAR9u+SSsbjjdgzXM+L3SkslvlisbCEbrH
HwYys2honEk38SzxeDeqpmLBDmEqccfuq/VIqe82szbusn58kmq7RbU/veScIwkA
5eAOzi+YbbaK1ThS2CZKrt9DqhUgaIhj66X7+bmhusPxG1cQyGnV8Tetol50Hyo4
6unkqiQhr4qfXwDtrUDDtdBxTiFWsIwXCe3zytp9J6HStHN1OjGfjDM8Mu71wwiH
44PqYnugaJff7I6fLC+qDWX5VD5i+7gSm8/Q7awt1hk7L5gLsFU6qQmJVkpY2HJs
RQ3G2aoB2pKyvnbeYvFb9Ny1LH2I8gnUW0vXZ69T7ecvRLm4IC4wmFc3SGmUXP+x
xbRWb6LW+RXPhsZYooKQ
=RP/c
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 On 05/31/2013 08:46 PM, Robert Haas wrote:
 Changing SQL syntax in the back-branches isn't normally something
 we do, but I confess I don't see any real reason not to do it in
 this case.

 That was part of my hesitation, but I don't see any better way to fix
 existing installations and this is pretty well self-contained. Any
 other opinions out there?

I don't like this approach much.

1. It does nothing to fix the issue in *existing* databases, which
won't have pg_depend entries like this.

2. In general, we have assumed that properties of tables, such as
indexes and constraints, cannot be independent members of extensions.
It's not clear to me why rules should be different.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/01/2013 02:43 AM, Christoph Moench-Tegeder wrote:
 ## Joe Conway (m...@joeconway.com):
 
 However there is a period of time after pg_start_backup() is
 first executed to when it completes, during which backup_label
 file does not exist yet, but the backup has essentially been
 started. Is there any way to detect this state?
 
 When I did some research on the very same question just a few
 weeks ago, I settled for external locks (lockfile,
 pg_advisory_lock(), ..., depending on your exact situation) around
 the backup-taking code.

Right, and an external lockfile is good except there is a race condition.

Proc1, t0) SELECT pg_start_backup(...) -- command starts to execute

Proc2, t1) lay down a lockfile (or other mechanism)

Proc2, t2) check for running backup by looking for backup_label

Proc1, t3) SELECT pg_start_backup(...) -- command finishes,
   -- backup_label created

So you are forced to sleep for some arbitrary time just in case
pg_start_backup() has started but not completed at the point when you
try to lock out the backup.

Or better (at least my solution), is you check pg_stat_activity for
the presence of a running SELECT pg_start_backup(...) command prior
to checking for backup_label. I imagine there is still a small race
condition there but at least much smaller.


Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqg8eAAoJEDfy90M199hlm2wP/ihMBBOvhT+fbXHcZj6A4wW9
/d3N2KZ6tRZE6Sb6W6ouy8jdMjgQsVfgrfHh7Ts0NS1cu6k2mTgbjxfi2JygjFzh
PLpj4b9cRMYRLaeEj5Ik69HM/IMg6gLW5WM2hRrRiXopbxXIK8qR37rQmjWM2aYU
OJ+qlKKx1K5d/VS6bmxveprAMzplR/U9OhEOteFo+A2ODl7Vsd9wL2NIa6DyiVvq
BsM0QEJjkFZSkKvrZyWctO6v6j/ccNgo2xUJMyPAOVFxabOTw1CMUjdVDM9Im61/
c+AfchWCVHaLOGMY7KlGUmNhHuWcIY23u2sucn5JMpNbOmJRyexwsXCYIkBZXZdP
OpaMq1w37aY2HwtrSpAgzUditQqoMjbq0PVgwoTu8P+pYbwToNclXW/TGq2zeDA3
mKWUCGGbSKfjoQks0yMNho05YIJkCkZeTDRMTuXN6k2Gf3WgqRyNwDfjnT0+YFZn
Su93JZ5gE/vDugf7o47OeyrLTXcqVt3WgTCae7A70Vi2nenq6jWVCCKqTz9E7Ct6
I3Vhjal0dxpd6pi0sfI6msRAnPKoxfu9vjXdDuRf+NbzxpG8Gwb+HDaZzE/ffqz8
/473B/ZgNqCIXd9/loCTVdnewSaUDNuGqNxmmCMtFpmEC1SZ0zZZhImeLQFkA17k
mwSGNqxchm8J/4ExM/n9
=x1tN
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Andres Freund
On 2013-06-01 08:11:26 -0700, Joe Conway wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 06/01/2013 02:43 AM, Christoph Moench-Tegeder wrote:
  ## Joe Conway (m...@joeconway.com):
  
  However there is a period of time after pg_start_backup() is
  first executed to when it completes, during which backup_label
  file does not exist yet, but the backup has essentially been
  started. Is there any way to detect this state?
  
  When I did some research on the very same question just a few
  weeks ago, I settled for external locks (lockfile,
  pg_advisory_lock(), ..., depending on your exact situation) around
  the backup-taking code.
 
 Right, and an external lockfile is good except there is a race condition.
 
 Proc1, t0) SELECT pg_start_backup(...) -- command starts to execute
 
 Proc2, t1) lay down a lockfile (or other mechanism)
 
 Proc2, t2) check for running backup by looking for backup_label
 
 Proc1, t3) SELECT pg_start_backup(...) -- command finishes,
-- backup_label created
 
 So you are forced to sleep for some arbitrary time just in case
 pg_start_backup() has started but not completed at the point when you
 try to lock out the backup.

Uh. Why would you do the lock(file) thingy *after* calling
pg_start_backup? You should do lock before calling start backup and
remove the lock after calling stop backup. In that case I don't see
where the race condition is?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Andres Freund
On 2013-06-01 11:07:53 -0400, Tom Lane wrote:
 Joe Conway m...@joeconway.com writes:
  On 05/31/2013 08:46 PM, Robert Haas wrote:
  Changing SQL syntax in the back-branches isn't normally something
  we do, but I confess I don't see any real reason not to do it in
  this case.
 
  That was part of my hesitation, but I don't see any better way to fix
  existing installations and this is pretty well self-contained. Any
  other opinions out there?
 
 I don't like this approach much.
 
 1. It does nothing to fix the issue in *existing* databases, which
 won't have pg_depend entries like this.

Well, you can now write an extension upgrade script that adds the
missing dependencies. To me that sounds better than letting it fiddle
with pg_depend.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/01/2013 08:07 AM, Tom Lane wrote:
 Joe Conway m...@joeconway.com writes:
 On 05/31/2013 08:46 PM, Robert Haas wrote:
 Changing SQL syntax in the back-branches isn't normally
 something we do, but I confess I don't see any real reason not
 to do it in this case.
 
 That was part of my hesitation, but I don't see any better way to
 fix existing installations and this is pretty well
 self-contained. Any other opinions out there?
 
 I don't like this approach much.
 
 1. It does nothing to fix the issue in *existing* databases, which 
 won't have pg_depend entries like this.

The grammar change does allow the pg_depend entries to be added
through ALTER EXTENSION. It works perfectly.

 2. In general, we have assumed that properties of tables, such as 
 indexes and constraints, cannot be independent members of
 extensions. It's not clear to me why rules should be different.

I can look at having pg_dump ignore these entries, but I suspect that
will be quite a bit more invasive.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqhHeAAoJEDfy90M199hlCwwP/215zTz6F1/pPDUowppEjQfd
YNCeufgm9ZpcycOjhz/wBFGSaOcPOn5eoBwcYC6XqVGDemU8MVUENcpydq2ltRzl
ks5o1LZsWRnYh594v3Wi6K0neQ9G4qx9Lx03k9RdE7TWVdnu4JziQb6BNPEyfa+D
9kCt6tHXOv2xYwr2FeVieH6dlDpEwScFSG59nUlE2mM7i9/eM7cuiCw7EJZpXJuD
48hkcEZkYlBZAAL6JAErEqMkl8bEB8JEk2s/YkoH8W/qkZrnd21k8IeHPcWBh2DH
2vVJBGBLZL2wYEMT1Qu5phiYhlUoXnHgIHPPVPeLl3Vx2U6D+00vswuwmKKD2T1/
aAgdQOX8ubNr9GJfAeBZ/GeLdAqr4sei1lzxM2LJkVmu7szE+6DYXyFvB3kO3Fxh
IXxDmhCWv7OeKPMo5OGjV3/Vjzxsxx85BDsz/TFhIyNaKvzz2UacspcKlZ51Sr6i
5FYSRPII8g3FPtt1/8ed/Js9ZQOscqrUw/gxrttexGs1I4R8ZooUesD2pSrED9wn
CKYY5AYWihH3cugyUZbqOOBTVEtgFpzKCo1p8zLUizTnlR6pnrRTWt+7/0Q+rSw/
SXnCwbeE4aaESghTOSb6P5l9JkCIbprz+URWYYmHWm6k0CnMUuKtM1syYBkdx+ew
Mptd78Ya3Wl6rRkC0pa5
=8g4V
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/01/2013 08:14 AM, Andres Freund wrote:
 On 2013-06-01 08:11:26 -0700, Joe Conway wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 On 06/01/2013 02:43 AM, Christoph Moench-Tegeder wrote:
 ## Joe Conway (m...@joeconway.com):
 
 However there is a period of time after pg_start_backup() is 
 first executed to when it completes, during which
 backup_label file does not exist yet, but the backup has
 essentially been started. Is there any way to detect this
 state?
 
 When I did some research on the very same question just a few 
 weeks ago, I settled for external locks (lockfile, 
 pg_advisory_lock(), ..., depending on your exact situation)
 around the backup-taking code.
 
 Right, and an external lockfile is good except there is a race
 condition.
 
 Proc1, t0) SELECT pg_start_backup(...) -- command starts to
 execute
 
 Proc2, t1) lay down a lockfile (or other mechanism)
 
 Proc2, t2) check for running backup by looking for backup_label
 
 Proc1, t3) SELECT pg_start_backup(...) -- command finishes, --
 backup_label created
 
 So you are forced to sleep for some arbitrary time just in case 
 pg_start_backup() has started but not completed at the point when
 you try to lock out the backup.
 
 Uh. Why would you do the lock(file) thingy *after* calling 
 pg_start_backup? You should do lock before calling start backup
 and remove the lock after calling stop backup. In that case I don't
 see where the race condition is?

No, the point is I have no control over Proc1. I am trying to prevent
Proc2 from running concurrently with a binary backup (Proc1). So I
need to lock out Proc1, or detect it is running and wait for it to finish.

The problem is that in between t0 and t3 there is no good way to
determine that Proc1 has started its backup because backup_label does
not yet exist, and there is no other evidence besides pg_stat_activity
(which as I said is better than nothing but probably not perfect).

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqhLuAAoJEDfy90M199hlfY4P/jiuiSODmJs9xphXAcs1Xo2k
hVFTDWTYysFYr9hhnTyWlv9B60h9f/nYu/EeXXyOlynSs/DOUpWNFDJcF/t/SopI
D4es+F+LFoSnIeeyNPu4lYzBeQ4tHFh7KqHYGd640eCaTDW2O2uqu89R8sMbhMQs
HXSGAa8N8Vmy/+js1xSfMHA/8qK2QGkEAxU/IJmYPKn+QYmHh3iyQ+9rDTVG7ghM
+3FC7EPtv9jsXPHczyT3qwcAy76DNjKbET9z3GsDc9qDIZezFRlh5mALXHPf6Puj
X1Bk8okZQj45bMx3DkxhGawae045O6nbUPYIwVTujAWB+1rqbKJFW2qSmp8DRL9u
w+k0kbRB++wu3QNxNx5GOKuHJwsVVU5CBhMLqPOPrBwYCZtAJPY7llCPUQclJ91A
QSe1lcEXSlQ0vlva4oC1ksEWfSCedudAwFkbiexKUu48FHQsSQrlbE46zbAKSL40
XzYCPN1eT/MsxAeXiV0ehtvEeiiqQU3aQifrOcIZiBquZPQP6I/kQ1WbdMJ9zChD
Wff8+J6nMbFFetfxIfkMdJIaKuMYetnZ0NkHiHORzESGqIc5Rm5t4gzA4bHNiVg9
jDNDSmnsR8FujfkhySrQcZTmV/a+y7p1mLJlpyKNF/OX55LdRA97dwIbpcutXScW
DmpFiM+molu2bLPd17H5
=jjJq
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Andres Freund
On 2013-06-01 08:27:42 -0700, Joe Conway wrote:
  Uh. Why would you do the lock(file) thingy *after* calling 
  pg_start_backup? You should do lock before calling start backup
  and remove the lock after calling stop backup. In that case I don't
  see where the race condition is?
 
 No, the point is I have no control over Proc1. I am trying to prevent
 Proc2 from running concurrently with a binary backup (Proc1). So I
 need to lock out Proc1, or detect it is running and wait for it to finish.

Backups over which you don't have control sound a bit scary ;). I think
at that point you have a race condition no matter what since the backup
could be started between your check and when you call pg_start_backup
anyay. So just calling pg_start_backup and handling the error properly
sounds like the way to go in that case.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-06-01 11:07:53 -0400, Tom Lane wrote:
 I don't like this approach much.
 
 1. It does nothing to fix the issue in *existing* databases, which
 won't have pg_depend entries like this.

 Well, you can now write an extension upgrade script that adds the
 missing dependencies. To me that sounds better than letting it fiddle
 with pg_depend.

Per my point #2, that would be the wrong solution, quite aside from the
wrongness of dumping the fixup burden on the extension author.  For one
thing, the extension author has no idea whether his script is being
loaded into a database that has this patch.  If it doesn't, adding a
command like this would cause the script to fail outright.  If it does,
then the command is unnecessary, since the patch also includes a code
change that adds the dependency.

But in any case, making rules act differently from other table
properties for this purpose seems seriously wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Andres Freund
On 2013-06-01 11:31:05 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-06-01 11:07:53 -0400, Tom Lane wrote:
  I don't like this approach much.
  
  1. It does nothing to fix the issue in *existing* databases, which
  won't have pg_depend entries like this.
 
  Well, you can now write an extension upgrade script that adds the
  missing dependencies. To me that sounds better than letting it fiddle
  with pg_depend.
 
 Per my point #2, that would be the wrong solution, quite aside from the
 wrongness of dumping the fixup burden on the extension author.  For one
 thing, the extension author has no idea whether his script is being
 loaded into a database that has this patch.  If it doesn't, adding a
 command like this would cause the script to fail outright.  If it does,
 then the command is unnecessary, since the patch also includes a code
 change that adds the dependency.

 But in any case, making rules act differently from other table
 properties for this purpose seems seriously wrong.

What's your proposal to fix this situation then?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/01/2013 08:32 AM, Andres Freund wrote:
 On 2013-06-01 11:31:05 -0400, Tom Lane wrote:
 But in any case, making rules act differently from other table 
 properties for this purpose seems seriously wrong.
 
 What's your proposal to fix this situation then?

I gather Tom would rather see this handled by filtering in pg_dump.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqhWfAAoJEDfy90M199hlZ4oQAIBha6LI6cCtPUTPgh4JT0Jy
oUH/+TWZbVUxe2AMMskxDh65DhUjTpHliEQiM6Eyd6gGx9icSKxHMo7pfvfqNIZi
kZQeY2x0un1RRd1yyydNuZYKk9cJWOzTJl+OaGCVUlVAnre1hs6ykkeWVotRDvJz
jMMs+XasEIr7MNNIbJqKGKNkiSD53gOOybouxzgqitsf/6+qp4DTmHgDurzptxgD
HpskiKBJkA7Trb5Xukd6SrhajzYVaF8+DAHzBaZBwKXvg/wr4JN1NEHpr8O3+itP
iIWbnR2iGxgkFRTvwwiJx+Phc2BJIVRwBzAyN4AAaiM7WykX14ztmyIQf5cEUNF5
abpFxMedMq0yATwU/5XBZ/HPLkCRv9mK+6zQUXrQ0rd2KaM/wMDA1DdpfH9C0vd7
6MPUNrq8U2V3UxJudMx59wnQMVSDoVNuxPn+wRBnUtpyIorwYIOVybRt/T3/F4tm
6UCoaBtGF4EWvdxtBpr9B9rl/xSc/JxMr6TNV+3S7EITg/QUbqKlcsMvza16PIXu
cFPKuI4VeKyKRt7bTV9hE0HRqL15qsnOQhZZ9aSH9kcqozpWCglHmWLiUljvIRtt
z4OMKXPOaayZWFLfex/dFd+AXE396sLBgadKCr5Y+L0U08SNCVVAXK9k84zwJOcy
MhOClw1EUQ9WTGaFmMfP
=FALs
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 I can look at having pg_dump ignore these entries, but I suspect that
 will be quite a bit more invasive.

Actually, I believe the answer is just that getSchemaData() is doing
things in the wrong order:

if (g_verbose)
write_msg(NULL, reading rewrite rules\n);
getRules(fout, numRules);

/*
 * Identify extension member objects and mark them as not to be dumped.
 * This must happen after reading all objects that can be direct members
 * of extensions, but before we begin to process table subsidiary objects.
 */
if (g_verbose)
write_msg(NULL, finding extension members\n);
getExtensionMembership(fout, extinfo, numExtensions);

Per that comment, getRules() should be called down where indexes,
constraints, and triggers are processed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Tom Lane
I wrote:
 Actually, I believe the answer is just that getSchemaData() is doing
 things in the wrong order:

BTW, I'm inclined to think it's also wrong that the getEventTriggers()
call was just added at the end; those things are certainly not table
subsidiary objects.  I don't know if we allow event triggers to be
extension members, but if we did, that call would have to occur before
getExtensionMembership().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-06-01 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/01/2013 09:39 AM, Tom Lane wrote:
 I wrote:
 Actually, I believe the answer is just that getSchemaData() is
 doing things in the wrong order:
 
 BTW, I'm inclined to think it's also wrong that the
 getEventTriggers() call was just added at the end; those things are
 certainly not table subsidiary objects.  I don't know if we allow
 event triggers to be extension members, but if we did, that call
 would have to occur before getExtensionMembership().

Thanks! I'll have a look.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRqiWWAAoJEDfy90M199hlvAgP+gNNd+Vf70i4ecANCffYIqa7
PrvuKBAP/ZwWwISO7G/T5JbmKrhsySsrVWCQqSdIj62eLzkgbToqIbQuygcUjg3t
SfwjxuSqf8P8oR+LWkKnU2pcY/WbpdnmJHYO0e6Y+Fn2I/OP3yImkUm+2O9nMflI
v0M2qxcVXu1/aUnfdU7+BLZli0S+gUp+FoiO9O+YaAzK7jCyg3q0QbCXLh9ygEim
5ABZCONiTbH3eALRHfeD1uBHAU60gdbiFPwSK7zBCW9FzVKbU5IFV1qIl4oKTlzo
rpvgXnJ7r1EngyHUnXZfTltnhCN6joOiRA3GLDflA0e+f933zJf/KXnRzRcjb1+H
vpmtKWdM9qnH7XcNQYe9EJXkLEcktctgDs01cgaFBy1EK6qEjFD7FAbfRMMpLVTp
4/HIQX62SezGJTzCW06Qz6/Gt2ycJ5HtLmrEmYrzhOjN+ZEsaZBr3ad/nU/zExEZ
TzNF8tX9SJzgEFd87x1Xz1pNeX+ewJ8uI87aqyWTIeHPG3GjjFUKehVYmBPGRawl
bWrGYo1G65b0h3jvSzAFEL82e0wzaEoxXyoBuogaefNohrCNrpiKUIfPwjuCimC0
MCGgmS8Kj76sqw/MTq2vcSY2ynEgse1O0weWI3sDM/M/dCvIQSCtNqfvTWL+PISL
01MhTFzyWFQa5E5206w/
=imAT
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum, Freeze and Analyze: the big picture

2013-06-01 Thread Simon Riggs
On 30 May 2013 19:48, Josh Berkus j...@agliodbs.com wrote:

 There's currently some great ideas bouncing around about eliminating the
 overhead associated with FREEZE.  However, I wanted to take a step back
 and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

That is a very commendable approach. We should do that more often.

 The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

When you say stuff like that, you should add speculating from my
personal experience. People might get the impression you'd measured
this somehow and it could confuse the issue if you try to assemble a
high level viewpoint and then add in factoids that are just opinions.

We should strive to measure such things.

 That's considerably better than was the case 5 years ago, when vacuum
 management was a daily or weekly responsibility for nearly 100% of our
 users, but it's still not good enough.  Our target should be that only
 those with really unusual setups should have to *ever* think about
 vacuum and analyze.

I think that's where we already are given that 1000s of users have
quite small databases.

The problem increases with scale. Larger databases have bigger
problems and make it easier to notice things are happening.

I think you should mention that the evidence for these issues is
anecdotal and take careful notes of the backgrounds in which they
occurred. Saying things occur in all cases wouldn't be accurate or
helpful to their resolution.

We should be seeking to contrast this against other databases to see
if we are better or worse than other systems. For example, recording
the moans of someone who is currently managing a 1 TB database, but
yet hasn't ever managed anything else that big is less valuable than a
balanced, experienced viewpoint (if such exists).

Anyway, I support this approach, just wanted to make sure we do it in
sufficient detail to be useful.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of freezing

2013-06-01 Thread Simon Riggs
On 28 May 2013 15:15, Robert Haas robertmh...@gmail.com wrote:
 On Sat, May 25, 2013 at 6:14 AM, Simon Riggs si...@2ndquadrant.com wrote:

 I think the right way is actually to rethink and simplify all this
 complexity of Freezing/Pruning/Hinting/Visibility

 I agree, but I think that's likely to have to wait until we get a
 pluggable storage API, and then a few years beyond that for someone to
 develop the technology to enable the new and better way.  In the
 meantime, if we can eliminate or even reduce the impact of freezing in
 the near term, I think that's worth doing.

I think we can do better more quickly than that.

Andres' basic idea of skipping freeze completely was a valuable one
and is the right way forwards. And it looks like the epoch based
approach that Heikki and I have come up seems likely to end up
somewhere workable.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] detecting binary backup in progress

2013-06-01 Thread Simon Riggs
On 1 June 2013 15:45, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The way to resolve this is to have two functions:
  pg_is_in_backup()  - which covers both/all kinds of backup
  pg_is_in_exclusive_backup() - which covers just the exclusive backup mode

 What will you do with pg_backup_start_time()?

Hmm, at least all of those functions use the backup name
consistently. I guess I wasn't suggesting we rename pg_start_backup()
to pg_start_exclusive_backup(), so maybe it makes sense.

pg_start_backup() talks about an online backup, while
pg_is_in_backup() talks about an exclusive backup. Minimum change here
would be to make pg_start_backup talk about an exclusive backup also.

What we need is a function that says whether it is possible to
shutdown because of a backup, or not. pg_basebackup is an active task,
whereas an exclusive backup never is. So we need a function to tell us
that although nothing else is running, but we are running an exclusive
backup. So changing pg_is_in_backup() to refer to all kinds of backup
still allows it to be used for its primary purpose - to tell whether
its OK to shutdown or not, but it also makes it clearer.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combo xids

2013-06-01 Thread Heikki Linnakangas

On 01.06.2013 11:22, Simon Riggs wrote:

What is the benefit?
Merging xmin/xmax would save 4 bytes per row. On servers with 8 byte
word length, that means that we'd save 8 bytes per row for tables that
have between 9 and 40 columns. Which is the majority of tables.


Hmm, it would probably be much easier to squeeze, say, one byte from the 
tuple header, than full four bytes. Then you could store store a null 
bitmap for upto 16 columns, without crossing the 24 byte mark. That 
would already get you much of the benefit.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combo xids

2013-06-01 Thread Simon Riggs
On 1 June 2013 19:25, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Hmm, it would probably be much easier to squeeze, say, one byte from the
 tuple header, than full four bytes. Then you could store store a null bitmap
 for upto 16 columns, without crossing the 24 byte mark. That would already
 get you much of the benefit.

It seemed worth recording the idea, though I agree that now its
recorded its not the best idea for reducing table size.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] COPY .... (FORMAT binary) syntax doesn't work

2013-06-01 Thread Simon Riggs
On 27 May 2013 15:31, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 26 May 2013 17:10, Tom Lane t...@sss.pgh.pa.us wrote:
 More readable would be to invent an intermediate nonterminal falling
 between ColId and ColLabel, whose expansion would be IDENT |
 unreserved_keyword | col_name_keyword | type_func_name_keyword, and
 then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
 Any thoughts about a name for that new nonterminal?

 Do you think complicating the parser in that way is worth the trouble
 for this case? Could that slow down parsing?

 It makes the grammar tables a bit larger (1% or so IIRC).  There would
 be some distributed penalty for that, but probably not much.  Of course
 there's always the slippery-slope argument about that.

 We don't actually have to fix it; clearly not too many people are
 bothered, since no complaints in 3 years. Documenting 'binary' seems
 better.

 Well, my thought is there are other cases.  For instance:

 regression=# create role binary;
 ERROR:  syntax error at or near binary
 LINE 1: create role binary;
 ^
 regression=# create user cross;
 ERROR:  syntax error at or near cross
 LINE 1: create user cross;
 ^

 If we don't have to treat type_func_name_keywords as reserved in these
 situations, shouldn't we avoid doing so?


Seems reasonable argument, so +1. Sorry for delayed reply.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Heikki Linnakangas

On 31.05.2013 06:02, Robert Haas wrote:

On Thu, May 30, 2013 at 2:39 PM, Robert Haasrobertmh...@gmail.com  wrote:

Random thought: Could you compute the reference XID based on the page
LSN?  That would eliminate the storage overhead.


After mulling this over a bit, I think this is definitely possible.
We begin a new half-epoch every 2 billion transactions.  We remember
the LSN at which the current half-epoch began and the LSN at which the
previous half-epoch began.  When a new half-epoch begins, the first
backend that wants to stamp a tuple with an XID from the new
half-epoch must first emit a new half-epoch WAL record, which
becomes the starting LSN for the new half-epoch.


Clever! Pages in unlogged tables need some extra treatment, as they 
don't normally have a valid LSN, but that shouldn't be too hard.



We define a new page-level bit, something like PD_RECENTLY_FROZEN.
When this bit is set, it means there are no unfrozen tuples on the
page with XIDs that predate the current half-epoch.  Whenever we know
this to be true, we set the bit.  If the page LSN crosses more than
one half-epoch boundary at a time, we freeze the page and set the bit.
  If the page LSN crosses exactly one half-epoch boundary, then (1) if
the bit is set, we clear it and (2) if the bit is not set, we freeze
the page and set the bit.


Yep, I think that would work. Want to write the patch, or should I? ;-)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Simon Riggs
On 1 June 2013 19:48, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 31.05.2013 06:02, Robert Haas wrote:

 On Thu, May 30, 2013 at 2:39 PM, Robert Haasrobertmh...@gmail.com
 wrote:

 Random thought: Could you compute the reference XID based on the page
 LSN?  That would eliminate the storage overhead.


 After mulling this over a bit, I think this is definitely possible.
 We begin a new half-epoch every 2 billion transactions.  We remember
 the LSN at which the current half-epoch began and the LSN at which the
 previous half-epoch began.  When a new half-epoch begins, the first
 backend that wants to stamp a tuple with an XID from the new
 half-epoch must first emit a new half-epoch WAL record, which
 becomes the starting LSN for the new half-epoch.


 Clever! Pages in unlogged tables need some extra treatment, as they don't
 normally have a valid LSN, but that shouldn't be too hard.

I like the idea of using the LSN to indicate the epoch. It saves any
other work we might consider, such as setting page or tuple level
epochs.


 We define a new page-level bit, something like PD_RECENTLY_FROZEN.
 When this bit is set, it means there are no unfrozen tuples on the
 page with XIDs that predate the current half-epoch.  Whenever we know
 this to be true, we set the bit.  If the page LSN crosses more than
 one half-epoch boundary at a time, we freeze the page and set the bit.
   If the page LSN crosses exactly one half-epoch boundary, then (1) if
 the bit is set, we clear it and (2) if the bit is not set, we freeze
 the page and set the bit.


 Yep, I think that would work. Want to write the patch, or should I? ;-)

If we set a bit, surely we need to write the page. Isn't that what we
were trying to avoid?

Why set a bit at all? If we know the LSN of the page and we know the
epoch boundaries, then we can work out when the page was last written
to and infer that the page is virtually frozen.

As soon as we make a change to a virtually frozen page, we can
actually freeze it and then make the change.

But we still have the problem of knowing which pages have been frozen
and which haven't.

Can we clear up those points first? Or at least my understanding of them.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Robert Haas
On Sat, Jun 1, 2013 at 2:48 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 We define a new page-level bit, something like PD_RECENTLY_FROZEN.
 When this bit is set, it means there are no unfrozen tuples on the
 page with XIDs that predate the current half-epoch.  Whenever we know
 this to be true, we set the bit.  If the page LSN crosses more than
 one half-epoch boundary at a time, we freeze the page and set the bit.
   If the page LSN crosses exactly one half-epoch boundary, then (1) if
 the bit is set, we clear it and (2) if the bit is not set, we freeze
 the page and set the bit.

 Yep, I think that would work. Want to write the patch, or should I? ;-)

Have at it.  I think the tricky part is going to be figuring out the
synchronization around half-epoch boundaries.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Robert Haas
On Sat, Jun 1, 2013 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If we set a bit, surely we need to write the page. Isn't that what we
 were trying to avoid?

No, the bit only gets set in situations when we were going to dirty
the page for some other reason anyway.  Specifically, if a page
modification discovers that we've switched epochs (but just once) and
the bit isn't already set, we can set it in lieu of scanning the
entire page for tuples that need freezing.

Under this proposal, pages that don't contain any dead tuples needn't
be dirtied for freezing, ever.  Smells like awesome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimising Foreign Key checks

2013-06-01 Thread Noah Misch
On Sat, Jun 01, 2013 at 09:41:13AM +0100, Simon Riggs wrote:
 FK checks can be expensive, especially when loading large volumes of
 data into an existing table or partition. A couple of ideas for
 improving performance are discussed here:
 
 1. Use Case: Bulk loading
 COPY pgbench_accounts;  -- references pgbench_branches with many
 repeated values
 
 Proposal: Transactions that need multiple checks can be optimised by
 simply LOCKing the whole referenced table, once. We can then hold the
 referenced table as a Hash, like we do with a Hash Join (its almost
 exactly the same thing). This works in two ways: it speeds up checks
 and it also reduces the locking overhead.

 2. Use Case: Transactional repetition
 BEGIN;
 INSERT INTO order VALUES (ordid, )
 INSERT INTO order_line VALUES (ordid, 1, .)
 INSERT INTO order_line VALUES (ordid, 2, .)
 INSERT INTO order_line VALUES (ordid, 3, .)
 INSERT INTO order_line VALUES (ordid, 4, .)

 Proposal: De-duplicate multiple checks against same value. This would
 be implemented by keeping a hash of rows that we had already either
 inserted and/or locked as the transaction progresses, so we can use
 the hash to avoid queuing up after triggers.

I find (2) more promising than (1).  It helps automatically, and it helps in
more cases.  The main advantage of (1) is avoiding the writes of tuple locks
onto the PK table.  Therefore, I expect (1) to distinguish itself over (2)
when the referenced values are _not_ repeated too much.  If referenced values
are repeated, tuple locking costs would tend to disappear into the noise after
the deduplication of (2).

 In both cases we are building up a local hash table with values and
 then using those values to avoid queuing constraint triggers. So code
 is similar for both.
 
 Thoughts?

Will this add too much cost where it doesn't help?  I don't know what to
predict there.  There's the obvious case of trivial transactions with no more
than one referential integrity check per FK, but there's also the case of a
transaction with many FK checks all searching different keys.  If the hash hit
rate (key duplication rate) is low, the hash can consume considerably more
memory than the trigger queue without preventing many RI queries.  What sort
of heuristic could we use to avoid pessimizing such cases?

Same-transaction UPDATE or DELETE of the PK table, as well as subtransaction
abort, potentially invalidates hash entries.  I recommend thinking relatively
early about how best to handle that.


Before deciding what to think overall, I needed to see a benchmark.  I ran
this simple one based on your scenarios:

BEGIN;
CREATE TABLE order (orderid int PRIMARY KEY);
CREATE TABLE order_line (
orderid int,
lineno int,
PRIMARY KEY (orderid, lineno),
FOREIGN KEY (orderid) REFERENCES order
);
INSERT INTO order VALUES (1);
INSERT INTO order_line SELECT 1, n FROM generate_series(1,100) t(n);
ROLLBACK;

See attached output from perf report -s parent -g graph,5,caller; I suggest
browsing under less -S.  It confirms that the expensive part is something
your proposals would address.


A different way to help the bulk loading case would be to lock more keys with
a single query.  Currently, we issue a query like this for every INSERTed row:

  SELECT 1 FROM ONLY pktable WHERE pkcol = $1 FOR KEY SHARE OF x

We could instead consider queued tuples in batches:

  SELECT 1 FROM ONLY pktable WHERE pkcol = ANY (ARRAY[$1,$2,...,$100]) FOR KEY 
SHARE OF x

This would have the advantage of not adding a long-lived, potentially-large
data structure and not depending on the rate of referenced key duplication.
But it would not help non-bulk scenarios.  (However, the user could make your
example for (2) become a bulk scenario by deferring the FK constraint.)

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
# Events: 20K cycles
#
# Overhead  Parent symbol
#   .
#
   100.00%  [other]  
|
--- (nil)
   |  
--96.88%-- __libc_start_main
  generic_start_main.isra.0
  |  
   --96.77%-- main
 PostmasterMain
 |  
  --95.62%-- PostgresMain
|  
 --95.51%-- PortalRun
   PortalRunMulti
   |  
--95.50%-- 
ProcessQuery
  | 
 
  
|--71.37%-- standard_ExecutorFinish
  

Re: [HACKERS] Vacuum, Freeze and Analyze: the big picture

2013-06-01 Thread Robert Haas
On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 30 May 2013 19:48, Josh Berkus j...@agliodbs.com wrote:

 There's currently some great ideas bouncing around about eliminating the
 overhead associated with FREEZE.  However, I wanted to take a step back
 and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

 That is a very commendable approach. We should do that more often.

 The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

 When you say stuff like that, you should add speculating from my
 personal experience. People might get the impression you'd measured
 this somehow and it could confuse the issue if you try to assemble a
 high level viewpoint and then add in factoids that are just opinions.

 We should strive to measure such things.

 That's considerably better than was the case 5 years ago, when vacuum
 management was a daily or weekly responsibility for nearly 100% of our
 users, but it's still not good enough.  Our target should be that only
 those with really unusual setups should have to *ever* think about
 vacuum and analyze.

 I think that's where we already are given that 1000s of users have
 quite small databases.

 The problem increases with scale. Larger databases have bigger
 problems and make it easier to notice things are happening.

 I think you should mention that the evidence for these issues is
 anecdotal and take careful notes of the backgrounds in which they
 occurred. Saying things occur in all cases wouldn't be accurate or
 helpful to their resolution.

 We should be seeking to contrast this against other databases to see
 if we are better or worse than other systems. For example, recording
 the moans of someone who is currently managing a 1 TB database, but
 yet hasn't ever managed anything else that big is less valuable than a
 balanced, experienced viewpoint (if such exists).

 Anyway, I support this approach, just wanted to make sure we do it in
 sufficient detail to be useful.

I agree with all that.  I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth.  AWS,
I'm looking at you.

It would be interesting to make a list of what other issues people
have seen using PostgreSQL on very large data sets.  Complaints I've
heard include:

1. Inexplicable failure of the planner to use indexes on very large
tables, preferring an obviously-stupid sequential scan.  This might be
fixed by the latest index-size fudge factor work.

2. Lack of concurrent DDL.

On VACUUM and ANALYZE specifically, I'd have to say that the most
common problems I encounter are (a) anti-wraparound vacuums kicking in
at inconvenient times and eating up too many resources and (b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources.  The changes we've been discussing elsewhere may
not completely solve this problem, because we'll still have to read
all pages that aren't yet all-visible... but they should surely help.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-06-01 Thread Simon Riggs
On 1 June 2013 21:26, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jun 1, 2013 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If we set a bit, surely we need to write the page. Isn't that what we
 were trying to avoid?

 No, the bit only gets set in situations when we were going to dirty
 the page for some other reason anyway.  Specifically, if a page
 modification discovers that we've switched epochs (but just once) and
 the bit isn't already set, we can set it in lieu of scanning the
 entire page for tuples that need freezing.

 Under this proposal, pages that don't contain any dead tuples needn't
 be dirtied for freezing, ever.  Smells like awesome.

Agreed, well done both.

What I especially like about it is how little logic it will require,
and no page format changes.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers