Re: [HACKERS] reprise: pretty print viewdefs

2011-12-27 Thread Andrew Dunstan



On 12/22/2011 06:14 PM, Andrew Dunstan wrote:



On 12/22/2011 06:11 PM, Andrew Dunstan wrote:



On 12/22/2011 02:17 PM, Andrew Dunstan wrote:



On 12/22/2011 01:05 PM, Tom Lane wrote:

Maybe, though I fear it might complicate the ruleutils code a bit.
You'd probably have to build the output for a column first and then
see how long it is before deciding whether to insert a newline.

In short, I don't mind trying to make this work better, but I think it
will take more work than a two-line patch.




OK. Let me whip something up. I had already come to the conclusion 
you did about how best to do this.





Here's a WIP patch. At least it's fairly localized, but as expected 
it's rather more than 2 lines :-). Sample output:


regression=# \pset format unaligned
Output format is unaligned.
regression=# select pg_get_viewdef('shoelace',true);
pg_get_viewdef
 SELECT s.sl_name, s.sl_avail, s.sl_color, s.sl_len, s.sl_unit,
s.sl_len * u.un_fact AS sl_len_cm
   FROM shoelace_data s, unit u
  WHERE s.sl_unit = u.un_name;
(1 row)
regression=#


I just had an idea. We could invent a flavor of pg_get_viewdef() that 
took an int as the second param, that would be the wrap width. For 
the boolean case as above, 80 would be implied. 0 would mean always 
wrap. psql could be made to default to the window width, or maybe 
window width - 1, but we could provide a psql setting that would 
override it.





This time with patch.





Updated, with docs and tests. Since the docs mark the versions of 
pg_get_viewdef() that take text as the first param as deprecated, I 
removed that variant of the new flavor. I left adding extra psql support 
to another day - I think this already does a good job of cleaning it up 
without any extra adjustments.


I'll add this to the upcoming commitfest.

cheers

andrew
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2e06346..0418591 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13758,7 +13758,8 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   row
entryliteralfunctionpg_get_viewdef(parameterview_name/parameter, parameterpretty_bool/)/function/literal/entry
entrytypetext/type/entry
-   entryget underlying commandSELECT/command command for view (emphasisdeprecated/emphasis)/entry
+   entryget underlying commandSELECT/command command for view, 
+  lines with fields are wrapped to 80 columns if pretty_bool is true (emphasisdeprecated/emphasis)/entry
   /row
   row
entryliteralfunctionpg_get_viewdef(parameterview_oid/parameter)/function/literal/entry
@@ -13768,7 +13769,14 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   row
entryliteralfunctionpg_get_viewdef(parameterview_oid/parameter, parameterpretty_bool/)/function/literal/entry
entrytypetext/type/entry
-   entryget underlying commandSELECT/command command for view/entry
+   entryget underlying commandSELECT/command command for view, 
+  lines with fields are wrapped to 80 columns if pretty_bool is true/entry
+  /row
+  row
+   entryliteralfunctionpg_get_viewdef(parameterview_oid/parameter, parameterwrap_int/)/function/literal/entry
+   entrytypetext/type/entry
+   entryget underlying commandSELECT/command command for view, 
+  wrapping lines with fields as specified, pretty printing is implied/entry
   /row
   row
entryliteralfunctionpg_options_to_table(parameterreloptions/parameter)/function/literal/entry
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29df748..5b774f0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -73,6 +73,8 @@
 #define PRETTYFLAG_PAREN		1
 #define PRETTYFLAG_INDENT		2
 
+#define PRETTY_WRAP_DEFAULT 79
+
 /* macro to test if pretty action needed */
 #define PRETTY_PAREN(context)	((context)-prettyFlags  PRETTYFLAG_PAREN)
 #define PRETTY_INDENT(context)	((context)-prettyFlags  PRETTYFLAG_INDENT)
@@ -136,6 +138,7 @@ static SPIPlanPtr plan_getrulebyoid = NULL;
 static const char *query_getrulebyoid = SELECT * FROM pg_catalog.pg_rewrite WHERE oid = $1;
 static SPIPlanPtr plan_getviewrule = NULL;
 static const char *query_getviewrule = SELECT * FROM pg_catalog.pg_rewrite WHERE ev_class = $1 AND rulename = $2;
+static int pretty_wrap = PRETTY_WRAP_DEFAULT;
 
 /* GUC parameters */
 bool		quote_all_identifiers = false;
@@ -381,6 +384,23 @@ pg_get_viewdef_ext(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_get_viewdef_wrap(PG_FUNCTION_ARGS)
+{
+	/* By OID */
+	Oid			viewoid = PG_GETARG_OID(0);
+	int		wrap = PG_GETARG_INT32(1);
+	int			prettyFlags;
+	char   *result;
+
+	/* calling this implies we want pretty printing */
+	prettyFlags = PRETTYFLAG_PAREN | PRETTYFLAG_INDENT;
+	pretty_wrap = wrap;
+	result = pg_get_viewdef_worker(viewoid, prettyFlags);
+	pretty_wrap = PRETTY_WRAP_DEFAULT;
+	PG_RETURN_TEXT_P(string_to_text(result));
+}
+
+Datum
 

Re: [HACKERS] Standalone synchronous master

2011-12-27 Thread Alexander Björnhagen
Okay,

Here’s version 3 then, which piggy-backs on the existing flag :

synchronous_commit = on | off | local | fallback

Where “fallback” now means “fall back from sync replication when no
(suitable) standbys are connected”.

This was done on input from Guillaume Lelarge.

 That said, I agree it's not necessarily reasonable to try to defend
 against that in a two node cluster.

That’s what I’ve been trying to say all along but I didn’t give enough
context before so I understand we took a turn there.

You can always walk up to any setup and say “hey, if you nuke that
site from orbit and crash that other thing, and ...” ;) I’m just
kidding of course but you get the point. Nothing is absolute.

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
  - Okay, promote the standby

2.The standby fails
  - Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

I’ve stated that 1 and 2 together covers way more than 99.9% of what’s
expected in my setup on any given day.

But 3. is what we’ve been talking about ... And well in that case
there is no reason to just go ahead and promote a standby because,
granted, it could be lagging behind if the master decided to switch to
standalone mode just before going down itself.

As long as you do not prematurely or rather instinctively promote the
standby when it has *possibly* lagged behind, you’re good and there is
no risk of data loss. The data might be sitting on a crashed or
otherwise unavailable master, but it’s not lost. Promoting the standby
however is basically saying “forget the master and its data, continue
from where the standby is currently at”.

Now granted this is operationally harder/more complicated than just
synchronous replication where you can always, in any case, just
promote the standby after a master failure, knowing that all data is
guaranteed to be replicated.

 I'm worried that the interface seems a bit
 fragile and that it's hard to be sure.

With this setup, you can’t promote the standby without first checking
if the replication link was disconnected prior to the master failure.

For me, the benefits outweigh this one drawback because I get more
standby replication guarantee than async replication and more master
availability than sync replication in the most plausible outcomes.

Cheers,

/A


sync-standalone-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2011-12-27 Thread Simon Riggs
On Sat, Dec 24, 2011 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Dec 22, 2011 at 4:20 PM, Robert Haas robertmh...@gmail.com wrote:

 Also, if it is that, what do we do about it?  I don't think any of the
 ideas proposed so far are going to help much.

 If you don't like guessing, don't guess, don't think. Just measure.

 Does increasing the number of buffers solve the problems you see? That
 must be the first port of call - is that enough, or not? If not, we
 can discuss the various ideas, write patches and measure them.

Just in case you want a theoretical prediction to test:

increasing NUM_CLOG_BUFFERS should reduce the frequency of the spikes
you measured earlier. That should happen proportionally, so as that is
increased they will become even less frequent. But the size of the
buffer will not decrease the impact of each event when it happens.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote:
 I
 think it would be regrettable if everyone had to give up 4 bytes per
 page because some people want checksums.

I can understand that some people might not want the CPU expense of
calculating CRCs; or the upgrade expense to convert to new pages; but do
you think 4 bytes out of 8192 is a real concern?

(Aside: it would be MAXALIGNed anyway, so probably more like 8 bytes.)

I was thinking we'd go in the other direction: expanding the header
would take so much effort, why not expand it a little more to give some
breathing room for the future?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 01:55 +, Greg Stark wrote:
 On Sun, Dec 18, 2011 at 7:51 PM, Jesper Krogh jes...@krogh.cc wrote:
  I dont know if it would be seen as a half baked feature.. or similar,
  and I dont know if the hint bit problem is solvable at all, but I could
  easily imagine checksumming just skipping the hit bit entirely.
 
 That was one approach discussed. The problem is that the hint bits are
 currently in each heap tuple header which means the checksum code
 would have to know a fair bit about the structure of the page format.

Which is actually a bigger problem, because it might not be the backend
that's reading the page. It might be your backup script taking a new
base backup.

The kind of person to care about CRCs would also want the base backup
tool to verify them during the copy so that you don't overwrite your
previous (good) backup with a bad one. The more complicated we make the
verification process, the less workable that becomes.

I vote for a simple way to calculate the checksum -- fixed offsets of
each page (of course, it would need to know the page size), and a
standard checksum algorithm.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 22:18 +0200, Heikki Linnakangas wrote:
 Or you could just use a filesystem that does CRCs...

That just moves the problem. Correct me if I'm wrong, but I don't think
there's anything special that the filesystem can do that we can't.

The filesystems that support CRCs are more like ZFS than ext3. They do
all writes to a new location, thus fragmenting the files. That may be a
good trade-off for some people, but it's not free.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Sun, 2011-12-25 at 22:18 +, Greg Stark wrote:
 2) The i/o system was in the process of writing out blocks and the
 system lost power or crashed as they were being written out. In this
 case there will probably only be 0 or 1 torn pages -- perhaps as many
 as the scsi queue depth if there's some weird i/o scheduling going on.

That would also depend on how many disks you have and what configuration
they're in, right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Thu, 2011-12-22 at 03:50 -0600, Kevin Grittner wrote:
 Now, on to the separate-but-related topic of double-write.  That
 absolutely requires some form of checksum or CRC to detect torn
 pages, in order for the technique to work at all.  Adding a CRC
 without double-write would work fine if you have a storage stack
 which prevents torn pages in the file system or hardware driver.  If
 you don't have that, it could create a damaged page indication after
 a hardware or OS crash, although I suspect that would be the
 exception, not the typical case.  Given all that, and the fact that
 it would be cleaner to deal with these as two separate patches, it
 seems the CRC patch should go in first.

I think it could be broken down further.

Taking a step back, there are several types of HW-induced corruption,
and checksums only catch some of them. For instance, the disk losing
data completely and just returning zeros won't be caught, because we
assume that a zero page is just fine.

From a development standpoint, I think a better approach would be:

1. Investigate if there are reasonable ways to ensure that (outside of
recovery) pages are always initialized; and therefore zero pages can be
treated as corruption.

2. Make some room in the page header for checksums and maybe some other
simple sanity information (like file and page number). It will be a big
project to sort out the pg_upgrade issues (as Tom and others have
pointed out).

3. Attack hint bits problem.

If (1) and (2) were complete, we would catch many common types of
corruption, and we'd be in a much better position to think clearly about
hint bits, double writes, etc.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] improve line-breaking and indentation of foreign options dump

2011-12-27 Thread Peter Eisentraut
Currently, pg_dump dumps foreign options in a single line, for example

CREATE SERVER cluster FOREIGN DATA WRAPPER plproxy OPTIONS (p0 'host=host0', p1 
'host=host1', p2 'host=host2', p3 'host=host3');

I think it would be nicer if it looked more like this:

CREATE SERVER cluster FOREIGN DATA WRAPPER plproxy OPTIONS (
p0 'host=host0',
p1 'host=host1',
p2 'host=host2',
p3 'host=host3'
);

Attached is a patch to implement that, and a test file to play around
with.
diff --git i/src/bin/pg_dump/pg_dump.c w/src/bin/pg_dump/pg_dump.c
index afeae6f..6dd4895 100644
--- i/src/bin/pg_dump/pg_dump.c
+++ w/src/bin/pg_dump/pg_dump.c
@@ -5671,7 +5671,7 @@ getTableAttrs(TableInfo *tblinfo, int numTables)
 			  SELECT pg_catalog.quote_ident(option_name) || 
 			  ' ' || pg_catalog.quote_literal(option_value) 
 			  FROM pg_catalog.pg_options_to_table(attfdwoptions)
-			  ), ', ') AS attfdwoptions 
+			  ), E',\n') AS attfdwoptions 
 			 FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t 
 			  ON a.atttypid = t.oid 
 			  WHERE a.attrelid = '%u'::pg_catalog.oid 
@@ -6488,7 +6488,7 @@ getForeignDataWrappers(int *numForeignDataWrappers)
 		  SELECT quote_ident(option_name) || ' ' || 
 		  quote_literal(option_value) 
 		  FROM pg_options_to_table(fdwoptions)
-		  ), ', ') AS fdwoptions 
+		  ), E',\n') AS fdwoptions 
 		  FROM pg_foreign_data_wrapper,
 		  username_subquery);
 	}
@@ -6502,7 +6502,7 @@ getForeignDataWrappers(int *numForeignDataWrappers)
 		  SELECT quote_ident(option_name) || ' ' || 
 		  quote_literal(option_value) 
 		  FROM pg_options_to_table(fdwoptions)
-		  ), ', ') AS fdwoptions 
+		  ), E',\n') AS fdwoptions 
 		  FROM pg_foreign_data_wrapper,
 		  username_subquery);
 	}
@@ -6591,7 +6591,7 @@ getForeignServers(int *numForeignServers)
 	  SELECT quote_ident(option_name) || ' ' || 
 	  quote_literal(option_value) 
 	  FROM pg_options_to_table(srvoptions)
-	  ), ', ') AS srvoptions 
+	  ), E',\n') AS srvoptions 
 	  FROM pg_foreign_server,
 	  username_subquery);
 
@@ -11484,7 +11484,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 		appendPQExpBuffer(q,  VALIDATOR %s, fdwinfo-fdwvalidator);
 
 	if (strlen(fdwinfo-fdwoptions)  0)
-		appendPQExpBuffer(q,  OPTIONS (%s), fdwinfo-fdwoptions);
+		appendPQExpBuffer(q,  OPTIONS (\n%s\n), fdwinfo-fdwoptions);
 
 	appendPQExpBuffer(q, ;\n);
 
@@ -11588,7 +11588,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 	appendPQExpBuffer(q, %s, fmtId(fdwname));
 
 	if (srvinfo-srvoptions  strlen(srvinfo-srvoptions)  0)
-		appendPQExpBuffer(q,  OPTIONS (%s), srvinfo-srvoptions);
+		appendPQExpBuffer(q,  OPTIONS (\n%s\n), srvinfo-srvoptions);
 
 	appendPQExpBuffer(q, ;\n);
 
@@ -11679,7 +11679,7 @@ dumpUserMappings(Archive *fout,
 	  SELECT quote_ident(option_name) || ' ' || 
 	  quote_literal(option_value) 
 	  FROM pg_options_to_table(umoptions)
-	  ), ', ') AS umoptions 
+	  ), E',\n') AS umoptions 
 	  FROM pg_user_mappings 
 	  WHERE srvid = '%u',
 	  catalogId.oid);
@@ -11704,7 +11704,7 @@ dumpUserMappings(Archive *fout,
 		appendPQExpBuffer(q,  SERVER %s, fmtId(servername));
 
 		if (umoptions  strlen(umoptions)  0)
-			appendPQExpBuffer(q,  OPTIONS (%s), umoptions);
+			appendPQExpBuffer(q,  OPTIONS (\n%s\n), umoptions);
 
 		appendPQExpBuffer(q, ;\n);
 
@@ -12337,7 +12337,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			  SELECT pg_catalog.quote_ident(option_name) || 
 			  ' ' || pg_catalog.quote_literal(option_value) 
 			  FROM pg_catalog.pg_options_to_table(ftoptions)
-			  ), ', ') AS ftoptions 
+			  ), E',\n') AS ftoptions 
 			  FROM pg_catalog.pg_foreign_table ft 
 			  JOIN pg_catalog.pg_foreign_server fs 
 			  ON (fs.oid = ft.ftserver) 
@@ -12566,7 +12566,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 
 		/* Dump generic options if any */
 		if (ftoptions  ftoptions[0])
-			appendPQExpBuffer(q, \nOPTIONS (%s), ftoptions);
+			appendPQExpBuffer(q, \nOPTIONS (\n%s\n), ftoptions);
 
 		appendPQExpBuffer(q, ;\n);
 
@@ -12766,7 +12766,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
   fmtId(tbinfo-dobj.name));
 appendPQExpBuffer(q, ALTER COLUMN %s ,
   fmtId(tbinfo-attnames[j]));
-appendPQExpBuffer(q, OPTIONS (%s);\n,
+appendPQExpBuffer(q, OPTIONS (\n%s\n);\n,
   tbinfo-attfdwoptions[j]);
 			}
 		}
CREATE FOREIGN DATA WRAPPER plproxy OPTIONS (
query_timeout '1800'
);

CREATE SERVER cluster FOREIGN DATA WRAPPER plproxy OPTIONS (
p0 'host=host0',
p1 'host=host1',
p2 'host=host2',
p3 'host=host3'
);

CREATE USER MAPPING FOR peter SERVER cluster OPTIONS (
user 'peter',
password 'sekret'
);

CREATE FOREIGN DATA WRAPPER foo;

CREATE SERVER foo FOREIGN DATA WRAPPER foo;

CREATE FOREIGN TABLE foobar (a 

Re: [HACKERS] sorting table columns

2011-12-27 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar dic 20 22:23:36 -0300 2011:
 
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Tom Lane's message of mar dic 20 18:24:29 -0300 2011:
  You do *not* want to store either of the latter two numbers in
  parse-time Var nodes, because then you can't rearrange columns without
  having to update stored rules.  But it might be useful to decree that
  one thing setrefs.c does is renumber Vars in scan nodes to use the
  physical column numbers instead of the permanent IDs.
 
  Hmm, having the numbers in Var nodes seems a fundamental part of the way
  I'm attacking the problem.  Hopefully after I give setrefs.c a read I
  will have a clearer picture of the way to do it without that.
 
 To clarify a bit: one thing that setrefs.c already does is to renumber
 Var nodes above the scan level, so that their attnums refer not to
 original table column attnums but to column numbers in the output of the
 next plan level down.  Vars in scan nodes currently don't need any
 renumbering, but it'd be easy enough to extend the logic to do something
 to them as well.  I'm visualizing the run-time transformation from
 physical to logical column ordering as a sort of projection, much like
 the mapping that happens in a join node.

After more playing with this, it turned out that those logical numbers
stored in Var and TargetEntry are actually completely useless; after
they served their purpose in helping me track down that I actually
needed to sort columns at the RangeTblEntry level, I was able to revert
all those bits and things work fine (actually they work better).  So
far, I have had no need to touch setrefs.c that I see.  The reason is
that * expansion happens much earlier than setrefs.c is involved, at the
parse analysis level; the target lists generated at that point must
already follow the logical column order.  So that part of the patch
becomes this:

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9e277c5..f640bd8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -701,6 +701,7 @@ typedef struct RangeTblEntry
 */
Oid relid;  /* OID of the relation */
charrelkind;/* relation kind (see pg_class.relkind) */
+   List   *lognums;/* int list of logical column numbers */
 
/*
 * Fields valid for a subquery RTE (else NULL):

Note that the the eref-colnames list is built in logical column order
(which is what it should be, because it then matches the alias-colnames
list).  With all that, it's easy to map the attnums to the logical
numbers when the target list is being constructed.  And things work fine
from that point onwards, because we still keep track of the original
attnum to reference the TupleDesc.

A RTE in a stored rule looks like this:

{RTE :alias  :eref {ALIAS :aliasname bar :colnames (z y x)} :rtekind 0
:relid 16404 :relkind r :lognums (i 3 2 1) :inh true :inFromCl true
:requiredPerms 2 :checkAsUser 0 :selectedCols (b 9 10 11) :modifiedCols (b)}

The original table was created with columns x, y, z, and then I
reversed the order.  So if I change the column order in the original
table, the rule does not need any change and it continues to return the
logical order that the table had when the view was originally defined.

(I wonder what the other two RTEs, those named new and old, are
for.)


One thing I'm finding necessary (for COPY support as well as things that
travel through different DestReceivers, such as SQL functions) is that
TupleTableSlots need to keep track of logical vs. physical order, and
form/deform tuples using the correct ordering.  So the values/isnull
arrays may be in either order depending on what the caller is doing.  At
some point a MinimalTuple might be constructed in logical order, for
example, and the caller must be aware of this so that it can be
deconstructed correctly later on.  I mention this so that there's time
for bells to ring ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-27 Thread Heikki Linnakangas

On 25.12.2011 15:01, Kevin Grittner wrote:

I don't believe that.  Double-writing is a technique to avoid torn
pages, but it requires a checksum to work.  This chicken-and-egg
problem requires the checksum to be implemented first.


I don't think double-writes require checksums on the data pages 
themselves, just on the copies in the double-write buffers. In the 
double-write buffer, you'll need some extra information per-page anyway, 
like a relfilenode and block number that indicates which page it is in 
the buffer.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misleading CREATE TABLE error

2011-12-27 Thread Peter Eisentraut
On tis, 2011-11-08 at 21:49 +, Thom Brown wrote:
 I found the following error message misleading:
 
 test=# create table cows2 (LIKE cows);
 ERROR:  inherited relation cows is not a table
 STATEMENT:  create table cows2 (LIKE cows);
 
 I'm not trying to inherit a relation, I'm trying to base a table on
 it.

It's not only the error message that's misleading, but the whole code,
because the entire code for CREATE TABLE ... (LIKE ...) claims to do
inheritance based on an ancient understanding of the SQL standard.  I
know this has confused me many times already, so I decided to clean this
up and rename all the internal parser structures, split up the
regression tests for real inheritance and CREATE TABLE LIKE, and adjust
the error messages.  Patch attached.

 As it happens, cows is a foreign table, which *is* a table,
 just not a regular table.  It might be useful to add support to clone
 foreign tables into regular tables, the use-case being that you may
 wish to import all the data locally into a table of the same
 structure.

This is easy to fix, and I mangled it into my big renaming patch, which
I shouldn't have.  Anyway, one question that's perhaps worth discussing
is whether we should allow and disallow the various INCLUDING options
depending on the relation type.  For example, views don't have indexes,
so should we disallow INCLUDING INDEXES or just assume they don't have
any?
diff --git i/doc/src/sgml/ref/create_table.sgml w/doc/src/sgml/ref/create_table.sgml
index 30e4154..97968bb 100644
--- i/doc/src/sgml/ref/create_table.sgml
+++ w/doc/src/sgml/ref/create_table.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name/replaceable ( [
   { replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceablecollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 | replaceabletable_constraint/replaceable
-| LIKE replaceableparent_table/replaceable [ replaceablelike_option/replaceable ... ] }
+| LIKE replaceablesource_table/replaceable [ replaceablelike_option/replaceable ... ] }
 [, ... ]
 ] )
 [ INHERITS ( replaceableparent_table/replaceable [, ... ] ) ]
@@ -312,7 +312,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
/varlistentry
 
varlistentry
-termliteralLIKE replaceableparent_table/replaceable [ replaceablelike_option/replaceable ... ]/literal/term
+termliteralLIKE replaceablesource_table/replaceable [ replaceablelike_option/replaceable ... ]/literal/term
 listitem
  para
   The literalLIKE/literal clause specifies a table from which
diff --git i/src/backend/nodes/copyfuncs.c w/src/backend/nodes/copyfuncs.c
index dd2dd25..a4d3912 100644
--- i/src/backend/nodes/copyfuncs.c
+++ w/src/backend/nodes/copyfuncs.c
@@ -2726,10 +2726,10 @@ _copyCreateStmt(const CreateStmt *from)
 	return newnode;
 }
 
-static InhRelation *
-_copyInhRelation(const InhRelation *from)
+static TableLikeClause *
+_copyTableLikeClause(const TableLikeClause *from)
 {
-	InhRelation *newnode = makeNode(InhRelation);
+	TableLikeClause *newnode = makeNode(TableLikeClause);
 
 	COPY_NODE_FIELD(relation);
 	COPY_SCALAR_FIELD(options);
@@ -4133,8 +4133,8 @@ copyObject(const void *from)
 		case T_CreateStmt:
 			retval = _copyCreateStmt(from);
 			break;
-		case T_InhRelation:
-			retval = _copyInhRelation(from);
+		case T_TableLikeClause:
+			retval = _copyTableLikeClause(from);
 			break;
 		case T_DefineStmt:
 			retval = _copyDefineStmt(from);
diff --git i/src/backend/nodes/equalfuncs.c w/src/backend/nodes/equalfuncs.c
index c2fdb2b..c1bf94f 100644
--- i/src/backend/nodes/equalfuncs.c
+++ w/src/backend/nodes/equalfuncs.c
@@ -1159,7 +1159,7 @@ _equalCreateStmt(const CreateStmt *a, const CreateStmt *b)
 }
 
 static bool
-_equalInhRelation(const InhRelation *a, const InhRelation *b)
+_equalTableLikeClause(const TableLikeClause *a, const TableLikeClause *b)
 {
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_SCALAR_FIELD(options);
@@ -2676,8 +2676,8 @@ equal(const void *a, const void *b)
 		case T_CreateStmt:
 			retval = _equalCreateStmt(a, b);
 			break;
-		case T_InhRelation:
-			retval = _equalInhRelation(a, b);
+		case T_TableLikeClause:
+			retval = _equalTableLikeClause(a, b);
 			break;
 		case T_DefineStmt:
 			retval = _equalDefineStmt(a, b);
diff --git i/src/backend/nodes/outfuncs.c w/src/backend/nodes/outfuncs.c
index cdc2cab..9d14141 100644
--- i/src/backend/nodes/outfuncs.c
+++ w/src/backend/nodes/outfuncs.c
@@ -2064,9 +2064,9 @@ _outDefElem(StringInfo str, const DefElem *node)
 }
 
 static void
-_outInhRelation(StringInfo str, const InhRelation *node)
+_outTableLikeClause(StringInfo str, const TableLikeClause *node)
 {
-	WRITE_NODE_TYPE(INHRELATION);
+	WRITE_NODE_TYPE(TABLELIKECLAUSE);
 
 	WRITE_NODE_FIELD(relation);
 	

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis pg...@j-davis.com wrote:
 3. Attack hint bits problem.

A large number of problems would go away if the current hint bit
system could be replaced with something that did not require writing
to the tuple itself.  FWIW, moving the bits around seems like a
non-starter -- you're trading a problem with a much bigger problem
(locking, wal logging, etc).  But perhaps a clog caching strategy
would be a win.  You get a full nibble back in the tuple header,
significant i/o reduction for some workloads, crc becomes relatively
trivial, etc etc.

My first attempt at a process local cache for hint bits wasn't perfect
but proved (at least to me) that you can sneak a tight cache in there
without significantly impacting the general case.  Maybe the angle of
attack was wrong anyways -- I bet if you kept a judicious number of
clog pages in each local process with some smart invalidation you
could cover enough cases that scribbling the bits down would become
unnecessary.  Proving that is a tall order of course, but IMO merits
another attempt.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-27 Thread Simon Riggs
On Tue, Dec 27, 2011 at 8:05 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 25.12.2011 15:01, Kevin Grittner wrote:

 I don't believe that.  Double-writing is a technique to avoid torn
 pages, but it requires a checksum to work.  This chicken-and-egg
 problem requires the checksum to be implemented first.


 I don't think double-writes require checksums on the data pages themselves,
 just on the copies in the double-write buffers. In the double-write buffer,
 you'll need some extra information per-page anyway, like a relfilenode and
 block number that indicates which page it is in the buffer.

How would you know when to look in the double write buffer?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Tue, 2011-12-27 at 16:43 -0600, Merlin Moncure wrote:
 On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis pg...@j-davis.com wrote:
  3. Attack hint bits problem.
 
 A large number of problems would go away if the current hint bit
 system could be replaced with something that did not require writing
 to the tuple itself.

My point was that neither the zero page problem nor the upgrade problem
are solved by addressing the hint bits problem. They can be solved
independently, and in my opinion, it seems to make sense to solve those
problems before the hint bits problem (in the context of detecting
hardware corruption).

Of course, don't let that stop you from trying to get rid of hint bits,
that has numerous potential benefits.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat wait timeout

2011-12-27 Thread Steve Crawford
I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is 
currently in test/dev mode. I'm currently seeing the following messages 
occurring every few seconds:


...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:43:59 foo postgres[23697]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:04 foo postgres[27324]: [71402-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:09 foo postgres[23715]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:17 foo postgres[27324]: [71403-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:22 foo postgres[23716]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:27 foo postgres[27324]: [71404-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:33 foo postgres[23718]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:54 foo postgres[27324]: [71405-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:59 foo postgres[23824]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:45:04 foo postgres[27324]: [71406-1] : WARNING:  pgstat wait 
timeout


I can't correlate events exactly, but the messages seem to have started 
shortly after I dropped a pgbench user and database. My Googling turned 
up various requests for debugging info on hackers. Since the system 
isn't live, I haven't touched it in case anyone wants me to collect 
debugging info.


Otherwise, I plan on just blowing the install away and replacing it with 9.1

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misleading CREATE TABLE error

2011-12-27 Thread Thom Brown
On 27 December 2011 20:16, Peter Eisentraut pete...@gmx.net wrote:
 It's not only the error message that's misleading, but the whole code,
 because the entire code for CREATE TABLE ... (LIKE ...) claims to do
 inheritance based on an ancient understanding of the SQL standard.  I
 know this has confused me many times already, so I decided to clean this
 up and rename all the internal parser structures, split up the
 regression tests for real inheritance and CREATE TABLE LIKE, and adjust
 the error messages.  Patch attached.

Thanks for the patch. +1 for changing parent to source in the
docs.  The patch doesn't apply cleanly for me for some reason though.

 Anyway, one question that's perhaps worth discussing
 is whether we should allow and disallow the various INCLUDING options
 depending on the relation type.  For example, views don't have indexes,
 so should we disallow INCLUDING INDEXES or just assume they don't have
 any?

I'd personally prefer the latter, primarily because it won't create
another syntax variation with no discernable benefit.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-27 Thread Simon Riggs
On Sun, Dec 25, 2011 at 1:01 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 This chicken-and-egg
 problem requires the checksum to be implemented first.

v2 of checksum patch, using a conditional copy if checksumming is
enabled, so locking is removed.

Thanks to Andres for thwacking me with the cluestick, though I have
used a simple copy rather than a copy  calc.

Tested using make installcheck with parameter on/off, then restart and
vacuumdb to validate all pages.

Reviews, objections, user interface tweaks all welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 1701,1706  SET ENABLE_SEQSCAN TO OFF;
--- 1701,1748 
/listitem
   /varlistentry
  
+  varlistentry id=guc-page-checksums xreflabel=page_checksums
+   indexterm
+primaryvarnamepage_checksums/ configuration parameter/primary
+   /indexterm
+   termvarnamepage_checksums/varname (typeboolean/type)/term
+   listitem
+para
+ When this parameter is on, the productnamePostgreSQL/ server
+ calculates checksums when it writes main database pages to disk,
+ flagging the page as checksum protected.  When this parameter is off,
+ no checksum is written, only a standard watermark in the page header.
+ The database may thus contain a mix of pages with checksums and pages
+ without checksums.
+/para
+ 
+para
+ When pages are read into shared buffers any page flagged with a
+ checksum has the checksum re-calculated and compared against the
+ stored value to provide greatly improved validation of page contents.
+/para
+ 
+para
+ Writes via temp_buffers are not checksummed.
+/para
+ 
+para
+ Turning this parameter off speeds normal operation, but
+ might allow data corruption to go unnoticed. The checksum uses
+ 16-bit checksums, using the fast Fletcher 16 algorithm. With this
+ parameter enabled there is still a non-zero probability that an error
+ could go undetected, as well as a non-zero probability of false
+ positives.
+/para
+ 
+para
+ This parameter can only be set in the filenamepostgresql.conf/
+ file or on the server command line.
+ The default is literaloff/.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-wal-buffers xreflabel=wal_buffers
termvarnamewal_buffers/varname (typeinteger/type)/term
indexterm
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***
*** 440,446  ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
  
  			/* check for garbage data */
! 			if (!PageHeaderIsValid((PageHeader) bufBlock))
  			{
  if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
  {
--- 440,446 
  			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
  
  			/* check for garbage data */
! 			if (!PageIsVerified((Page) bufBlock))
  			{
  if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
  {
***
*** 1860,1865  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
--- 1860,1867 
  {
  	XLogRecPtr	recptr;
  	ErrorContextCallback errcontext;
+ 	Block		bufBlock;
+ 	char		*bufCopy;
  
  	/*
  	 * Acquire the buffer's io_in_progress lock.  If StartBufferIO returns
***
*** 1907,1916  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
  	buf-flags = ~BM_JUST_DIRTIED;
  	UnlockBufHdr(buf);
  
  	smgrwrite(reln,
  			  buf-tag.forkNum,
  			  buf-tag.blockNum,
! 			  (char *) BufHdrGetBlock(buf),
  			  false);
  
  	pgBufferUsage.shared_blks_written++;
--- 1909,1932 
  	buf-flags = ~BM_JUST_DIRTIED;
  	UnlockBufHdr(buf);
  
+ 	/*
+ 	 * Set page verification info immediately before we write the buffer to disk.
+ 	 * Once we have flushed the buffer is marked clean again, meaning it can
+ 	 * be replaced quickly and silently with another data block, so we must
+ 	 * write verification info now. For efficiency, the process of cleaning
+ 	 * and page replacement is asynchronous, so we can't do this *only* when
+ 	 * we are about to replace the buffer, we need to do this for every flush.
+ 	 */
+ 	bufBlock = BufHdrGetBlock(buf);
+ 	bufCopy = PageSetVerificationInfo((Page) bufBlock);
+ 
+ 	/*
+ 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
+ 	 */
  	smgrwrite(reln,
  			  buf-tag.forkNum,
  			  buf-tag.blockNum,
! 			  (char *) bufCopy,
  			  false);
  
  	pgBufferUsage.shared_blks_written++;
***
*** 1921,1926  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
--- 1937,1944 
  	 */
  	TerminateBufferIO(buf, true, 0);
  
+ 	/* XXX 

Re: [HACKERS] Review of VS 2010 support patches

2011-12-27 Thread Brar Piening

Brar Piening wrote:
I have to admit that it's currently broken (it builds but fails during 
regression tests becuse it can't connect) when building with Visual 
Studio 2010 or Windows SDK 7.1 because of commit 
1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the 
mingw-w64 32 bit compiler).


It seems like VS 2010 has a few of the E... constants in 
src/include/port/win32.h already defined, but obviously in a way that 
breaks postgres.


Because of my missing experience and as I don't have a Mingw64 build 
environment I don't feel like I could fix that without breaking 
anythig else.


I'd like to add that I'm certainly willing to test suggested fixes or 
patches in my VS 2010 build environment.


Regards,
Brar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2011-12-27 Thread Noah Misch
On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote:
 On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote:
 
  FYI, I've added myself as the reviewer for the current commitfest.
 
 How is going review now?

I will examine this patch within the week.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of VS 2010 support patches

2011-12-27 Thread Magnus Hagander
On Wed, Dec 28, 2011 at 05:09, Brar Piening b...@gmx.de wrote:
 Brar Piening wrote:

 I have to admit that it's currently broken (it builds but fails during
 regression tests becuse it can't connect) when building with Visual Studio
 2010 or Windows SDK 7.1 because of commit
 1a0c76c32fe470142d3663dd84ac960d75a4e8db (Enable compiling with the
 mingw-w64 32 bit compiler).

 It seems like VS 2010 has a few of the E... constants in
 src/include/port/win32.h already defined, but obviously in a way that breaks
 postgres.

 Because of my missing experience and as I don't have a Mingw64 build
 environment I don't feel like I could fix that without breaking anythig
 else.


 I'd like to add that I'm certainly willing to test suggested fixes or
 patches in my VS 2010 build environment.


What actual error do you get when trying to connect?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-27 Thread Heikki Linnakangas

On 28.12.2011 01:39, Simon Riggs wrote:

On Tue, Dec 27, 2011 at 8:05 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 25.12.2011 15:01, Kevin Grittner wrote:


I don't believe that.  Double-writing is a technique to avoid torn
pages, but it requires a checksum to work.  This chicken-and-egg
problem requires the checksum to be implemented first.



I don't think double-writes require checksums on the data pages themselves,
just on the copies in the double-write buffers. In the double-write buffer,
you'll need some extra information per-page anyway, like a relfilenode and
block number that indicates which page it is in the buffer.


How would you know when to look in the double write buffer?


You scan the double-write buffer, and every page in the double write 
buffer that has a valid checksum, you copy to the main storage. There's 
no need to check validity of pages in the main storage.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers