date:20150423


On 04/24/2015 06:30 AM, Stephen Frost wrote:

* Peter Geoghegan (p...@heroku.com) wrote:

On Thu, Apr 23, 2015 at 12:05 PM, Heikki Linnakangas  wrote:

While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel
that ExecInsertIndexTuples() and friends would deserve a file of their own,
and not be buried in the middle of execUtils.c. I propose that we split
execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices()
ExecInsertIndexTuples() and related functions into a new file called
executor/execIndexing.c.


That split makes a lot of sense to me.


No objections here.


Ok, moved.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] forward vs backward slashes in msvc build code

2015-04-23 Thread Michael Paquier

On Fri, Mar 13, 2015 at 6:04 PM, Michael Paquier
 wrote:
> On Fri, Mar 13, 2015 at 6:20 AM, Alvaro Herrera wrote:
>> Peter Eisentraut wrote:
>>> This is contrib/chkpass not finding the crypt symbol, which is
>>> presumably in some library.  But I can't see how it would normally find
>>> it, without my patch.
>>
>> It seems crypt is provided by libpgport.  So chkpass should be mentioned
>> in @contrib_uselibpgport, but isn't.  Maybe the fix is just to add it
>> there?
>
> I had a closer look at this patch, and yes indeed, the problem was
> exactly that. Now honestly I cannot understand why this dependency
> with libpgport was not necessary before... In any case, attached is a
> patch rebased on HEAD that builds correctly with MSVC.

Now that the stuff related to the move from contrib/ to src/bin/,
modulescheck and tmp_install has been committed, shouldn't we give a
new shot at this patch? Attached is a rebased version.
-- 
Michael
From 3599bab7b0e37df2a47e497dc9049aa7d78a880b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 23 Apr 2015 20:13:30 -0700
Subject: [PATCH] Replace backslashes by slash in MSVC build script

This makes it easier to run on Linux to check file paths.
---
 src/tools/msvc/MSBuildProject.pm |   2 +-
 src/tools/msvc/Mkvcbuild.pm  | 344 ---
 src/tools/msvc/Project.pm|  44 +++--
 src/tools/msvc/Solution.pm   | 138 
 4 files changed, 264 insertions(+), 264 deletions(-)

diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
index 37958f9..a16f9ac 100644
--- a/src/tools/msvc/MSBuildProject.pm
+++ b/src/tools/msvc/MSBuildProject.pm
@@ -127,7 +127,7 @@ EOF
 	foreach my $fileNameWithPath (sort keys %{ $self->{files} })
 	{
 		confess "Bad format filename '$fileNameWithPath'\n"
-		  unless ($fileNameWithPath =~ /^(.*)\\([^\\]+)\.(c|cpp|y|l|rc)$/);
+		  unless ($fileNameWithPath =~ m!^(.*)/([^/]+)\.(c|cpp|y|l|rc)$!);
 		my $dir  = $1;
 		my $fileName = $2;
 		if ($fileNameWithPath =~ /\.y$/ or $fileNameWithPath =~ /\.l$/)
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 8654bfe..2f0e837 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,10 +33,12 @@ my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
 my @contrib_uselibpq =
   ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
 my @contrib_uselibpgport = (
+	'chkpass',
 	'oid2name',
 	'pg_standby',
 	'vacuumlo');
 my @contrib_uselibpgcommon = (
+	'chkpass',
 	'oid2name',
 	'pg_standby',
 	'vacuumlo');
@@ -44,8 +46,8 @@ my $contrib_extralibs = undef;
 my $contrib_extraincludes =
   { 'tsearch2' => ['contrib/tsearch2'], 'dblink' => ['src/backend'] };
 my $contrib_extrasource = {
-	'cube' => [ 'contrib\cube\cubescan.l', 'contrib\cube\cubeparse.y' ],
-	'seg' => [ 'contrib\seg\segscan.l', 'contrib\seg\segparse.y' ], };
+	'cube' => [ 'contrib/cube/cubescan.l', 'contrib/cube/cubeparse.y' ],
+	'seg' => [ 'contrib/seg/segscan.l', 'contrib/seg/segparse.y' ], };
 my @contrib_excludes = ('pgcrypto', 'commit_ts', 'intagg', 'sepgsql');
 
 # Set of variables for frontend modules
@@ -59,12 +61,12 @@ my $frontend_extralibs = {
 	'pgbench'=> ['ws2_32.lib'],
 	'psql'   => ['ws2_32.lib'] };
 my $frontend_extraincludes = {
-	'initdb' => ['src\timezone'],
-	'psql'   => [ 'src\bin\pg_dump', 'src\backend' ] };
+	'initdb' => ['src/timezone'],
+	'psql'   => [ 'src/bin/pg_dump', 'src/backend' ] };
 my $frontend_extrasource = {
-	'psql' => ['src\bin\psql\psqlscan.l'],
+	'psql' => ['src/bin/psql/psqlscan.l'],
 	'pgbench' =>
-		[ 'src\bin\pgbench\exprscan.l', 'src\bin\pgbench\exprparse.y' ],
+		[ 'src/bin/pgbench/exprscan.l', 'src/bin/pgbench/exprparse.y' ],
 };
 my @frontend_excludes =
   ('pgevent', 'pg_basebackup', 'pg_rewind', 'pg_dump', 'pg_xlogdump', 'scripts');
@@ -73,9 +75,9 @@ sub mkvcbuild
 {
 	our $config = shift;
 
-	chdir('..\..\..') if (-d '..\msvc' && -d '..\..\..\src');
+	chdir('../../..') if (-d '../msvc' && -d '../../../src');
 	die 'Must run from root or msvc directory'
-	  unless (-d 'src\tools\msvc' && -d 'src');
+	  unless (-d 'src/tools/msvc' && -d 'src');
 
 	my $vsVersion = DetermineVisualStudioVersion();
 
@@ -114,37 +116,37 @@ sub mkvcbuild
 
 	$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
 	$libpgport->AddDefine('FRONTEND');
-	$libpgport->AddFiles('src\port', @pgportfiles);
+	$libpgport->AddFiles('src/port', @pgportfiles);
 
 	$libpgcommon = $solution->AddProject('libpgcommon', 'lib', 'misc');
 	$libpgcommon->AddDefine('FRONTEND');
-	$libpgcommon->AddFiles('src\common', @pgcommonfrontendfiles);
+	$libpgcommon->AddFiles('src/common', @pgcommonfrontendfiles);
 
-	$postgres = $solution->AddProject('postgres', 'exe', '', 'src\backend');
-	$postgres->AddIncludeDir('src\backend');
-	$postgres->AddDir('src\backend\port\win32');
-	$postgres->AddFile('src\backend\utils\fmgrtab.c');
+	$postgres = $solution->AddProject('postgres', 'exe', '', 'src/backend');
+	$post

Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 23 Apr 2015 14:49:29 -0500, Jim Nasby  wrote 
in <55394cc9.5050...@bluetreble.com>
> On 4/23/15 5:07 AM, Kyotaro HORIGUCHI wrote:
> > This is because parsing of UNION immediately converts constants
> > of unknown type in the UNION's both arms to text so the top level
> > select won't be bothered by this problem. But the problematic
> > query doesn't have appropriate timing to do that until the
> > function I patched.
> 
> FWIW, I think that's more accidental than anything.

I guess so. It looks not intentional about this behavior at
all.

> I'm no expert in our casting and type handling code but I spent a lot
> of time stuck in it while working on the variant type, and it seems
> very scattered. There's stuff in the actual casting code, there's some
> stuff in other parts of parse/plan, there's stuff in individual types
> (array and record at least).
> 
> Some stuff is handled by casting; some stuff is handled by mangling
> the parse tree.

That's what makes me unconfident. But if coercion is always made
by coerce_type and coercion is properly considered at all places
needs it, and this coercion steps is appropriate, we will see
nothing bad. I hope.

> Something else I noticed is we're not consistent with handling typmod
> either. I don't remember the exact example I found, but there's cases
> involving casting of constants where we ignore it (I don't think it
> was as simple as SELECT 1::int::variant(...), but it was something
> like that).

Mmm.. It's a serious bug if explicit casts are ignored. If some
cast procedures does wrong, it should be fixed.

> I don't know how much of this is just historical and how much is
> intentional, but it'd be nice if we could consolidate it more.

Yeah, but it seems tough to do it throughly.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Amit Kapila

On Fri, Apr 24, 2015 at 1:03 AM, Jim Nasby  wrote:
>
> On 4/23/15 11:45 AM, Petr Jelinek wrote:
>>
>> On 23/04/15 18:24, Andres Freund wrote:
>>>
>>> Whether that's feasible complexity wise is debatable, but it's certainly
>>> possible.
>>>
>>>
>>> I do wonder what, in realistic cases, is actually the bigger contributor
>>> to the overhead. The tuple header or the padding we liberally add in
>>> many cases...
>>>
>>
>> The logical ordering patch + auto optimizations of column layout on
>> table creation/rewrite might help partially there.
>>
>> But what seems to be clear is that we need more in depth analysis of
>> what really contributes most to the tuple size in various use-cases and
>> then we can debate what we can do about it.
>
>
> Also, what Robert posted was that while we started at something like
15%-30% larger, we ended the test at 80% larger. That makes me think that
the bigger win is not in reducing tuple size but tackling bloat.
>

I agree with you and what I think one of the major reasons of bloat is that
Index segment doesn't have visibility information due to which clearing of
Index needs to be tied along with heap.  Now if we can move transaction
information at page level, then we can even think of having it in Index
segment as well and then Index can delete/prune it's tuples on it's own
which can reduce the bloat in index significantly and there is a benefit
to Vacuum as well.  Now this has some downsides as well like Delete
needs to traverse Index segment as well to Delete mark the tuples, but
I think the upsides of reducing bloat can certainly outweigh the downsides.

In short, reducing the tuple size by moving transaction information at
page level can not only reduce the tuple size but can also help in
reducing bloat.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [committers] pgsql: RLS fixes, new hooks, and new test module

Christian,

* Christian Ullrich (ch...@chrullrich.net) wrote:
> * Stephen Frost wrote:
> 
> >RLS fixes, new hooks, and new test module
> 
> The buildfarm says that with -DCLOBBER_CACHE_ALWAYS, the RLS
> violations get blamed on the wrong tables. Mostly, they are catalogs
> (I have seen pg_opclass, pg_am, and pg_amproc), but some also come
> up with binary garbage instead, e.g.
> 
> - 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-04-23%2000%3A00%3A12
> - 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbird&dt=2015-04-23%2004%3A20%3A00
> - 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=tick&dt=2015-04-22%2019%3A56%3A53

Yup, thanks, Robert pointed this out on another thread and I'm looking
into it.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Moving ExecInsertIndexTuples and friends to new file

* Peter Geoghegan (p...@heroku.com) wrote:
> On Thu, Apr 23, 2015 at 12:05 PM, Heikki Linnakangas  wrote:
> > While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel
> > that ExecInsertIndexTuples() and friends would deserve a file of their own,
> > and not be buried in the middle of execUtils.c. I propose that we split
> > execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices()
> > ExecInsertIndexTuples() and related functions into a new file called
> > executor/execIndexing.c.
> 
> That split makes a lot of sense to me.

No objections here.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

Stephen Frost  writes:
> * Bruce Momjian (br...@momjian.us) wrote:
>> On Sun, Feb  1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
>>> The problem here is that performance degrades exponentially, or
>>> worse. Speaking here from experience, we already tested this for a
>>> very similar case (table creation, where two oids are consumed from
>>> a global sequence when inserting to pg_class). Have a look at
>>> http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
>>> slides 43-45. We tested there this scenario and shown that table
>>> creations per second dropped from 10K to a few per second and then
>>> to a few per day. In the graphs you can't even realize there were
>>> more tables been created. At around 8K tables from the theoretical
>>> limit of 4B oids consumed, the process basically stopped (doing more
>>> insertions).

>> We don't report the maximum number of tables per database, or the
>> maximum number of TOAST values.  Agreed?

> For my 2c, this limitation is a surprise to users and therefore we
> should add documentation to point out that it exists, unless we're going
> to actually fix it (which is certainly what I'd prefer to see...).

TBH, I've got very little enthusiasm for fixing this given the number
of reports of trouble from the field, which so far as I recall is zero.
Álvaro's case came up through intentionally trying to create an
unreasonable number of tables, not from real usage.  This thread likewise
appears to contain lots of speculation and no reports of anyone hitting
a problem in practice.

Certainly this is likely to become an issue at some point in the future,
but I'm not finding it very compelling to worry about now.  By the time
it does become an issue, we may have additional considerations or use
cases that should inform a solution; which seems to me to be a good
argument not to try to fix it in advance of real problems.  Perhaps,
for example, we'd find that at the same time we ought to relax the 1GB
limit on individual-value size; or perhaps not.

Having said all that, if we did try to fix it today, I'd imagine changing
TOAST value identifiers to int64 and inventing a new TOAST pointer format
for use when 32 bits isn't wide enough for the ID.  But I think we're best
advised to hold off doing that until the need becomes pressing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-04-23 Thread Abhijit Menon-Sen

At 2015-04-24 08:35:40 +0530, amit.kapil...@gmail.com wrote:
>
> > Just stick a PG_RETURN_NULL() at the end?
> 
> That should also work.

OK, updated patch attached with just that one change.

I'm not doing anything about the rename. I don't know if it's possible
to make patch(1) rename a file at all (it wasn't, but maybe something
has changed recently?).

Note to friendly neighbourhood committers: the patch is expected to
rename pgstattuple--1.2.sql to pgstattuple--1.3.sql, which it does
if applied using git apply.

-- Abhijit
>From f67ddc68ab03fe8237058dcdca1e410ef9e95b5e Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Fri, 26 Dec 2014 12:37:13 +0530
Subject: Add pgstatbloat to pgstattuple

---
 contrib/pgstattuple/Makefile   |   4 +-
 contrib/pgstattuple/pgstatbloat.c  | 389 +
 contrib/pgstattuple/pgstattuple--1.2--1.3.sql  |  18 +
 .../{pgstattuple--1.2.sql => pgstattuple--1.3.sql} |  18 +-
 contrib/pgstattuple/pgstattuple.control|   2 +-
 doc/src/sgml/pgstattuple.sgml  | 135 +++
 6 files changed, 562 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pgstattuple/pgstatbloat.c
 create mode 100644 contrib/pgstattuple/pgstattuple--1.2--1.3.sql
 rename contrib/pgstattuple/{pgstattuple--1.2.sql => pgstattuple--1.3.sql} (73%)

diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile
index 862585c..d7d27a5 100644
--- a/contrib/pgstattuple/Makefile
+++ b/contrib/pgstattuple/Makefile
@@ -1,10 +1,10 @@
 # contrib/pgstattuple/Makefile
 
 MODULE_big	= pgstattuple
-OBJS		= pgstattuple.o pgstatindex.o $(WIN32RES)
+OBJS		= pgstattuple.o pgstatindex.o pgstatbloat.o $(WIN32RES)
 
 EXTENSION = pgstattuple
-DATA = pgstattuple--1.2.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
+DATA = pgstattuple--1.3.sql pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
 PGFILEDESC = "pgstattuple - tuple-level statistics"
 
 REGRESS = pgstattuple
diff --git a/contrib/pgstattuple/pgstatbloat.c b/contrib/pgstattuple/pgstatbloat.c
new file mode 100644
index 000..612e22b
--- /dev/null
+++ b/contrib/pgstattuple/pgstatbloat.c
@@ -0,0 +1,389 @@
+/*
+ * contrib/pgstattuple/pgstatbloat.c
+ *
+ * Abhijit Menon-Sen 
+ * Portions Copyright (c) 2001,2002	Tatsuo Ishii (from pgstattuple)
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "access/visibilitymap.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/multixact.h"
+#include "access/htup_details.h"
+#include "catalog/namespace.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/freespace.h"
+#include "storage/procarray.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/tqual.h"
+#include "commands/vacuum.h"
+
+PG_FUNCTION_INFO_V1(pgstatbloat);
+
+/*
+ * tuple_percent, dead_tuple_percent and free_percent are computable,
+ * so not defined here.
+ */
+typedef struct pgstatbloat_output_type
+{
+	uint64		table_len;
+	uint64		tuple_count;
+	uint64		misc_count;
+	uint64		tuple_len;
+	uint64		dead_tuple_count;
+	uint64		dead_tuple_len;
+	uint64		free_space;
+	uint64		total_pages;
+	uint64		scanned_pages;
+} pgstatbloat_output_type;
+
+static Datum build_output_type(pgstatbloat_output_type *stat,
+			   FunctionCallInfo fcinfo);
+static Datum pgstatbloat_heap(Relation rel, FunctionCallInfo fcinfo);
+
+/*
+ * build a pgstatbloat_output_type tuple
+ */
+static Datum
+build_output_type(pgstatbloat_output_type *stat, FunctionCallInfo fcinfo)
+{
+#define NCOLUMNS	10
+#define NCHARS		32
+
+	HeapTuple	tuple;
+	char	   *values[NCOLUMNS];
+	char		values_buf[NCOLUMNS][NCHARS];
+	int			i;
+	double		tuple_percent;
+	double		dead_tuple_percent;
+	double		free_percent;	/* free/reusable space in % */
+	double		scanned_percent;
+	TupleDesc	tupdesc;
+	AttInMetadata *attinmeta;
+
+	/* Build a tuple descriptor for our result type */
+	if (get_cal

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-04-23 Thread Amit Kapila

On Fri, Apr 24, 2015 at 8:04 AM, Abhijit Menon-Sen 
wrote:
>
> At 2015-04-24 07:22:27 +0530, amit.kapil...@gmail.com wrote:
> >
> > Few minor issues:
> > 1.
> > warning on windows
> >
> > \pgstatbloat.c(193): warning C4715: 'pgstatbloat' : not all control
paths
> > return a value
>
> This is because the function ends with an ereport(ERROR, …). What would
> you suggest here?

For similar code in function (pgstattuple.c->pgstat_relation()), it simply
return 0;

> Just stick a PG_RETURN_NULL() at the end?
>

That should also work.

> > 2.
> > Failed to install pgstattuple--1.3.sql
> >
> > You have to name sql file as pgstattuple--1.3.sql rather
> > than pgstattuple--1.2.sql
>
> How are you applying the patch? It already contains this:
>
> diff --git a/contrib/pgstattuple/pgstattuple--1.2.sql
b/contrib/pgstattuple/pgstattuple--1.3.sql
> similarity index 73%
> rename from contrib/pgstattuple/pgstattuple--1.2.sql
> rename to contrib/pgstattuple/pgstattuple--1.3.sql
> index e5fa2f5..93593ee 100644
> --- a/contrib/pgstattuple/pgstattuple--1.2.sql
> +++ b/contrib/pgstattuple/pgstattuple--1.3.sql
>
> So it should do the right thing with git apply.
>

I was using patch -p1 ...


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

* Bruce Momjian (br...@momjian.us) wrote:
> On Sun, Feb  1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
> > The problem here is that performance degrades exponentially, or
> > worse. Speaking here from experience, we already tested this for a
> > very similar case (table creation, where two oids are consumed from
> > a global sequence when inserting to pg_class). Have a look at
> > http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
> > slides 43-45. We tested there this scenario and shown that table
> > creations per second dropped from 10K to a few per second and then
> > to a few per day. In the graphs you can't even realize there were
> > more tables been created. At around 8K tables from the theoretical
> > limit of 4B oids consumed, the process basically stopped (doing more
> > insertions).
> 
> I had a look at our FAQ about Postgres limitations and I don't see
> anything that needs changing:
> 
>   
> https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F
> 
>   Maximum size for a database? unlimited (32 TB databases exist)
>   Maximum size for a table? 32 TB
>   Maximum size for a row? 400 GB
>   Maximum size for a field? 1 GB
>   Maximum number of rows in a table? unlimited
>   Maximum number of columns in a table? 250-1600 depending on column types
>   Maximum number of indexes on a table? unlimited
> 
> We don't report the maximum number of tables per database, or the
> maximum number of TOAST values.  Agreed?

For my 2c, this limitation is a surprise to users and therefore we
should add documentation to point out that it exists, unless we're going
to actually fix it (which is certainly what I'd prefer to see...).

As for the other discussion on the thread, having a per-table sequence
would be far better as it'd reduce the wrap-around risk quite a bit and,
further, we should be able to provide that without breaking the on-disk
format.  What would be really nice is a way to expand the size of the
key when needed- in other words, instead of wrapping around, if we
actually hit 4B toasted values in a table then stick a flag somewhere
for the next toasted value that says "this value is in the second toast
table/fork" and then go up to 4B on that one, etc.  That allows us to
support more than 4B toasted values and doesn't require searching for
holes in the values assigned.

If we end up with empty toast tables eventually, then allow reusing
them.  Perhaps vacuum can even be used to make a note somewhere saying
"this toast table is now empty and can be reused".

In the end, I'd like to think we can do better here than having a hard
limit at 4B when it comes to how many values over a few KB we can store.
As mentioned, that isn't all that much these days.  I'm not saying that
my proposal or what's been proposed upthread is an answer, but I've
certainly build PG systems which store over 4B rows and it's not hard to
imagine cases where I might have wanted a toasted value for each of
those rows.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-04-23 Thread Abhijit Menon-Sen

At 2015-04-24 07:22:27 +0530, amit.kapil...@gmail.com wrote:
>
> Few minor issues:
> 1.
> warning on windows
> 
> \pgstatbloat.c(193): warning C4715: 'pgstatbloat' : not all control paths
> return a value

This is because the function ends with an ereport(ERROR, …). What would
you suggest here? Just stick a PG_RETURN_NULL() at the end?

> 2.
> Failed to install pgstattuple--1.3.sql
> 
> You have to name sql file as pgstattuple--1.3.sql rather
> than pgstattuple--1.2.sql

How are you applying the patch? It already contains this:

diff --git a/contrib/pgstattuple/pgstattuple--1.2.sql 
b/contrib/pgstattuple/pgstattuple--1.3.sql
similarity index 73%
rename from contrib/pgstattuple/pgstattuple--1.2.sql
rename to contrib/pgstattuple/pgstattuple--1.3.sql
index e5fa2f5..93593ee 100644
--- a/contrib/pgstattuple/pgstattuple--1.2.sql
+++ b/contrib/pgstattuple/pgstattuple--1.3.sql

So it should do the right thing with git apply.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting TAP tests with MSVC and Windows

2015-04-23 Thread Michael Paquier

On Mon, Apr 20, 2015 at 9:01 PM, Michael Paquier
 wrote:
> On Sun, Apr 19, 2015 at 10:01 PM, Michael Paquier
>  wrote:
>> Note as well that this patch uses the following patches fixing
>> independent issues:
>> ...
>
> Attached is v4. I added a switch in config.pl to be consistent with
> ./configure and --enable-tap-tests.

Attached is v5, rebased on HEAD (2c47fe16) after conflicts with
dcae5fac and 54a16df0.
-- 
Michael
From 9c348d985a3da0098be9d9f64a14e936068117a0 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 20 Apr 2015 04:57:37 -0700
Subject: [PATCH 1/2] Add support for TAP tests on Windows

Nodes initialized by the TAP tests use SSPI to securely perform the
tests, and test scripts are patched in a couple of places to support
Windows grammar. In the case of MSVC, tests can be run with this
command:
vcregress tapcheck
---
 doc/src/sgml/install-windows.sgml  |  1 +
 src/Makefile.global.in |  2 +-
 src/bin/initdb/t/001_initdb.pl | 18 --
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   | 79 --
 src/bin/pg_controldata/t/001_pg_controldata.pl |  5 +-
 src/bin/pg_ctl/t/001_start_stop.pl | 14 -
 src/bin/pg_ctl/t/002_status.pl | 12 +++-
 src/bin/pg_rewind/RewindTest.pm| 56 ++
 src/bin/scripts/t/020_createdb.pl  |  3 +
 src/test/perl/TestLib.pm   | 16 --
 src/tools/msvc/Solution.pm |  1 +
 src/tools/msvc/config_default.pl   |  1 +
 src/tools/msvc/vcregress.pl| 48 +++-
 13 files changed, 195 insertions(+), 61 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index d154b44..2047790 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -439,6 +439,7 @@ $ENV{CONFIG}="Debug";
 vcregress modulescheck
 vcregress ecpgcheck
 vcregress isolationcheck
+vcregress tapcheck
 vcregress upgradecheck
 
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 81cae94..f030dac 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -338,7 +338,7 @@ cd $(srcdir) && TESTDIR='$(CURDIR)' PATH="$(bindir):$$PATH" PGPORT='6$(DEF_PGPOR
 endef
 
 define prove_check
-cd $(srcdir) && TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
+cd $(srcdir) && TESTDIR='$(CURDIR)' TESTREGRESS='$(top_builddir)/src/test/regress/pg_regress' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
 endef
 
 else
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index d12be84..0865107 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -1,6 +1,8 @@
 use strict;
 use warnings;
+use Config;
 use TestLib;
+use File::Path qw(rmtree);
 use Test::More tests => 19;
 
 my $tempdir = TestLib::tempdir;
@@ -18,27 +20,31 @@ command_fails([ 'initdb', '-S', "$tempdir/data3" ],
 mkdir "$tempdir/data4" or BAIL_OUT($!);
 command_ok([ 'initdb', "$tempdir/data4" ], 'existing empty data directory');
 
-system_or_bail "rm -rf '$tempdir'/*";
-
+rmtree($tempdir);
+mkdir $tempdir;
 command_ok([ 'initdb', '-X', "$tempdir/pgxlog", "$tempdir/data" ],
 	'separate xlog directory');
 
-system_or_bail "rm -rf '$tempdir'/*";
+rmtree($tempdir);
+mkdir $tempdir;
 command_fails(
 	[ 'initdb', "$tempdir/data", '-X', 'pgxlog' ],
 	'relative xlog directory not allowed');
 
-system_or_bail "rm -rf '$tempdir'/*";
+rmtree($tempdir);
+mkdir $tempdir;
 mkdir "$tempdir/pgxlog";
 command_ok([ 'initdb', '-X', "$tempdir/pgxlog", "$tempdir/data" ],
 	'existing empty xlog directory');
 
-system_or_bail "rm -rf '$tempdir'/*";
+rmtree($tempdir);
+mkdir $tempdir;
 mkdir "$tempdir/pgxlog";
 mkdir "$tempdir/pgxlog/lost+found";
 command_fails([ 'initdb', '-X', "$tempdir/pgxlog", "$tempdir/data" ],
 	'existing nonempty xlog directory');
 
-system_or_bail "rm -rf '$tempdir'/*";
+rmtree($tempdir);
+mkdir $tempdir;
 command_ok([ 'initdb', '-T', 'german', "$tempdir/data" ],
 	'select default dictionary');
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 7e9a776..4cb1b5a 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -1,8 +1,9 @@
 use strict;
 use warnings;
 use Cwd;
+use Config;
 use TestLib;
-use Test::More tests => 35;
+use Test::More tests => ($Config{osname} eq "MSWin32") ? 26 : 35;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -17,10 +18,20 @@ command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of hba');
 
+# Use SSPI on Windows, node has been initialized already accordingly
+# by pg_regress --config-auth.
 open HBA, ">>$tempdir/pgdata/pg_hba.conf";
-print HBA "

Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Noah Misch

On Thu, Apr 23, 2015 at 10:54:45AM -0400, Tom Lane wrote:
> I wrote:
> > Given that anole is the only one reporting this, I'm not sure that we
> > should immediately blame Postgres itself.  I have a vague recollection
> > that we've seen this symptom before and traced it to a bug in some
> > supporting library.  Is anole using any particularly out-of-date versions
> > of openssl, kerberos, etc?
> 
> A bit of digging in the archives suggests that my hindbrain remembered this:
> http://www.postgresql.org/message-id/flat/4f5a8404.8020...@dunslane.net
> 
> The specifics probably don't apply to anole, but the conclusion that
> inconsistent openssl header and library files triggered the bug might.

A library problem is plausible.  anole builds without OpenSSL, and I have no
guess for which remaining library could be at fault.  I could not reproduce
this in an HP-UX IA-64 build configured as follows (no HP compiler available):
  ./configure --enable-debug --enable-cassert --enable-depend 
--without-readline --without-zlib CC='gcc -pthread -mlp64'

Sandeep, I suggest trying a build at commit 36e5247, the last REL9_4_STABLE
commit known good on anole.  If that build fails, you'll know there's an
environmental problem, like a broken dependency library.  If that build
succeeds, please use "git bisect" to find which commit broke things, and
report the commit hash here.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko

On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  wrote:
> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>
>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>
>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>> Agreed, no extra file, and the same write volume as currently.  It would
>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>> reuse some of that code.
>>>
>>
>> Yeah, this approach seems promising. We probably can't reuse code from
>> clog because the usage pattern is different (key for clog is xid, while
>> for visibility/freeze map ctid is used). But visibility map storage
>> layer is pretty simple so it should be easy to extend it for this use.
>
>
> Actually, there may be some bit manipulation functions we could reuse;
> things like efficiently counting how many things in a byte are set. Probably
> doesn't make sense to fully refactor it, but at least CLOG is a good source
> for cut/paste/whack.
>

I agree with adding a bit that indicates corresponding page is
all-frozen into VM, just like CLOG.
I'll change the patch as second version patch.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-04-23 Thread Amit Kapila

On Wed, Apr 22, 2015 at 6:33 PM, Abhijit Menon-Sen 
wrote:
>
> At 2015-04-18 12:28:36 +0530, amit.kapil...@gmail.com wrote:
> >
> > I think you have missed to address the below point:
> >
> > >> 4) prefetch
>
> Updated patch attached, as well as the diff against the earlier version
> just to make it easier to see the exact change I made (which is to copy
> the skip logic from lazy_scan_heap, as you suggested).
>

Few minor issues:
1.
warning on windows

\pgstatbloat.c(193): warning C4715: 'pgstatbloat' : not all control paths
return a value

2.
Failed to install pgstattuple--1.3.sql

You have to name sql file as pgstattuple--1.3.sql rather
than pgstattuple--1.2.sql


Other than above 2 points, patch looks good to me.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] adding more information about process(es) cpu and memory usage

Jeff Janes  writes:
> On Thu, Apr 23, 2015 at 10:17 AM, Heikki Linnakangas 
> wrote:
>> In a nutshell, I don't think PostgreSQL should get involved in that...

> I have often wanted an SQL function which would expose the back-end's
> rusage statistics to the front-end. This could support a \timing feature
> variant to psql that reports more than just wall-clock time.

> I don't use RDS, and use shell access and "top" (and "strace" and "gdb")
> quite enthusiastically, but still it is a pain to correlate any given
> front-end to any given back-end.

> Would such an addition to core be welcome?

The reason nobody's gotten around to that in the last fifteen years is
that per-process rusage isn't actually all that interesting; there's
too much that happens in background daemons, for instance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] adding more information about process(es) cpu and memory usage

2015-04-23 Thread Jeff Janes

On Thu, Apr 23, 2015 at 10:17 AM, Heikki Linnakangas 
wrote:

> On 04/23/2015 08:00 PM, Radovan Jablonovsky wrote:
>
>> During current encounters with amazon web services - RDS, the DBA does not
>> have access to OS/linux shell of underlying instance. That render some
>> postgresql monitoring technique of process CPU and memory usage, not
>> useful. Even if the AWS provide internal tools/programming interface for
>> monitoring, it could be very useful to have this information provided by
>> postgresql system table(s)/view/functions/api. The information about how
>> much postgresql background/process is using CPU (similar to command top
>> result) and memory. it could be something as simple as adding cpu,memory
>> information fields to pg_stat_activity.
>>
>
> You can write an extension to do that. Of course, Amazon won't let you run
> your own C extension either (otherwise you could use that to escape into
> shell), but if you do it well and publish and get it included into standard
> distributions, they just might pick it up. Unless they don't want you to
> see that information. If they don't, then they wouldn't let you use the
> system views either.
>
> In a nutshell, I don't think PostgreSQL should get involved in that...
>
>
I have often wanted an SQL function which would expose the back-end's
rusage statistics to the front-end. This could support a \timing feature
variant to psql that reports more than just wall-clock time.

I don't use RDS, and use shell access and "top" (and "strace" and "gdb")
quite enthusiastically, but still it is a pain to correlate any given
front-end to any given back-end.

Would such an addition to core be welcome?

Cheers,

Jeff

Re: [HACKERS] [committers] pgsql: RLS fixes, new hooks, and new test module

2015-04-23 Thread Christian Ullrich


* Stephen Frost wrote:


RLS fixes, new hooks, and new test module


The buildfarm says that with -DCLOBBER_CACHE_ALWAYS, the RLS violations 
get blamed on the wrong tables. Mostly, they are catalogs (I have seen 
pg_opclass, pg_am, and pg_amproc), but some also come up with binary 
garbage instead, e.g.


- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-04-23%2000%3A00%3A12
- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbird&dt=2015-04-23%2004%3A20%3A00
- 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=tick&dt=2015-04-22%2019%3A56%3A53


--
Christian




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 12:45 PM, Peter Geoghegan  wrote:
>>> * We need to figure out the tuple lock strength details. I think this
>>> is doable, but it is the greatest challenge to committing ON CONFLICT
>>> UPDATE at this point. Andres feels that we should require no greater
>>> lock strength than an equivalent UPDATE. I suggest we get to this
>>> after committing the IGNORE variant. We probably need to discuss this
>>> some more.
>>
>> I want to see a clear way forward before we commit parts. It doesn't
>> have to be completely iron-clad, but the way forward should be pretty
>> clear. What's the problem you're seeing right now making this work? A
>> couple days on jabber you seemed to see a way doing this?
>
> I was really just identifying it as the biggest problem the patch now
> faces, and I want to face those issues down ASAP. Of course, that's
> good, because as you say it is a small problem in an absolute sense.
> The second most significant open item is rebasing on top of the recent
> RLS changes, IMV.


OK, I pushed out code to do this a while ago. I must admit that I
probably significantly over-estimated the difficulty of doing this.

With Heikki's problematic commit reverted this works fine  (I have not
actually reverted it myself...I'll leave it to Heikki to clean that up
when he gets around to it). The usually jjanes_upsert stress tests
show up no problems.

Curious about what you think about my proposal to go back to putting
the inference specification WHERE clause within the parenthesis, since
this solves several problems, including clarifying to users that the
predicate is part of the inference clause.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Michael Paquier

On Thu, Apr 23, 2015 at 10:42 PM, Robert Haas wrote:
> On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs  wrote:
>> We only need a freeze/backup map for larger relations. So if we map 1000
>> blocks per map page, we skip having a map at all when size < 1000.
>
> Agreed.  We might also want to map multiple blocks per map slot - e.g.
> one slot per 32 blocks.  That would keep the map quite small even for
> very large relations, and would not compromise efficiency that much
> since reading 256kB sequentially probably takes only a little longer
> than reading 8kB.
>
> I think the idea of integrating the freeze map into the VM fork is
> also worth considering.  Then, the incremental backup map could be
> optional; if you don't want incremental backup, you can shut it off
> and have less overhead.

When I read that I think about something configurable at
relation-level.There are cases where you may want to have more
granularity of this information at block level by having the VM slots
to track less blocks than 32, and vice-versa.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 1:08 PM, Heikki Linnakangas  wrote:
> That said, I'd actually like to see a separate heap_super_delete() function
> for that, rather than piggybacking on heap_delete(). It's a quite different
> operation. There'll be some duplication, but seems better than a maze of
> if-else's in heap_delete.

I found a bug that seems to be down to commit
e3144183562d08e347f832f0b29daefe8bac617b on Github:

"""
commit e3144183562d08e347f832f0b29daefe8bac617b
Author: Heikki Linnakangas 
Date:   Thu Apr 23 18:38:11 2015 +0300

Minor cleanup of check_exclusion_or_unique_constraint.

To improve readability.

"""

At least, that's what a "git bisect" session showed.

Basically, at high client counts (128 clients only), after a few
iterations of the B-Tree test, the latest version of jjanes_upsert
would see this happen (error originates fromExecOnConflictUpdate(),
with custom instrumentation added to identify invisible tuple):

2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system was shut
down at 2015-04-23 15:10:09 PDT
2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system is ready
to accept connections
2015-04-23 15:12:55 PDT [ txid:  472418 ]: ERROR:  attempted to lock
invisible tuple (140,39)
2015-04-23 15:12:55 PDT [ txid:  472418 ]: STATEMENT:  insert into
upsert_race_test (index, filler, count) values ('3896',
random_characters(), '1') on conflict (index)
 update set count=TARGET.count + EXCLUDED.count, filler =
EXCLUDED.filler
 where TARGET.index = EXCLUDED.index
 returning count

This seemed to only show up when fsync = on on my laptop, whereas in
the past some race conditions that I've found were easier to recreate
with fsync = off.

I attach some notes from my bisect/debugging session, including
pg_xlogdump output (from a psql session - I like to manipulate
pg_xlogdump output using SQL). That's probably not all that
interesting, but I attach it all the same. Hopefully this is something
that Heikki can easily debug, since the commit implicated here only
concerned readability. A simple oversight? If you want, Heikki, I can
try and debug it, but it seems like something we're better off having
you sign off on if possible.

Thanks
-- 
Peter Geoghegan
error for dump:

pg@hamster:~/postgresql/src/backend/executor$ postgres   
(git)-[insert_conflict_ignore]
2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system was shut down at 
2015-04-23 15:10:09 PDT
2015-04-23 15:10:48 PDT [ txid:  0 ]: LOG:  database system is ready to accept 
connections
2015-04-23 15:12:55 PDT [ txid:  472418 ]: ERROR:  attempted to lock invisible 
tuple (140,39)
2015-04-23 15:12:55 PDT [ txid:  472418 ]: STATEMENT:  insert into 
upsert_race_test (index, filler, count) values ('3896', random_characters(), 
'1') on conflict (index)
  update set count=TARGET.count + EXCLUDED.count, filler = 
EXCLUDED.filler
  where TARGET.index = EXCLUDED.index
  returning count

This was after 3 full iterations of jjanes_upsert, B-Tree only with 128 clients 
only (fsync = on, the default). Settings:

max_connections = 150
shared_buffers = 6GB
work_mem = 5GB
maintenance_work_mem = 5GB
autovacuum = off
max_wal_size=10GB
wal_level = logical
max_replication_slots = 4
#log_min_duration_statement=0
#log_error_verbosity=verbose
log_line_prefix='%t [ %x ]: '
wal_keep_segments=500
max_prepared_transactions=5
shared_preload_libraries='pg_stat_statements'

pg_xlogdump output up to and including the above xact's abort:

(18545) /postgres=# select * from xlogdump_records where r_lsn <= '0/82F7F10' 
order by r_lsn desc limit 50;
rmgr | len_rec | len_tot |   tx   |   r_lsn   | prev_lsn  | 
descr   
   |   relation
-+-+-++---+---++---
 Transaction |   8 |  34 | 472418 | 0/82F7F10 | 0/82F7EE0 | ABORT 
2015-04-23 15:12:55.784621 PDT  
 | [null]
 Heap|   2 |  48 | 472420 | 0/82F7EE0 | 0/82F7EA0 | 
HEAP_CONFIRM off 144 blkref #0: rel 1663/12488/16427 blk 33 
   | upsert_race_test
 Btree   |   2 |  64 | 472420 | 0/82F7EA0 | 0/82F7E48 | INSERT_LEAF 
off 313 blkref #0: rel 1663/12488/16433 blk 22  
   | upsert_race_test_pkey
 Heap|   3 |  83 | 472420 | 0/82F7E48 | 0/82F7E18 | INSERT off 
144 blkref #0: rel 1663/12488/16427 blk 33  
| upsert_race_test
 Transaction |  20 |  46 | 472419 | 0/82F7E18 | 0/82F7DD8 | COMMIT 
2015-04-23 15:12:55.784507 PDT

Re: [HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Andrew Dunstan



On 04/23/2015 04:04 PM, Andrew Gierth wrote:

"Joshua" == Joshua D Drake  writes:

  Joshua> The database dumps fine as long as we don't dump large
  Joshua> objects. However, if we try to dump the large objects, FreeBSD
  Joshua> will kill pg_dump as it will consume all free memory and
  Joshua> swap. With Andrew's help we were able to determine the
  Joshua> following:

  Joshua> There is a memory cost of about 160 bytes per largeobject.

I may have the exact number here wrong, it was just a quick eyeball of
the data structures (and depends on malloc overheads anyway).

The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having >100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.




I'm surprised this hasn't come up before. I have a client that I 
persuaded to convert all their LOs to bytea fields because of problems 
with pg_dump handling millions of LOs, and kept them on an older 
postgres version until they made that change.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

2015-04-23 Thread David Steele

On 4/23/15 4:30 PM, Andres Freund wrote:
> On 2015-04-23 16:26:09 -0400, Robert Haas wrote:
>> But pg_upgrade automates all that, so you can't use pg_upgrade in that
>> case.  If we add a GUC as I suggested, you can still use pg_upgrade.
> 
> But we also have to live with data directories being in a shit state
> forever onward. We won't really be able to remove the option
> realistically.
> 
> It's not that hard to just move the tablespace out of the data directory
> while the server. As long as you move it on the same partition, it's
> even fast.

I agree.  It wouldn't be that hard to do a bit of directory manipulation
before upgrading - and that's only for the people who have put
tablespaces in $PGDATA.  I've never seen it before, but I have no doubt
that it happens.  I can see how it might make a weird sort of sense
depending on the level of experience.

-- 
- David Steele
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On 2015-04-23 16:26:09 -0400, Robert Haas wrote:
> But pg_upgrade automates all that, so you can't use pg_upgrade in that
> case.  If we add a GUC as I suggested, you can still use pg_upgrade.

But we also have to live with data directories being in a shit state
forever onward. We won't really be able to remove the option
realistically.

It's not that hard to just move the tablespace out of the data directory
while the server. As long as you move it on the same partition, it's
even fast.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On Thu, Apr 23, 2015 at 3:57 PM, Andres Freund  wrote:
> On 2015-04-23 15:46:20 -0400, Robert Haas wrote:
>> Well, we've made backward-incompatible changes before.  Not to this
>> specific thing, but in general.  I don't think there's anything
>> preventing us from doing so here, except that we don't want to annoy
>> too many users.
>
> I think the number of users that have done this, and haven't yet
> (knowing or unknowningly) been bitten by it is pretty low. In that
> scenario it seems much better to break compatibility given that it's
> pretty easy to fix during restore (just precreate the tablespace).  It's
> not something you have to retest a whole application for.
>
> If you want to avoid that one error you can still do pg_dumpall
> --globals, edit and run that script, and only then restore the the
> actual databases.

But pg_upgrade automates all that, so you can't use pg_upgrade in that
case.  If we add a GUC as I suggested, you can still use pg_upgrade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On 2015-04-23 15:17:55 -0500, Jim Nasby wrote:
> Yes, but only after creating a brand new cluster from scratch, which would
> then disallow them from putting tablespaces in $PGDATA.

pg_dumpall output includes tablespaces.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful


On 4/23/15 11:01 AM, Andres Freund wrote:

On April 23, 2015 6:12:05 PM GMT+03:00, Jim Nasby  
wrote:

On 4/22/15 9:41 PM, Bruce Momjian wrote:

The case this doesn't catch is referencing a
symbolic link that points to the same directory.  We can't make it an
error so people can use pg_upgrade these setups.


Couldn't we make it an ERROR unless IsBinaryUpgrade?


People still upgrade without pg upgrade.


Yes, but only after creating a brand new cluster from scratch, which 
would then disallow them from putting tablespaces in $PGDATA.


Or are you saying people do binary upgrades without pg_upgrade? I don't 
think we have any obligation to support that...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On 2015-04-23 23:08:34 +0300, Heikki Linnakangas wrote:
> The heapam API is not that stable, we've added arguments to those functions
> every once in a while, and I don't recall any complaints.

I heard some, but not too many, that's true. I know that I've written
code that'd be broken/needed even more ifdefs ;)

> That said, I'd actually like to see a separate heap_super_delete() function
> for that, rather than piggybacking on heap_delete(). It's a quite different
> operation. There'll be some duplication, but seems better than a maze of
> if-else's in heap_delete.

+many. I've requested that changes a couple times now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0


On 04/23/2015 10:53 PM, Andres Freund wrote:

On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:

On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund  wrote:

I think you misread my statement: I'm saying we don't need the new
argument anymore, even if we still do the super-deletion in
heap_delete(). Now that the speculative insertion will not be visible
(as in seen on a tuple they could delete) to other backends we can just
do the super deletion if we see that the tuple is a promise one.


I disagree. I think it's appropriate that the one and only "super"
heap_delete() caller make a point of indicating that that's what it's
doing. The cost is only that the two other such callers must say that
they're not doing it. Super deletion is a special thing, that logical
decoding knows all about for example, and I think it's appropriate
that callers ask explicitly.


Unconvinced. Not breaking an API has its worth.


The heapam API is not that stable, we've added arguments to those 
functions every once in a while, and I don't recall any complaints. So 
I'm with Peter that super-deletion should be requested explicitly by the 
caller.


That said, I'd actually like to see a separate heap_super_delete() 
function for that, rather than piggybacking on heap_delete(). It's a 
quite different operation. There'll be some duplication, but seems 
better than a maze of if-else's in heap_delete.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 12:53 PM, Andres Freund  wrote:
> Unconvinced. Not breaking an API has its worth.

Yeah, and I acknowledge that - but it's not something that it's
appropriate to encapsulate IMV.

Let's just leave it to Heikki...I'd say he has the deciding vote,
especially as the reviewer that is more in charge of the executor
stuff than anything else.

>> The second most significant open item is rebasing on top of the recent
>> RLS changes, IMV.
>
> Not sure I agree. That part seems pretty mechanical to me.

Hopefully so.

> * The docs aren't suitable for endusers. I think this will take a fair
>   amount of work.

It's hard to explain why something like the collation field in the
inference specification matters to users...because it's only supported
at all due to forwards compatibility concerns. It's hard to document
certain things like that in an accessible way. In general, much of the
effort of the last year was making the feature play nice, and
considering a bunch of usability edge cases that are unlikely to come
up, but still must be documented.

> * We're not yet sure about the syntax yet. In addition to the keyword
>   issue I'm unconvinced about having two WHERE clauses in the same
>   statement. I think that'll end up confusing users a fair bit. Might
>   still be the best way forward.

My previous approach to allowing an index predicate did at least
clearly show that the predicate belonged to the inference
specification, since it appeared between parenthesis. Maybe we should
do that after all? I don't think it much matters if the inference
specification is not identical to CREATE INDEX. I don't want to give
up inference of partial indexes, and I don't want to give up allowing
the UPDATE to have a limited WHERE clause, and we can't just spell
those differently here. So what alternative does that leave?

Anyone else have an opinion?

> * The executor integration still isn't pretty, although Heikki is making
>   strides there

That's just clean-up, though. I'm not worried about the risk of Heikki
not succeeding at that.

> * I don't think anybody seriously has looked at the planner side yet.

Good point. That certainly needs to be looked at (and I include the
rewriter parts in that). It's really not that much code, but (ideally)
a subject matter expert should look into the whole ExcludedExpr dance
in particular.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Andrew Gierth

> "Joshua" == Joshua D Drake  writes:

 Joshua> The database dumps fine as long as we don't dump large
 Joshua> objects. However, if we try to dump the large objects, FreeBSD
 Joshua> will kill pg_dump as it will consume all free memory and
 Joshua> swap. With Andrew's help we were able to determine the
 Joshua> following:

 Joshua> There is a memory cost of about 160 bytes per largeobject.

I may have the exact number here wrong, it was just a quick eyeball of
the data structures (and depends on malloc overheads anyway).

The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having >100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote:
> On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund  wrote:
> > I think you misread my statement: I'm saying we don't need the new
> > argument anymore, even if we still do the super-deletion in
> > heap_delete(). Now that the speculative insertion will not be visible
> > (as in seen on a tuple they could delete) to other backends we can just
> > do the super deletion if we see that the tuple is a promise one.
> 
> I disagree. I think it's appropriate that the one and only "super"
> heap_delete() caller make a point of indicating that that's what it's
> doing. The cost is only that the two other such callers must say that
> they're not doing it. Super deletion is a special thing, that logical
> decoding knows all about for example, and I think it's appropriate
> that callers ask explicitly.

Unconvinced. Not breaking an API has its worth.

> The second most significant open item is rebasing on top of the recent
> RLS changes, IMV.

Not sure I agree. That part seems pretty mechanical to me.

* The docs aren't suitable for endusers. I think this will take a fair
  amount of work.

* We're not yet sure about the syntax yet. In addition to the keyword
  issue I'm unconvinced about having two WHERE clauses in the same
  statement. I think that'll end up confusing users a fair bit. Might
  still be the best way forward.
* The executor integration still isn't pretty, although Heikki is making
  strides there
* I don't think anybody seriously has looked at the planner side yet.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On 2015-04-23 15:46:20 -0400, Robert Haas wrote:
> Well, we've made backward-incompatible changes before.  Not to this
> specific thing, but in general.  I don't think there's anything
> preventing us from doing so here, except that we don't want to annoy
> too many users.

I think the number of users that have done this, and haven't yet
(knowing or unknowningly) been bitten by it is pretty low. In that
scenario it seems much better to break compatibility given that it's
pretty easy to fix during restore (just precreate the tablespace).  It's
not something you have to retest a whole application for.

If you want to avoid that one error you can still do pg_dumpall
--globals, edit and run that script, and only then restore the the
actual databases.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type


On 4/23/15 5:07 AM, Kyotaro HORIGUCHI wrote:

This is because parsing of UNION immediately converts constants
of unknown type in the UNION's both arms to text so the top level
select won't be bothered by this problem. But the problematic
query doesn't have appropriate timing to do that until the
function I patched.


FWIW, I think that's more accidental than anything.

I'm no expert in our casting and type handling code but I spent a lot of 
time stuck in it while working on the variant type, and it seems very 
scattered. There's stuff in the actual casting code, there's some stuff 
in other parts of parse/plan, there's stuff in individual types (array 
and record at least).


Some stuff is handled by casting; some stuff is handled by mangling the 
parse tree.


Something else I noticed is we're not consistent with handling typmod 
either. I don't remember the exact example I found, but there's cases 
involving casting of constants where we ignore it (I don't think it was 
as simple as SELECT 1::int::variant(...), but it was something like that).


I don't know how much of this is just historical and how much is 
intentional, but it'd be nice if we could consolidate it more.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On Thu, Apr 23, 2015 at 11:00 AM, Bruce Momjian  wrote:
>> > I have developed the attached patch to warn about creating tablespaces
>> > inside the data directory.  The case this doesn't catch is referencing a
>> > symbolic link that points to the same directory.  We can't make it an
>> > error so people can use pg_upgrade these setups.  This would be for 9.5
>> > only.
>>
>> I think this is a good thing to do, but I sure wish we could go
>> further and block it completely.  That may require more thought than
>> we have time to put in at this stage of the release cycle, though, so
>> +1 for doing at least this much.
>
> OK, good.  Thinking to 9.6, I am not sure how we could throw an error
> because we have allowed this in the past and pg_dump is going to be
> restored with a raw SQL CREATE TABLESPACE command.
>
> We have had this type of problem before, but never resolved it.  We
> almost need pg_dump to set a GUC variable telling the backend it is
> restoring a dump and issue a warning, but throw an error if the same
> command was issued outside of a pg_dump restore.  FYI, pg_upgrade
> already throws a warning related to the non-creation of a delete script.

Well, we've made backward-incompatible changes before.  Not to this
specific thing, but in general.  I don't think there's anything
preventing us from doing so here, except that we don't want to annoy
too many users.

I don't think the right solution is to add a GUC so that pg_dump
ignores this, and otherwise deny it.  It's bad if you do it as part of
a restore, and it's bad if you do it some other time, too.

What I'd recommend is that we add a GUC stupid_tablespaces=off.  If
you have done this in the past, and you want to upgrade (whether via
pg_dump or pg_upgrade) to a new release, you'll have to configure the
new cluster for stupid_tablespaces=on.  If you don't, you'll get an
error.  If you do, you'll get a warning.  That way, people can still
upgrade, but they have to set the GUC to make it work, so they'll be
clearly aware that they're doing something that is not recommended.

(Of course we might want to call the GUC something like other than
stupid_tablespaces, like allow_tablespaces_in_data_directory, but you
get the idea.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund  wrote:
> I think you misread my statement: I'm saying we don't need the new
> argument anymore, even if we still do the super-deletion in
> heap_delete(). Now that the speculative insertion will not be visible
> (as in seen on a tuple they could delete) to other backends we can just
> do the super deletion if we see that the tuple is a promise one.

I disagree. I think it's appropriate that the one and only "super"
heap_delete() caller make a point of indicating that that's what it's
doing. The cost is only that the two other such callers must say that
they're not doing it. Super deletion is a special thing, that logical
decoding knows all about for example, and I think it's appropriate
that callers ask explicitly.

>> > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a
>> >   link in the insert. Given that that's the pervasive term that doesn't
>> >   seem absurd.
>>
>> Not sure what you mean. Where would the link appear?
>
> The index, i.e. it'd just be another indexterm. It seems good to give
> people a link.

Oh, okay. Sure. Done on Github.

>> * We need to figure out the tuple lock strength details. I think this
>> is doable, but it is the greatest challenge to committing ON CONFLICT
>> UPDATE at this point. Andres feels that we should require no greater
>> lock strength than an equivalent UPDATE. I suggest we get to this
>> after committing the IGNORE variant. We probably need to discuss this
>> some more.
>
> I want to see a clear way forward before we commit parts. It doesn't
> have to be completely iron-clad, but the way forward should be pretty
> clear. What's the problem you're seeing right now making this work? A
> couple days on jabber you seemed to see a way doing this?

I was really just identifying it as the biggest problem the patch now
faces, and I want to face those issues down ASAP. Of course, that's
good, because as you say it is a small problem in an absolute sense.
The second most significant open item is rebasing on top of the recent
RLS changes, IMV.

I can see yourself and Heikki continuing to change small stylistic
things, which I expect to continue for a little while. That's fine,
but naturally I want to be aggressive about closing off these open
issues that are not just general clean-up, but have some small level
of risk of becoming more significant blockers.

> The reason I think this has to use the appropriate lock level is that
> it'll otherwise re-introduce the deadlocks that fk locks resolved. Given
> how much pain we endured to get fk locks, that seems like a bad deal.

Right.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 2015-04-23 15:40:36 -0400, Robert Haas wrote:
> The issue is that you have to vacuum a table frequently enough to
> avoid accumulating bloat.  The frequency with which you need to vacuum
> varies depending on the size of the table and how frequently it's
> updated.  However, a large, heavily-updated table can take long enough
> to vacuum that, by the time you get done, it's already overdue to be
> vacuumed again.  That's a problem.

Especially because the indexes are scanned fully. In many cases I've
observed the heap scans themselves being fast; but scanning hundreds
(yes) of gigabytes of indexes taking ages.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Thu, Apr 23, 2015 at 10:44 AM, Jim Nasby  wrote:
> On 4/23/15 8:25 AM, Robert Haas wrote:
>> Some users are partitioning tables just so that each
>> partition can be autovac'd separately.  That really shouldn't be
>> required.
>
> Are they doing this for improved heap scan performance? Index scan
> performance? If the table wasn't partitioned, would they need more than one
> pass through the indexes due to exhausting maintenance_work_mem?

I don't know of anyone with a properly-configured system who needs
more than one pass through the indexes due to exhausting
maintenance_work_mem.  The issue is that you have to vacuum a table
frequently enough to avoid accumulating bloat.  The frequency with
which you need to vacuum varies depending on the size of the table and
how frequently it's updated.  However, a large, heavily-updated table
can take long enough to vacuum that, by the time you get done, it's
already overdue to be vacuumed again.  That's a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing tuple overhead


On 4/23/15 11:45 AM, Petr Jelinek wrote:

On 23/04/15 18:24, Andres Freund wrote:

Whether that's feasible complexity wise is debatable, but it's certainly
possible.


I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...



The logical ordering patch + auto optimizations of column layout on
table creation/rewrite might help partially there.

But what seems to be clear is that we need more in depth analysis of
what really contributes most to the tuple size in various use-cases and
then we can debate what we can do about it.


Also, what Robert posted was that while we started at something like 
15%-30% larger, we ended the test at 80% larger. That makes me think 
that the bigger win is not in reducing tuple size but tackling bloat.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump: largeobject behavior issues (possible bug)

2015-04-23 Thread Joshua D. Drake



Hello,

I have been working a problem with Andrew Gierth (sp?) in regards to 
pg_dump. Here is the basic breakdown:


FreeBSD 10.1
PostgreSQL 9.3.6
64GB ~ memory
500GB database
228G of largeobjects (106M objects)

The database dumps fine as long as we don't dump large objects. However, 
if we try to dump the large objects, FreeBSD will kill pg_dump as it 
will consume all free memory and swap. With Andrew's help we were able 
to determine the following:


There is a memory cost of about 160 bytes per largeobject. Based on the 
number of largeobjects we have that would be about 16GB of memory. Also 
when pg_dump is reading in the largobject list there is a point where 
pg_dump has a PGresult containing the entire contents of 
pg_largeobject_metadata and a malloc of an array where it is going to 
copy the data to. That could easily get above the 40G thus causeFreeBSD 
to kill the process.


tl;dr

The memory issue comes down to the fact that in the prep stage, pg_dump 
creates a TOC entry for every individual large object.


It seems that pg_dump should be much more efficient about dumping these
objects.

Sincerely,

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving ExecInsertIndexTuples and friends to new file

On Thu, Apr 23, 2015 at 12:05 PM, Heikki Linnakangas  wrote:
> While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel
> that ExecInsertIndexTuples() and friends would deserve a file of their own,
> and not be buried in the middle of execUtils.c. I propose that we split
> execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices()
> ExecInsertIndexTuples() and related functions into a new file called
> executor/execIndexing.c.

That split makes a lot of sense to me.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improving vacuum/VM/etc

I mentioned this idea in the "other"[1] vacuum thread [2], but I think
it got lost.

Kevin Grittner pointed out that there's a potentially huge number of
writes we incur over the life of a tuple [3]:

(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

He mentioned that a lot of these writes could be combined if they
happened close enough together. We can further add an all-visible state
in at 3.5.

Instead of simply adding all-frozen information to the VM we could
instead store 4 different page states and potentially improve a lot of
different cleanup woes at one time.

Unfortunately, the states I came up with using existing semantics don't
look hugely useful[4], but if we take Robert's idea and make all-visible
mean all-frozen, we can do much better:

0: Newly inserted tuples
Tracking this state allows us to aggressively set hint bits.

1: Newly deleted
There are tuples that have been deleted but not pruned. There may also
be newly inserted tuples that need hinting (state 0).

Similar to state 0, we'd want to be fairly aggressive with these pages,
because as soon as the deleting XID is committed and older than all
snapshots we can prune. Because we can prune without hitting indexes,
this is still a fairly cheap operation, though not as cheap as 0.

2: Fully hinted, not frozen
This is the really painful state to clean up, because we have to deal
with indexes. We must enter this state after being in 1.

3: All-visible-frozen
Every tuple on the page is visible and frozen. Pages in this state need
no maintenance at all. We might be able to enter this state directly
from state 0.

BENEFITS
This tracking should help at least 3 problems: the need to set hint bits
after insert, SELECT queries doing pruning (Simon's recent complaint),
and needing to scan an entire table for freezing.

The improvement in hinting and pruning is based on the idea that
normally there would not be a lot of pages in state 0 or 1, and pages
that were in those states are very likely to still be in disk cache (if
not shared buffers). That means we can have a background process (or 2)
that is very aggressive at targeting pages in these states.

Not needing to scan everything that's frozen is thanks to state 3. I
think it's OK (at least for now) if only vacuum puts pages into this
state, which means it can actually freeze the tuples when it does it
(thanks to 37484ad we won't lose forensic data doing this). That means
there's no extra work necessary by a foreground process that's dirtying
a page.

Because of 37484ad, I think as part of this we should also deprecate
vacuum_freeze_min_age, or at least change it's behavior. AFAIK the only
objection to aggressive freezing was loss of forensic data, and that's
gone now. So vacuum (and presumably the bg process(es) than handle state
0 and 1) should freeze tuples if it would allow the whole page to be
frozen. Possibly it should just do it any time it's dirtying the page.
(We could actually do this right now; it would let us eliminate the GUC,
but I'm not sure there'd be other benefit without the rest of this.)

DOWNSIDES
This does mean doubling the size of the VM. It would still be 32,000
times smaller than the heap with 8k pages (and 128,000 times smaller
with the common warehouse 32k page size), so I suspect this is a
non-issue, but it's worth mentioning. It might have some effect on a
almost entirely read-only system; but I suspect in most other cases the
other benefits will outweigh this.

This approach still does nothing to help the index related activity in
vacuum. My gut says state 2 should be further split; but I'm not sure
why. Perhaps if we had another state we could do something more
intelligent with index cleanup...

This might put a lot more read pressure on the VMs. We might want some
way to summarize per-table VMs (or ranges of VMs) so that we're not
constantly scanning them.

We'd still have to freeze, as opposed to what might be possible with
XID-LSN. OTOH, most of the changes to do this would be limited to
current VM code and callers. I don't think vacuum itself would need a
lot of changes, and I hope the BG code for state 0/1 would be that
complicated; it shouldn't need the complexity of autovacuum or vacuum.
So this should be much lower risk than something like XID-LSN.

So... what am I missing? :)

[1]
http://postgresql.org/message-id/flat/20140912135413.gk4...@eldon.alvh.no-ip.org
[2]
http://postgresql.org/message-id/flat/2011829201.2201963.1429726992897.javamail.ya...@mail.yahoo.com
[3]
http://postgresql.org/message-id/771351984.2266772.1429728671811.javamail.ya...@mail.yahoo.com

[4]
1a: All-visible
What we have today. Page still needs to be visited for freeze, but has
no newly inser

[HACKERS] Moving ExecInsertIndexTuples and friends to new file

While looking at Peter's INSERT ... ON CONFLICT patch, I started to feel 
that ExecInsertIndexTuples() and friends would deserve a file of their 
own, and not be buried in the middle of execUtils.c. I propose that we 
split execUtils.c into two, moving ExecOpenIndices(), ExecCloseIndices() 
ExecInsertIndexTuples() and related functions into a new file called 
executor/execIndexing.c.


Moving functions makes backpatching harder, so it's not something to be 
done lightly, but I think it would be worth it in this case. There have 
been few changes to those functions in years, so I doubt they'll need 
much backpatching in the near future either.


For comparison, this is what the file sizes of executor/exec*.c will 
look like after the split:


-rw-r--r-- 1 heikki heikki  14710 Apr 22 09:07 execAmi.c
-rw-r--r-- 1 heikki heikki   9711 Apr 22 09:07 execCurrent.c
-rw-r--r-- 1 heikki heikki  16659 Apr 22 09:07 execGrouping.c
-rw-r--r-- 1 heikki heikki  16023 Apr 23 21:57 execIndexing.c
-rw-r--r-- 1 heikki heikki   8276 Apr 22 09:07 execJunk.c
-rw-r--r-- 1 heikki heikki  80102 Apr 22 09:07 execMain.c
-rw-r--r-- 1 heikki heikki  18694 Apr 22 09:07 execProcnode.c
-rw-r--r-- 1 heikki heikki 160700 Apr 22 09:07 execQual.c
-rw-r--r-- 1 heikki heikki   9957 Apr 22 09:07 execScan.c
-rw-r--r-- 1 heikki heikki  37796 Apr 22 09:07 execTuples.c
-rw-r--r-- 1 heikki heikki  28004 Apr 23 21:50 execUtils.c

Any objections? I'm planning to do this as a separate commit, before the 
INSERT ... ON CONFLICT patch goes in.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Alvaro Herrera

Thanks for posting this.

Joshua D. Drake wrote:

> First of all I should note that I discussed the approach mentioned above to
> pgsql-hackers and got a very interesting comment from Tom Lane that adding
> sorting info to Var and TargetEntry nodes was not a very good idea because
> it'd break stored rules whenever a table column changed. So I went back and
> studied that code and noticed that it was really the change in
> RangeTableEntry that's doing the good magic; those other changes are
> fortunately not necessary. (Though there were a necessary vehicle for me to
> understand how the other stuff works.)

So in the logical columns thread I was saying that this change of eref
didn't work either; it seems that most of the time (if not always) the
list should continue to be in attnum order, and the
logical-ordering-info data should be obtained from the tuple descriptor
and whatever needs to be sorted should be sorted afterwards, i.e. in
setrefs.c, using data previously stored in RelOptInfo.  I tried doing
that and ran into some other mess elsewhere.

> I've been continuing to study the backend code looking for uses of attribute
> lists that assume a single ordering. As I get more into it, more complex
> cases appear. The number of cases is fortunately bounded, though.

> However, some other places are not like that. I have "fixed" SQL functions
> two times now, and I just found out that the second fix (which I believed to
> be "mostly correct") was to be the final one, but I found out just now that
> it's not, and the proper fix is going to require something a bit more
> low-level (namely, a projection step that reorders columns correctly after
> the fact). Fortunately, I believe that this extra projection step is going
> to fix a lot of other cases too, which I originally had no idea how to
> attack. Moreover, understanding that bit means I also figured out what Tom
> Lane meant on the second half of his response to my original pgsql-hackers
> comment. So I think we're good on that front.

I had forgotten my intention to rework things using projection.

In any case, I have posted a lot of patches now which others could
study, if there's interest.  I would sure like to see this split, and
there are multiple benefits such as reduction of padding space.  I'm
sure there's a nonempty set of guys brighter than me that could figure
it out in not that much time.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread David G. Johnston

On Thursday, April 23, 2015, Jeff Davis  wrote:

> On Thu, Apr 23, 2015 at 1:49 AM, David G. Johnston
> > wrote:
> > Reading and writing all this I'm convinced you have gotten the idea in
> your
> > mind an expectation of equivalency and consistency where there really is
> > little or none from an overall design perspective.  And none insofar as
> > would merit trying to force the two example queries you provide to behave
> > identically.  There are a number of things about SQL that one either
> simply
> > lives with or goes through mind contortions to understand the, possibly
> > imperfect, reasoning behind.  This is one of those things: and while it's
> > been fun to do those contortions in the end I am only a little bit better
> > off than when I simply accepted the fact the unknown and untyped were
> > similar but different (even if I hadn't considered giving them different
> > names).
>
> You make some good points, but I disagree for two reasons:
>
> 1. This is a postgres issue, not a general SQL issue, so we can't
> simply say "SQL is weird" (like we can with a lot of the strange NULL
> semantics).


But it is ok to admit that we are weird when we are.  Though yes, we are
being inefficient here even with the current behavior taken as desired.

2. Postgres has determined that the unknown column reference "b" in
> the query "SELECT a=b FROM (SELECT ''::text, '  ') x(a,b)" can and
> should be coerced to text.


So the error should be "operator does not exist: text = unknown"...instead
it tries and fails to cast the unknown to text.

Or allow for the coercion to proceed.

There would be fewer obvious errors, and simple cases that fail would begin
to work sensibly, but it still feels like implicit casting from text which
was recently largely removed from the system for cause.  Albeit to the
disgruntlement of some and accusations of being draconian by others.


> So postgres has already made that decision.
> It's just that when it tries to coerce it, it fails. Even if you
> believe that literals and column references are not equivalent, I
> don't see any justification at all for this behavior -- postgres
> should not decide to coerce it in the first place if it's going to
> fail.
>
>
This is true - but obviously one solution is to not attempt it in the first
place.

David J.

Re: [HACKERS] [BUGS] Failure to coerce unknown type to specific type

2015-04-23 Thread Jeff Davis

On Thu, Apr 23, 2015 at 1:49 AM, David G. Johnston
 wrote:
> Reading and writing all this I'm convinced you have gotten the idea in your
> mind an expectation of equivalency and consistency where there really is
> little or none from an overall design perspective.  And none insofar as
> would merit trying to force the two example queries you provide to behave
> identically.  There are a number of things about SQL that one either simply
> lives with or goes through mind contortions to understand the, possibly
> imperfect, reasoning behind.  This is one of those things: and while it's
> been fun to do those contortions in the end I am only a little bit better
> off than when I simply accepted the fact the unknown and untyped were
> similar but different (even if I hadn't considered giving them different
> names).

You make some good points, but I disagree for two reasons:

1. This is a postgres issue, not a general SQL issue, so we can't
simply say "SQL is weird" (like we can with a lot of the strange NULL
semantics).
2. Postgres has determined that the unknown column reference "b" in
the query "SELECT a=b FROM (SELECT ''::text, '  ') x(a,b)" can and
should be coerced to text. So postgres has already made that decision.
It's just that when it tries to coerce it, it fails. Even if you
believe that literals and column references are not equivalent, I
don't see any justification at all for this behavior -- postgres
should not decide to coerce it in the first place if it's going to
fail.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving speed of make check-world

2015-04-23 Thread Jeff Janes

On Thu, Aug 14, 2014 at 10:45 PM, Peter Eisentraut  wrote:

> make check-world creates a temporary installation in every subdirectory
> it runs a test in, which is stupid: it's very slow and uses a lot of
> disk space.  It's enough to do this once per run.  That is the essence
> of what I have implemented.  It cuts the time for make check-world in
> half or less, and it saves gigabytes of disk space.
>

Something about this commit (dcae5faccab64776376d354d) broke "make check"
in parallel conditions when started from a clean directory.  It fails with
a different error each time, one example:

make -j4 check > /dev/null

In file included from gram.y:14515:
scan.c: In function 'yy_try_NUL_trans':
scan.c:10307: warning: unused variable 'yyg'
/usr/bin/ld: tab-complete.o: No such file: No such file or directory
collect2: ld returned 1 exit status
make[3]: *** [psql] Error 1
make[2]: *** [all-psql-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-bin-recurse] Error 2
make: *** [all-src-recurse] Error 2
make: *** Waiting for unfinished jobs

If I rerun it without cleaning the tree, is usually passes the second
time.  Or I can just separate the make and the check like "make -j4 >
/dev/null && make check > /dev/null" but I've grown accustomed to being
able to combine them since this problem was first fixed a couple years ago
(in a commit I can't seem to find)

I have:
GNU Make 4.0
Built for x86_64-unknown-linux-gnu

I was using ccache, but I still get the problem without using it.

Cheers,

Jeff

Re: [HACKERS] adding more information about process(es) cpu and memory usage


On 04/23/2015 08:00 PM, Radovan Jablonovsky wrote:

During current encounters with amazon web services - RDS, the DBA does not
have access to OS/linux shell of underlying instance. That render some
postgresql monitoring technique of process CPU and memory usage, not
useful. Even if the AWS provide internal tools/programming interface for
monitoring, it could be very useful to have this information provided by
postgresql system table(s)/view/functions/api. The information about how
much postgresql background/process is using CPU (similar to command top
result) and memory. it could be something as simple as adding cpu,memory
information fields to pg_stat_activity.


You can write an extension to do that. Of course, Amazon won't let you 
run your own C extension either (otherwise you could use that to escape 
into shell), but if you do it well and publish and get it included into 
standard distributions, they just might pick it up. Unless they don't 
want you to see that information. If they don't, then they wouldn't let 
you use the system views either.


In a nutshell, I don't think PostgreSQL should get involved in that...

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Joshua D. Drake



On 04/23/2015 09:42 AM, Jim Nasby wrote:


On 4/23/15 11:24 AM, Andres Freund wrote:

I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...


Assuming you're talking about padding between fields...

Several years ago Enova paid Command Prompt to start work on logical
column ordering, and part of the motivation for that was to allow
re-ordering physical tuples into the most efficient on-disk format
possible. I think I did some tests re-arranging some tables into the
theoretically most efficient order and measuring heap size. I think
there was some modest size improvement, maybe 10-15%? This was several
years ago so it's all foggy. Maybe Josh can find some of this in CMD's
ticketing system?


Yeah I dug around. I don't see anything about size improvement but here 
are our notes:


Alvaro said:

I ended up not producing notes as regularly as I had initially hoped. To 
try and make up for it, here's an update covering everything I've done 
since I started working on this issue.


This patch turned out to be completely different than what we had 
initially thought. We had thought it was going to be a matter of finding 
out places that used "attnum" and replace it with either attnum, 
attlognum or attphysnum, depending on what order was necessary on any 
given spot. This wasn't an easy thing to do because there are several 
hundreds of those. So it was supposed to be amazingly time-consuming and 
rather boring work.


This has nothing to do with reality: anywhere from parser down to 
optimizer and executor, the way things work is that a list of attributes 
is built, processed, and referenced. Some places assume that the list is 
in a certain order that's always the same order for those three cases. 
So the way to develop this feature is to change those places so that 
instead of receiving the list in one of these orders, they instead 
receive it in a different order.


So what I had to do early on, was find a way to retrieve the sort order 
from catalogs, preserve it when TupleDescriptors are built, and ensure 
the attribute list is extracted from TupleDesc in the correct order. But 
it turned out that this is not enough, because down in the parser guts, 
a target list is constructed; and later, a TupleDescriptor is built from 
the target list. So it's necessary to preserve the sorting info from the 
original tuple descriptor into the target list (which means adding order 
info to Var and TargetEntry nodes), so that the new TupleDesc can also 
have it.


Today I'm finding that even more than that is necessary. It turns out 
that the RangeTableEntries (i.e. the entries in the FROM clause of a 
query) have an item dubbed "eref" which is a list of column names; due 
to my changes in the earlier parser stages, this list is sorted in 
logical column order; but the code to resolve things such as columns 
used in JOIN/ON clauses walks the list (which is in logical order) and 
then uses the number of times it had to walk the elements in the list to 
construct a Var (column reference) in "attnum" order -- so it finds a 
different column, and it all fails.


So what I'm doing now is modify the RangeTableEntry node to keep a 
mapping list of logical to identity numbers. Then I'll have to search 
for places using the rte->eref->colnames and make sure that they 
correctly use attlognum as index into it.


And then later:

First of all I should note that I discussed the approach mentioned above 
to pgsql-hackers and got a very interesting comment from Tom Lane that 
adding sorting info to Var and TargetEntry nodes was not a very good 
idea because it'd break stored rules whenever a table column changed. So 
I went back and studied that code and noticed that it was really the 
change in RangeTableEntry that's doing the good magic; those other 
changes are fortunately not necessary. (Though there were a necessary 
vehicle for me to understand how the other stuff works.)


I've been continuing to study the backend code looking for uses of 
attribute lists that assume a single ordering. As I get more into it, 
more complex cases appear. The number of cases is fortunately bounded, 
though. Most of the uses of straight attribute lists are in places that 
do not require modification, or require little work or thought to update 
correctly.


However, some other places are not like that. I have "fixed" SQL 
functions two times now, and I just found out that the second fix (which 
I believed to be "mostly correct") was to be the final one, but I found 
out just now that it's not, and the proper fix is going to require 
something a bit more low-level (namely, a projection step that reorders 
columns correctly after the fact). Fortunately, I believe that this 
extra projection step is going to fix a lot of other cases too, which I 
originally had no idea how to attack. Moreover, understanding that bit 
means I also figured out what Tom

[HACKERS] adding more information about process(es) cpu and memory usage

2015-04-23 Thread Radovan Jablonovsky

During current encounters with amazon web services - RDS, the DBA does not
have access to OS/linux shell of underlying instance. That render some
postgresql monitoring technique of process CPU and memory usage, not
useful. Even if the AWS provide internal tools/programming interface for
monitoring, it could be very useful to have this information provided by
postgresql system table(s)/view/functions/api. The information about how
much postgresql background/process is using CPU (similar to command top
result) and memory. it could be something as simple as adding cpu,memory
information fields to pg_stat_activity.

-- 

*Radovan Jablonovsky* | SaaS DBA | Phone 1-403-262-6519 (ext. 7256) | Fax
1-403-233-8046



*Replicon | Hassle-Free Time & Expense Management Software - 7,300
Customers - 70 Countrieswww.replicon.com
 | facebook
 | twitter
 | blog
 | contact us
We are
hiring! | search jobs
*

Re: [HACKERS] Reducing tuple overhead

2015-04-23 Thread Petr Jelinek


On 23/04/15 18:24, Andres Freund wrote:

Whether that's feasible complexity wise is debatable, but it's certainly
possible.


I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...



The logical ordering patch + auto optimizations of column layout on 
table creation/rewrite might help partially there.


But what seems to be clear is that we need more in depth analysis of 
what really contributes most to the tuple size in various use-cases and 
then we can debate what we can do about it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing tuple overhead


On 4/23/15 11:24 AM, Andres Freund wrote:

I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...


Assuming you're talking about padding between fields...

Several years ago Enova paid Command Prompt to start work on logical 
column ordering, and part of the motivation for that was to allow 
re-ordering physical tuples into the most efficient on-disk format 
possible. I think I did some tests re-arranging some tables into the 
theoretically most efficient order and measuring heap size. I think 
there was some modest size improvement, maybe 10-15%? This was several 
years ago so it's all foggy. Maybe Josh can find some of this in CMD's 
ticketing system?


Aside from that, something else that might be interesting is Tom's 
recent work on decoupling on-page representation of types from what's 
passed around internally. That might offer some gains here, even if it's 
just in reducing the need for alignment.


I also wonder if a similar technique would be useful at the tuple level. 
One possibility would be attempting to compress the tuple before putting 
it on the page.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 4/23/15 11:06 AM, Petr Jelinek wrote:

On 23/04/15 17:45, Bruce Momjian wrote:

On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
Agreed, no extra file, and the same write volume as currently.  It would
also match pg_clog, which uses two bits per transaction --- maybe we can
reuse some of that code.



Yeah, this approach seems promising. We probably can't reuse code from
clog because the usage pattern is different (key for clog is xid, while
for visibility/freeze map ctid is used). But visibility map storage
layer is pretty simple so it should be easy to extend it for this use.


Actually, there may be some bit manipulation functions we could reuse; 
things like efficiently counting how many things in a byte are set. 
Probably doesn't make sense to fully refactor it, but at least CLOG is a 
good source for cut/paste/whack.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reducing tuple overhead

Split into a new thread, the other one is already growing fast
enough. This discussion started at
http://archives.postgresql.org/message-id/55391469.5010506%40iki.fi

On April 23, 2015 6:48:57 PM GMT+03:00, Heikki Linnakangas  
wrote:
>Stop right there. You need to reserve enough space on the page to store
>
>an xmax for *every* tuple on the page. Because if you don't, what are 
>you going to do when every tuple on the page is deleted by a different 
>transaction.
>
>Even if you store the xmax somewhere else than the page header, you
>need 
>to reserve the same amount of space for them, so it doesn't help at
>all.

Depends on how you do it and what you optimize for (disk space, runtime,
code complexity..).  You can e.g. use apply a somewhat similar trick to
xmin/xmax as done to cmin/cmax; only that the data structure needs to be
persistent.

In fact, we already have combocid like structure for xids that's
persistent - multixacts. We could just have one xid saved that's either
xmin or xmax (indicated by bits) or a multixact.  When a tuple is
updated/deleted whose xmin is still required we could replace the former
xmin with a multixact, otherwise just change the flag that it's now a
xmax without a xmin.  To check visibility and if the xid is a multixact
we'd just have to look for the relevant member for the actual xmin and
xmax.

To avoid exessive overhead when a tuple is repeatedly updated within one
session we could store some of the data in the combocid entry that we
anyway need in that case.

Whether that's feasible complexity wise is debatable, but it's certainly
possible.

I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anole - test case sha2 fails on all branches

Robert,

On Thursday, April 23, 2015, Robert Haas  wrote:

> On Thu, Apr 23, 2015 at 6:13 AM, Sandeep Thakkar
> > wrote:
> > The test case sha2 in contrib pgcrypto module is failing with a diff on
> > anole because the results file contains the additional lines as:
> > --
> >
> > + WARNING:  detected write past chunk end in ExprContext 6021cbb0
>
> That's clearly a bug, but it's a little hard to tell which commit
> introduced it because anole hasn't reported in so long.  :-(
>
> I notice that friarbird, leech, markhor, and jaguarundi are failing
> with changes that appear to be clearly RLS-related:


Interesting.  I'll take a deeper look at what's going on there tonight.

Thanks!

Stephen

Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

On Sun, Feb  1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:
> 
> On 31/01/15 14:55, Roger Pack wrote:
> >[...]
> >
> >Now, the reality is that GetNewOidWithIndex() is going to keep
> >incrementing the global OID counter until it finds an OID that isn't in
> >the toast table. That means that if you actually get anywhere close to
> >using 4B OIDs you're going to become extremely unhappy with the
> >performance of toasting new data.
> >OK so "system stability" doesn't degrade per se when it wraps [since
> >they all use that GetNewOid method or similar [?] good to know.
> >
> >So basically when it gets near 4B TOAST'ed rows it may have to wrap that
> >counter and search for "unused" number, and for each number it's
> >querying the TOAST table to see if it's already used, degrading
> >performance.
> 
> 
> The problem here is that performance degrades exponentially, or
> worse. Speaking here from experience, we already tested this for a
> very similar case (table creation, where two oids are consumed from
> a global sequence when inserting to pg_class). Have a look at
> http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
> slides 43-45. We tested there this scenario and shown that table
> creations per second dropped from 10K to a few per second and then
> to a few per day. In the graphs you can't even realize there were
> more tables been created. At around 8K tables from the theoretical
> limit of 4B oids consumed, the process basically stopped (doing more
> insertions).

I had a look at our FAQ about Postgres limitations and I don't see
anything that needs changing:


https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F

Maximum size for a database? unlimited (32 TB databases exist)
Maximum size for a table? 32 TB
Maximum size for a row? 400 GB
Maximum size for a field? 1 GB
Maximum number of rows in a table? unlimited
Maximum number of columns in a table? 250-1600 depending on column types
Maximum number of indexes on a table? unlimited

We don't report the maximum number of tables per database, or the
maximum number of TOAST values.  Agreed?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 9:02 AM, Heikki Linnakangas  wrote:
> That code in ExecWithCheckOptions is not translatable. See style guide:
> http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

It's probably going to need to change when I rebase on top of
Dean's/Stephen's work, anyway.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Petr Jelinek


On 23/04/15 17:45, Bruce Momjian wrote:

On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:

Right.  My point is that either you do X 2M times to maintain that fork
and the overhead of the file existence, or you do one VACUUM FREEZE.  I
am saying that 2M is a large number and adding all those X's might
exceed the cost of a VACUUM FREEZE.


I agree, but if we instead make this part of the visibility map
instead of a separate fork, the cost is much less.  It won't be any
more expensive to clear 2 consecutive bits any time a page is touched
than it is to clear 1.  The VM fork will be twice as large, but still
tiny.  And the fact that you'll have only half as many pages mapping
to the same VM page may even improve performance in some cases by
reducing contention.  Even when it reduces performance, I think the
impact will be so tiny as not to be worth caring about.


Agreed, no extra file, and the same write volume as currently.  It would
also match pg_clog, which uses two bits per transaction --- maybe we can
reuse some of that code.



Yeah, this approach seems promising. We probably can't reuse code from 
clog because the usage pattern is different (key for clog is xid, while 
for visibility/freeze map ctid is used). But visibility map storage 
layer is pretty simple so it should be easy to extend it for this use.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0


On 04/20/2015 07:37 AM, Peter Geoghegan wrote:


if (wco->commandType == CMD_INSERT)
command = "INSERT-applicable ";
else if (wco->commandType == CMD_UPDATE)
command = "UPDATE-applicable ";
else if (wco->commandType == CMD_DELETE)
command = "DELETE-applicable ";
else if (wco->commandType == CMD_SELECT)
command = "SELECT-applicable ";

ereport(ERROR,

(errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION),
 errmsg("new row violates %sWITH CHECK OPTION %sfor 
\"%s\"",
command ? command : "",
wco->secBarrier ? "(originally security 
barrier) ":"",
wco->viewname),
val_desc ? errdetail("Failing row contains 
%s.", val_desc) :
   0));


That code in ExecWithCheckOptions is not translatable. See style guide: 
http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On April 23, 2015 6:12:05 PM GMT+03:00, Jim Nasby  
wrote:
>On 4/22/15 9:41 PM, Bruce Momjian wrote:
>> The case this doesn't catch is referencing a
>> symbolic link that points to the same directory.  We can't make it an
>> error so people can use pg_upgrade these setups.
>
>Couldn't we make it an ERROR unless IsBinaryUpgrade?

People still upgrade without pg upgrade.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Apr 23, 2015 at 06:52:20PM +0300, Heikki Linnakangas wrote:
> >Agreed.  Are you saying we can't find a way to fit an 8-byte value into
> >the existing page in a backward-compatible way?
> 
> I'm sure we can find a way. We've discussed ways to handle page
> format updates in pg_upgrade before, and I don't want to get into
> that discussion here, but it's not trivial.

OK, good to know, thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 04/23/2015 06:38 PM, Bruce Momjian wrote:

On Thu, Apr 23, 2015 at 10:42:59AM +0300, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

I would feel safer if we added a completely new "epoch" counter to the page

header, instead of reusing LSNs. But as we all know, changing the page
format is a problem for in-place upgrade, and takes some space too.

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See "The Elephants in the Room.pdf" at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint,
but the page header is not the elephant in the room.


Agreed.  Are you saying we can't find a way to fit an 8-byte value into
the existing page in a backward-compatible way?


I'm sure we can find a way. We've discussed ways to handle page format 
updates in pg_upgrade before, and I don't want to get into that 
discussion here, but it's not trivial.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

On Thu, Apr 23, 2015 at 09:26:50AM -0400, Robert Haas wrote:
> >> What I meant was - I didn't see an attachment on that message.
> >
> > I didn't attach it as people have told me they can just as easily see
> > the patch via git, and since it was so similar, I didn't repost it.
> > Should I have?  I can easily do that.
> 
> No, I just misread your email.  I thought you said you had attached
> the patch; rereading it, I see that you said you had applied the
> patch.  Silly me.

You were not the only one confused --- I got a private email on the same
topic.  My only guess is that I normally say "attached" in that case so
"applied" just looked too similar.  I will try to mix it up in the
future.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 04/23/2015 06:39 PM, Petr Jelinek wrote:

On 23/04/15 17:24, Heikki Linnakangas wrote:

On 04/23/2015 05:52 PM, Jim Nasby wrote:

I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that
years ago - but xmax (and ctid) is difficult. When a tuple is inserted,
Xmax is basically just a reservation for the value that will be put
there later. You have no idea what that value is, and you can't
influence it, and when it's time to delete/update the row, you *must*
have the space for that xmax. So we can't opportunistically use the
space for anything else, or compress them or anything like that.


That depends, if we are going to change page format we can move the xmax
to be some map of ctid->xmax in the header (with no values for tuples
with no xmax)  ...


Stop right there. You need to reserve enough space on the page to store 
an xmax for *every* tuple on the page. Because if you don't, what are 
you going to do when every tuple on the page is deleted by a different 
transaction.


Even if you store the xmax somewhere else than the page header, you need 
to reserve the same amount of space for them, so it doesn't help at all.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
> > Right.  My point is that either you do X 2M times to maintain that fork
> > and the overhead of the file existence, or you do one VACUUM FREEZE.  I
> > am saying that 2M is a large number and adding all those X's might
> > exceed the cost of a VACUUM FREEZE.
> 
> I agree, but if we instead make this part of the visibility map
> instead of a separate fork, the cost is much less.  It won't be any
> more expensive to clear 2 consecutive bits any time a page is touched
> than it is to clear 1.  The VM fork will be twice as large, but still
> tiny.  And the fact that you'll have only half as many pages mapping
> to the same VM page may even improve performance in some cases by
> reducing contention.  Even when it reduces performance, I think the
> impact will be so tiny as not to be worth caring about.

Agreed, no extra file, and the same write volume as currently.  It would
also match pg_clog, which uses two bits per transaction --- maybe we can
reuse some of that code.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Apr 23, 2015 at 06:24:00PM +0300, Heikki Linnakangas wrote:
> >I've often wondered if there was some way we could consolidate XMIN/XMAX
> >from multiple tuples at the page level; that could be a big win for OLAP
> >environments where most of your tuples belong to a pretty small range of
> >XIDs. In many workloads you could have 80%+ of the tuples in a table
> >having a single inserting XID.
> 
> It would be doable for xmin - IIRC someone even posted a patch for
> that years ago - but xmax (and ctid) is difficult. When a tuple is
> inserted, Xmax is basically just a reservation for the value that
> will be put there later. You have no idea what that value is, and
> you can't influence it, and when it's time to delete/update the row,
> you *must* have the space for that xmax. So we can't
> opportunistically use the space for anything else, or compress them
> or anything like that.

Also SELECT FOR UPDATE uses the per-row xmax too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Petr Jelinek


On 23/04/15 17:24, Heikki Linnakangas wrote:

On 04/23/2015 05:52 PM, Jim Nasby wrote:

On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See "The Elephants in the Room.pdf" at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that
years ago - but xmax (and ctid) is difficult. When a tuple is inserted,
Xmax is basically just a reservation for the value that will be put
there later. You have no idea what that value is, and you can't
influence it, and when it's time to delete/update the row, you *must*
have the space for that xmax. So we can't opportunistically use the
space for anything else, or compress them or anything like that.



That depends, if we are going to change page format we can move the xmax 
to be some map of ctid->xmax in the header (with no values for tuples 
with no xmax) or have bitmap there of tuples that have xmax etc. 
Basically not saving xmax (and potentially other info) inline for each 
tuple but have some info in header only for tuples that need it. That 
might have bad performance side effects of course, but there are 
definitely some potential ways of doing things differently which we 
could explore.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Apr 23, 2015 at 10:42:59AM +0300, Heikki Linnakangas wrote:
> On 04/22/2015 09:24 PM, Robert Haas wrote:
> >>I would feel safer if we added a completely new "epoch" counter to the page
> >>>header, instead of reusing LSNs. But as we all know, changing the page
> >>>format is a problem for in-place upgrade, and takes some space too.
> >Yeah.  We have a serious need to reduce the size of our on-disk
> >format.  On a TPC-C-like workload Jan Wieck recently tested, our data
> >set was 34% larger than another database at the beginning of the test,
> >and 80% larger by the end of the test.  And we did twice the disk
> >writes.  See "The Elephants in the Room.pdf" at
> >https://sites.google.com/site/robertmhaas/presentations
> 
> Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
> disk size. No doubt it would be nice to reduce our disk footprint,
> but the page header is not the elephant in the room.

Agreed.  Are you saying we can't find a way to fit an 8-byte value into
the existing page in a backward-compatible way?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On Thu, Apr 23, 2015 at 05:02:19PM +0200, Andres Freund wrote:
> On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
> > When I set out I was really only hoping to express a preference as a user;
> > on balance I would really rather not have DO IGNORE, if it were possible to
> > avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
> > about cope with (and means you don't need to add IGNORE as a keyword,
> > win!), although it still mildly pains me that there's an additional
> > unnecessary word.
> 
> Yea, DO NOTHING is a good alternative. And I do like we're adding one
> keyword less (which is also good for the parser's
> size/performance).

No question that DO IGNORE sounds awkward.  DO NOTHING also matches
CREATE RULE --- another plus.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On Thu, Apr 23, 2015 at 05:05:14PM +0200, Andres Freund wrote:
> On 2015-04-23 11:00:43 -0400, Bruce Momjian wrote:
> > On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
> > > I think this is a good thing to do, but I sure wish we could go
> > > further and block it completely.  That may require more thought than
> > > we have time to put in at this stage of the release cycle, though, so
> > > +1 for doing at least this much.
> > 
> > OK, good.  Thinking to 9.6, I am not sure how we could throw an error
> > because we have allowed this in the past and pg_dump is going to be
> > restored with a raw SQL CREATE TABLESPACE command.  
> 
> We could just document that you need to pre-create the tablespace and
> ignore the resulting error. This isn't going to affect too many people.

This approach is going to cause any object in that tablespace to not
restore --- are we sure that enough people check for restore errors that
we will not have people losing data on a restore?

Also, the error is going to cause pg_upgrade to fail.  We could have
pg_upgrade --check detect these cases and force people to fix their
setups before they run pg_upgrade --- at least that would be consistent
with the pg_dump behavior.  Jim Nasby suggested throwing an error unless
IsBinaryUpgrade is set, and that would work, but it means we are
allowing such tablespaces to be upgraded using pg_upgrade only, which
seems kind of odd.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 04/23/2015 05:52 PM, Jim Nasby wrote:

On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See "The Elephants in the Room.pdf" at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX
from multiple tuples at the page level; that could be a big win for OLAP
environments where most of your tuples belong to a pretty small range of
XIDs. In many workloads you could have 80%+ of the tuples in a table
having a single inserting XID.


It would be doable for xmin - IIRC someone even posted a patch for that 
years ago - but xmax (and ctid) is difficult. When a tuple is inserted, 
Xmax is basically just a reservation for the value that will be put 
there later. You have no idea what that value is, and you can't 
influence it, and when it's time to delete/update the row, you *must* 
have the space for that xmax. So we can't opportunistically use the 
space for anything else, or compress them or anything like that.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Pavel Stehule

2015-04-23 16:12 GMT+02:00 Robert Haas :

> On Thu, Apr 23, 2015 at 9:55 AM, Pavel Stehule 
> wrote:
> >> On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule  >
> >> wrote:
> >> > I don't see a contradiction. There is clean agreement, so ERROR level
> >> > should
> >> > to show the context. NOTICE and WARNINGs doesn't need it - and there
> is
> >> > a
> >> > backward compatibility and usability reasons don't do it.
> >>
> >> Whether notices and warnings need it is a matter of opinion.  I don't
> >> think your idea is bad, and it might be a good rule of thumb in many
> >> cases, but I slightly prefer Marko's approach of adding a new option.
> >
> > I am not sure if I understand to you.
> >
> > please, can you write more about your idea?
>
> Your idea, as I understand it, is that for logs at severity levels
> lower than ERROR, we can always emit the context, because it's not
> necessary.  But I'm not sure that's right: some people might find that
> context helpful.  If, as Marko proposes, we add an explicit option,
> then everyone can choose the behavior that is right for them.
>

I am not sure, if explained it well. I would to emit context for ERROR and
higher by default. And I would not to emit context for any less than ERROR
by default (I am not sure about WARNING level).

But it can be changed by some option in RAISE statement like Marko proposes
- possible to change by GUC globally, because it doesn't change a behave of
application.

For current behave I have a problem with ERROR level in plpgsql where the
context is missing now.  On second hand I am thinking so current behave is
ok for NOTICE level .

I am not against to any new option in RAISE statement.

If there is some collision between me and Marko, then it is in opinion what
have to be default behave for NOTICE level. I strongly prefer don't show
context there. But I can accept some global switch too.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] tablespaces inside $PGDATA considered harmful


On 4/22/15 9:41 PM, Bruce Momjian wrote:

The case this doesn't catch is referencing a
symbolic link that points to the same directory.  We can't make it an
error so people can use pg_upgrade these setups.


Couldn't we make it an ERROR unless IsBinaryUpgrade?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Split the 'Server Programming' chapter into two?


On 4/23/15 9:23 AM, Alvaro Herrera wrote:

Robert Haas wrote:

On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund  wrote:



To me at least 44 - 47 don't really fit well to the rest. I think we
either should invent a new category for them, or move them to
'Internals'. Maybe we could introduce 'Extending the Server' category
for those and a couple more? Candidates for that least seem to be
52. Writing A Procedural Language Handler
53. Writing A Foreign Data Wrapper
54. Writing A Custom Scan Provider


I like the "extending the server" idea.  Maybe "Server Extensions".


+1 for Extending the Server.  I don't like Server Extensions too much;
seems confusing with sql-level EXTENSION objects.


I definitely think it would be useful to split the C stuff out on it's 
own. We could also stand to have more information there, especially 
around things like fcinfo, but that's another topic...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On 2015-04-23 11:00:43 -0400, Bruce Momjian wrote:
> On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
> > I think this is a good thing to do, but I sure wish we could go
> > further and block it completely.  That may require more thought than
> > we have time to put in at this stage of the release cycle, though, so
> > +1 for doing at least this much.
> 
> OK, good.  Thinking to 9.6, I am not sure how we could throw an error
> because we have allowed this in the past and pg_dump is going to be
> restored with a raw SQL CREATE TABLESPACE command.  

We could just document that you need to pre-create the tablespace and
ignore the resulting error. This isn't going to affect too many people.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 4/23/15 8:42 AM, Robert Haas wrote:

On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs  wrote:

We were talking about having an incremental backup map also. Which sounds a
lot like the freeze map.


Yeah, possibly.  I think we should try to set things up so that the
backup map can be updated asynchronously by a background worker, so
that we're not adding more work to the foreground path just for the
benefit of maintenance operations.  That might make the logic for
autovacuum to use it a little bit more complex, but it seems
manageable.


I'm not sure an actual map makes sense... for incremental backups you 
need some kind of stream that tells you not only what changed but when 
it changed. A simple freeze map won't work for that because the 
operation of freezing itself writes data (and the same can be true for 
VM). Though, if the backup utility was actually comparing live data to 
an actual backup maybe this would work...



We only need a freeze/backup map for larger relations. So if we map 1000
blocks per map page, we skip having a map at all when size < 1000.


Agreed.  We might also want to map multiple blocks per map slot - e.g.
one slot per 32 blocks.  That would keep the map quite small even for
very large relations, and would not compromise efficiency that much
since reading 256kB sequentially probably takes only a little longer
than reading 8kB.


The problem with mapping a range of pages per bit is dealing with 
locking when you set the bit. Currently that's easy because we're 
holding the cleanup lock on the page, but you can't do that if you have 
a range of pages. Though, if each 'slot' wasn't a simple binary value we 
could have a 3rd state that indicates we're in the process of marking 
that slot as all visible/frozen, but you still need to consider the bit 
as cleared.


Honestly though, I think concerns about the size of the map are a bit 
overblown. Even if we double it's size, it's still 32,000 times smaller 
than the heap is with 8k pages. I suspect if you have tables large 
enough where you'll care, you'll also be using 32k pages, which means 
it'd be 128,000 times smaller than the heap. I have a hard time 
believing that's going to be even a faint blip on the performance radar.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-04-23 Thread Fabrízio de Royes Mello

On Wed, Apr 22, 2015 at 3:48 PM, Payal Singh  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> Seeing this when trying to apply the patch:
>
> Patching file src/backend/commands/tablecmds.c using Plan A...
> Hunk #1 FAILED at 328.
> Hunk #2 succeeded at 2294 (offset 11 lines).
> Hunk #3 FAILED at 3399.
> Hunk #4 FAILED at 3500.
> Hunk #5 succeeded at 4658 with fuzz 1 (offset 65 lines).
> Hunk #6 succeeded at 4753 (offset 66 lines).
> Hunk #7 succeeded at 4989 with fuzz 2 (offset 66 lines).
> Hunk #8 succeeded at 5003 (offset 69 lines).
> Hunk #9 succeeded at 5017 (offset 69 lines).
> Hunk #10 succeeded at 5033 (offset 69 lines).
>
> The new status of this patch is: Waiting on Author
>

The patch needs a "rebase". Done!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 6a82730..3041b09 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE name
 
 where action is one of:
 
-ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ]column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]
 ALTER [ COLUMN ] column_name SET DEFAULT expression
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE name
 
   

-ADD COLUMN
+ADD COLUMN [ IF NOT EXISTS ]
 
  
   This form adds a new column to the table, using the same syntax as
-  .
+  . If IF NOT EXISTS
+  is specified and the column already exists, no error is thrown.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 06e4332..8cd436d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,8 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname, bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2294,7 +2294,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy(&(attform->attname), newattname);
@@ -3443,11 +3443,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, false, false, lockmode);
+			false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, true, false, lockmode);
+	  false, true, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_ColumnDefault:	/* ALTER COLUMN DEFAULT */
 			address = ATExecColumnDefault(rel, cmd->name, cmd->def, lockmode);
@@ -3490,19 +3490,19 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			address =
-ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
-	false, false, lockmode);
+			ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
+false, false, lockmode);
 			break;
 		case AT_AddConstraintRecurse:	/* ADD CONSTRAINT with recursion */
 			address =
-ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
-	true, false, lockmode);
+			ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
+true, false, lockmode);
 			break;
 		case AT_ReAddConstraint:		/* Re-add p

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote:
> When I set out I was really only hoping to express a preference as a user;
> on balance I would really rather not have DO IGNORE, if it were possible to
> avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
> about cope with (and means you don't need to add IGNORE as a keyword,
> win!), although it still mildly pains me that there's an additional
> unnecessary word.

Yea, DO NOTHING is a good alternative. And I do like we're adding one
keyword less (which is also good for the parser's
size/performance).

DO {UPDATE ... | NOTHING | LOCK} doesn't sound too bad to me (yes, LOCK
doesn't exist yet, except by writing UPDATE .. WHERE false ;)).

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablespaces inside $PGDATA considered harmful

On Thu, Apr 23, 2015 at 09:13:52AM -0400, Robert Haas wrote:
> On Wed, Apr 22, 2015 at 10:41 PM, Bruce Momjian  wrote:
> >> What is a real problem is that we don't block creating tablespaces
> >> anywhere at all, including in obviously problematic places like the
> >> transaction log directory:
> >>
> >> josh=# create tablespace tbl2 location '/home/josh/pg94/data/pg_xlog/';
> >> CREATE TABLESPACE
> >>
> >> It really seems like we ought to block *THAT*.  Of course, if we block
> >> tablespace creation in PGDATA generally, then that's covered.
> >
> > I have developed the attached patch to warn about creating tablespaces
> > inside the data directory.  The case this doesn't catch is referencing a
> > symbolic link that points to the same directory.  We can't make it an
> > error so people can use pg_upgrade these setups.  This would be for 9.5
> > only.
> 
> I think this is a good thing to do, but I sure wish we could go
> further and block it completely.  That may require more thought than
> we have time to put in at this stage of the release cycle, though, so
> +1 for doing at least this much.

OK, good.  Thinking to 9.6, I am not sure how we could throw an error
because we have allowed this in the past and pg_dump is going to be
restored with a raw SQL CREATE TABLESPACE command.  

We have had this type of problem before, but never resolved it.  We
almost need pg_dump to set a GUC variable telling the backend it is
restoring a dump and issue a warning, but throw an error if the same
command was issued outside of a pg_dump restore.  FYI, pg_upgrade
already throws a warning related to the non-creation of a delete script.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anole - test case sha2 fails on all branches

I wrote:
> Given that anole is the only one reporting this, I'm not sure that we
> should immediately blame Postgres itself.  I have a vague recollection
> that we've seen this symptom before and traced it to a bug in some
> supporting library.  Is anole using any particularly out-of-date versions
> of openssl, kerberos, etc?

A bit of digging in the archives suggests that my hindbrain remembered this:
http://www.postgresql.org/message-id/flat/4f5a8404.8020...@dunslane.net

The specifics probably don't apply to anole, but the conclusion that
inconsistent openssl header and library files triggered the bug might.

Maybe it'd be worth trying to see if we could detect such inconsistencies
during configure?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.


On 4/23/15 2:42 AM, Heikki Linnakangas wrote:

On 04/22/2015 09:24 PM, Robert Haas wrote:

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See "The Elephants in the Room.pdf" at
https://sites.google.com/site/robertmhaas/presentations


Meh. Adding an 8-byte header to every 8k block would add 0.1% to the
disk size. No doubt it would be nice to reduce our disk footprint, but
the page header is not the elephant in the room.


I've often wondered if there was some way we could consolidate XMIN/XMAX 
from multiple tuples at the page level; that could be a big win for OLAP 
environments where most of your tuples belong to a pretty small range of 
XIDs. In many workloads you could have 80%+ of the tuples in a table 
having a single inserting XID.


Dunno how much it would help for OLTP though... :/
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Geoff Winkless

On 23 April 2015 at 14:50, Andres Freund  wrote:

> > Maybe I'm misreading it, but isn't index_predicate meant to be inside
> the
> > brackets?
> >
> >
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html
>
> That has changed since.

Oh, helpful. :)

I'll shut up. I have a feeling that my objection is really with the very
idea of unreserved keywords and I have a feeling that there will be rather
more people shouting me down if I go off on that particular rant; meanwhile
it's 20 years since I used yacc in earnest and it's too hazy to be able to
argue about what it is or isn't capable of.

When I set out I was really only hoping to express a preference as a user;
on balance I would really rather not have DO IGNORE, if it were possible to
avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just
about cope with (and means you don't need to add IGNORE as a keyword,
win!), although it still mildly pains me that there's an additional
unnecessary word.

But I certainly don't object enough to hold up you guys doing the actual
work for my benefit (among others, obviously!).

G

Re: [HACKERS] Allow SQL/plpgsql functions to accept record


On 4/23/15 8:33 AM, Pavel Stehule wrote:

I agree that that would be useful.  I think the problem with an
expression like rowvar.something is that PL/pgsql cannot infer the
type of the result, and nothing else works without that.  I doubt that
it's practical to lift that restriction.  PL/pgsql could introduce
dedicated syntax for this operation, like DYNAMIC_EXTRACT(rowvar,
colname, resulttype) or something, but that's going to be clunky at
best.  Whether we eventually do that or not, if we can allow rows to
be passed in and then let people use json or hstore operators on them,
that would be a helpful step forward, IMHO.  I'm not sure if that's
practical either, but maybe...


this need significant changes in plpgsql - it can enforce work with
different types in expressions in cycle - so we should to leave
expressions based on plans or we have to introduce alternated plans for
different input types.


Is this fundamentally an issue of not knowing what we're being handed 
when we compile the function? Perhaps a way around this limitation would 
be to recompile during execution if any record arguments get a different 
base type. In reality, I suspect that won't happen during a single query.


I'll take a look at at least allowing passing a record in so you can 
hand it to some other function. Any pointers on how to do that would be 
welcome; I've never hacked on plpgsql or SQL function code before.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes


On 4/23/15 8:25 AM, Robert Haas wrote:

Some users are partitioning tables just so that each
partition can be autovac'd separately.  That really shouldn't be
required.


Are they doing this for improved heap scan performance? Index scan 
performance? If the table wasn't partitioned, would they need more than 
one pass through the indexes due to exhausting maintenance_work_mem?


There's probably some fairly low-hanging fruit for parallelizing vacuum, 
but it really depends on what problems people are running into.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anole - test case sha2 fails on all branches

Robert Haas  writes:
> On Thu, Apr 23, 2015 at 6:13 AM, Sandeep Thakkar
>  wrote:
>> The test case sha2 in contrib pgcrypto module is failing with a diff on
>> anole because the results file contains the additional lines as:
>> + WARNING:  detected write past chunk end in ExprContext 6021cbb0

> That's clearly a bug, but it's a little hard to tell which commit
> introduced it because anole hasn't reported in so long.  :-(

Given that anole is the only one reporting this, I'm not sure that we
should immediately blame Postgres itself.  I have a vague recollection
that we've seen this symptom before and traced it to a bug in some
supporting library.  Is anole using any particularly out-of-date versions
of openssl, kerberos, etc?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anole - test case sha2 fails on all branches

2015-04-23 Thread Alvaro Herrera

Robert Haas wrote:

> There are lots of machines failing in pg_upgradeCheck, but I don't see
> details of the failures in the logs.

Yeah, I think the buildfarm script is failing to save the error log.
Anyway AFAIR this is related to the move from contrib to src/bin;
machines that have updated to buildfarm 4.15 work fine, those that are
on older versions report failure.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Split the 'Server Programming' chapter into two?

2015-04-23 Thread Alvaro Herrera

Robert Haas wrote:
> On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund  wrote:

> > To me at least 44 - 47 don't really fit well to the rest. I think we
> > either should invent a new category for them, or move them to
> > 'Internals'. Maybe we could introduce 'Extending the Server' category
> > for those and a couple more? Candidates for that least seem to be
> > 52. Writing A Procedural Language Handler
> > 53. Writing A Foreign Data Wrapper
> > 54. Writing A Custom Scan Provider
> 
> I like the "extending the server" idea.  Maybe "Server Extensions".

+1 for Extending the Server.  I don't like Server Extensions too much;
seems confusing with sql-level EXTENSION objects.

> > Also, shouldn't there at least be a link to
> > http://www.postgresql.org/docs/devel/static/xfunc-sql.html in
> > http://www.postgresql.org/docs/devel/static/xplang.html ?
> 
> Wouldn't hurt.

Agreed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

On Thu, Apr 23, 2015 at 9:55 AM, Pavel Stehule  wrote:
>> On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule 
>> wrote:
>> > I don't see a contradiction. There is clean agreement, so ERROR level
>> > should
>> > to show the context. NOTICE and WARNINGs doesn't need it - and there is
>> > a
>> > backward compatibility and usability reasons don't do it.
>>
>> Whether notices and warnings need it is a matter of opinion.  I don't
>> think your idea is bad, and it might be a good rule of thumb in many
>> cases, but I slightly prefer Marko's approach of adding a new option.
>
> I am not sure if I understand to you.
>
> please, can you write more about your idea?

Your idea, as I understand it, is that for logs at severity levels
lower than ERROR, we can always emit the context, because it's not
necessary.  But I'm not sure that's right: some people might find that
context helpful.  If, as Marko proposes, we add an explicit option,
then everyone can choose the behavior that is right for them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Split the 'Server Programming' chapter into two?

On Thu, Apr 23, 2015 at 8:21 AM, Andres Freund  wrote:
> While playing around with where exactly to put the replication
> origin/progress docs I once more noticed that the 'Server Programming'
> book is a mix of different topics.
>
> It currently contains:
> 35. Extending SQL
> 36. Triggers
> 37. Event Triggers
> 38. The Rule System
> 39. Procedural Languages
> 40. PL/pgSQL - SQL Procedural Language
> 41. PL/Tcl - Tcl Procedural Language
> 42. PL/Perl - Perl Procedural Language
> 43. PL/Python - Python Procedural Language
> 44. Server Programming Interface
> 45. Background Worker Processes
> 46. Logical Decoding
> 47. Replication Progress Tracking
>
> To me at least 44 - 47 don't really fit well to the rest. I think we
> either should invent a new category for them, or move them to
> 'Internals'. Maybe we could introduce 'Extending the Server' category
> for those and a couple more? Candidates for that least seem to be
> 52. Writing A Procedural Language Handler
> 53. Writing A Foreign Data Wrapper
> 54. Writing A Custom Scan Provider

I like the "extending the server" idea.  Maybe "Server Extensions".

> Also, shouldn't there at least be a link to
> http://www.postgresql.org/docs/devel/static/xfunc-sql.html in
> http://www.postgresql.org/docs/devel/static/xplang.html ?

Wouldn't hurt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anole - test case sha2 fails on all branches

On Thu, Apr 23, 2015 at 6:13 AM, Sandeep Thakkar
 wrote:
> The test case sha2 in contrib pgcrypto module is failing with a diff on
> anole because the results file contains the additional lines as:
> --
>
> + WARNING:  detected write past chunk end in ExprContext 6021cbb0

That's clearly a bug, but it's a little hard to tell which commit
introduced it because anole hasn't reported in so long.  :-(

I notice that friarbird, leech, markhor, and jaguarundi are failing
with changes that appear to be clearly RLS-related:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=friarbird&dt=2015-04-23%2004%3A20%3A00
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leech&dt=2015-04-23%2003%3A47%3A14
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-04-23%2000%3A00%3A12
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2015-04-22%2020%3A03%3A09

There are lots of machines failing in pg_upgradeCheck, but I don't see
details of the failures in the logs.

I suppose whatever's going wrong on anole must be the result of a
commit that was back-patched all the way to 9.0, post November 30th,
and probably touching contrib/pgcrypto.  That seems to include the
following four commits:

Author: Noah Misch 
Branch: master [8b59672d8] 2015-02-02 10:00:45 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [258e294db] 2015-02-02 10:00:49 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [a558ad3a7] 2015-02-02 10:00:50 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [d1972da8c] 2015-02-02 10:00:51 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [8d412e02e] 2015-02-02 10:00:52 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [0a3ee8a5f] 2015-02-02 10:00:52 -0500

Cherry-pick security-relevant fixes from upstream imath library.

This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20.  Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath.  Back-patch to 9.0 (all
supported versions).

Security: CVE-2015-0243

Author: Noah Misch 
Branch: master [1dc755158] 2015-02-02 10:00:45 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [82806cf4e] 2015-02-02 10:00:49 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [6994f0790] 2015-02-02 10:00:50 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [d95ebe0ac] 2015-02-02 10:00:51 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [11f738a8a] 2015-02-02 10:00:51 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [ce6f261cd] 2015-02-02 10:00:52 -0500

Fix buffer overrun after incomplete read in pullf_read_max().

Most callers pass a stack buffer.  The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation.  Back-patch to 9.0 (all supported versions).

Marko Tiikkaja

Security: CVE-2015-0243

Author: Tom Lane Branch: master [a59ee8819]
2015-01-30 13:05:30 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [70da7aeba] 2015-01-30 13:04:59 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [f08cf8ad9] 2015-01-30 13:05:01 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [a97dfdfd9] 2015-01-30 13:05:04 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [8f51c432c] 2015-01-30 13:05:07 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [7c41a32b3] 2015-01-30 13:05:09 -0500

Fix Coverity warning about contrib/pgcrypto's mdc_finish().

Coverity points out that mdc_finish returns a pointer to a local buffer
(which of course is gone as soon as the function returns), leaving open
a risk of misbehaviors possibly as bad as a stack overwrite.

In reality, the only possible call site is in process_data_packets()
which does not examine the returned pointer at all.  So there's no
live bug, but nonetheless the code is confusing and risky.  Refactor
to avoid the issue by letting process_data_packets() call mdc_finish()
directly instead of going through the pullf_read() API.

Although this is only cosmetic, it seems good to back-patch so that
the logic in pgp-decrypt.c stays in sync across all branches.

Marko Kreen

Author: Tom Lane 
Branch: master [586dd5d6a] 2015-01-24 13:05:42 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [d51d4ff31] 2015-01-24 13:05:45 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [7240f9200] 2015-01-24 13:05:49 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [502e5f9c3] 2015-01-24 13:05:53 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [b00a08859] 2015-01-24 13:05:56 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [3a3ee655c] 2015-01-24 13:05:58 -0500

Replace a bunch more uses of strncpy() with safer coding.

strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the k

Re: [HACKERS] Code paths where LWLock should be released on failure

2015-04-23 Thread Kevin Grittner

Michael Paquier  wrote:

> I have also been surprised by the inconsistencies particularly in
> predicate.c or other places regarding LWLock releases. Sometimes
> they are released on failure, sometimes not.

Those are not needed for correctness; they are a micro-optimization
for SerializableXactHashLock, which is a particularly hot LWLock
when serializable transactions are used.  There are a few places
where this is not done for this lock, which look like the places
where changes were made late in the development of the feature.
Most probably we went through at one point and added them before
throwing errors when that lock was held (as an optimization), but
neglected to do that with subsequent changes.  Removing the ones
that are there, or adding to the places we could, would not affect
correctness; it might make a very small difference in how quickly a
transaction which is going to cancel due to a serialization failure
gets out of the way of other transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

On 2015-04-23 14:34:02 +0100, Geoff Winkless wrote:
> > A syntax error. DO is a reserved keyword. Update is just unreserved (and
> > thus can be used as a column label). Ignore is unreserved with the patch
> > and was unreserved before.  We obviously can make both reserved, but of so
> > we have to do it for real, not by hiding the conflicts
> >
> 
> Sorry, I misunderstood: so it's not the fact that it can't be used as a
> column label (because it can) but the fact that it can't then be referenced
> within a WHERE clause without quoting

Meh. You can use any keyword in quotes - because then they're not
keywords anymore.

> INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
> update UPDATE update=1
> 
> but I would have to do
> 
> INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
> "do" UPDATE "do"=1

Yes.

> Maybe I'm misreading it, but isn't index_predicate meant to be inside the
> brackets?
> 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

That has changed since. And for good reason: It's pretty to have the
WHERE clause inside the brackets when that's not the case for CREATE
INDEX. But more importantly with multiple columns for stuff like ON
CONFLICT (a, b WHERE foo) it's unclear where the WHERE is actually
attached to. We have that problem with aggregates and it repeatedly
caused confusion.

> As I said, it's my personal belief that anyone using keywords (of any
> kind) unquoted deserves what they get, but I see your point.

Given that IGNORE isn't even a keyword right now (9.5 master) that
policy isn't particularly meaningful anyway.

> I think I object to the fact that you're talking about adding extra
> syntactic sugar to work around a parser-implementation problem, not an
> actual syntax problem (since UPDATE SET is unambiguous, isn't it?).

I fail to see the point of such an objection. We have an LALR parser
(generated by bison). That implies a certain expressiveness. You're
suggesting that we change to a different kind of parser?

I don't think it's necessarily unambiguous. I'm not particularly
motivated to prove it though - the point is that we rely on bison to
prevent ambiguities. That only works if we're listening. And not if
we're silencing warnings about ambiguities over the whole grammar.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-04-23 Thread Pavel Stehule

2015-04-23 15:47 GMT+02:00 Robert Haas :

> On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule 
> wrote:
> > I don't see a contradiction. There is clean agreement, so ERROR level
> should
> > to show the context. NOTICE and WARNINGs doesn't need it - and there is a
> > backward compatibility and usability reasons don't do it.
>
> Whether notices and warnings need it is a matter of opinion.  I don't
> think your idea is bad, and it might be a good rule of thumb in many
> cases, but I slightly prefer Marko's approach of adding a new option.
>

I am not sure if I understand to you.

please, can you write more about your idea?




>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] PL/pgSQL, RAISE and error context

On Thu, Apr 23, 2015 at 4:56 AM, Pavel Stehule  wrote:
> I don't see a contradiction. There is clean agreement, so ERROR level should
> to show the context. NOTICE and WARNINGs doesn't need it - and there is a
> backward compatibility and usability reasons don't do it.

Whether notices and warnings need it is a matter of opinion.  I don't
think your idea is bad, and it might be a good rule of thumb in many
cases, but I slightly prefer Marko's approach of adding a new option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Wed, Apr 22, 2015 at 8:55 PM, Bruce Momjian  wrote:
> On Wed, Apr 22, 2015 at 06:36:23PM -0500, Jim Nasby wrote:
>> On 4/22/15 6:12 PM, Bruce Momjian wrote:
>> >My point is that for the life of 200M transactions, you would have the
>> >overhead of an additional file per table in the file system, and updates
>> >of that.  I just don't know if the overhead over the long time period
>> >would be smaller than the VACUUM FREEZE.  It might be fine --- I don't
>> >know.  People seem to focus on the big activities, while many small
>> >activities can lead to larger slowdowns.
>>
>> Ahh. This wouldn't be for the life of 200M transactions; it would be
>> a permanent fork, just like the VM is.
>
> Right.  My point is that either you do X 2M times to maintain that fork
> and the overhead of the file existance, or you do one VACUUM FREEZE.  I
> am saying that 2M is a large number and adding all those X's might
> exceed the cost of a VACUUM FREEZE.

I agree, but if we instead make this part of the visibility map
instead of a separate fork, the cost is much less.  It won't be any
more expensive to clear 2 consecutive bits any time a page is touched
than it is to clear 1.  The VM fork will be twice as large, but still
tiny.  And the fact that you'll have only half as many pages mapping
to the same VM page may even improve performance in some cases by
reducing contention.  Even when it reduces performance, I think the
impact will be so tiny as not to be worth caring about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Thu, Apr 23, 2015 at 4:19 AM, Simon Riggs  wrote:
> We were talking about having an incremental backup map also. Which sounds a
> lot like the freeze map.

Yeah, possibly.  I think we should try to set things up so that the
backup map can be updated asynchronously by a background worker, so
that we're not adding more work to the foreground path just for the
benefit of maintenance operations.  That might make the logic for
autovacuum to use it a little bit more complex, but it seems
manageable.

> We only need a freeze/backup map for larger relations. So if we map 1000
> blocks per map page, we skip having a map at all when size < 1000.

Agreed.  We might also want to map multiple blocks per map slot - e.g.
one slot per 32 blocks.  That would keep the map quite small even for
very large relations, and would not compromise efficiency that much
since reading 256kB sequentially probably takes only a little longer
than reading 8kB.

I think the idea of integrating the freeze map into the VM fork is
also worth considering.  Then, the incremental backup map could be
optional; if you don't want incremental backup, you can shut it off
and have less overhead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-04-23 Thread Geoff Winkless

On 23 April 2015 at 13:51, Andres Freund  wrote:

> On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless <
> pgsqlad...@geoff.dj> wrote:
> 
> >
>  And what's to stop me having "SELECT true
> 
> AS

>do" in the where clause (as per your UPDATE objection)?
>
> A syntax error. DO is a reserved keyword. Update is just unreserved (and
> thus can be used as a column label). Ignore is unreserved with the patch
> and was unreserved before.  We obviously can make both reserved, but of so
> we have to do it for real, not by hiding the conflicts
>

Sorry, I misunderstood: so it's not the fact that it can't be used as a
column label (because it can) but the fact that it can't then be referenced
within a WHERE clause without quoting
. Which is in itself utterly horrible, but that's a separate argument and I
can at least now understand your point.

So I could end up with

INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
update UPDATE update=1

but I would have to do

INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE
"do" UPDATE "do"=1

?

> >Apologies for butting in but can I (as a user) express a preference as
> >a
> 
> user against DO?
>
> Sure. If you propose an alternative ;)
>

Maybe I'm misreading it, but isn't index_predicate meant to be inside the
brackets?

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

certainly states that.

It's not one right now. And ignore isn't a keyword at all atm.
>

As I said, it's my personal belief that anyone using keywords (of any
kind) unquoted deserves what they get, but I see your point.

I think I object to the fact that you're talking about adding extra
syntactic sugar to work around a parser-implementation problem, not an
actual syntax problem (since UPDATE SET is unambiguous, isn't it?).

(Please don't top post)
>

Mea culpa. I blame google :)

FWIW "DO IGNORE" just reads disgustingly. If you do finally go down the DO
route, perhaps "DO NOTHING"? :)

Geoff

Re: [HACKERS] Allow SQL/plpgsql functions to accept record

2015-04-23 Thread Pavel Stehule

2015-04-23 15:19 GMT+02:00 Robert Haas :

> On Wed, Apr 22, 2015 at 6:12 PM, Jim Nasby 
> wrote:
> > On 4/22/15 2:12 PM, Merlin Moncure wrote:
> >> That being said, I think json types with their associated API, given
> >> that they are core types, will ultimately handle these types of
> >> problems.  That way, at least, we can avoid adding syntax and
> >> functionality that will basically do the same thing.  This reminds me
> >> a little bit of the json_build() vs enhanced row() syntax we discussed
> >> some time back.  I didn't say so at the time, but for posterity, I
> >> think you were right...json_build() is working fine for building
> >> arbitrary record types and moving a record to json and deconstructing
> >> it should work just as well.
> >
> > The one part I don't care for in that is it seems rather inefficient to
> cast
> > something to JSON just so we can do things we really should be able to do
> > with a record. But perhaps it's not all that costly.
> >
> > As for allowing SQL and plpgsql functions to accept a record, I think our
> > JSON functionality just provided plenty of reason we should allow
> accepting
> > them, even if you can't do much with it: you *can* hand it to
> row_to_json(),
> > which does allow you to do something useful with it. So it seems
> reasonable
> > to me that we should at least accept it as a function argument.
>
> I agree that that would be useful.  I think the problem with an
> expression like rowvar.something is that PL/pgsql cannot infer the
> type of the result, and nothing else works without that.  I doubt that
> it's practical to lift that restriction.  PL/pgsql could introduce
> dedicated syntax for this operation, like DYNAMIC_EXTRACT(rowvar,
> colname, resulttype) or something, but that's going to be clunky at
> best.  Whether we eventually do that or not, if we can allow rows to
> be passed in and then let people use json or hstore operators on them,
> that would be a helpful step forward, IMHO.  I'm not sure if that's
> practical either, but maybe...
>

this need significant changes in plpgsql - it can enforce work with
different types in expressions in cycle - so we should to leave expressions
based on plans or we have to introduce alternated plans for different input
types.

Pavel




>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

On Wed, Apr 22, 2015 at 8:57 PM, Bruce Momjian  wrote:
> On Tue, Apr 21, 2015 at 05:36:41PM -0400, Robert Haas wrote:
>> On Mon, Apr 20, 2015 at 5:41 PM, Bruce Momjian  wrote:
>> > On Mon, Apr 20, 2015 at 05:04:14PM -0400, Robert Haas wrote:
>> >> On Mon, Apr 20, 2015 at 4:11 PM, Bruce Momjian  wrote:
>> >> > Slightly improved patch applied.
>> >>
>> >> Is it?
>> >
>> > The patch has a slightly modified 'if' statement to check a constant
>> > before calling a function, and use elseif:
>> >
>> > < + if (!interpretOidsOption(stmt->options, true) && 
>> > cxt.hasoids)
>> > ---
>> > > + if (cxt.hasoids && !interpretOidsOption(stmt->options, 
>> > true))
>> > 47c57
>> > < + if (interpretOidsOption(stmt->options, true) && 
>> > !cxt.hasoids)
>> > ---
>> > > + else if (!cxt.hasoids && 
>> > interpretOidsOption(stmt->options, true))
>> >
>> > I realize the change is subtle.
>>
>> What I meant was - I didn't see an attachment on that message.
>
> I didn't attach it as people have told me they can just as easily see
> the patch via git, and since it was so similar, I didn't repost it.
> Should I have?  I can easily do that.

No, I just misread your email.  I thought you said you had attached
the patch; rereading it, I see that you said you had applied the
patch.  Silly me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes