date:20150406

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-04-06 Thread Kouhei Kaigai

Hanada-san,

Thanks for your dedicated efforts for remote-join feature.
Below are the comments from my side.


* Bug - mixture of ctid system column and whole row-reference
In case when "ctid" system column is required, deparseSelectSql()
adds ctid reference on the base relation scan level.
On the other hands, whole-row reference is transformed to
a reference to the underlying relation. It will work fine if
system column is not specified. However, system column reference
breaks tuple layout from the expected one.
Eventually it leads an error.

postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b;
ERROR:  malformed record literal: "(2,2,bbb,"(0,2)")"
DETAIL:  Too many columns.
CONTEXT:  column "" of foreign table "foreign join"
STATEMENT:  select ft1.ctid,ft1 from ft1,ft2 where a=b;

postgres=# explain verbose
   select ft1.ctid,ft1 from ft1,ft2 where a=b;
   QUERY PLAN

 Foreign Scan  (cost=200.00..208.35 rows=835 width=70)
   Output: ft1.ctid, ft1.*
   Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9,
 a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT
b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1))

"l" of the first SELECT represents a whole-row reference.
However, underlying SELECT contains system columns in its target-
list.

Is it available to construct such a query?
  SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
  ^^
Probably, it is a job of deparseProjectionSql().


* postgresGetForeignJoinPaths()
It walks on the root->simple_rel_array to check whether
all the relations involved are manged by same foreign
server with same credential.
We may have more graceful way for this. Pay attention on
the fdw_private field of innerrel/outerrel. If they have
a valid fdw_private, it means FDW driver (postgres_fdw)
considers all the underlying relations scan/join are
available to run the remote-server.
So, all we need to check is whether server-id and user-id
of both relations are identical or not.


* merge_fpinfo()
It seems to me fpinfo->rows should be joinrel->rows, and
fpinfo->width also should be joinrel->width.
No need to have special intelligence here, isn't it?


* explain output

EXPLAIN output may be a bit insufficient to know what does it
actually try to do.

postgres=# explain select * from ft1,ft2 where a = b;
   QUERY PLAN

 Foreign Scan  (cost=200.00..212.80 rows=1280 width=80)
(1 row)

Even though it is not an immediate request, it seems to me
better to show users joined relations and remote ON/WHERE
clause here.


Please don't hesitate to consult me, if you have any questions.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
> Sent: Friday, April 03, 2015 7:32 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown;
> pgsql-hackers@postgreSQL.org
> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> 
> Attached is the patch which adds join push-down support to postgres_fdw (v7).
> It supports SELECT statements with JOIN, but has some more possible 
> enhancements
> (see below).  I'd like to share the WIP patch here to get comments about new 
> FDW
> API design provided by KaiGai-san's v11 patch.
> 
> To make reviewing easier, I summarized changes against Custom/Foreign join v11
> patch.
> 
> Changes for Join push-down support
> ==
> - Add FDW API GetForeignJoinPaths().  It generates a ForeignPath which 
> represents
> a scan against pseudo join relation represented by given RelOptInfo.
> - Expand deparsing module to handle multi-relation queries.  Steps of 
> deparsing
> a join query:
> 
> 1) Optimizer calls postgresGetForeignPaths() for each BASEREL.  Here
> postgres_fdw does the same things as before, except adding column aliases in 
> SELECT
> clause.
> 2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL.  Optimizer
> calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so postgres_fdw
> should consider both A JOIN B and B JOIN A in one call.
> 
> postgres_fdw checks whether the join can be pushed down.
> 
> a) Both outer and inner relations can be pushed down (NULL in
> RelOptInfo#fdw_private indicates such situation)
> b) Outmost command is a SELECT (this can be relaxed in the future)
> c) Join type is inner or one of outer
> d) Server of all relations in the join are identical
> e) Effective user id for all relations in the join are identical (they might 
> be
> different some were accessed via views)
> f) No local filters (this can be relaxed if inner && non-volatile)
> g) Join conditions doesn't contain any "unsafe"

[HACKERS] Ignoring some binaries generated in src/test

2015-04-06 Thread Michael Paquier

Hi all,

A couple of binaries in src/test, that are not part of the main make
flow, can be built but they are actually not ignored in the tree:
examples/testlibpq
examples/testlibpq2
examples/testlibpq3
examples/testlibpq4
examples/testlo
examples/testlo64
locale/test-ctype
thread/thread_test
I recall that some of them were target for removal, still shouldn't
they have their own entries in a .gitignore, like in the patch
attached?
Regards,
-- 
Michael
diff --git a/src/test/examples/.gitignore b/src/test/examples/.gitignore
new file mode 100644
index 000..1957ec1
--- /dev/null
+++ b/src/test/examples/.gitignore
@@ -0,0 +1,6 @@
+/testlibpq
+/testlibpq2
+/testlibpq3
+/testlibpq4
+/testlo
+/testlo64
diff --git a/src/test/locale/.gitignore b/src/test/locale/.gitignore
new file mode 100644
index 000..620d3df
--- /dev/null
+++ b/src/test/locale/.gitignore
@@ -0,0 +1 @@
+/test-ctype
diff --git a/src/test/thread/.gitignore b/src/test/thread/.gitignore
new file mode 100644
index 000..1d54d54
--- /dev/null
+++ b/src/test/thread/.gitignore
@@ -0,0 +1 @@
+/thread_test

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind and log messages

2015-04-06 Thread Michael Paquier

On Mon, Apr 6, 2015 at 9:10 PM, Fujii Masao  wrote:
> I'm not familiar with native language support (sorry), but don't we need to
> add the shortcut of gettext into every calls of pg_log and pg_fatal, e.g.,
> change pg_fatal("xxx") to pg_fatal(_("xxx"))? I know that fprintf() in
> pg_Log_v() has such shortcut, but I'm not sure if that's enough or not.

I think I addressed those things...

>  case PG_FATAL:
> -printf("\n%s", _(message));
> -printf("%s", _("Failure, exiting\n"));
> +fprintf(stderr, _("\n%s: fatal: %s\n"), progname, message);
> +fprintf(stderr, _("Failure, exiting\n"));
>
> Why do we need to output the error level like fatal? This seems also
> inconsistent with the log message policy of other tools.

initdb and psql do not for a couple of warning messages, but perhaps I
am just taking consistency with the original code too seriously.

> Why do we need to output \n at the head of the message?
> The second message "Failure, exiting" is really required?

I think that those things were here to highlight the fact that this is
a fatal bailout, but the error code would help the same way I guess...

>> I eliminated a bunch of
>> newlines in the log messages that seemed really unnecessary to me,
>> simplifying a bit the whole. While hacking this stuff, I noticed as
>> well that pg_rewind could be called as root on non-Windows platform,
>> that's dangerous from a security point of view as process manipulates
>> files in PGDATA. Hence let's block that. On Windows, a restricted
>> token should be used.
>
> Good catch! I think it's better to implement it as a separate patch
> because it's very different from log message problem.

Attached are new patches:
- 0001 fixes the restriction issues: restricted token on Windows, and
forbid non-root user on non-Windows.
- 0002 includes the improvements for logging, addressing the comments above.

Regards,
-- 
Michael
From 6fa9c9d427352a01d589ce1871b6adecd88cf49c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 7 Apr 2015 11:21:17 +0900
Subject: [PATCH 1/2] Fix process handling of pg_rewind

To begin with, pg_rewind should not be allowed to run as root on
non-Windows platforms as it manipulates data folders, and file permissions.
On Windows platforms, it can run under a user that has Administrator rights
but in this case a restricted token needs to be used.
---
 src/bin/pg_rewind/pg_rewind.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index dda3a79..200e001 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -24,6 +24,7 @@
 #include "access/xlog_internal.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
+#include "common/restricted_token.h"
 #include "getopt_long.h"
 #include "storage/bufpage.h"
 
@@ -174,6 +175,21 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/*
+	 * Don't allow pg_rewind to be run as root, to avoid overwriting the
+	 * ownership of files in the data directory. We need only check for root
+	 * -- any other user won't have sufficient permissions to modify files in
+	 * the data directory.
+	 */
+#ifndef WIN32
+	if (geteuid() == 0)
+		pg_fatal("cannot be executed by \"root\"\n"
+ "You must run %s as the PostgreSQL superuser.\n",
+ progname);
+#endif
+
+	get_restricted_token(progname);
+
 	/* Connect to remote server */
 	if (connstr_source)
 		libpqConnect(connstr_source);
-- 
2.3.5

From 79103f93393e34e1a795001afd3f43a3a8201500 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 6 Apr 2015 17:18:21 +0900
Subject: [PATCH 2/2] Fix inconsistent handling of logs in pg_rewind

pg_rewind was handling a couple of things differently compared to the
other src/bin utilities:
- Logging output needs to be flushed on stderr, not stdout
- Logging messages should be prefixed with the application name
- pg_fatal exits with error code of 1, but it was sometimes followed by
subsequent logs, making this information lost in the process.
---
 src/bin/pg_rewind/copy_fetch.c  | 24 -
 src/bin/pg_rewind/datapagemap.c |  4 ++-
 src/bin/pg_rewind/file_ops.c| 30 +++---
 src/bin/pg_rewind/filemap.c | 16 ++--
 src/bin/pg_rewind/libpq_fetch.c | 52 ++---
 src/bin/pg_rewind/logging.c | 13 +-
 src/bin/pg_rewind/logging.h |  2 +-
 src/bin/pg_rewind/parsexlog.c   | 20 +++
 src/bin/pg_rewind/pg_rewind.c   | 57 ++---
 src/bin/pg_rewind/pg_rewind.h   |  2 ++
 src/bin/pg_rewind/timeline.c| 27 +++
 11 files changed, 122 insertions(+), 125 deletions(-)

diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c
index 887fec9..d3430e5 100644
--- a/src/bin/pg_rewind/copy_fetch.c
+++ b/src/bin/pg_rewind/copy_fetch.c
@@ -58,7 +58,7 @@ recurse_dir(const char *datadir, const char *parentpath,
 
 	xl

[HACKERS] Typo in a comment in set_rel_size()

2015-04-06 Thread Amit Langote


Hi,

Attached fixes what I suppose is a typo:

 * so set up a single dummy path for it.  Here we only check this for
 * regular baserels; if it's an otherrel, CE was already checked in
-* set_append_rel_pathlist().
+* set_append_rel_size().
 *

Thanks,
Amit
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..c4b0c79 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -241,7 +241,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * We proved we don't need to scan the rel via constraint exclusion,
 		 * so set up a single dummy path for it.  Here we only check this for
 		 * regular baserels; if it's an otherrel, CE was already checked in
-		 * set_append_rel_pathlist().
+		 * set_append_rel_size().
 		 *
 		 * In this case, we go ahead and set up the relation's path right away
 		 * instead of leaving it for set_rel_pathlist to do.  This is because

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind and log messages

2015-04-06 Thread Michael Paquier

On Mon, Apr 6, 2015 at 10:01 PM, Alvaro Herrera wrote:
> I'm not sure about translation of generic strings such as "%s: %s".  My
> first impression is that they shouldn't be translated, but maybe it is
> important that they are for languages I don't know nothing about such as
> Japanese.

I misunderstood things here, pg_fatal and others should not use _()
directly for generic strings. It seems like an overkill.

> Another thing is compound messages like "foo has happened\nSee --help
> for usage.\n" and "bar didn't happen\.See --help for usage".  In those
> cases, the "see --help" part would need to be translated over and over,
> so it's best to separate them in phrases to avoid repetitive work for
> translators.  Not sure how to do this -- maybe something like
> _("foo has happened\n") _("See --help")
> but I'm not sure how to appease the compiler.  Having them in two
> separate pg_log() calls (or whatever) was handy for this reason.

OK, let's switch to two calls to pg_log or similar in those cases to
make the life of translators easier.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko

On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby  wrote:
> On 4/6/15 5:18 PM, Greg Stark wrote:
>>
>> Only I would suggest thinking of it in terms of two orthogonal boolean
>> flags rather than three states. It's easier to reason about whether a
>> table has a specific property than trying to control a state machine in
>> a predefined pathway.
>>
>> So I would say the two flags are:
>> READONLY: guarantees nothing can be dirtied
>> ALLFROZEN: guarantees no unfrozen tuples are present
>>
>> In practice you can't have the later without the former since vacuum
>> can't know everything is frozen unless it knows nobody is inserting. But
>> perhaps there will be cases in the future where that's not true.
>
>
> I'm not so sure about that. There's a logical state progression here (see
> below). ISTM it's easier to just enforce that in one place instead of a
> bunch of places having to check multiple conditions. But, I'm not wed to a
> single field.
>
>> Incidentally there are number of other optimisations tat over had in
>> mind that are only possible on frozen read-only tables:
>>
>> 1) Compression: compress the pages and pack them one after the other.
>> Build a new fork with offsets for each page.
>>
>> 2) Automatic partition elimination where the statistics track the
>> minimum and maximum value per partition (and number of tuples) and treat
>> then as implicit constraints. In particular it would magically make read
>> only empty parent partitions be excluded regardless of the where clause.
>
>
> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
> uncompact and re-compact for #1 when you actually freeze (which maybe isn't
> worth it), but freezing isn't absolutely required. #2 would only require
> that everything in the relation is visible; not frozen.
>
> I think there's value here to having an ALLVISIBLE state as well as
> ALLFROZEN.
>

Based on may suggestions, I'm going to deal with FM at first as one
patch. It would be simply mechanism and similar to VM, at first patch.
- Each bit of FM represent single page
- The bit is set only by vacuum
- The bit is un-set by inserting and updating and deleting

At second, I'll deal with simply read-only table and 2 states,
Read/Write(default) and ReadOnly as one patch. ITSM the having the
Frozen state needs to more discussion. read-only table just allow us
to disable any updating table, and it's controlled by read-only flag
pg_class has. And DDL command which changes these status is like ALTER
TABLE SET READ ONLY, or READ WRITE.
Also as Alvaro's suggested, the read-only table affect not only
freezing table but also performance optimization. I'll consider
including them when I deal with read-only table.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PATCH: use foreign keys to improve join estimates v1

2015-04-06 Thread Tomas Vondra


Hi,

attached is a first version of a patch that aims to improve cardinality 
estimates of joins by matching foreign keys between the tables (which 
was ignored by the planner until now).


This significantly improves estimates when joining two tables using 
multi-column conditions, matching a foreign key between the tables.


Consider for example this simple case

CREATE TABLE dim (a int, b int, primary key (a,b));
CREATE TABLE fact (a int, b int, foreign key (a,b) references dim(a,b));

INSERT INTO dim SELECT i,i FROM generate_series(1,100) s(i);
INSERT INTO fact SELECT i,i FROM generate_series(1,100) s(i);

ANALYZE dim;
ANALYZE fact;

EXPLAIN SELECT * FROM fact f JOIN dim d USING (a,b);

   QUERY PLAN
---
Hash Join  (cost=29425.00..51350.01 rows=1 width=16)
  Hash Cond: ((f.a = d.a) AND (f.b = d.b))
  ->  Seq Scan on fact f (cost=0.00..14425.00 rows=100 width=8)
  ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
  ->  Seq Scan on dim d (cost=0.00..14425.00 rows=100 width=8)
(5 rows)

which is of course completely off, because the query produces 1M rows.

In practice, underestimates like this often cause much more serious 
issues in the subsequent steps - for example the next join would 
probably be executed as nested loop, which makes sense with a single row 
but is an awful choice with 1M rows.


With the patch, the planner realizes there is a matching foreign key, 
and tweaks the selectivities in calc_joinrel_size_estimate().


 QUERY PLAN
-
 Hash Join  (cost=29426.25..250323877.62 rows=150 width=8)
   Hash Cond: ((fact.a = dim.a) AND (fact.b = dim.b))
   ->  Seq Scan on fact  (cost=0.00..14425.50 rows=150 width=8)
   ->  Hash  (cost=14425.50..14425.50 rows=150 width=8)
 ->  Seq Scan on dim  (cost=0.00..14425.50 rows=150 width=8)
(5 rows)


There are a few unsolved questions/issues:

(1) The current patch only does the trick when the FK matches the
conditions perfectly - when there are no missing columns (present
in the FK, not covered by a condition).

I believe this might be relaxed in both directions. When the
conditions don't cover all the FK columns, we know there's at least
one matching row (and we can estimate the number of matches). In
the other direction, we can estimate just the 'extra' conditions.

(2) Adding further conditions may further break the estimates, for
example after adding "WHERE d.a = d.b" this happens

   QUERY PLAN

 Hash Join  (cost=16987.50..33931.50 rows=25 width=8)
   Hash Cond: (f.a = d.a)
   ->  Seq Scan on fact f  (cost=0.00..16925.00 rows=5000 width=8)
 Filter: (a = b)
   ->  Hash  (cost=16925.00..16925.00 rows=5000 width=8)
 ->  Seq Scan on dim d  (cost=0.00..16925.00 rows=5000 width=8)
   Filter: (a = b)
(7 rows)

One issue is that "a=b" condition is poorly estimated due to
correlation (which might be improved by multi-variate stats). It
however removes one of the conditions from the join restrict list,
so it only contains "f.a = d.a" and thus only covers one of the FK
columns, and thus the patch fails to detect the FK :-(

Not sure how to fix this - one way might be performing the check
sooner, before the second join clause is removed (not sure where
that happens). Another option is reconstructing clauses somehow.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5450be4c3af455ddae37b8949b293f8c01fc5bdd Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 1 Apr 2015 22:26:46 +0200
Subject: [PATCH] improve cardinality estimation of joins on foreign keys

When estimating joins with multi-column join clauses, matching
FK constraints, the cardinality estimate is incorrect because
it multiplies selectivities of the clauses.

This may be significant issue for example with models involving
junction tables with multi-column primary keys, or other models
using multi-column primary keys.

CREATE TABLE dimension (a INT, b INT, PRIMARY KEY (a,b));

CREATE TABLE fact (a INT, b INT, FOREIGN KEY (a,b)
  REFERENCES dimension (a,b));

INSERT INTO dimension SELECT i,i
FROM generate_series(1,1000) s(i);

INSERT INTO fact SELECT mod(i,1000)+1, mod(i,1000)+1
   FROM generate_series(1,100) s(i);

ANALYZE;

EXPLAIN SELECT * FROM fact JOIN dimension USING (a,b);

This should estimate the join cardinality as 1.000.000, but it
the actual estimate is 1.000 (because of the multiplication).
The patch fixes this by matching the join clauses and foreign
key constraints in calc_joinrel_size_estimate

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Simon Riggs

On 6 April 2015 at 20:38, David Steele  wrote:

>> The earlier version of pg_audit generated different output.
>> Specifically, it allowed you to generate output for each object
>> tracked; one line per object.

That discussion covers recursive SQL. That is important too, but not
what I am saying.

My point is what we log when an SQL statement covers multiple tables,
e.g. join SELECTs, or inheritance cases, views.

> That is still doable, but is covered by object-level auditing.  Even
> so, multiple log entries are possible (and even likely) with session
> auditing.  See my response to Peter for details.
>
>> The present version can trigger an audit trail event for a
>> statement, without tracking the object that was being audited. This
>> prevents you from searching for "all SQL that touches table X",
>> i.e. we know the statements were generated, but not which ones they
>> were. IMHO that makes the resulting audit trail unusable for
>> auditing purposes. I would like to see that functionality put back
>> before it gets committed, if that occurs.
>
> Bringing this back would be easy (it actually requires removing, not
> adding code) but I'd prefer to make it configurable.

That is my preference also. My concern was raised when it was
*removed* without confirming others agreed.

Typical questions:

Who has written to table X?
Who has read data from table Y yesterday between time1 and time2?
Has anyone accessed a table directly, rather than through a security view?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread David Steele

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 4/6/15 4:47 PM, Simon Riggs wrote:
> On 6 April 2015 at 16:34, Peter Eisentraut 
> wrote:
>> "Audit" is a "big word".  It might imply regulatory or standards 
>> compliance on some level.  We already have ways to log
>> everything.  If customers want "auditing" instead, they will
>> hopefully have a precise requirements set, and we need a way to
>> map that to a system configuration.  I don't think "we need
>> auditing" -> "oh there's this pg_audit thing, and it has a bunch
>> of settings you can play with" is going to be enough of a
>> workflow.
> 
> Yes, this needs better documentation, as does RLS.

Discussions like these definitely help when it comes to knowing what
to put in the documentation.  The "what" is hard enough, the "why"
gets into some scary territory.

Still, audit requirements vary wildly and I'm not sure how much
justice I could do to the topic in the contrib docs.  I think more
discussion of what's technically possible might be more fruitful.

>> For starters, I would consider the textual server log to be
>> potentially lossy in many circumstances, so there would need to
>> be additional information on how to configure that to be robust.
> 
> It was intended to be used with a log filter plugin, to allow it to
> be routed wherever is considered safe.

That would certainly work.

>> (Also, more accurately, this is an "audit trail", not an "audit".
>> An audit is an examination of a system, not a record of
>> interactions with a system.  An audit trail might be useful for
>> an audit.)
> 
> No problem with calling it pg_audit_trail

Nor I.

>> I see value in what you call object auditing, which is something
>> you can't easily do at the moment.  But what you call session
>> auditing seems hardly distinct from statement logging.  If we
>> enhance log_statements a little bit, there will not be a need for
>> an extra module to do almost the same thing.
> 
> Agreed: generating one line per statement isn't much different
> from log_statements.
> 
> The earlier version of pg_audit generated different output. 
> Specifically, it allowed you to generate output for each object 
> tracked; one line per object.

That is still doable, but is covered by object-level auditing.  Even
so, multiple log entries are possible (and even likely) with session
auditing.  See my response to Peter for details.

> The present version can trigger an audit trail event for a
> statement, without tracking the object that was being audited. This
> prevents you from searching for "all SQL that touches table X",
> i.e. we know the statements were generated, but not which ones they
> were. IMHO that makes the resulting audit trail unusable for
> auditing purposes. I would like to see that functionality put back
> before it gets committed, if that occurs.

Bringing this back would be easy (it actually requires removing, not
adding code) but I'd prefer to make it configurable.

- -- 
- - David Steele
da...@pgmasters.net
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVIybuAAoJEIA2uIJQ5SFAm/8P/2gS2oSvxF2VjP3WBJjH0d8p
QHlni3SDBIlJPPE1ZnNYbtUANWKSw2Ublpk50223TeejEnNZfORtA7TZ9qic+3Ei
83yK4SzQcSMA1xqMvGTDS621l4/Nkw/uWKO8BDGePTHRjsEPpgMxJxsHVfNddd5Z
MTP26vXPgyzj1H1GE4jPCi1kR6iiKx3GiagawmNJNgzdOXf25hQijpQ7mR0puw/T
V75MeNr0WNi4CtsyDgNnx0oVKBN4olG6aId6+q3jt+yuxboJ53Nq59xbfvxYUR+3
uPKX9jfwInZxQc+70g2CcKj+EglB9cDn4oaMUkAxqYWKWyRW0O2gs0IIkbQqk8qK
SlfBvAaZA1wfDelCztr8GHc8hLIh+hwb3mJq4zoclPg3+36hUgVyVIyRUjzW42sJ
shvd2KWkxP4iwN1+tru9YK3qZ1GXkZfodtXdJ7iY14k5eXTKBuRgHFO8BRXxW9xp
/KwIgkLD9gEjht6cncgP83lBoaxMFjrQE9N3hzX1wMM5ZYDAisbKK7JkGE2+yCsH
L/aiCOxyHbxaMZATopATbCBhULDMLKl9oICKY+jv17yeyGG5F5D78AWv0tuvk1jW
5enydtXPhcBIXRWIvTZgCfDpFs5Hv5r/+V70tiMQbJIzg2qvxHmC0VLEubxky0XE
TGfavKbTvK9dmw1dhzk5
=5KkP
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-06 Thread Fabrízio de Royes Mello

On Mon, Apr 6, 2015 at 10:21 AM, Sawada Masahiko 
wrote:
>
> On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, I have some trivial comments about the latest patch.
> >
> > At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko <
sawada.m...@gmail.com> wrote in 
> > sawada.mshk> On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby <
jim.na...@bluetreble.com> wrote:
> >> >>> >Are the parenthesis necessary? No other WITH option requires
them, other
> >> >>> >than create table/matview (COPY doesn't actually require them).
> >> >>> >
> >> >>
> >> >> I was imagining EXPLAIN syntax.
> >> >> Is there some possibility of supporting multiple options for REINDEX
> >> >> command in future?
> >> >> If there is, syntax will be as follows, REINDEX { INDEX | ... } name
> >> >> WITH VERBOSE, XXX, XXX;
> >> >> I thought style with parenthesis is better than above style.
> >> >
> >> >
> >> > The thing is, ()s are actually an odd-duck. Very little supports it,
and
> >> > while COPY allows it they're not required. EXPLAIN is a different
story,
> >> > because that's not WITH; we're actually using () *instead of* WITH.
> >> >
> >> > So because almost all commands that use WITH doen't even accept (),
I don't
> >> > think this should either. It certainly shouldn't require them,
because
> >> > unlike EXPLAIN, there's no need to require them.
> >> >
> >>
> >> I understood what your point is.
> >> Attached patch is changed syntax, it does not have parenthesis.
> >
> > As I looked into the code to find what the syntax would be, I
> > found some points which would be better to be fixed.
> >
> > In gram.y the options is a list of cstring but it is not necesary
> > to be a list because there's only one kind of option now.
> >
> > If you prefer it to be a list, I have a comment for the way to
> > make string list in gram.y. You stored bare cstring in the
> > options list but I think it is not the preferable form. I suppose
> > the followings are preferable. Corresponding fixes are needed in
> > ReindexTable, ReindexIndex, ReindexMultipleTables.
> >
> > $ = list_make1(makeString($1);
> >  
> > $ = lappend($1, list_make1(makeString($3));
> >
> >
> > In equalfuncs.c, _equalReindexStmt forgets to compare the member
> > options. _copyReindexStmt also forgets to copy it. The way you
> > constructed the options list prevents them from doing their jobs
> > using prepared methods. Comparing and copying the member "option"
> > is needed even if it becomes a simple string.
> >
>
> I revised patch, and changed gram.y as I don't use the list.
> So this patch adds new syntax,
> REINDEX { INDEX | ... } name WITH VERBOSE;
>
> Also documentation is updated.
> Please give me feedbacks.
>

Some notes:

1) There are a trailing space in src/backend/parser/gram.y:

-   | REINDEX DATABASE name opt_force
+   | REINDEX reindex_target_multitable name WITH opt_verbose
{
ReindexStmt *n = makeNode(ReindexStmt);
-   n->kind = REINDEX_OBJECT_DATABASE;
+   n->kind = $2;
n->name = $3;
n->relation = NULL;
+   n->verbose = $5;
$$ = (Node *)n;
}
;


2) The documentation was updated and is according the behaviour.


3) psql autocomplete is ok.


4) Lack of regression tests. I think you should add some regression like
that:

fabrizio=# \set VERBOSITY terse
fabrizio=# create table reindex_verbose(id integer primary key);
CREATE TABLE
fabrizio=# reindex table reindex_verbose with verbose;
INFO:  index "reindex_verbose_pkey" was reindexed.
REINDEX


5) Code style and organization is ok


6) You should add the new field ReindexStmt->verbose to
src/backend/nodes/copyfuncs.c


Regards,


--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread David Steele

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 4/6/15 4:34 PM, Peter Eisentraut wrote:
> On 2/14/15 9:34 PM, David Steele wrote:
>> The patch I've attached satisfies the requirements that I've had
>> from customers in the past.
> 
> What I'm missing is a more precise description/documentation of
> what those requirements might be.

Admittedly I'm not a financial or ISO certification auditor, but I've
been in the position of providing data to auditors on many of
occasions.  The requests generally fall into three categories:

1) Data requests.  Perhaps all the CDRs for a particular customer for
a particular month.  Bulk data requests are not addressed by pg_audit.

2) DDL log.  A list of all DDL changes made to the database.  For
instance, the last time a function was updated and who did it.  The
auditor would like to be sure that the function update timestamp
matches up with the last maintenance window and the person who is on
record as having done the updates.

3) DML log.  This can be done with triggers, but requires quite a bit
of work and vigilance.

> "Audit" is a "big word".  It might imply regulatory or standards 
> compliance on some level.  We already have ways to log everything.
> If customers want "auditing" instead, they will hopefully have a
> precise requirements set, and we need a way to map that to a
> system configuration.  I don't think "we need auditing" -> "oh
> there's this pg_audit thing, and it has a bunch of settings you can
> play with" is going to be enough of a workflow.  For starters, I
> would consider the textual server log to be potentially lossy in
> many circumstances, so there would need to be additional
> information on how to configure that to be robust.

Nothing is perfect, but there's a big difference between being able to
log everything and being able to use the data you logged to satisfy an
audit.  Auditors tend to be reasonably tech savvy but there are
limits.  An example of how pg_audit can provide better logging is at
the end of this email.

I agree that server logs are potentially lossy but that really
describes anywhere audit records might be written.  Agreed that there
are better ways to do it (like writing back to the DB, or a remote
DB), but I thought those methods should be saved for a future version.

In my past experience having retention policies in place and being
able to show that they normally work are enough to satisfy an auditor.
 Accidents happen and that's understood - as long as an explanation
for the failure is given.  Such as, "We lost a backup tape, here's the
ticket for the incident and the name of the employee who handled the
case so you can follow up."  Or, "On this date we had a disk failure
and lost the logs before the could be shipped, here's the ticket, etc."

> (Also, more accurately, this is an "audit trail", not an "audit".
> An audit is an examination of a system, not a record of
> interactions with a system.  An audit trail might be useful for an
> audit.)

You are correct and I'd be happy to call it pg_audit_trail (as Simon
suggested) or pg_audit_log or something that's more descriptive.

> I see value in what you call object auditing, which is something
> you can't easily do at the moment.  But what you call session
> auditing seems hardly distinct from statement logging.  If we
> enhance log_statements a little bit, there will not be a need for
> an extra module to do almost the same thing.

Even with session auditing you can have multiple log entries per
backend call.  This is particularly true for DO blocks and functions
calls.

Here's a relatively simple example, but it shows how complex this
could get.  Let's say we have a DO block with dynamic SQL:

do $$
declare
table_name text = 'do_table';
begin
execute 'create table ' || table_name || ' ("weird name" int)';
execute 'drop table ' || table_name;
end; $$

Setting log_statement=all will certain work but you only get the DO
block logged:

LOG:  statement: do $$
declare
table_name text = 'do_table';
begin
execute 'create table ' || table_name || ' ("weird name" int)';
execute 'drop table ' || table_name;
end; $$

With pg_audit you get (forgive the LFs that email added) much more:

LOG:  AUDIT: SESSION,38,1,FUNCTION,DO,,,"do $$
declare
table_name text = 'do_table';
begin
execute 'create table ' || table_name || ' (""weird name"" int)';
execute 'drop table ' || table_name;
end; $$"
LOG:  AUDIT: SESSION,38,2,DDL,CREATE
TABLE,TABLE,public.do_table,"CREATE  TABLE  public.do_table (""weird
name"" pg_catalog.int4   )  WITH (oids=OFF)  "
LOG:  AUDIT: SESSION,38,3,DDL,DROP TABLE,TABLE,public.do_table,drop
table do_table

Not only is the DO block logged but each sub statement is logged as
well.  They are logically grouped by the statement ID (in this case
38) so it's clear they were run as a single command.  The commands
(DO, DROP TABLE, CREATE TABLE) and fully-qualified object names are
provided and the statements are quoted and escaped when need

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Jim Nasby


On 4/6/15 5:18 PM, Greg Stark wrote:

Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a
table has a specific property than trying to control a state machine in
a predefined pathway.

So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are present

In practice you can't have the later without the former since vacuum
can't know everything is frozen unless it knows nobody is inserting. But
perhaps there will be cases in the future where that's not true.


I'm not so sure about that. There's a logical state progression here 
(see below). ISTM it's easier to just enforce that in one place instead 
of a bunch of places having to check multiple conditions. But, I'm not 
wed to a single field.



Incidentally there are number of other optimisations tat over had in
mind that are only possible on frozen read-only tables:

1) Compression: compress the pages and pack them one after the other.
Build a new fork with offsets for each page.

2) Automatic partition elimination where the statistics track the
minimum and maximum value per partition (and number of tuples) and treat
then as implicit constraints. In particular it would magically make read
only empty parent partitions be excluded regardless of the where clause.


AFAICT neither of those actually requires ALLFROZEN, no? You'll need to 
uncompact and re-compact for #1 when you actually freeze (which maybe 
isn't worth it), but freezing isn't absolutely required. #2 would only 
require that everything in the relation is visible; not frozen.


I think there's value here to having an ALLVISIBLE state as well as 
ALLFROZEN.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Greg Stark

On 6 Apr 2015 09:17, "Jim Nasby"  wrote:
>
>
> No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.
>
> The Frozen state would only be set by the vacuum code, IFF:
> - The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
> - Vacuum ensured that there were no un-frozen tuples in the table
>
> That does not necessitate 2 scans.

This is exactly what I would suggest.

Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a table
has a specific property than trying to control a state machine in a
predefined pathway.

So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are present

In practice you can't have the later without the former since vacuum can't
know everything is frozen unless it knows nobody is inserting. But perhaps
there will be cases in the future where that's not true.

Incidentally there are number of other optimisations tat over had in mind
that are only possible on frozen read-only tables:

1) Compression: compress the pages and pack them one after the other. Build
a new fork with offsets for each page.

2) Automatic partition elimination where the statistics track the minimum
and maximum value per partition (and number of tuples) and treat then as
implicit constraints. In particular it would magically make read only empty
parent partitions be excluded regardless of the where clause.

Re: [HACKERS] BRIN range operator class

2015-04-06 Thread Alvaro Herrera

Thanks for the updated patch; I will at it as soon as time allows.  (Not
really all that soon, regrettably.)

Judging from a quick look, I think patches 1 and 5 can be committed
quickly; they imply no changes to other parts of BRIN.  (Not sure why 1
and 5 are separate.  Any reason for this?)  Also patch 2.

Patch 4 looks like a simple bugfix (or maybe a generalization) of BRIN
framework code; should also be committable right away.  Needs a closer
look of course.

Patch 3 is a problem.  That code is there because the union proc is only
used in a corner case in Minmax, so if we remove it, user-written Union
procs are very likely to remain buggy for long.  If you have a better
idea to test Union in Minmax, or some other way to turn that stuff off
for the range stuff, I'm all ears.  Just lets make sure the support
procs are tested to avoid stupid bugs.  Before I introduced that, my
Minmax Union proc was all wrong.

Patch 7 I don't understand.  Will have to look closer.  Are you saying
Minmax will depend on Btree opclasses?  I remember thinking in doing it
that way at some point, but wasn't convinced for some reason.

Patch 6 seems the real meat of your own stuff.  I think there should be
a patch 8 also but it's not attached ... ??

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Alvaro Herrera

Simon Riggs wrote:

> The present version can trigger an audit trail event for a statement,
> without tracking the object that was being audited. This prevents you
> from searching for "all SQL that touches table X", i.e. we know the
> statements were generated, but not which ones they were. IMHO that
> makes the resulting audit trail unusable for auditing purposes. I
> would like to see that functionality put back before it gets
> committed, if that occurs.

Is there a consensus that the current version is the one that we should
be reviewing, rather than the one Abhijit submitted?  Last I checked,
that wasn't at all clear.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Simon Riggs

On 6 April 2015 at 16:34, Peter Eisentraut  wrote:
> On 2/14/15 9:34 PM, David Steele wrote:
>> The patch I've attached satisfies the requirements that I've had from
>> customers in the past.
>
> What I'm missing is a more precise description/documentation of what
> those requirements might be.
>
> "Audit" is a "big word".  It might imply regulatory or standards
> compliance on some level.  We already have ways to log everything.  If
> customers want "auditing" instead, they will hopefully have a precise
> requirements set, and we need a way to map that to a system
> configuration.  I don't think "we need auditing" -> "oh there's this
> pg_audit thing, and it has a bunch of settings you can play with" is
> going to be enough of a workflow.

Yes, this needs better documentation, as does RLS.

> For starters, I would consider the
> textual server log to be potentially lossy in many circumstances, so
> there would need to be additional information on how to configure that
> to be robust.

It was intended to be used with a log filter plugin, to allow it to be
routed wherever is considered safe.

> (Also, more accurately, this is an "audit trail", not an "audit".  An
> audit is an examination of a system, not a record of interactions with a
> system.  An audit trail might be useful for an audit.)

No problem with calling it pg_audit_trail

> I see value in what you call object auditing, which is something you
> can't easily do at the moment.  But what you call session auditing seems
> hardly distinct from statement logging.  If we enhance log_statements a
> little bit, there will not be a need for an extra module to do almost
> the same thing.

Agreed: generating one line per statement isn't much different from
log_statements.

The earlier version of pg_audit generated different output.
Specifically, it allowed you to generate output for each object
tracked; one line per object.

The present version can trigger an audit trail event for a statement,
without tracking the object that was being audited. This prevents you
from searching for "all SQL that touches table X", i.e. we know the
statements were generated, but not which ones they were. IMHO that
makes the resulting audit trail unusable for auditing purposes. I
would like to see that functionality put back before it gets
committed, if that occurs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Peter Eisentraut

On 2/14/15 9:34 PM, David Steele wrote:
> The patch I've attached satisfies the requirements that I've had from
> customers in the past.

What I'm missing is a more precise description/documentation of what
those requirements might be.

"Audit" is a "big word".  It might imply regulatory or standards
compliance on some level.  We already have ways to log everything.  If
customers want "auditing" instead, they will hopefully have a precise
requirements set, and we need a way to map that to a system
configuration.  I don't think "we need auditing" -> "oh there's this
pg_audit thing, and it has a bunch of settings you can play with" is
going to be enough of a workflow.  For starters, I would consider the
textual server log to be potentially lossy in many circumstances, so
there would need to be additional information on how to configure that
to be robust.

(Also, more accurately, this is an "audit trail", not an "audit".  An
audit is an examination of a system, not a record of interactions with a
system.  An audit trail might be useful for an audit.)

I see value in what you call object auditing, which is something you
can't easily do at the moment.  But what you call session auditing seems
hardly distinct from statement logging.  If we enhance log_statements a
little bit, there will not be a need for an extra module to do almost
the same thing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Jim Nasby


On 4/6/15 1:28 PM, Josh Berkus wrote:

On 04/06/2015 10:07 AM, Jim Nasby wrote:

Actually, I would start simply with ReadOnly and ReadWrite.

As I understand it, the goal here is to prevent huge amounts of periodic
freeze work due to XID wraparound. I don't think we need the Freeze
state to accomplish that.

With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.

Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that just
getting the first 2 accomplished will be a significant amount of work.

Also, note that you don't really even need the ReadOnly patch. As long
as you're not actually touching the table at all the FM will eventually
read as everything is frozen; that gets you 80% of the way there. So I'd
suggest starting with the FM, then doing ReadOnly, and only then
attempting to add the Frozen state.


+1

There was some reason why we didn't have  Freeze Map before, though;
IIRC these were the problems:

1. would need to make sure it gets sync'd to disk and/or WAL-logged


Same as VM.


2. every time a page is modified, the map would need to get updated


Not everytime, just the first time if FM for a page was set. It would 
only be set by vacuum, just like VM.



3. Yet Another Relation File (not inconsequential for the cases we're
discussing).


Sure, which is why I think it might be interesting to either allow for 
more than one page per bit, or perhaps some form of compression. That 
said, I don't think it's worth worrying about too much because it's 
still a 64,000-1 ratio with 8k pages. If you use 32k pages it becomes 
256,000-1, or 4GB of FM for 1PB of heap.



Also, given that the Visibility Map necessarily needs to have the
superset of the Frozen Map, maybe combining them in some way would make
sense.


The thing is, I think in many workloads the paterns here will actually 
be radically different, in that it's way easier to get a page to be 
all-visible than it is to freeze it.


Perhaps there's something we can do here when we look at other ways to 
reduce space usage for FM (and maybe VM too), but I don't think now is 
the time to put effort into this.



I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely.  I can think of
some ways to do that, but they have severe tradeoffs.


Aside from Alvaro's points, I think many users would find it useful as 
an easy way to ensure no one is writing to a table, which could be 
valuable for any number of reasons. As long as the patch isn't too 
complicated I don't see a reason not to do it.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Josh Berkus

On 04/06/2015 11:35 AM, Alvaro Herrera wrote:
> Josh Berkus wrote:
> 
>> I agree with Jim that if we have a trustworthy Frozen Map, having a
>> ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
>> us to skip updating the individual row XIDs entirely.  I can think of
>> some ways to do that, but they have severe tradeoffs.
> 
> If you're thinking that the READ ONLY flag is only useful for freezing,
> then yeah maybe it's of marginal value.  But in the foreign key
> constraint area, consider that you could have tables with
> frequently-referenced PKs marked as READ ONLY -- then you don't need to
> acquire row locks when inserting/updating rows in the referencing
> tables.  That might give you a good performance benefit that's not in
> any way related to freezing, as well as reducing your multixact
> consumption rate.

H.  Yeah, that would make it worthwhile, although it would be a
fairly obscure bit of performance optimization for anyone not on this
list ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Alvaro Herrera

Josh Berkus wrote:

> I agree with Jim that if we have a trustworthy Frozen Map, having a
> ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
> us to skip updating the individual row XIDs entirely.  I can think of
> some ways to do that, but they have severe tradeoffs.

If you're thinking that the READ ONLY flag is only useful for freezing,
then yeah maybe it's of marginal value.  But in the foreign key
constraint area, consider that you could have tables with
frequently-referenced PKs marked as READ ONLY -- then you don't need to
acquire row locks when inserting/updating rows in the referencing
tables.  That might give you a good performance benefit that's not in
any way related to freezing, as well as reducing your multixact
consumption rate.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Josh Berkus

On 04/06/2015 10:07 AM, Jim Nasby wrote:
> Actually, I would start simply with ReadOnly and ReadWrite.
> 
> As I understand it, the goal here is to prevent huge amounts of periodic
> freeze work due to XID wraparound. I don't think we need the Freeze
> state to accomplish that.
> 
> With a single bit per page in the Frozen Map, checking a 800GB table
> would require reading a mere 100MB of FM. That's pretty tiny, and
> largely accomplishes the goal.
> 
> Obviously it would be nice to eliminate even that 100MB read, but I
> suggest you leave that for a 3rd patch. I think you'll find that just
> getting the first 2 accomplished will be a significant amount of work.
> 
> Also, note that you don't really even need the ReadOnly patch. As long
> as you're not actually touching the table at all the FM will eventually
> read as everything is frozen; that gets you 80% of the way there. So I'd
> suggest starting with the FM, then doing ReadOnly, and only then
> attempting to add the Frozen state.

+1

There was some reason why we didn't have  Freeze Map before, though;
IIRC these were the problems:

1. would need to make sure it gets sync'd to disk and/or WAL-logged

2. every time a page is modified, the map would need to get updated

3. Yet Another Relation File (not inconsequential for the cases we're
discussing).

Also, given that the Visibility Map necessarily needs to have the
superset of the Frozen Map, maybe combining them in some way would make
sense.

I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely.  I can think of
some ways to do that, but they have severe tradeoffs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-04-06 Thread Jim Nasby


On 4/1/15 1:25 AM, Andrew Gierth wrote:

"Michael" == Michael Paquier  writes:


  Michael> For an extension that has a single branch compatible with a
  Michael> set of multiple major versions of Postgres, the cases are
  Michael> custom values for REGRESS_OPTS and REGRESS depending on the
  Michael> backend version. I also manipulate on a daily basis the same
  Michael> set of scripts across many platforms (on Windows as well using
  Michael> msysgit, and MSVC installation does not have pgxs stuff), so I
  Michael> would use it to simplify them. It is true that you can already
  Michael> do that by parsing the output of "pg_config --version",

What _exactly_ would you be doing that you could not already do better
with $(MAJORVERSION) which is already defined in Makefile.global?


FWIW, I'm currently using this as a poor-man's version of configure, to 
deal with some minor changes to function parameters between versions.


I probably would also need to screw around with regression tests but 
I've given up on pretty much the whole idea of our regression 
methodology and use pgTap instead. I do still run it using the built-in 
framework, because of all the other stuff I get. If I had some way to 
just over-ride the notion of "diff the output" I'd do that.


If I wasn't using pgTap I'm pretty sure I'd be having fun and games with 
cross-version output tweaking.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

2015-04-06 Thread Jim Nasby

On 3/31/15 3:46 AM, Ronan Dunklau wrote:

>StringInfo uses int's to store length, so it could possibly be changed,
>but then you'd just error out due to MaxAllocSize.
>
>Now perhaps those could both be relaxed, but certainly not to the extent
>that you can shove an entire 1.6TB row into an output buffer.

Another way to look at it would be to work in small chunks. For the first test
case (rows bigger than 1GB), maybe the copy command could be rewritten to work
in chunks, flushing the output more often if needed.

Possibly; I'm not sure how well the FE/BE protocol or code would 
actually support that.

>The other issue is that there's a LOT of places in code that blindly
>copy detoasted data around, so while we technically support 1GB toasted
>values you're probably going to be quite unhappy with performance. I'm
>actually surprised you haven't already seen this with 500MB objects.
>
>So long story short, I'm not sure how worthwhile it would be to try and
>fix this. We probably should improve the docs though.
>

I think that having data that can't be output by pg_dump is quite surprising,
and if this is not fixable, I agree that it should clearly be documented.

>Have you looked at using large objects for what you're doing? (Note that
>those have their own set of challenges and limitations.)

Yes I do. This particular customer of ours did not mind the performance
penalty of using bytea objects as long as it was convenient to use.

What do they do when they hit 1GB? Presumably if they're this close to 
the limit they're already hitting 1GB, no? Or is this mostly hypothetical?

>

> >We also hit a second issue, this time related to bytea encoding.

>
>There's probably several other places this type of thing could be a
>problem. I'm thinking of conversions in particular.

Yes, thats what the two other test cases I mentioned are about: any conversion
leadng to a size greater than 1GB results in an error, even implicit
conversions like doubling antislashes in the output.

I think the big issue with encoding is going to be the risk of changing 
encoding and ending up with something too large to fit back into 
storage. They might need to consider using something like bytea(990MB).

In any case, I don't think it would be terribly difficult to allow a bit 
more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR 
there's some 1GB limits there too.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Jim Nasby


On 4/6/15 12:29 PM, k...@rice.edu wrote:

On Mon, Apr 06, 2015 at 12:07:47PM -0500, Jim Nasby wrote:

...
As I understand it, the goal here is to prevent huge amounts of
periodic freeze work due to XID wraparound. I don't think we need
the Freeze state to accomplish that.

With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.

Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that
just getting the first 2 accomplished will be a significant amount
of work.



Hi,
I may have my math wrong, but 800GB ~ 100M pages or 12.5MB and not
100MB.


Doh! 8 bits per byte and all that...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread k...@rice.edu

On Mon, Apr 06, 2015 at 12:07:47PM -0500, Jim Nasby wrote:
> ...
> As I understand it, the goal here is to prevent huge amounts of
> periodic freeze work due to XID wraparound. I don't think we need
> the Freeze state to accomplish that.
> 
> With a single bit per page in the Frozen Map, checking a 800GB table
> would require reading a mere 100MB of FM. That's pretty tiny, and
> largely accomplishes the goal.
> 
> Obviously it would be nice to eliminate even that 100MB read, but I
> suggest you leave that for a 3rd patch. I think you'll find that
> just getting the first 2 accomplished will be a significant amount
> of work.
> 

Hi,
I may have my math wrong, but 800GB ~ 100M pages or 12.5MB and not
100MB.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Jim Nasby


On 4/6/15 11:12 AM, Sawada Masahiko wrote:

On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby  wrote:

On 4/6/15 1:46 AM, Sawada Masahiko wrote:


On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes  wrote:


On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby 
wrote:



On 4/3/15 12:59 AM, Sawada Masahiko wrote:



+   case HEAPTUPLE_LIVE:
+   case HEAPTUPLE_RECENTLY_DEAD:
+   case HEAPTUPLE_INSERT_IN_PROGRESS:
+   case HEAPTUPLE_DELETE_IN_PROGRESS:
+   if
(heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
+
mxactcutoff, &frozen[nfrozen]))
+
frozen[nfrozen++].offset
= offnum;
+   break;




This doesn't seem safe enough to me. Can't there be tuples that are
still
new enough that they can't be frozen, and are still live?




Yep.  I've set a table to read only while it contained unfreezable
tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.




Somewhat related... instead of forcing the freeze to happen
synchronously,
can't we set this up so a table is in one of three states? Read/Write,
Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was
also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and
set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code
in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.




+1 here as well.  I might want to set tables to read only for reasons
other
than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to
automatically
trigger a vacuum freeze in the process.



Thank you for comments.


Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three states?
Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
simply change to > the appropriate state, and all the vacuum infrastructure
would continue to process those tables as it does today. lazy_vacuum_rel
would become responsible for tracking if there were any non-frozen tuples if
it was also attempting > a freeze. If it discovered there were none, AND the
table was marked as ReadOnly, then it would change the table state to Frozen
and set relfrozenxid = InvalidTransactionId and relminxid =
InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
Xid as an optimization. Doing it that way leaves all the complicated vacuum
code in one place, and would eliminate concerns about race conditions with
still running transactions, etc.



I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole
tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?



No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.

The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
- Vacuum ensured that there were no un-frozen tuples in the table

That does not necessitate 2 scans.



I understood this comcept, and have question as I wrote below.


+1 here as well.  I might want to set tables to read only for reasons
other than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a
vacuum freeze in the process.

There is another possibility here, too. We can completely divorce a
ReadOnly mode (which I think is useful for other things besides freezing)
from the question of whether we need to force-freeze a relation if we create
a
FrozenMap, similar to the visibility map. This has the added advantage of
helping freeze scans on relations that are not ReadOnly in the case of
tables that are insert-mostly or any other pattern where most pages stay
all-frozen.
Prior to the visibility map this would have been a rather daunting
project, but I believe this could piggyback on the VM code rather nicely.
Anytime you clea

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread David Steele

On 4/6/15 8:40 AM, Sawada Masahiko wrote:
> On Fri, Apr 3, 2015 at 10:01 PM, David Steele  wrote:
>> On 4/3/15 3:59 AM, Sawada Masahiko wrote:
>>> On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
 Let me know if you see any other issues.

>>>
>>> I pulled HEAD, and then tried to compile source code after applied
>>> following "deparsing utility command patch" without #0001 and #0002.
>>> (because these two patches are already pushed)
>>> 
>>>
>>> But I could not use new pg_audit patch due to compile error (that
>>> deparsing utility command patch might have).
>>> I think I'm missing something, but I'm not sure.
>>> Could you tell me which patch should I apply before compiling pg_audit?
>>
>> When Alvaro posted his last patch set he recommended applying them to
>> b3196e65:
>>
>> http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org
>>
>> Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
>> space error) then applying pg_audit-v6.patch works fine.
>>
> 
> I tested v6 patch, but I got SEGV when I executed test() function I
> mentioned up thread.
> Could you address this problem?

Unfortunately I'm not able to reproduce the issue.  Here's what I tried
based on your original test:

postgres=# create table hoge (id int);
CREATE TABLE
postgres=# create function test() returns int as $$
postgres$# declare
postgres$# cur1 cursor for select * from hoge;
postgres$# tmp int;
postgres$# begin
postgres$# open cur1;
postgres$# fetch cur1 into tmp;
postgres$#return tmp;
postgres$# end$$
postgres-# language plpgsql ;
CREATE FUNCTION
postgres=# select test();
 test
--

(1 row)
postgres=# insert into hoge values (1);
INSERT 0 1
postgres=# select test();
 test
--
1
(1 row)

And the log output was:

prefix LOG:  statement: create table hoge (id int);
prefix DEBUG:  EventTriggerInvoke 16385
prefix LOG:  AUDIT: SESSION,3,1,DDL,CREATE
TABLE,TABLE,public.hoge,CREATE  TABLE  public.hoge (id pg_catalog.int4
 )  WITH (oids=OFF)
prefix LOG:  statement: create function test() returns int as $$
declare
cur1 cursor for select * from hoge;
tmp int;
begin
open cur1;
fetch cur1 into tmp;
   return tmp;
end$$
language plpgsql ;
prefix DEBUG:  EventTriggerInvoke 16385
prefix LOG:  AUDIT: SESSION,4,1,DDL,CREATE
FUNCTION,FUNCTION,public.test(),"CREATE  FUNCTION public.test() RETURNS
 pg_catalog.int4 LANGUAGE plpgsql  VOLATILE  CALLED ON NULL INPUT
SECURITY INVOKER COST 100   AS '
declare
cur1 cursor for select * from hoge;
tmp int;
begin
open cur1;
fetch cur1 into tmp;
   return tmp;
end'"
prefix LOG:  statement: select test();
prefix LOG:  AUDIT: SESSION,5,1,READ,SELECT,,,select test();
prefix LOG:  AUDIT:
SESSION,5,2,FUNCTION,EXECUTE,FUNCTION,public.test,select test();
prefix LOG:  AUDIT: SESSION,5,3,READ,SELECT,,,select * from hoge
prefix CONTEXT:  PL/pgSQL function test() line 6 at OPEN
prefix LOG:  statement: insert into hoge values (1);
prefix LOG:  AUDIT: SESSION,6,1,WRITE,INSERT,,,insert into hoge values (1);
prefix LOG:  statement: select test();
prefix LOG:  AUDIT: SESSION,7,1,READ,SELECT,,,select test();
prefix LOG:  AUDIT:
SESSION,7,2,FUNCTION,EXECUTE,FUNCTION,public.test,select test();
prefix LOG:  AUDIT: SESSION,7,3,READ,SELECT,,,select * from hoge
prefix CONTEXT:  PL/pgSQL function test() line 6 at OPEN

Does the above look like what you did?  Any other information about your
environment would also be helpful.  I'm thinking it might be some kind
of build issue.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko

On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby  wrote:
> On 4/6/15 1:46 AM, Sawada Masahiko wrote:
>>
>> On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes  wrote:
>>>
>>> On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby 
>>> wrote:


 On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>
>
> +   case HEAPTUPLE_LIVE:
> +   case HEAPTUPLE_RECENTLY_DEAD:
> +   case HEAPTUPLE_INSERT_IN_PROGRESS:
> +   case HEAPTUPLE_DELETE_IN_PROGRESS:
> +   if
> (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
> +
> mxactcutoff, &frozen[nfrozen]))
> +
> frozen[nfrozen++].offset
> = offnum;
> +   break;



 This doesn't seem safe enough to me. Can't there be tuples that are
 still
 new enough that they can't be frozen, and are still live?
>>>
>>>
>>>
>>> Yep.  I've set a table to read only while it contained unfreezable
>>> tuples,
>>> and the tuples remain unfrozen yet the read-only action claims to have
>>> succeeded.
>>>
>>>

 Somewhat related... instead of forcing the freeze to happen
 synchronously,
 can't we set this up so a table is in one of three states? Read/Write,
 Read
 Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
 the
 appropriate state, and all the vacuum infrastructure would continue to
 process those tables as it does today. lazy_vacuum_rel would become
 responsible for tracking if there were any non-frozen tuples if it was
 also
 attempting a freeze. If it discovered there were none, AND the table was
 marked as ReadOnly, then it would change the table state to Frozen and
 set
 relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
 AT_SetReadWrite could change relfrozenxid to it's own Xid as an
 optimization. Doing it that way leaves all the complicated vacuum code
 in
 one place, and would eliminate concerns about race conditions with still
 running transactions, etc.
>>>
>>>
>>>
>>> +1 here as well.  I might want to set tables to read only for reasons
>>> other
>>> than to avoid repeated freezing.  (After all, the name of the command
>>> suggests it is a general purpose thing) and wouldn't want to
>>> automatically
>>> trigger a vacuum freeze in the process.
>>>
>>
>> Thank you for comments.
>>
>>> Somewhat related... instead of forcing the freeze to happen
>>> synchronously, can't we set this up so a table is in one of three states?
>>> Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
>>> simply change to > the appropriate state, and all the vacuum infrastructure
>>> would continue to process those tables as it does today. lazy_vacuum_rel
>>> would become responsible for tracking if there were any non-frozen tuples if
>>> it was also attempting > a freeze. If it discovered there were none, AND the
>>> table was marked as ReadOnly, then it would change the table state to Frozen
>>> and set relfrozenxid = InvalidTransactionId and relminxid =
>>> InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
>>> Xid as an optimization. Doing it that way leaves all the complicated vacuum
>>> code in one place, and would eliminate concerns about race conditions with
>>> still running transactions, etc.
>>
>>
>> I agree with 3 status, Read/Write, ReadOnly and Frozen.
>> But I'm not sure when we should do to freeze tuples, e.g., scan whole
>> tables.
>> I think that the any changes to table are completely
>> ignored/restricted if table is marked as ReadOnly table,
>> and it's accompanied by freezing tuples, just mark as ReadOnly.
>> Frozen table ensures that all tuples of its table completely has been
>> frozen, so it also needs to scan whole table as well.
>> e.g., we should need to scan whole table at two times. right?
>
>
> No. You would be free to set a table as ReadOnly any time you wanted,
> without scanning anything. All that setting does is disable any DML on the
> table.
>
> The Frozen state would only be set by the vacuum code, IFF:
> - The table state is ReadOnly *at the start of vacuum* and did not change
> during vacuum
> - Vacuum ensured that there were no un-frozen tuples in the table
>
> That does not necessitate 2 scans.
>

I understood this comcept, and have question as I wrote below.

>>> +1 here as well.  I might want to set tables to read only for reasons
>>> other than to avoid repeated freezing.  (After all, the name of the command
>>> suggests it is a general purpose thing) and wouldn't want to automatically
>>> trigger a
>>> vacuum freeze in the process.
>>>
>>> There is another possibility here, too. We can completely divorce a
>>> ReadOnly mode (which I think is useful for other things besides freezing)
>>> from the question of whether we need to force-freeze a relation if we crea

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-06 Thread Sawada Masahiko

On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, I have some trivial comments about the latest patch.
>
> At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko  
> wrote in 
> sawada.mshk> On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby 
>  wrote:
>> >>> >Are the parenthesis necessary? No other WITH option requires them, other
>> >>> >than create table/matview (COPY doesn't actually require them).
>> >>> >
>> >>
>> >> I was imagining EXPLAIN syntax.
>> >> Is there some possibility of supporting multiple options for REINDEX
>> >> command in future?
>> >> If there is, syntax will be as follows, REINDEX { INDEX | ... } name
>> >> WITH VERBOSE, XXX, XXX;
>> >> I thought style with parenthesis is better than above style.
>> >
>> >
>> > The thing is, ()s are actually an odd-duck. Very little supports it, and
>> > while COPY allows it they're not required. EXPLAIN is a different story,
>> > because that's not WITH; we're actually using () *instead of* WITH.
>> >
>> > So because almost all commands that use WITH doen't even accept (), I don't
>> > think this should either. It certainly shouldn't require them, because
>> > unlike EXPLAIN, there's no need to require them.
>> >
>>
>> I understood what your point is.
>> Attached patch is changed syntax, it does not have parenthesis.
>
> As I looked into the code to find what the syntax would be, I
> found some points which would be better to be fixed.
>
> In gram.y the options is a list of cstring but it is not necesary
> to be a list because there's only one kind of option now.
>
> If you prefer it to be a list, I have a comment for the way to
> make string list in gram.y. You stored bare cstring in the
> options list but I think it is not the preferable form. I suppose
> the followings are preferable. Corresponding fixes are needed in
> ReindexTable, ReindexIndex, ReindexMultipleTables.
>
> $$ = list_make1(makeString($1);
>  
> $$ = lappend($1, list_make1(makeString($3));
>
>
> In equalfuncs.c, _equalReindexStmt forgets to compare the member
> options. _copyReindexStmt also forgets to copy it. The way you
> constructed the options list prevents them from doing their jobs
> using prepared methods. Comparing and copying the member "option"
> is needed even if it becomes a simple string.
>

I revised patch, and changed gram.y as I don't use the list.
So this patch adds new syntax,
REINDEX { INDEX | ... } name WITH VERBOSE;

Also documentation is updated.
Please give me feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..27be1a4 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name WITH VERBOSE
 
  
 
@@ -159,6 +160,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index(indOid, false, persistence);
+	reindex_index(indOid, false, persistence, verbose);
 
 	return indOid;
 }
@@ -1775,7 +1775,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool verbose)
 {
 	Oid			heapOid;
 
@@ -1785,7 +1785,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  REINDEX_REL_PROCESS_TOAST |
-		  REINDEX_REL_CHECK_CONSTRAINTS))
+		  REINDEX_REL_CHECK_CONSTRAINTS,
+		  verbose))
 		ereport(NOTICE,
 (errmsg("table \"%s\" has no indexes",
 		relation->relname)));
@@ -1802,7 +1803,7 @@ ReindexTable(RangeVar *relation)
  * That means this must not be called within a user transaction block!
  */
 void
-ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
+ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, bool verbose)
 {
 	Oid			objectOid;
 	Relation	relationRelation;
@@ -1814,6 +1815,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 	List	   *relids = NIL;
 	ListCell   *l;
 	int			num_keys;
+	int			elevel = verbose ? INFO : DEBUG2;
 
 	AssertArg(objectName);
 	Assert(objectKind == REINDEX_OBJECT_SCHEMA ||
@@ -1938,9 +1940,10 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 		PushActiveSnapshot(GetTransactionSnapshot());
 		if (reindex_relation(relid,
 			 REINDEX_REL_PROCESS_TOAST |
-			 REINDEX_REL_CHECK_CONSTRAINTS))
-			ereport(DEBUG1,
-	(errmsg("table \"%s.%s\" was reindexed",
+			 REINDEX_REL_CHECK_CONSTRAINTS,
+			 verbose))
+			ereport(elevel,
+	(errmsg("indexes of whole table \"%s.%s\" were reindexed",
 			get_namespace_name(get_rel_namespace(relid)),
 			get_rel_name(relid;
 		PopActiveSnapshot();
diff --git a/src/backend/commands/tablecmds.c b/src/backe

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 15:07, Amit Kapila wrote:

On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek mailto:p...@2ndquadrant.com>> wrote:
 >
 > On 06/04/15 12:33, Amit Kapila wrote:
 >>
 >>
 >> But I think the Update on target table with sample scan is
 >> supported via views which doesn't seem to be the right thing
 >> in case you just want to support it via FROM/USING, example
 >>
 >> postgres=# create view vw_test As select * from test_tablesample
 >> TABLESAMPLE sys
 >> tem(30);
 >> postgres=# explain update vw_test set id = 4;
 >>  QUERY PLAN
 >>
---
 >>   Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
 >> ->  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4
width=210)
 >> (2 rows)
 >>
 >
 > Right, I'll make those views not auto-updatable.
 >
 >>
 >>  > Standard is somewhat useless for UPDATE and DELETE as it only defines
 >> quite limited syntax there. From what I've seen when doing research
 >> MSSQL also only supports it in their equivalent of FROM/USING list,
 >> Oracle does not seem to support their SAMPLING clause outside of SELECTs
 >> at all and if I got the cryptic DB2 manual correctly I think they don't
 >> support it outside of (sub)SELECTs either.
 >>  >
 >>
 >> By the way, what is the usecase to support sample scan in
 >> Update or Delete statement?
 >>
 >
 > Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
 >

I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list.  Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?

Well, I don't understand why would somebody do it either, but then again 
during research of this feature I've found questions on stack overflow 
and similar sites about how to do it, so people must have use-cases.

And in any case as you say sub-select would work there so there is no 
reason to explicitly disable it. Plus there is already difference 
between what can be the target table in DELETE/UPDATE versus what can be 
in the FROM/USING clause and I think the TABLESAMPLE behavior follows 
that separation nicely - it's well demonstrated by the fact that we 
would have to add explicit exception to some places in code to disallow it.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Jim Nasby


On 4/6/15 1:46 AM, Sawada Masahiko wrote:

On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes  wrote:

On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby  wrote:


On 4/3/15 12:59 AM, Sawada Masahiko wrote:


+   case HEAPTUPLE_LIVE:
+   case HEAPTUPLE_RECENTLY_DEAD:
+   case HEAPTUPLE_INSERT_IN_PROGRESS:
+   case HEAPTUPLE_DELETE_IN_PROGRESS:
+   if
(heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
+
mxactcutoff, &frozen[nfrozen]))
+   frozen[nfrozen++].offset
= offnum;
+   break;



This doesn't seem safe enough to me. Can't there be tuples that are still
new enough that they can't be frozen, and are still live?



Yep.  I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.




Somewhat related... instead of forcing the freeze to happen synchronously,
can't we set this up so a table is in one of three states? Read/Write, Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.



+1 here as well.  I might want to set tables to read only for reasons other
than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.



Thank you for comments.


Somewhat related... instead of forcing the freeze to happen synchronously, can't we set 
this up so a table is in one of three states? Read/Write, Read Only, Frozen. 
AT_SetReadOnly and AT_SetReadWrite would simply change to > the appropriate state, 
and all the vacuum infrastructure would continue to process those tables as it does 
today. lazy_vacuum_rel would become responsible for tracking if there were any 
non-frozen tuples if it was also attempting > a freeze. If it discovered there were 
none, AND the table was marked as ReadOnly, then it would change the table state to 
Frozen and set relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. 
AT_SetReadWrite > could change relfrozenxid to it's own Xid as an optimization. Doing 
it that way leaves all the complicated vacuum code in one place, and would eliminate 
concerns about race conditions with still running transactions, etc.


I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?


No. You would be free to set a table as ReadOnly any time you wanted, 
without scanning anything. All that setting does is disable any DML on 
the table.


The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not 
change during vacuum

- Vacuum ensured that there were no un-frozen tuples in the table

That does not necessitate 2 scans.


+1 here as well.  I might want to set tables to read only for reasons other 
than to avoid repeated freezing.  (After all, the name of the command suggests 
it is a general purpose thing) and wouldn't want to automatically trigger a
vacuum freeze in the process.

There is another possibility here, too. We can completely divorce a ReadOnly 
mode (which I think is useful for other things besides freezing) from the 
question of whether we need to force-freeze a relation if we create a
FrozenMap, similar to the visibility map. This has the added advantage of 
helping freeze scans on relations that are not ReadOnly in the case of tables 
that are insert-mostly or any other pattern where most pages stay all-frozen.
Prior to the visibility map this would have been a rather daunting project, but 
I believe this could piggyback on the VM code rather nicely. Anytime you clear 
the VM you clearly must clear the FrozenMap as well. The logic for
setting the FM is clearly differe

Re: [HACKERS] initdb -S and tablespaces

2015-04-06 Thread Alvaro Herrera

Abhijit Menon-Sen wrote:

Hi,

> At 2015-04-03 13:32:32 -0300, alvhe...@2ndquadrant.com wrote:

> > I also noticed that walkdir() thinks it is completely agnostic on
> > what the passed action is; except that the comment at the bottom talks
> > about how fsync on directories is important, which seems out of place.
> 
> Yes. The function behaves as documented, but the comment is clearly too
> specific. I'm not sure where to put it. I could make walkdir() NOT do
> it, and instead do it in the caller with the same comment. Thoughts?

I think it's enough to state in the function comment that the action is
applied to the top element too.  Maybe add the fsync comment on the
callsite.

> > Hmm ... Actually, since surely we must follow symlinks everywhere,
> > why do we have to do this separately for pg_tblspc? Shouldn't that
> > link-following occur automatically when walking PGDATA in the first
> > place?
> 
> I'm not sure about that (and that's why I've not attached an updated
> patch here). The original idea was to follow only those links that we
> expect to be in PGDATA.
> 
> I suppose it would be easier in terms of the code to follow all links,
> but I don't know if it's the right thing. If that's what you think we
> should do, I can post a simplified patch.

Well, we have many things that can be set as symlinks; some you can have
initdb create the links for you (initdb --xlogdir), others you can move
manually.  I think not following all links might lead to impossible-to-
detect bugs such as failing to fsync new pgdata subdirectories we add in
the future, for example.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Amit Kapila

On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek  wrote:
>
> On 06/04/15 12:33, Amit Kapila wrote:
>>
>>
>> But I think the Update on target table with sample scan is
>> supported via views which doesn't seem to be the right thing
>> in case you just want to support it via FROM/USING, example
>>
>> postgres=# create view vw_test As select * from test_tablesample
>> TABLESAMPLE sys
>> tem(30);
>> postgres=# explain update vw_test set id = 4;
>>  QUERY PLAN
>>
---
>>   Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
>> ->  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4
width=210)
>> (2 rows)
>>
>
> Right, I'll make those views not auto-updatable.
>
>>
>>  > Standard is somewhat useless for UPDATE and DELETE as it only defines
>> quite limited syntax there. From what I've seen when doing research
>> MSSQL also only supports it in their equivalent of FROM/USING list,
>> Oracle does not seem to support their SAMPLING clause outside of SELECTs
>> at all and if I got the cryptic DB2 manual correctly I think they don't
>> support it outside of (sub)SELECTs either.
>>  >
>>
>> By the way, what is the usecase to support sample scan in
>> Update or Delete statement?
>>
>
> Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
>

I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list.  Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?

>> Also, isn't it better to mention in the docs for Update and
>> Delete incase we are going to support tablesample clause
>> for them?
>>
>
> Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like "refer
to the SELECT FROM docs for more info". Do you think TABLESAMPLE deserves
special treatment in this regard?
>

Nothing too important, just as I got confused while using,
someone else can also get confused, but I think we can leave
it.

>>
>>  > And we do this type of coercion even for table data (you can insert
>> -2.3 into integer column and it will work) so I don't see what's wrong
>> with it here.
>>  >
>>
>> I am not sure we can compare it with column of a table.  I think we
>> can support it within a valid range (similar to tablesample method) and
>> if user inputs value outside the range, then return error.
>>
>
> But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make us
do the same. I think most important thing here is that using -2.3 will
produce same results if called repeatedly (if there are no changes to data,
vacuum etc). Yes passing -2 will produce same results, I don't know if that
is a problem. The main reason why I have the coercion there is so that
users don't have to explicitly typecast expression results.
>

Actually, not a big point, but I felt it will be clear if there is a valid
range and actually we are not doing anything with negative (part)
of seed input by the user.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_rewind and log messages

2015-04-06 Thread Alvaro Herrera

Fujii Masao wrote:
> On Mon, Apr 6, 2015 at 5:33 PM, Michael Paquier
>  wrote:
> > On Mon, Apr 6, 2015 at 1:41 PM, Michael Paquier wrote:
> >> I guess that you are working on a patch? If not, you are looking for one?
> >
> > Code-speaking, this gives the patch attached.
> 
> Thanks! Here are the review comments:
> 
> I'm not familiar with native language support (sorry), but don't we need to
> add the shortcut of gettext into every calls of pg_log and pg_fatal, e.g.,
> change pg_fatal("xxx") to pg_fatal(_("xxx"))? I know that fprintf() in
> pg_Log_v() has such shortcut, but I'm not sure if that's enough or not.

It's not necessary for pg_fatal and the like, because those functions
are marked to have their first argument automatically translated in
nls.mk.  This means that the string literal is automatically extracted
into pg_rewind.pot for translators.  Of course, the function itself must
call _() (or some variant thereof) to actually fetch the translated
string at run time.

I'm not sure about translation of generic strings such as "%s: %s".  My
first impression is that they shouldn't be translated, but maybe it is
important that they are for languages I don't know nothing about such as
Japanese.

Another thing is compound messages like "foo has happened\nSee --help
for usage.\n" and "bar didn't happen\.See --help for usage".  In those
cases, the "see --help" part would need to be translated over and over,
so it's best to separate them in phrases to avoid repetitive work for
translators.  Not sure how to do this -- maybe something like
_("foo has happened\n") _("See --help")
but I'm not sure how to appease the compiler.  Having them in two
separate pg_log() calls (or whatever) was handy for this reason.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko

On Fri, Apr 3, 2015 at 10:01 PM, David Steele  wrote:
> On 4/3/15 3:59 AM, Sawada Masahiko wrote:
>> On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
>>> Let me know if you see any other issues.
>>>
>>
>> I pulled HEAD, and then tried to compile source code after applied
>> following "deparsing utility command patch" without #0001 and #0002.
>> (because these two patches are already pushed)
>> 
>>
>> But I could not use new pg_audit patch due to compile error (that
>> deparsing utility command patch might have).
>> I think I'm missing something, but I'm not sure.
>> Could you tell me which patch should I apply before compiling pg_audit?
>
> When Alvaro posted his last patch set he recommended applying them to
> b3196e65:
>
> http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org
>
> Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
> space error) then applying pg_audit-v6.patch works fine.
>

I tested v6 patch, but I got SEGV when I executed test() function I
mentioned up thread.
Could you address this problem?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek


On 06/04/15 11:02, Simon Riggs wrote:

On 2 April 2015 at 17:36, Petr Jelinek  wrote:


so here is version 11.


Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was "sampling method can be one of ."



Will reword.


Are we ready for a final detailed review and commit?



I plan to send v12 in the evening with some additional changes that came 
up from Amit's comments + some improvements to error reporting. I think 
it will be ready then.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 12:33, Amit Kapila wrote:

On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek mailto:p...@2ndquadrant.com>> wrote:
 >
 > Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.
 >
 > I would be ok with just expr_list, naming parameters here isn't
overly important, but ability to have different types and numbers of
parameters for custom TABLESAMPLE methods *is* important.
 >

Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.

I think I'll go with simple list of a_exprs. The reason for that is that 
I can foresee sampling methods that need multiple parameters, but I 
don't think naming them is very important. Also adding support for 
naming parameters can be done in the future if we decide so without 
breaking compatibility. Side benefit is that it makes hinting about what 
is wrong with input somewhat easier.

I don't think we need to come up with different limit from 
FUNC_MAX_ARGS. I don't think any sampling method would need that many 
parameters but I also don't see what would additional smaller limit give us.

 >>
 >> 2.
 >> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
 >> = 2;
 >> ERROR:  syntax error at or near "TABLESAMPLE"
 >> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
 >>
 >> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
 >> ERROR:  syntax error at or near "TABLESAMPLE"
 >> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
 >>
 >> Isn't TABLESAMPLE clause suppose to work with Update/Delete
 >> statements?
 >>
 >
 > It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
 >

But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
 QUERY PLAN
---
  Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
->  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
(2 rows)

Right, I'll make those views not auto-updatable.

 > Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.
 >

By the way, what is the usecase to support sample scan in
Update or Delete statement?

Well for the USING/FROM part the use-case is same as for SELECT - 
providing sample of the data for the query (it can be useful also for 
getting pseudo random rows fast). And if we didn't support it, it could 
still be done using sub-select so why not have it directly.

Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?

Most of other clauses that we support in FROM are not mentioned in 
UPDATE/DELETE docs, both of those commands just say something like 
"refer to the SELECT FROM docs for more info". Do you think TABLESAMPLE 
deserves special treatment in this regard?

 >
 >> 7.
 >> ParseTableSample()
 >> {
 >> ..
 >> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
 >> ..
 >> }
 >>
 >> What is the reason for coercing value of REPEATABLE clause to INT4OID
 >> when numeric value is expected for the clause.  If user gives the
 >> input value as -2.3, it will generate a seed which doesn't seem to
 >> be right.
 >>
 >
 > Because the REPEATABLE is numeric expression so it can produce
whatever number but we need int4 internally (well float4 would also work
just the code would be slightly uglier).
 >

Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295

Well the thing with SQL Standard's "numeric value expression" is that it 
actually does not mean numeric data type, it's just simple arithmetic 
expression with some given rules (by the standard)

Re: [HACKERS] pg_rewind and log messages

2015-04-06 Thread Fujii Masao

On Mon, Apr 6, 2015 at 5:33 PM, Michael Paquier
 wrote:
> On Mon, Apr 6, 2015 at 1:41 PM, Michael Paquier wrote:
>> I guess that you are working on a patch? If not, you are looking for one?
>
> Code-speaking, this gives the patch attached.

Thanks! Here are the review comments:

I'm not familiar with native language support (sorry), but don't we need to
add the shortcut of gettext into every calls of pg_log and pg_fatal, e.g.,
change pg_fatal("xxx") to pg_fatal(_("xxx"))? I know that fprintf() in
pg_Log_v() has such shortcut, but I'm not sure if that's enough or not.

 case PG_FATAL:
-printf("\n%s", _(message));
-printf("%s", _("Failure, exiting\n"));
+fprintf(stderr, _("\n%s: fatal: %s\n"), progname, message);
+fprintf(stderr, _("Failure, exiting\n"));

Why do we need to output the error level like fatal? This seems also
inconsistent with the log message policy of other tools.

Why do we need to output \n at the head of the message?

The second message "Failure, exiting" is really required?

> I eliminated a bunch of
> newlines in the log messages that seemed really unnecessary to me,
> simplifying a bit the whole. While hacking this stuff, I noticed as
> well that pg_rewind could be called as root on non-Windows platform,
> that's dangerous from a security point of view as process manipulates
> files in PGDATA. Hence let's block that. On Windows, a restricted
> token should be used.

Good catch! I think it's better to implement it as a separate patch
because it's very different from log message problem.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Amit Kapila

On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek  wrote:
>
> On 04/04/15 14:57, Amit Kapila wrote:
>>
>>
>> 1.
>> +tablesample_clause:
>> +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
>>
>> It seems to me that you want to allow it to make it extendable
>> to user defined Tablesample methods, but not sure if it is
>> right to use func_arg_list for the same because sample method
>> arguments can have different definition than function arguments.
>> Now even if we want to keep sample method arguments same as
>> function arguments that sounds like a separate discussion.
>>
>
> Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end they
are arguments for the init function of tablesampling method.
>
> I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.
>

Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.

I have shared my thoughts regarding this point with you, if
you don't agree with the same, then proceed as you think is
the best way.

>>
>> 2.
>> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
>> = 2;
>> ERROR:  syntax error at or near "TABLESAMPLE"
>> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
>>
>> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
>> ERROR:  syntax error at or near "TABLESAMPLE"
>> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
>>
>> Isn't TABLESAMPLE clause suppose to work with Update/Delete
>> statements?
>>
>
> It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
>

But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---
 Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
   ->  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
(2 rows)


> Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research MSSQL
also only supports it in their equivalent of FROM/USING list, Oracle does
not seem to support their SAMPLING clause outside of SELECTs at all and if
I got the cryptic DB2 manual correctly I think they don't support it
outside of (sub)SELECTs either.
>

By the way, what is the usecase to support sample scan in
Update or Delete statement?

Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?

>
>> 7.
>> ParseTableSample()
>> {
>> ..
>> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
>> ..
>> }
>>
>> What is the reason for coercing value of REPEATABLE clause to INT4OID
>> when numeric value is expected for the clause.  If user gives the
>> input value as -2.3, it will generate a seed which doesn't seem to
>> be right.
>>
>
> Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier).
>

Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295

> And we do this type of coercion even for table data (you can insert -2.3
into integer column and it will work) so I don't see what's wrong with it
here.
>

I am not sure we can compare it with column of a table.  I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.

>>
>> 8.
>> +DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
>> t f v 3 0 2278 "2281
>> 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_
));
>> +DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
>> f f t f v 3 0 2278 "2281
>> 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
>> _null_ ));
>>
>> Datatype for second argument is kept as  700 (Float4), shouldn't
>> it be 1700 (Numeric)?
>

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Simon Riggs

On 2 April 2015 at 17:36, Petr Jelinek  wrote:

> so here is version 11.

Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was "sampling method can be one of ."

Are we ready for a final detailed review and commit?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind and log messages

2015-04-06 Thread Michael Paquier

On Mon, Apr 6, 2015 at 1:41 PM, Michael Paquier wrote:
> I guess that you are working on a patch? If not, you are looking for one?

Code-speaking, this gives the patch attached. I eliminated a bunch of
newlines in the log messages that seemed really unnecessary to me,
simplifying a bit the whole. While hacking this stuff, I noticed as
well that pg_rewind could be called as root on non-Windows platform,
that's dangerous from a security point of view as process manipulates
files in PGDATA. Hence let's block that. On Windows, a restricted
token should be used.
Regards,
-- 
Michael
From 32b0aeef2602f7388c7e9fca6290046d8d9dfe67 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 6 Apr 2015 17:18:21 +0900
Subject: [PATCH] Fix inconsistent error and process handling in pg_rewind

pg_rewind was handling a couple of things differently compared to the
other src/bin utilities:
- pg_rewind manipulates files in a data folder, hence it should not run
as root on Non-Windows platforms. On Windows, it should use a restricted
token
- Logging output needs to be flushed on stderr, not stdout
- Logging messages should be prefixed with the application name
- pg_fatal exits with error code of 1, but it was sometimes followed by
subsequent logs, making this information lost in the process.
---
 src/bin/pg_rewind/copy_fetch.c  | 22 +--
 src/bin/pg_rewind/datapagemap.c |  4 +-
 src/bin/pg_rewind/file_ops.c| 30 +++---
 src/bin/pg_rewind/filemap.c | 18 -
 src/bin/pg_rewind/libpq_fetch.c | 52 -
 src/bin/pg_rewind/logging.c | 16 +---
 src/bin/pg_rewind/logging.h |  1 +
 src/bin/pg_rewind/parsexlog.c   | 20 +-
 src/bin/pg_rewind/pg_rewind.c   | 86 +++--
 src/bin/pg_rewind/pg_rewind.h   |  2 +
 src/bin/pg_rewind/timeline.c| 27 +
 11 files changed, 144 insertions(+), 134 deletions(-)

diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c
index 887fec9..de5fe4a 100644
--- a/src/bin/pg_rewind/copy_fetch.c
+++ b/src/bin/pg_rewind/copy_fetch.c
@@ -58,7 +58,7 @@ recurse_dir(const char *datadir, const char *parentpath,
 
 	xldir = opendir(fullparentpath);
 	if (xldir == NULL)
-		pg_fatal("could not open directory \"%s\": %s\n",
+		pg_fatal("could not open directory \"%s\": %s",
  fullparentpath, strerror(errno));
 
 	while (errno = 0, (xlde = readdir(xldir)) != NULL)
@@ -113,13 +113,13 @@ recurse_dir(const char *datadir, const char *parentpath,
 
 			len = readlink(fullpath, link_target, sizeof(link_target) - 1);
 			if (len == -1)
-pg_fatal("readlink() failed on \"%s\": %s\n",
+pg_fatal("readlink() failed on \"%s\": %s",
 		 fullpath, strerror(errno));
 
 			if (len == sizeof(link_target) - 1)
 			{
 /* path was truncated */
-pg_fatal("symbolic link \"%s\" target path too long\n",
+pg_fatal("symbolic link \"%s\" target path too long",
 		 fullpath);
 			}
 
@@ -132,18 +132,18 @@ recurse_dir(const char *datadir, const char *parentpath,
 			if (strcmp(parentpath, "pg_tblspc") == 0)
 recurse_dir(datadir, path, callback);
 #else
-			pg_fatal("\"%s\" is a symbolic link, but symbolic links are not supported on this platform\n",
+			pg_fatal("\"%s\" is a symbolic link, but symbolic links are not supported on this platform",
 	 fullpath);
 #endif   /* HAVE_READLINK */
 		}
 	}
 
 	if (errno)
-		pg_fatal("could not read directory \"%s\": %s\n",
+		pg_fatal("could not read directory \"%s\": %s",
  fullparentpath, strerror(errno));
 
 	if (closedir(xldir))
-		pg_fatal("could not close archive location \"%s\": %s\n",
+		pg_fatal("could not close archive location \"%s\": %s",
  fullparentpath, strerror(errno));
 }
 
@@ -163,11 +163,11 @@ copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
 
 	srcfd = open(srcpath, O_RDONLY | PG_BINARY, 0);
 	if (srcfd < 0)
-		pg_fatal("could not open source file \"%s\": %s\n",
+		pg_fatal("could not open source file \"%s\": %s",
  srcpath, strerror(errno));
 
 	if (lseek(srcfd, begin, SEEK_SET) == -1)
-		pg_fatal("could not seek in source file: %s\n", strerror(errno));
+		pg_fatal("could not seek in source file: %s", strerror(errno));
 
 	open_target_file(path, trunc);
 
@@ -184,17 +184,17 @@ copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
 		readlen = read(srcfd, buf, len);
 
 		if (readlen < 0)
-			pg_fatal("could not read file \"%s\": %s\n",
+			pg_fatal("could not read file \"%s\": %s",
 	 srcpath, strerror(errno));
 		else if (readlen == 0)
-			pg_fatal("unexpected EOF while reading file \"%s\"\n", srcpath);
+			pg_fatal("unexpected EOF while reading file \"%s\"", srcpath);
 
 		write_target_range(buf, begin, readlen);
 		begin += readlen;
 	}
 
 	if (close(srcfd) != 0)
-		pg_fatal("error closing file \"%s\": %s\n", srcpath, strerror(errno));
+		pg_fatal("error closing file \"%s\": %s", srcpath, strerror(errno));
 }
 
 /*
diff --git a/src/bin/pg_rewind/datapagemap.c b/

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko

On Fri, Apr 3, 2015 at 10:01 PM, David Steele  wrote:
> On 4/3/15 3:59 AM, Sawada Masahiko wrote:
>> On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
>>> Let me know if you see any other issues.
>>>
>>
>> I pulled HEAD, and then tried to compile source code after applied
>> following "deparsing utility command patch" without #0001 and #0002.
>> (because these two patches are already pushed)
>> 
>>
>> But I could not use new pg_audit patch due to compile error (that
>> deparsing utility command patch might have).
>> I think I'm missing something, but I'm not sure.
>> Could you tell me which patch should I apply before compiling pg_audit?
>
> When Alvaro posted his last patch set he recommended applying them to
> b3196e65:
>
> http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org
>
> Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
> space error) then applying pg_audit-v6.patch works fine.
>

Thank you for your advice!
I'm testing pg_audit, so I will send report to you as soon as possible.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb -S and tablespaces

2015-04-06 Thread Abhijit Menon-Sen

Hi Álvaro.

Thanks for taking a look at the patch.

At 2015-04-03 13:32:32 -0300, alvhe...@2ndquadrant.com wrote:
>
> But then, since the name is already telling us that it's all about
> PGDATA, maybe we can remove the "recursively" part of the name?

Sure, that makes sense too. Since you and Andres both like
«fsync_pgdata(data_directory)», I'll change it accordingly.

> I also noticed that walkdir() thinks it is completely agnostic on
> what the passed action is; except that the comment at the bottom talks
> about how fsync on directories is important, which seems out of place.

Yes. The function behaves as documented, but the comment is clearly too
specific. I'm not sure where to put it. I could make walkdir() NOT do
it, and instead do it in the caller with the same comment. Thoughts?

> Hmm ... Actually, since surely we must follow symlinks everywhere,
> why do we have to do this separately for pg_tblspc? Shouldn't that
> link-following occur automatically when walking PGDATA in the first
> place?

I'm not sure about that (and that's why I've not attached an updated
patch here). The original idea was to follow only those links that we
expect to be in PGDATA.

I suppose it would be easier in terms of the code to follow all links,
but I don't know if it's the right thing. If that's what you think we
should do, I can post a simplified patch.

> Maybe fsync_recursively should Assert() that it's only being called
> with enableFsync=on; or perhaps we can have it return early if it's
> unset.

I prefer the latter. Will change.

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

43 matches

Mail list logo