Re: [HACKERS] Parallel Seq Scan

2015-08-03 Thread Amit Kapila
On Sun, Aug 2, 2015 at 8:06 AM, Kouhei Kaigai  wrote:
>
> Amit,
>
> Let me ask three more detailed questions.
>
> Why Funnel has a valid qual of the subplan?
> The 2nd argument of make_funnel() is qualifier of the subplan
> (PartialSeqScan) then it is initialized at ExecInitFunnel,
> but never executed on the run-time. Why does Funnel node has
> useless qualifier expression here (even though it is harmless)?
>

The idea is that if in some case the qualification can't be
pushed down (consider the case where qualification contains
parallel restricted functions (functions that can only be
executed in master backend)) and needs to be only executed
in master backend, then we need it in Funnel node, so that it
can be executed for tuples passed by worker backends.  It is
currently not used, but I think we should retain it as it is
because it can be used in some cases either as part of this
patch itself or in future.  As of now, it is used in other
places in patch (like during Explain) as well, although we
might want to optimize the same, but overall I think it is
required.

> Why Funnel delivered from Scan? Even though it constructs
> a compatible target-list with underlying partial-scan node,
> it does not require the node is also delivered from Scan.

It needs it's own target-list due to reason mentioned above
for qual and yet another reason is that the same is required
for FunnelState which inturn is required ScanSlot used to
retrieve tuples from workers.  Also it is not excatly same
as partialseqscan, because for the case when the partialseqscan
node is executed by worker, we modify the targetlist as well,
refer create_parallel_worker_plannedstmt().

>
> Does ExecFunnel() need to have a special code path to handle
> EvalPlanQual()? Probably, it just calls underlying node in the
> local context. ExecScan() of PartialSeqScan will check its
> qualifier towards estate->es_epqTuple[].
>

Isn't EvalPlanQual() called for modifytable node and which
won't be allowed in parallel mode, so I think EvalPlanQual()
is not required for ExecFunnel path.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] tablecmds.c and lock hierarchy

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 3:35 PM, Simon Riggs  wrote:
> Please provide the link to the discussion of this. I don't see a problem
> here right now that can't be solved by saying

Thread:
http://www.postgresql.org/message-id/cafcns+ox7jvenc_3i54fdq3ibmogmknc2tmevdsmvojbsxg...@mail.gmail.com

Particularly those messages:
http://www.postgresql.org/message-id/20150731022857.gc11...@alap3.anarazel.de
http://www.postgresql.org/message-id/20150731200012.gc2...@postgresql.org
http://www.postgresql.org/message-id/CAB7nPqSK-hSZG7T1tAJ_=HEYsi6P1ejgX2x5LW3LYXJ7=9c...@mail.gmail.com

> Assert(locklevel==ShareUpdateExclusiveLock ||
> locklevel>ShareRowExclusiveLock);

Yep, true as things stand now. But this would get broken if we add a
new lock level between ShareRowExclusiveLock and AccessExclusiveLock
that does not respect the current monotone hierarchy between those.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablecmds.c and lock hierarchy

2015-08-03 Thread Simon Riggs
On 4 August 2015 at 05:56, Michael Paquier 
wrote:

> Hi all,
>
> As mentioned in the thread related to lowering locks of autovacuum
> reloptions in ALTER TABLE SET
> (
> http://www.postgresql.org/message-id/cafcns+ox7jvenc_3i54fdq3ibmogmknc2tmevdsmvojbsxg...@mail.gmail.com
> ),
> I have noticed the following code in
> AlterTableGetLockLevel@tablecmds.c:
> /*
>  * Take the greatest lockmode from any subcommand
>  */
> if (cmd_lockmode > lockmode)
> lockmode = cmd_lockmode;
>
> The thing is that, as mentioned by Alvaro and Andres on this thread,
> we have no guarantee that the different relation locks compared have a
> monotone hierarchy and we may finish by taking a lock that does not
> behave as you would like to. We are now lucky enough that ALTER TABLE
> only uses ShareUpdateExclusiveLock, ShareRowExclusiveLock and
> AccessExclusiveLock that actually have a hierarchy so this is not a
> problem yet.
> However it may become a problem if we add in the future more lock
> modes and that are used by ALTER TABLE.
>

Please provide the link to the discussion of this. I don't see a problem
here right now that can't be solved by saying

Assert(locklevel==ShareUpdateExclusiveLock ||
locklevel>ShareRowExclusiveLock);

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 3:27 PM, Beena Emerson  wrote:
> Michael Paquier wrote:
>> And this is the case of any format as well. String format validation
>> for a GUC occurs when server is reloaded or restarted, one advantage
>> of JSON is that the parser validator is already here, so we don't need
>> to reinvent a new machinery for that.
>
> IIUC correctly, we would also have to add additional code to check that that
> given JSON has the required keys and entries. For ex: The "group" mentioned
> in the "s_s_names"  should be definied in the "groups" section, etc.

Yep, true as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablecmds.c and lock hierarchy

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 2:43 PM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>
>> As mentioned in the thread related to lowering locks of autovacuum
>> reloptions in ALTER TABLE SET
>> (http://www.postgresql.org/message-id/cafcns+ox7jvenc_3i54fdq3ibmogmknc2tmevdsmvojbsxg...@mail.gmail.com),
>> I have noticed the following code in
>> AlterTableGetLockLevel@tablecmds.c:
>> /*
>>  * Take the greatest lockmode from any subcommand
>>  */
>> if (cmd_lockmode > lockmode)
>> lockmode = cmd_lockmode;
>>
>> The thing is that, as mentioned by Alvaro and Andres on this thread,
>> we have no guarantee that the different relation locks compared have a
>> monotone hierarchy and we may finish by taking a lock that does not
>> behave as you would like to.
>
> Maybe the solution to this is to add the concept of "addition" of two
> lock modes, where the result is another lock mode that conflicts with
> any lock that would conflict with either of the two operand lock modes.
> For instance, if you add a lock mode to itself, you get the same lock
> mode; if you "add" anything to AccessExclusive, you get AccessExclusive;
> if you "add" anything to AccessShare, you end up with that other lock
> mode.  The most interesting case in our current lock table is if you
> "add" ShareUpdateExclusive to Share, where you end up with
> ShareRowExclusive.  In essence, holding the result of that operation is
> the same as holding both locks, which AFAICS is the behavior we want.

That's commutative, as this is basically looking at the conflict table
to get the union of the bits to indicate what are all the locks
conflicting with lock A and lock B, and then we select the lock on the
table that includes the whole union, with a minimum number of them.

Now, let's take for example this case with locks A, B, C, D:
- Lock A conflicts with ACD
- B with BCD
- C with itself
- D with itself
What would you choose as a result of add(C,D)? A or B? Or the super
lock conflicting with all of them?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-08-03 Thread Beena Emerson
Michael Paquier wrote:
> And this is the case of any format as well. String format validation
> for a GUC occurs when server is reloaded or restarted, one advantage
> of JSON is that the parser validator is already here, so we don't need
> to reinvent a new machinery for that.

IIUC correctly, we would also have to add additional code to check that that
given JSON has the required keys and entries. For ex: The "group" mentioned
in the "s_s_names"  should be definied in the "groups" section, etc.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5860758.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 2:57 PM, Masahiko Sawada wrote:
> On Thu, Jul 30, 2015 at 2:16 PM, Beena Emerson wrote:
>> Since there will not be many nesting and grouping, I still prefer new
>> language to JSON.
>> I understand one can easily, modify/add groups in JSON using in built
>> functions but I think changes will not be done too often.
>>
>
> If we decided to use dedicated language, the syntax checker for that
> language is needed, via SQL or something.

Well, sure, both approaches have downsides.

> Otherwise we will not be able to know whether the parsing that value
> will be done correctly, until reloading or restarting server.

And this is the case of any format as well. String format validation
for a GUC occurs when server is reloaded or restarted, one advantage
of JSON is that the parser validator is already here, so we don't need
to reinvent a new machinery for that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-08-03 Thread Masahiko Sawada
On Thu, Jul 30, 2015 at 2:16 PM, Beena Emerson  wrote:
> Hello,
>
> Just looking at how the 2 differnt methods can be used to set the s_s_names
> value.
>
> 1. For a simple case where quorum is required for a single group the JSON
> could be:
>
>  {
>  "sync_standby_names":
>  {
>  "quorum":2,
>  "nodes":
>  [ "node1","node2","node3" ]
>  }
>  }
>
> or
>
>  {
>  "sync_standby_names":
>  {
>  "quorum":2,
>  "group": "cluster1"
>  },
>  "groups":
>  {
>  "cluster1":["node1","node2","node3"]
>  }
>  }
>
> Language:
> 2(node1, node2, node3)
>
>
> 2. For having quorum between different groups and node:
>  {
>  "sync_standby_names":
>  {
>  "quorum":2,
>  "nodes":
> [
> {"priority":1,"nodes":["node0"]},
> {"quorum":2,"group": "cluster1"}
> ]
>  },
>  "groups":
>  {
>  "cluster1":["node1","node2","node3"]
>  }
>  }
>
> or
>  {
>  "sync_standby_names":
>  {
>  "quorum":2,
>  "nodes":
> [
> {"priority":1,"group": "cluster2"},
> {"quorum":2,"group": "cluster1"}
> ]
>  },
>  "groups":
>  {
>  "cluster1":["node1","node2","node3"],
>  "cluster2":["node0"]
>  }
>  }
>
> Language:
> 2 (node0, cluster1: 2(node1, node2, node3))
>
> Since there will not be many nesting and grouping, I still prefer new
> language to JSON.
> I understand one can easily, modify/add groups in JSON using in built
> functions but I think changes will not be done too often.
>

If we decided to use dedicated language, the syntax checker for that
language is needed, via SQL or something.
Otherwise we will not be able to know whether the parsing that value
will be done correctly, until reloading or restarting server.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablecmds.c and lock hierarchy

2015-08-03 Thread Alvaro Herrera
Michael Paquier wrote:

> As mentioned in the thread related to lowering locks of autovacuum
> reloptions in ALTER TABLE SET
> (http://www.postgresql.org/message-id/cafcns+ox7jvenc_3i54fdq3ibmogmknc2tmevdsmvojbsxg...@mail.gmail.com),
> I have noticed the following code in
> AlterTableGetLockLevel@tablecmds.c:
> /*
>  * Take the greatest lockmode from any subcommand
>  */
> if (cmd_lockmode > lockmode)
> lockmode = cmd_lockmode;
> 
> The thing is that, as mentioned by Alvaro and Andres on this thread,
> we have no guarantee that the different relation locks compared have a
> monotone hierarchy and we may finish by taking a lock that does not
> behave as you would like to.

Maybe the solution to this is to add the concept of "addition" of two
lock modes, where the result is another lock mode that conflicts with
any lock that would conflict with either of the two operand lock modes.
For instance, if you add a lock mode to itself, you get the same lock
mode; if you "add" anything to AccessExclusive, you get AccessExclusive;
if you "add" anything to AccessShare, you end up with that other lock
mode.  The most interesting case in our current lock table is if you
"add" ShareUpdateExclusive to Share, where you end up with
ShareRowExclusive.  In essence, holding the result of that operation is
the same as holding both locks, which AFAICS is the behavior we want.

This can be implemented with a simple table and solves the problem for
both ALTER TABLE SET and this existing usage you cite and is not as
invasive as your first proposed solution.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Jeff Janes
On Jul 31, 2015 4:22 AM, "Jeremy Harris"  wrote:
>
> On 30/07/15 02:05, Peter Geoghegan wrote:
> > Since heapification is now a big fraction of the total cost of a sort
> > sometimes, even where the heap invariant need not be maintained for
> > any length of time afterwards, it might be worth revisiting the patch
> > to make that an O(n) rather than a O(log n) operation [3].
>
>  O(n log n) ?
>
> Heapification is O(n) already, whether siftup (existing) or down.

They are both linear on average, but the way we currently do it has an
NlogN worst case, while the other way is linear even in the worst case.

Cheers, Jeff


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 2:04 AM, Simon Riggs  wrote:
> On 3 August 2015 at 17:36, Tom Lane  wrote:
>>
>> Alvaro Herrera  writes:
>> > Simon Riggs wrote:
>> >> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at
>> >> all,
>> >> since they aren't critical path activities at that point
>>
>> > It is not possible to skip scanning indexes completely, unless no tuples
>> > are to be removed from the heap.
>>
>> Right.
>>
>> > But actually this is an interesting point and I don't think we do this:
>> > if in emergency mode, maybe we shouldn't try to remove any dead tuples
>> > at all, and instead only freeze very old tuples.
>>
>> +1 ... not sure if that's what Simon had in mind exactly, but it seems
>> like a correct statement of what he was getting at.
>
>
> Yes, that's what I was thinking, I just didn't say actually it. I'd been
> thinking about having VACUUM do just Phase 1 for some time, since its so
> much faster to do that. Will code.

Interesting. I'll be happy to have a look at any patch produced,
that's surely something we want to improve in emergency mode.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind tap test unstable

2015-08-03 Thread Michael Paquier
On Mon, Aug 3, 2015 at 5:35 PM, Christoph Berg  wrote:
> Re: Michael Paquier 2015-07-28 
> 
>> On Tue, Jul 28, 2015 at 5:57 PM, Christoph Berg  wrote:
>> > for something between 10% and 20% of the devel builds for 
>> > apt.postgresql.org
>> > (which happen every 6h if there's a git change, so it happens every few 
>> > days),
>> > I'm seeing this:
>> > Dubious, test returned 1 (wstat 256, 0x100)
>> > Failed 1/8 subtests
>> >
>> > I don't have the older logs available, but from memory, the subtest
>> > failing and the two numbers mentioned are always the same.
>>
>> There should be some output logs in src/bin/pg_rewind/tmp_check/log/*?
>> Could you attach them here if you have them? That would be helpful to
>> understand what is happening.
>
> It took me a few attempts to tell the build environment to save a copy
> on failure and not shred everything right away. So here we go:

In test case 2, the failure happens to be that the standby did not
have the time to replicate the database beforepromotion that has been
created on the master. One possible explanation for this failure is
that the standby has been promoted before all the WAL needed for the
tests has been replayed, hence we had better be sure that the
replay_location of the standby matches pg_current_xlog_location()
before promotion. On the buildfarm, hamster, the legendary slowest
machine of the whole set, does not complain about that. Is your
environment that heavy loaded when you run the tests?

Perhaps the attached patch helps?
-- 
Michael
diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm
index b66ff0d..fce725f 100644
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@ -232,6 +232,13 @@ sub promote_standby
 	 Now run the test-specific parts to run after standby has been started
 	# up standby
 
+	# Wait until the standby has caught up with the primary, by polling
+	# pg_stat_replication.
+	my $caughtup_query =
+"SELECT pg_current_xlog_location() = replay_location FROM pg_stat_replication WHERE application_name = 'rewind_standby';";
+	poll_query_until($caughtup_query, $connstr_master)
+	  or die "Timed out while waiting for standby to catch up";
+
 	# Now promote slave and insert some new data on master, this will put
 	# the master out-of-sync with the standby. Wait until the standby is
 	# out of recovery mode, and is ready to accept read-write connections.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FSM versus GIN pending list bloat

2015-08-03 Thread Jeff Janes
For a GIN index with fastupdate turned on, both the user backends and
autoanalyze routine will clear out the pending list, pushing the entries
into the normal index structure and deleting the pages used by the pending
list.  But those deleted pages will not get added to the freespace map
until a vacuum is done.  This leads to horrible bloat on insert only
tables, as it is never vacuumed and so the pending list space is never
reused.  And the pending list is very inefficient in space usage to start
with, even compared to the old style posting lists and especially compared
to the new compressed ones.  (If they were aggressively recycled, this
inefficient use wouldn't be much of a problem.)

Even on a table receiving mostly updates after its initial population (and
so being vacuumed regularly) with default autovac setting, there is a lot
of bloat.

The attached proof of concept patch greatly improves the bloat for both the
insert and the update cases.  You need to turn on both features: adding the
pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The
first of those two things could probably be adopted for real, but the
second probably is not acceptable.  What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum?  It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.

Or would a completely different approach be better, like managing the
vacated pending list pages directly in the index without going to the fsm?

Cheers,

Jeff


gin_fast_freespace.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tablecmds.c and lock hierarchy

2015-08-03 Thread Michael Paquier
Hi all,

As mentioned in the thread related to lowering locks of autovacuum
reloptions in ALTER TABLE SET
(http://www.postgresql.org/message-id/cafcns+ox7jvenc_3i54fdq3ibmogmknc2tmevdsmvojbsxg...@mail.gmail.com),
I have noticed the following code in
AlterTableGetLockLevel@tablecmds.c:
/*
 * Take the greatest lockmode from any subcommand
 */
if (cmd_lockmode > lockmode)
lockmode = cmd_lockmode;

The thing is that, as mentioned by Alvaro and Andres on this thread,
we have no guarantee that the different relation locks compared have a
monotone hierarchy and we may finish by taking a lock that does not
behave as you would like to. We are now lucky enough that ALTER TABLE
only uses ShareUpdateExclusiveLock, ShareRowExclusiveLock and
AccessExclusiveLock that actually have a hierarchy so this is not a
problem yet.
However it may become a problem if we add in the future more lock
modes and that are used by ALTER TABLE.

One idea mentioned in the ALTER TABLE SET thread was to enforce the
lock to AccessExclusiveLock should two locks of any subcommands
differ. Another idea was to select all the locks of the subcommands
found to be present, and open the different relations with all of
them. Tracking all those locks is going to require some heavy
rewriting, like for example the use of a LOCKMASK to track the locks
that need to be taken, and more extended routines for relation_open.
Other possibilities may be to add a comment on top of
AlterTableGetLockLevel so as we do not use a lock in ALTER TABLE
commands that may result in a choice conflict with the other ones, to
simply do nothing because committers are very careful people usually,
or to track all the locks taken in AlterTableGetLockLevel and then run
DoLockModesConflict and ERROR if there are incompatible locks.

I am attaching a patch that implements the first approach mentioned to
give a short idea of how things could be improved.
Thoughts?
-- 
Michael
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 970abd4..0bcdd76 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2807,7 +2807,8 @@ AlterTableInternal(Oid relid, List *cmds, bool recurse)
  * Function is called before and after parsing, so it must give same
  * answer each time it is called. Some subcommands are transformed
  * into other subcommand types, so the transform must never be made to a
- * lower lock level than previously assigned. All transforms are noted below.
+ * lock level different than previously assigned. All transforms are noted
+ * below.
  *
  * Since this is called before we lock the table we cannot use table metadata
  * to influence the type of lock we acquire.
@@ -2837,6 +2838,7 @@ AlterTableGetLockLevel(List *cmds)
 	 */
 	ListCell   *lcmd;
 	LOCKMODE	lockmode = ShareUpdateExclusiveLock;
+	bool		is_init = false;
 
 	foreach(lcmd, cmds)
 	{
@@ -3057,10 +3059,22 @@ AlterTableGetLockLevel(List *cmds)
 		}
 
 		/*
-		 * Take the greatest lockmode from any subcommand
+		 * At the first command evaluated, take the lock corresponding to
+		 * it. When looking at the next commands, check if the lock taken
+		 * is different than the first one taken and enforce it to
+		 * AccessExclusiveLock, which is the only lock super-seeding all
+		 * the others.
 		 */
-		if (cmd_lockmode > lockmode)
+		if (!is_init)
+		{
 			lockmode = cmd_lockmode;
+			is_init = true;
+		}
+		else
+		{
+			if (cmd_lockmode != lockmode)
+lockmode = AccessExclusiveLock;
+		}
 	}
 
 	return lockmode;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cost_agg() with AGG_HASHED does not account for startup costs

2015-08-03 Thread David Rowley
During working on allowing the planner to perform GROUP BY before joining
I've noticed that cost_agg() completely ignores input_startup_cost
when aggstrategy == AGG_HASHED.

I can see at least 3 call to cost_agg() which pass aggstrategy as
AGG_HASHED that are also passing a possible non-zero startup cost.

The attached changes cost_agg() to include the startup cost, but does
nothing for the changed plans in the regression tests.

Is this really intended?

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 7069f60..06870a3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1639,7 +1639,8 @@ cost_agg(Path *path, PlannerInfo *root,
else
{
/* must be AGG_HASHED */
-   startup_cost = input_total_cost;
+   startup_cost = input_startup_cost;
+   startup_cost += input_total_cost;
startup_cost += aggcosts->transCost.startup;
startup_cost += aggcosts->transCost.per_tuple * input_tuples;
startup_cost += (cpu_operator_cost * numGroupCols) * 
input_tuples;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: prevent user from setting wal_buffers over 2GB bytes

2015-08-03 Thread Takashi Horikawa
> >> Why does this cause a core dump?  We could consider fixing whatever
> >> the problem is rather than capping the value.
As far as I experiment with my own evaluation environment using 
PostgreSQL-9.4.4 on a x86_64 Linux, this problem can be fixed with the patch 
attached.

I have confirmed that applying the patch makes 'wal_buffers = 4GB'  works 
fine, while original PostgreSQL-9.4.4 results in core dump (segfault). I'll be 
happy if anyone reconfirm this.

--
Takashi Horikawa
NEC Corporation
Knowledge Discovery Research Laboratories


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Tuesday, August 04, 2015 2:29 AM
> To: Josh Berkus
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] patch: prevent user from setting wal_buffers over
> 2GB bytes
>
> On Fri, Jul 31, 2015 at 8:09 PM, Josh Berkus  wrote:
> > On 07/31/2015 10:43 AM, Robert Haas wrote:
> >> On Thu, Jul 30, 2015 at 9:17 PM, Josh Berkus  wrote:
> >>> In guc.c, the maximum for wal_buffers is INT_MAX.  However,
> >>> wal_buffers is actually measured in 8KB buffers, not in bytes.  This
> >>> means that users are able to set wal_buffers > 2GB.  When the
> >>> database is started, this can cause a core dump if the WAL offset is
> > 2GB.
> >>
> >> Why does this cause a core dump?  We could consider fixing whatever
> >> the problem is rather than capping the value.
> >
> > The underlying issue is that byte position in wal_buffers is a 32-bit
> > INT, so as soon as you exceed that, core dump.
>
> OK.  So capping it sounds like the right approach, then.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


allow_wal_buffer_over_2G.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-08-03 Thread Amit Kapila
On Mon, Aug 3, 2015 at 7:44 PM, Fujii Masao  wrote:
>
>
> Thanks! Pushed.
>

Thanks to you as well for committing the patch.

> BTW, while reading the code related to tablespace_map, I found that
> CancelBackup() emits the WARNING message "online backup mode was not
canceled"
> when rename() fails. Isn't this confusing (or incorrect)?

Yes, it looks confusing.

> ISTM that we can
> see that the online backup mode has already been canceled if backup_label
file
> is successfully removed whether tablespace_map file remains or not. No?
>

I think what we should do is that display successful cancellation message
only when both the files are renamed.  I have drafted a patch (still I needs
to verify/test it, I will do that if you think the fix is in right
direction) to show
what I have in mind.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_rename_backup_and_mapfile_cancel_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffer locking fix for lazy_scan_heap

2015-08-03 Thread Robert Haas
On Mon, Aug 3, 2015 at 12:03 AM, 高增琦  wrote:
> sorry for asking this really old commit.
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7ab9b2f3b79177e501a1ef90ed004cc68788abaf
>
> I could not understand why "releases the lock on the buffer" is
> a problem since vacuum will lock and check the page bit again before
> set the vm. Did I missed something?

Hmm, you might be right.  It certainly seems safer to do all the work
without releasing the buffer lock temporarily in the middle, but maybe
the old way wasn't actually broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-08-03 Thread Michael Paquier
On Tue, Aug 4, 2015 at 9:12 AM, Fabrízio de Royes Mello
 wrote:
> Are you sure we need to do all this changes just to check the highest
> locklevel based on the reloptions?

Well, by looking at the code that's what it looks as. That's a large
change not that straight-forward.

> Or I misunderstood the whole thing?

No I think we are on the same page.

> Maybe we just check the DoLockModeConflict in the new method
> GetReloptionsLockLevel and if true then fallback to AccessExclusiveLock (as
> Michael suggested in a previous email).

Honestly that's what I would suggest for this patch, and also fix the
existing problem of tablecmds.c that does the same assumption. It
seems saner to me for now than adding a whole new level of routines
and wrappers, and your patch has IMO great value when each ALTER
COMMAND is kicked individually.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-08-03 Thread Amit Langote
On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
> On Sat, Aug 1, 2015 at 12:18 AM, Robert Haas  wrote:
>>
>> OK, sure.  But let's make sure postgres_fdw gets a server-level option
>> to control this.
>>
>>
> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
> (suggestions for name welcome).
> 

How about just 'twophase'?

Thanks,
Amit





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-08-03 Thread Fabrízio de Royes Mello
On Mon, Aug 3, 2015 at 2:15 AM, Michael Paquier 
wrote:
>
> On Sat, Aug 1, 2015 at 9:20 PM, Andres Freund wrote:
> > On August 1, 2015 2:17:24 PM GMT+02:00, Michael Paquier wrote:
> >>> For instance, if you told me to choose between ShareLock and
> >>> ShareUpdateExclusiveLock I wouldn't know which one is strongest.  I
> >>> don't it's sensible to have the "lock mode compare" primitive
> >>honestly.
> >>> I don't have any great ideas to offer ATM sadly.
> >>
> >>Yes, the thing is that lowering the lock levels is good for
> >>concurrency, but the non-monotony of the lock levels makes it
> >>impossible to choose an intermediate state correctly.
> >
> > How about simply acquiring all the locks individually of they're
different types? These few acquisitions won't matter.
>
> As long as this only applies on master, this may be fine... We could
> basically pass a LOCKMASK to the multiple layers of tablecmds.c
> instead of LOCKMODE to track all the locks that need to be taken, and
> all the relations open during operations. Would it be worth having
> some routines like relation_multi_[open|close]() as well? Like that:
> Relation[] relation_multi_open(relation Oid, LOCKMASK):
> void relation_multi_close(Relation[]);
>
> If we do something, we may consider patching as well 9.5, it seems to
> me that tablecmds.c is broken by assuming that lock hierarchy is
> monotone in AlterTableGetLockLevel().
>

Hey guys,

Are you sure we need to do all this changes just to check the highest
locklevel based on the reloptions? Or I misunderstood the whole thing?

Maybe we just check the DoLockModeConflict in the new method
GetReloptionsLockLevel and if true then fallback to AccessExclusiveLock (as
Michael suggested in a previous email).


Regards,


*** This work is funded by Zenvia Mobile Results (http://www.zenvia.com.br)

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-08-03 Thread Peter Geoghegan
On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost  wrote:
> Thoughts?  Trying to keep it straight-forward and provide a simple
> solution for users to be able to address the issue, if they're worried
> about it.  Perhaps this, plus an additional paragraph which goes into
> more detail about exactly what's going on?

I'm still thinking about it, but I think you have the right idea here.

However, as the docs put it when talking about conventional access
controls and SELECT: "The use of FOR UPDATE or FOR SHARE requires
UPDATE privilege as well [as SELECT privilege] (for at least one
column of each table so selected)". I wonder if RLS needs to consider
this, too.

If you can just say that you don't have to worry about this when the
user has no right to UPDATE or DELETE the rows in the first place,
that makes it more practical to manage the issue. ISTM that as things
stand, you can't say that because RLS does not consider SELECT FOR
UPDATE special in any way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-08-03 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
> On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan  wrote:
> > If you're using another well known MVCC database system that has RLS,
> > I imagine when this happens the attacker similarly waits on the
> > conflicting (privileged) xact to finish (in my example in the patch,
> > Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
> > Mallory would then have her malicious UPDATE statement entirely rolled
> > back, and her statement would acquire an entirely new MVCC snapshot,
> > to be used by the USING security barrier qual (and everything else)
> > from scratch. This other system would then re-run her UPDATE with the
> > new MVCC snapshot. This would repeat until Mallory's UPDATE statement
> > completes without encountering any concurrent UPDATEs/DELETEs to her
> > would-be affected rows.
> >
> > In general, with this other database system, an UPDATE must run to
> > completion without violating MVCC, even in READ COMMITTED mode. For
> > that reason, I think we can take no comfort from the presumption that
> > this flexibility in USING security barrier quals (allowing subqueries,
> > etc) works securely in this other system. (I actually didn't check
> > this out, but I imagine it's true).
> 
> Where are we on this?
> 
> Discussion during pgCon with Heikki and Andres led me to believe that
> the issue is acceptable. The issue can be documented to help ensure
> that user expectation is in line with actual user-visible behavior.
> Unfortunately, I think that that will be a clunky documentation patch.

It's important to realize that this is an issue beyond RLS and that it
impacts Security Barrier Views also.

One idea which I have for making the documentation patch a bit less
clunky is to provide a simple way for users to address the issue.  Along
those lines, here's what I'd suggest (certainly open for discussion):

---
When reducing the set of rows which a user has access to, through
modifications to relations referenced by Row-Level Security Policies or
Security Barrier Views, be aware that users with a currently open
transaction might have a lock on an existing row and be able to see that
row after the change.  The best approach to avoid any possible leak of
information during a reduction of a user's rights is to ensure that the
user does not have any open transactions, perhaps by ensuring they have
no concurrent sessions running.
---

Thoughts?  Trying to keep it straight-forward and provide a simple
solution for users to be able to address the issue, if they're worried
about it.  Perhaps this, plus an additional paragraph which goes into
more detail about exactly what's going on?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Peter Geoghegan
On Mon, Aug 3, 2015 at 1:36 PM, Robert Haas  wrote:
> I don't think that's what Heikki is talking about.  He can correct me
> if I'm wrong, but what I think he's saying is that we should try to
> exploit the fact that we've already determined which in-memory tuples
> can be part of the current run and which in-memory tuples must become
> part of the next run.  Suppose half the tuples in memory can become
> part of the current run and the other half must become part of the
> next run.  If we throw all of those tuples into a single bucket and
> quicksort it, we're losing the benefit of the comparisons we did to
> figure out which tuples go in which runs.  Instead, we could quicksort
> the current-run tuples and, separately, quick-sort the next-run
> tuples.  Ignore the current-run tuples completely until the tape is
> empty, and then merge them with any next-run tuples that remain.

Oh. Well, the benefit of "the comparisons we did to figure out which
tuples go in which runs" is that we can determine the applicability of
this optimization. Also, by keeping run 1 (if any) usually in memory,
and run 0 partially on disk we avoid having to worry about run 1 as a
thing that spoils the optimization (in the current "single run
optimization", dumping all tuples can make us "acknowledge" run 1
(i.e. currentRun++), preventing single run optimization, which we
handily avoid in the patch). Finally, it saves us a bunch of real
COMPARETUP() comparisons as HEAPCOMPARE() is called as tuples are
inserted into the still-heapified memtuples array.

> I'm not sure if there's any reason to believe that would be faster
> than your approach.  In general, sorting is O(n lg n) so sorting two
> arrays that are each half as large figures to be slightly faster than
> sorting one big array.  But the difference may not amount to much.

IMV, the smart way of avoiding wasting "the comparisons we did to
figure out which tuples go in which runs" is to rig HEAPCOMPARE() to
only do a COMPARETUP() for the currentRun, and make sure that we don't
mess up and forget that if we don't end up quicksorting.

The second run that is in memory can only consist of whatever tuples
were added after heapification that were less than any of those
previously observed tuples (a big majority, usually). So like you, I
can't see any of these techniques helping much, even my "smart"
technique. Maybe I should look at a case involving text or something
to be sure.

Thinking about it some more, I don't think it would be easy to
maintain a clear separation between run 0 and run 1 in the memtuples
array in terms of a cutoff point. It's still a heap at that stage, of
course. You'd have to rig each tuple comparator so that COMPARETUP()
cared about tupindex before comparing datum1 just for this, which
seems rather unappealing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Jim Nasby

On 8/3/15 12:04 PM, Simon Riggs wrote:

Yes, that's what I was thinking, I just didn't say actually it. I'd been
thinking about having VACUUM do just Phase 1 for some time, since its so
much faster to do that. Will code.


I'd like to see that exposed as an option as well. There are certain 
situations where you'd really like to just freeze things as fast as 
possible, without waiting for a full vacuum.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Robert Haas
On Mon, Aug 3, 2015 at 3:33 PM, Peter Geoghegan  wrote:
>> When it's time to drain the heap, in performsort, divide the array into two
>> arrays, based on the run number of each tuple, and then quicksort the arrays
>> separately. The first array becomes the in-memory tail of the current tape,
>> and the second array becomes the in-memory tail of the next tape.
>>
>> You wouldn't want to actually allocate two arrays and copy SortTuples
>> around, but keep using the single large array, just logically divided into
>> two. So the bookkeeping isn't trivial, but seems doable.
>
> You're talking about a new thing here, that happens when it is
> necessary to dump everything and do a conventional merging of on-tape
> runs. IOW, we cannot fit a significant fraction of overall tuples in
> memory, and we need much of the memtuples array for the next run
> (usually this ends as a TSS_FINALMERGE). That being the case
> (...right?),

I don't think that's what Heikki is talking about.  He can correct me
if I'm wrong, but what I think he's saying is that we should try to
exploit the fact that we've already determined which in-memory tuples
can be part of the current run and which in-memory tuples must become
part of the next run.  Suppose half the tuples in memory can become
part of the current run and the other half must become part of the
next run.  If we throw all of those tuples into a single bucket and
quicksort it, we're losing the benefit of the comparisons we did to
figure out which tuples go in which runs.  Instead, we could quicksort
the current-run tuples and, separately, quick-sort the next-run
tuples.  Ignore the current-run tuples completely until the tape is
empty, and then merge them with any next-run tuples that remain.

I'm not sure if there's any reason to believe that would be faster
than your approach.  In general, sorting is O(n lg n) so sorting two
arrays that are each half as large figures to be slightly faster than
sorting one big array.  But the difference may not amount to much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Robert Haas
On Sat, Aug 1, 2015 at 9:56 AM, Andres Freund  wrote:
> On 2015-07-31 13:31:54 -0400, Robert Haas wrote:
>> On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris  wrote:
>> > Heapification is O(n) already, whether siftup (existing) or down.
>>
>> That's not my impression, or what Wikipedia says.  Source?
>
> Building a binary heap via successive insertions is O(n log n), but
> building it directly is O(n). Looks like wikipedia agrees too
> https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap

That doesn't really address the sift-up vs. sift-down question.  Maybe
I'm just confused about the terminology.  I think that Wikipedia
article is saying that if you iterate from the middle element of an
unsorted array towards the beginning, establishing the heap invariant
for every item as you reach it, you will take only O(n) time.  But
that is not what inittapes() does.  It instead starts at the beginning
of the array and inserts each element one after the other.  If this is
any different from building the heap via successive insertions, I
don't understand how.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Robert Haas
On Sat, Aug 1, 2015 at 9:49 AM, Jeremy Harris  wrote:
> On 31/07/15 18:31, Robert Haas wrote:
>> On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris  wrote:
>>> Heapification is O(n) already, whether siftup (existing) or down.
>>
>> That's not my impression, or what Wikipedia says.  Source?
>
> Measurements done last year:
>
> http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org
> (spreadsheet attachment)
>
> http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org
> (measurement procedure and spreadsheet explanation)

I don't think that running benchmarks is the right way to establish
the asymptotic runtime of an algorithm.  I mean, if you test
quicksort, it will run in much less than O(n^2) time on almost any
input.  But that does not mean that the worst-case run time is
anything other than O(n^2).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-03 Thread Peter Geoghegan
On Fri, Jul 31, 2015 at 12:59 AM, Heikki Linnakangas  wrote:
> Oh, ok, I was confused on how the heap works. You could still abstract this
> as "in-memory tails" of the tapes, but it's more complicated than I thought
> at first:
>
> When it's time to drain the heap, in performsort, divide the array into two
> arrays, based on the run number of each tuple, and then quicksort the arrays
> separately. The first array becomes the in-memory tail of the current tape,
> and the second array becomes the in-memory tail of the next tape.
>
> You wouldn't want to actually allocate two arrays and copy SortTuples
> around, but keep using the single large array, just logically divided into
> two. So the bookkeeping isn't trivial, but seems doable.

Since you're talking about the case where we must drain all tuples
within tuplesort_performsort(), I think you're talking about a
distinct idea here (since surely you're not suggesting giving up on my
idea of avoiding dumping most tuples, which is a key strength of the
patch). That's fine, but I just want to be clear on that. Importantly,
my optimization does not care about the fact that the array may have
two runs in it, because it quicksorts the array and forgets about it
being a heap. It only cares whether or not more than one run made it
out to tape, which makes it more widely applicable than it would
otherwise be. Also, the fact that much I/O can be avoided is clearly
something that can only happen when work_mem is at least ~50% of a
work_mem setting that would have resulted in an (entirely) internal
sort.

You're talking about a new thing here, that happens when it is
necessary to dump everything and do a conventional merging of on-tape
runs. IOW, we cannot fit a significant fraction of overall tuples in
memory, and we need much of the memtuples array for the next run
(usually this ends as a TSS_FINALMERGE). That being the case
(...right?), I'm confused that you're talking about doing something
clever within tuplesort_performsort(). In the case you're targeting,
won't the vast majority of tuple dumping (through calls to
dumptuples()) occur within puttuple_common()?

I think that my optimization should probably retain it's own
state.status even if we do this (i.e. TSS_MEMTAPEMERGE should stay).

> Hmm, I can see another possible optimization here, in the way the heap is
> managed in TSS_BUILDRUNS state. Instead of keeping a single heap, with
> tupindex as the leading key, it would be more cache efficient to keep one
> heap for the currentRun, and an unsorted array of tuples belonging to
> currentRun + 1. When the heap becomes empty, and currentRun is incemented,
> quicksort the unsorted array to become the new heap.
>
> That's a completely separate idea from your patch, although if you did it
> that way, you wouldn't need the extra step to divide the large array into
> two, as you'd maintain that division all the time.

This sounds to me like a refinement of your first idea (the idea that
I just wrote about).

I think the biggest problem with tuplesort after this patch of mine is
committed is that it is still too attached to the idea of
incrementally spilling and sifting. It makes sense to some degree
where it makes my patch possible...if we hang on to the idea of
incrementally spilling tuples on to tape in sorted order for a while,
then maybe we can hang on for long enough to quicksort most tuples,
*and* to avoid actually dumping most tuples (incremental spills make
the latter possible). But when work_mem is only, say, 10% of the
setting required for a fully internal sort, then the incremental
spilling and sifting starts to look dubious, at least to me, because
the TSS_MEMTAPEMERGE optimization added by my patch could not possibly
apply, and dumping and merging many times is inevitable.

What I think you're getting at here is that we still have a heap, but
we don't use the heap to distinguish between tuples within a run. In
other words, HEAPCOMPARE() often/always only cares about run number.
We quicksort after a deferred period of time, probably just before
dumping everything. Perhaps I've misunderstood, but I don't see much
point in quicksorting a run before being sure that you're sorting as
opposed to heapifying at that point (you're not clear on what we've
decided on once we quicksort). I think it could make sense to make
HEAPCOMPARE() not care about tuples within a run that is not
currentRun, though.

I think that anything that gives up on replacement selection's ability
to generate large runs, particularly for already sorted inputs will be
too hard a sell (not that I think that's what you proposed). That's
way, way less of an advantage than it was in the past (back when
external sorts took place using actual magnetic tape, it was a huge),
but the fact remains that it is an advantage. And so, I've been
prototyping an approach where we don't heapify once it is established
that this TSS_MEMTAPEMERGE optimization of mine cannot possibly apply.
We quicksort in batches r

Re: [HACKERS] RLS restrictive hook policies

2015-08-03 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 3 August 2015 at 16:09, Stephen Frost  wrote:
> > * Stephen Frost (sfr...@snowman.net) wrote:
> >> Agreed.  I'm happy to commit that change and back-patch it to 9.5,
> >> barring objections.  Given that the only way to have restrictive
> >> policies currently is using hooks, I don't believe there's any
> >> documentation update required either; of course, I'll update the comment
> >> to reflect this discussion/decision and make any necessary changes to
> >> test_rls_hooks.
> >
> > Patch attached to make this change, with appropriate comment updates and
> > fixes to the test_rls_hooks modules.
> >
> > Comments?
> >
> 
> Looks good to me.

Thanks!  Pushed to master and 9.5.

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2015-08-03 Thread Tom Lane
Piotr Stefaniak  writes:
> How about this one?
> 1 ERROR:  could not find RelOptInfo for given relids

That would be a bug, for sure ...

> It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query 
> and the attached one:

... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect patch, for showing tuple data

2015-08-03 Thread Alvaro Herrera
Nikolay Shaplov wrote:

> This patch adds several new functions, available from SQL queries. All these 
> functions are based on heap_page_items, but accept slightly different 
> arguments and has one additional column at the result set:
> 
> heap_page_tuples - accepts relation name, and bulkno, and returns usual 
> heap_page_items set with additional column that contain snapshot of tuple 
> data 
> area stored in bytea.

I think the API around get_raw_page is more useful generally.  You might
have table blocks stored in a bytea column for instance, because you
extracted from some remote server and inserted into a local table for
examination.  If you only accept relname/blkno, there's no way to
examine data other than blocks directly in the database dir, which is
limiting.

> There is also one strange function: _heap_page_items it is useless for 
> practical purposes. As heap_page_items it accepts page data as bytea, but 
> also 
> get a relation name. It tries to apply tuple descriptor of that relation to 
> that page data. 

For BRIN, I added something similar (brin_page_items), but it receives
the index OID rather than name, and constructs a tuple descriptor to
read the data.  I think OID is better than name generally.  (You can
cast the relation name to regclass).

It's easy to misuse, but these functions are superuser-only, so there
should be no security issue at least.  The possibility of a crash
remains real, though, so maybe we should try to come up with a way to
harden that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-03 Thread Peter Geoghegan
On Mon, Aug 3, 2015 at 8:53 AM, Geoff Winkless  wrote:
> If I create a copy of the table using
>
> CREATE mytab (LIKE brokentab INCLUDING ALL);
> INSERT INTO mytab SELECT * FROM brokentab;

Also, did you drop any columns from the original "brokentab" table
where the bug can be reproduced?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-03 Thread Peter Geoghegan
On Mon, Aug 3, 2015 at 8:53 AM, Geoff Winkless  wrote:
> the new table does not exhibit the same problem (so I'm assuming it's not
> easily reproducible and giving you a creation script isn't going to help).
>
> VACUUM FULL on the table makes no difference.
>
> Is there anything you guys can suggest that I can do to help narrow down the
> problem?

Yes. You should enable all of the following settings:

debug_print_parse (boolean)
debug_print_rewritten (boolean)
debug_print_plan (boolean)
debug_pretty_print (boolean)

which you can do dynamically from psql.

Then post the output for both the failing version (on that table where
you can reproduce the issue), and the other table where it seems that
you ought to be able to reproduce the problem, but you can't. Maybe
that'll show where the problem is.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Robert Haas
On Mon, Aug 3, 2015 at 4:13 AM, Simon Riggs  wrote:
> * For normal VACUUMs we should scan indexes only if (num_dead_tuples * 20) >
> (blocks to be scanned in any one index), which allows some index bloat but
> not much

I think this kind of heuristic is good, but I think we should expose a
setting for it.  There's no way for us to know without testing whether
the right value for that multiplier is 2 or 20 or 200 or 2000, and if
we don't make it easy to tweak, we'll never find out.  It may even be
workload-dependent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Robert Haas
On Mon, Aug 3, 2015 at 11:52 AM, Alvaro Herrera
 wrote:
> Simon Riggs wrote:
>> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
>> since they aren't critical path activities at that point
>
> It is not possible to skip scanning indexes completely, unless no tuples
> are to be removed from the heap.  Otherwise, index tuples become
> lingering pointers (and when such heap address are later refilled, they
> become corrupted indexscans).

Well, if we skip the index scans, we can't do the second heap pass
either, but that's OK.  I think we're all talking about the same thing
here, which is to do only the first heap pass in some cases.  That
will prune dead tuples to line pointers, freeze old XIDs, and mark
pages all-visible where appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: prevent user from setting wal_buffers over 2GB bytes

2015-08-03 Thread Robert Haas
On Fri, Jul 31, 2015 at 8:09 PM, Josh Berkus  wrote:
> On 07/31/2015 10:43 AM, Robert Haas wrote:
>> On Thu, Jul 30, 2015 at 9:17 PM, Josh Berkus  wrote:
>>> In guc.c, the maximum for wal_buffers is INT_MAX.  However, wal_buffers
>>> is actually measured in 8KB buffers, not in bytes.  This means that
>>> users are able to set wal_buffers > 2GB.  When the database is started,
>>> this can cause a core dump if the WAL offset is > 2GB.
>>
>> Why does this cause a core dump?  We could consider fixing whatever
>> the problem is rather than capping the value.
>
> The underlying issue is that byte position in wal_buffers is a 32-bit
> INT, so as soon as you exceed that, core dump.

OK.  So capping it sounds like the right approach, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN trivial tweaks

2015-08-03 Thread Kevin Grittner
I happened to notice two declarations of functions for BRIN that
are not actually defined, and a comment that looked like it was
incompletely edited.  Attached patch is a suggestion, but I leave
it to those working on the feature to commit, since there might be
a reason I'm missing for the declarations and my wording might not
be ideal.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



brin-unused-declarations.diff
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Simon Riggs
On 3 August 2015 at 17:36, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Simon Riggs wrote:
> >> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at
> all,
> >> since they aren't critical path activities at that point
>
> > It is not possible to skip scanning indexes completely, unless no tuples
> > are to be removed from the heap.
>
> Right.
>
> > But actually this is an interesting point and I don't think we do this:
> > if in emergency mode, maybe we shouldn't try to remove any dead tuples
> > at all, and instead only freeze very old tuples.
>
> +1 ... not sure if that's what Simon had in mind exactly, but it seems
> like a correct statement of what he was getting at.
>

Yes, that's what I was thinking, I just didn't say actually it. I'd been
thinking about having VACUUM do just Phase 1 for some time, since its so
much faster to do that. Will code.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] nodes/*funcs.c inconsistencies

2015-08-03 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/03/2015 09:55 AM, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
>> On Sun, Aug 02, 2015 at 11:31:16PM -0400, Tom Lane wrote:
>>> That being the case, it would probably be a good idea to get
>>> them done before alpha2, as there may not be a good opportunity
>>> afterwards.
>> 
>> Freedom to bump catversion after alpha2 will be
>> barely-distinguishable from freedom to do so now.  I have planned
>> to leave my usual comment period of a few days, though skipping
>> that would be rather innocuous in this case.
> 
> This is clearly necessary, of course, and I wouldn't be surprised
> if we have another necessary bump post-alpha2, but at the same
> time, it'd certainly be nice if we are able to put out alpha2 and
> then a beta1 in the future without needing to do a bump.
> 
> In other words, +1 from me for going ahead and committing this for 
> alpha2, if possible.

+1


- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVv52QAAoJEDfy90M199hl/skP+wXfyo8axMipFwyV9fqesFHF
xRc8j2QHUOVTyqAA9TdksSL1+t8h7a6PmEL6ucIUyGIfcBZaY5kw9/IOQxac8i0W
ZxAElSzFSL8xpimSehZCKLHTF4fTlypSA1srVOb2py6vjOQK+7PZAnlP42TXEgBT
gk6HnAFQBJ6cx8KckYHdqQFPtTKLyglSFLeOCBfbwymOBZgjEopdzTDsE17Z07m8
ZQq08jiSPJKGmwiLI7SCMlA5DMI6B9OV2/h8GXfxP833oR8Z8wQrvzQkM0GnJU5w
Hirk8ECEJndBuLyyVl0suzabNUNdzVWvXQHMIP2+0aW/secAiUbkVdF7o0B5l8qI
GZaHFCZ37BdKgppjApCCLmA/1Th2hHKEwGsIWGiBMQhWfKeRzXxuUZYqSF+YZDRN
ju5ZhJOlmKiLv0AyuMPw9T25X8fTzBwAppjL7r6HM610rNOM0izYz8+7K0ckq74/
jfbt+FuKxGwp8Phz7mm7APOsDsrhNkQgmn1WpnqgJBQwXTRZGzn2ne9npvmUAo3C
4Hiaib2fTIq2Q39scT51b5enjtkPa9S2o4MF+32JP6g5UmLMe3GMpa5V2itvCuN3
ue8AzgNoeyd3oOlqPh9XJi/Kk68it4UacXRkiZAJ9cLTEMixVUjuaVkCyFxsmKxb
vvNa2Of440JgSc2Hw21p
=vdZZ
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nodes/*funcs.c inconsistencies

2015-08-03 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Couldn't we adopt
> AssertVariableIsOfType()/AssertVariableIsOfTypeMacro() to macros like
> READ_UINT_FIELD()?
> 
> I'm surprised that this stuff was only ever used for logical decoding
> infrastructure so far.

The reason it's only used there is that Andres is the one who introduced
those macros precisely for that code.  We've not yet had time to adjust 
the rest of the code to have more sanity checks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nodes/*funcs.c inconsistencies

2015-08-03 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Sun, Aug 2, 2015 at 3:07 PM, Peter Geoghegan  wrote:
> > I'm surprised that this stuff was only ever used for logical decoding
> > infrastructure so far.
> 
> On second thought, having tried it, one reason is that that breaks
> things that are considered legitimate for historical reasons. For
> example, AttrNumber is often used with READ_INT_FIELD(), which is an
> int16. Whether or not it's worth fixing that by introducing a
> READ_ATTRNUM_FIELD() (and so on) is not obvious to me.

If it allows us to introduce additional checking for new code, I'm all
for it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nodes/*funcs.c inconsistencies

2015-08-03 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Sun, Aug 02, 2015 at 11:31:16PM -0400, Tom Lane wrote:
> > That being the case, it would probably be a good idea to get them done
> > before alpha2, as there may not be a good opportunity afterwards.
> 
> Freedom to bump catversion after alpha2 will be barely-distinguishable from
> freedom to do so now.  I have planned to leave my usual comment period of a
> few days, though skipping that would be rather innocuous in this case.

This is clearly necessary, of course, and I wouldn't be surprised if we
have another necessary bump post-alpha2, but at the same time, it'd
certainly be nice if we are able to put out alpha2 and then a beta1 in
the future without needing to do a bump.

In other words, +1 from me for going ahead and committing this for
alpha2, if possible.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Tom Lane
Alvaro Herrera  writes:
> Simon Riggs wrote:
>> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
>> since they aren't critical path activities at that point

> It is not possible to skip scanning indexes completely, unless no tuples
> are to be removed from the heap.

Right.

> But actually this is an interesting point and I don't think we do this:
> if in emergency mode, maybe we shouldn't try to remove any dead tuples
> at all, and instead only freeze very old tuples.

+1 ... not sure if that's what Simon had in mind exactly, but it seems
like a correct statement of what he was getting at.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-08-03 Thread Jeff Janes
On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs  wrote:

> On 22 July 2015 at 17:11, Jeff Janes  wrote:
>
>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
>> wrote:
>>
>>> On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes 
>>> wrote:
>>> > Attached is a patch that implements the vm scan for truncation.  It
>>> > introduces a variable to hold the last blkno which was skipped during
>>> the
>>> > forward portion.  Any blocks after both this blkno and after the last
>>> > inspected nonempty page (which the code is already tracking) must have
>>> been
>>> > observed to be empty by the current vacuum.  Any other process
>>> rendering the
>>> > page nonempty are required to clear the vm bit, and no other process
>>> can set
>>> > the bit again during the vacuum's lifetime.  So if the bit is still
>>> set, the
>>> > page is still empty without needing to inspect it.
>>>
>>> Urgh.  So if we do this, that forever precludes having HOT pruning set
>>> the all-visible bit.
>>
>>
>> I wouldn't say forever, as it would be easy to revert the change if
>> something more important came along that conflicted with it.
>>
>
> I think what is being said here is that someone is already using this
> technique, or if not, then we actively want to encourage them to do so as
> an extension or as a submission to core.
>
> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
> Jeff. Probably needs code comments added.
>

Sure, that sounds like the consensus.  The VM method was very efficient,
but I agree it is pretty fragile and restricting.


>
> That does still leave the prefetch technique, so all is not lost.
>
> Can we see a patch with just prefetch, probably with a simple choice of
> stride? Thanks.
>

I probably won't get back to it this commit fest, so it can be set to
returned with feedback.  But if anyone has good ideas for how to set the
stride (or detect that it is on SSD and so is pointless to try) I'd love to
hear about them anytime.

Cheers,

Jeff


[HACKERS] Re: ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-03 Thread Geoff Winkless
On 3 August 2015 at 16:53, Geoff Winkless  wrote:

> If I create a copy of the table using
>
> CREATE mytab (LIKE brokentab INCLUDING ALL);
>
> ​Of course I meant CREATE TABLE here... finger slippage :)​


Re: [HACKERS] Planner debug views

2015-08-03 Thread Alvaro Herrera
Qingqing Zhou wrote:
> On Thu, Jul 30, 2015 at 2:42 PM, Jim Nasby  wrote:
> >
> > I think a better option would be shoving it into a backend tuplestore and
> > just leaving it there (maybe with a command to clear it for the paranoid).
> > That gives a relation you can query against, insert into another table, etc.
> 
> This is something I don't know how to do it: in my understanding, a
> tuplestore is an internal store, which means it has no name exposed.
> Thus we can't reference it later.

Yeah, that doesn't sound the kind of project you should attempt here.
EXPLAIN already knows to return tuples, so I was assuming you would
return your stuff using that mechanism.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-03 Thread Geoff Winkless
Hi

We've come across a weirdness with ON CONFLICT, where UPSERTing a smallint
value produces an error:

db=# INSERT INTO brokentab(id, k1,k2,k3,k4,k5,k6,k7, smallval) VALUES
(5,0,0,0,1,0,1,0, 0) ON CONFLICT (id, k1,k2,k3,k4,k5,k6,k7) DO UPDATE SET
smallval=EXCLUDED.smallval;
ERROR:  attribute 29 has wrong type
DETAIL:  Table has type integer, but query expects smallint.

If you change the SET to smallval=0 the problem goes away, although using
SET smallval=CAST(EXCLUDED.smallval AS smallint) - or indeed AS int -
doesn't help at all.

If I create a copy of the table using

CREATE mytab (LIKE brokentab INCLUDING ALL);
INSERT INTO mytab SELECT * FROM brokentab;

the new table does not exhibit the same problem (so I'm assuming it's not
easily reproducible and giving you a creation script isn't going to help).

VACUUM FULL on the table makes no difference.

Is there anything you guys can suggest that I can do to help narrow down
the problem?

Linux Centos 6.5, kernel 2.6.32-431.el6.i686, pgsql alpha1, built from
source using gcc 4.4.7.

Thanks

Geoff


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Alvaro Herrera
Simon Riggs wrote:

> * For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
> since they aren't critical path activities at that point

It is not possible to skip scanning indexes completely, unless no tuples
are to be removed from the heap.  Otherwise, index tuples become
lingering pointers (and when such heap address are later refilled, they
become corrupted indexscans).

But actually this is an interesting point and I don't think we do this:
if in emergency mode, maybe we shouldn't try to remove any dead tuples
at all, and instead only freeze very old tuples.  That would make such
vacuums go much quicker.  (More accurately, if the updating xid is older
than the freeze point, then remove the tuple, but otherwise keep it.)

My point is that emergency vacuums are troublesome for various reasons
and it would be better if they did only the minimum possible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-08-03 Thread Anastasia Lubennikova


On Mon, Aug 3, 2015 at 12:27 PM, Anastasia Lubennikova 
mailto:a.lubennik...@postgrespro.ru>> 
wrote:


1) Test and results are in attachments. Everything seems to work
as expected.
2) I dropped these notices. It was done only for debug purposes.
Updated patch is attached.
3) fixed


Good! Another couple of notes from me:
1) I think gistvacuumpage() and gistkillitems() need function-level 
comments.
2) ItemIdIsDead() can be used in index scan like it's done 
in _bt_checkkeys().


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 


The Russian Postgres Company

I've added some comments.
ItemIdIsDead() is used now (just skip dead tuples as not matching the 
quals).
And there is one else check of LSN in gistkillitems to make sure that 
page was not changed between reads.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 36,42  static bool gistinserttuples(GISTInsertState *state, 
GISTInsertStack *stack,
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! 
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
--- 36,42 
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
***
*** 209,214  gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
--- 209,225 
 * because the tuple vector passed to gistSplit won't include this 
tuple.
 */
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 
+   /*
+* If leaf page is full, try at first to delete dead tuples. And then
+* check again.
+*/
+   if ((is_split) && GistPageIsLeaf(page) && GistPageHasGarbage(page))
+   {
+   gistvacuumpage(rel, page, buffer);
+   is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+   }
+ 
if (is_split)
{
/* no space for insertion */
***
*** 1440,1442  freeGISTstate(GISTSTATE *giststate)
--- 1451,1522 
/* It's sufficient to delete the scanCxt */
MemoryContextDelete(giststate->scanCxt);
  }
+ 
+ /*
+  * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+  */
+ static void
+ gistvacuumpage(Relation rel, Page page, Buffer buffer)
+ {
+   OffsetNumber deletable[MaxOffsetNumber];
+   int ndeletable = 0;
+   OffsetNumber offnum,
+   minoff,
+   maxoff;
+ 
+   /*
+* Scan over all items to see which ones need to be deleted according to
+* LP_DEAD flags.
+*/
+   maxoff = PageGetMaxOffsetNumber(page);
+   for (offnum = FirstOffsetNumber;
+offnum <= maxoff;
+offnum = OffsetNumberNext(offnum))
+   {
+   ItemId  itemId = PageGetItemId(page, offnum);
+ 
+   if (ItemIdIsDead(itemId))
+   deletable[ndeletable++] = offnum;
+   }
+ 
+   if (ndeletable > 0)
+   {
+   START_CRIT_SECTION();
+ 
+   PageIndexMultiDelete(page, deletable, ndeletable);
+ 
+   /*
+* Mark the page as not containing any LP_DEAD items.  This is 
not
+* certainly true (there might be some that have recently been 
marked,
+* but weren't included in our target-item list), but it will 
almost
+* always be true and it doesn't seem worth an additional page 
scan to
+* check it. Remember that F_HAS_GARBAGE is only a hint anyway.
+*/
+   GistClearPageHasGarbage(page);
+ 
+   MarkBufferDirty(buffer);
+ 
+   /* XLOG stuff */
+   if (RelationNeedsWAL(rel))
+   {
+   XLogRecPtr  recptr;
+ 
+   recptr = gistXLogUpdate(rel->rd_node, buffer,
+   
deletable, ndeletable,
+   NULL, 
0, InvalidBuffer);
+ 
+   PageSetLSN(page, recptr);
+   }
+   else
+   PageSetLSN(page,

Re: [HACKERS] RLS restrictive hook policies

2015-08-03 Thread Dean Rasheed
On 3 August 2015 at 16:09, Stephen Frost  wrote:
> Dean, all,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
>> Agreed.  I'm happy to commit that change and back-patch it to 9.5,
>> barring objections.  Given that the only way to have restrictive
>> policies currently is using hooks, I don't believe there's any
>> documentation update required either; of course, I'll update the comment
>> to reflect this discussion/decision and make any necessary changes to
>> test_rls_hooks.
>
> Patch attached to make this change, with appropriate comment updates and
> fixes to the test_rls_hooks modules.
>
> Comments?
>

Looks good to me.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS restrictive hook policies

2015-08-03 Thread Stephen Frost
Dean, all,

* Stephen Frost (sfr...@snowman.net) wrote:
> Agreed.  I'm happy to commit that change and back-patch it to 9.5,
> barring objections.  Given that the only way to have restrictive
> policies currently is using hooks, I don't believe there's any
> documentation update required either; of course, I'll update the comment
> to reflect this discussion/decision and make any necessary changes to
> test_rls_hooks.

Patch attached to make this change, with appropriate comment updates and
fixes to the test_rls_hooks modules.

Comments?

Thanks!

Stephen
From 173db8f942636e92f20145844b7dadd04ccac765 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 3 Aug 2015 10:59:24 -0400
Subject: [PATCH] RLS: Keep deny policy when only restrictive exist

Only remove the default deny policy when a permissive policy exists
(either from the hook or defined by the user).  If only restrictive
policies exist then no rows will be visible, as restrictive policies
shouldn't make rows visible.  To address this requirement, a single
"USING (true)" permissive policy can be created.

Update the test_rls_hooks regression tests to create the necessary
"USING (true)" permissive policy.

Back-patch to 9.5 where RLS was added.

Per discussion with Dean.
---
 src/backend/rewrite/rowsecurity.c  | 14 ++
 .../modules/test_rls_hooks/expected/test_rls_hooks.out |  7 +++
 src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql |  8 
 src/test/modules/test_rls_hooks/test_rls_hooks.c   |  5 +
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
index 562dbc9..5a81db3 100644
--- a/src/backend/rewrite/rowsecurity.c
+++ b/src/backend/rewrite/rowsecurity.c
@@ -225,12 +225,18 @@ get_row_security_policies(Query *root, CmdType commandType, RangeTblEntry *rte,
 	}
 
 	/*
-	 * If the only built-in policy is the default-deny one, and hook policies
-	 * exist, then use the hook policies only and do not apply the
+	 * If the only built-in policy is the default-deny one, and permissive hook
+	 * policies exist, then use the hook policies only and do not apply the
 	 * default-deny policy.  Otherwise, we will apply both sets below.
+	 *
+	 * Note that we do not remove the defaultDeny policy if only *restrictive*
+	 * policies exist as restrictive policies should only ever be reducing what
+	 * is visible.  Therefore, at least one permissive policy must exist which
+	 * allows records to be seen before restrictive policies can remove rows
+	 * from that set.  A single "true" policy can be created to address this
+	 * requirement, if necessary.
 	 */
-	if (defaultDeny &&
-		(hook_policies_restrictive != NIL || hook_policies_permissive != NIL))
+	if (defaultDeny && hook_policies_permissive != NIL)
 	{
 		rowsec_expr = NULL;
 		rowsec_with_check_expr = NULL;
diff --git a/src/test/modules/test_rls_hooks/expected/test_rls_hooks.out b/src/test/modules/test_rls_hooks/expected/test_rls_hooks.out
index 3a7a4c3..4587eb0 100644
--- a/src/test/modules/test_rls_hooks/expected/test_rls_hooks.out
+++ b/src/test/modules/test_rls_hooks/expected/test_rls_hooks.out
@@ -13,6 +13,11 @@ CREATE TABLE rls_test_restrictive (
 supervisor  name,
 datainteger
 );
+-- At least one permissive policy must exist, otherwise
+-- the default deny policy will be applied.  For
+-- testing the only-restrictive-policies from the hook,
+-- create a simple 'allow all' policy.
+CREATE POLICY p1 ON rls_test_restrictive USING (true);
 -- initial test data
 INSERT INTO rls_test_restrictive VALUES ('r1','s1',1);
 INSERT INTO rls_test_restrictive VALUES ('r2','s2',2);
@@ -109,6 +114,8 @@ RESET ROLE;
 -- Create "internal" policies, to check that the policies from
 -- the hooks are combined correctly.
 CREATE POLICY p1 ON rls_test_permissive USING (data % 2 = 0);
+-- Remove the original allow-all policy
+DROP POLICY p1 ON rls_test_restrictive;
 CREATE POLICY p1 ON rls_test_restrictive USING (data % 2 = 0);
 CREATE POLICY p1 ON rls_test_both USING (data % 2 = 0);
 SET ROLE r1;
diff --git a/src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql b/src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql
index ece4ab9..3071213 100644
--- a/src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql
+++ b/src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql
@@ -17,6 +17,12 @@ CREATE TABLE rls_test_restrictive (
 datainteger
 );
 
+-- At least one permissive policy must exist, otherwise
+-- the default deny policy will be applied.  For
+-- testing the only-restrictive-policies from the hook,
+-- create a simple 'allow all' policy.
+CREATE POLICY p1 ON rls_test_restrictive USING (true);
+
 -- initial test data
 INSERT INTO rls_test_restrictive VALUES ('r1','s1',1);
 INSERT INTO rls_test_restrictive VALUES ('r2','s2',2);
@@ -101,6 +107,8 @@ RESET ROLE;
 -- the hooks are combined correctly.
 CREATE POLICY 

[HACKERS] track_commit_timestamp and COMMIT PREPARED

2015-08-03 Thread Fujii Masao
Hi,

track_commit_timestamp tracks COMMIT PREPARED as expected in standby server,
but not in master server. Is this intentional? It should track COMMIT PREPARED
even in master? Otherwise, we cannot use commit_timestamp feature to check
the replication lag properly while we use 2PC.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-08-03 Thread Fujii Masao
On Mon, Aug 3, 2015 at 8:55 PM, Amit Kapila  wrote:
> On Thu, Jul 16, 2015 at 9:58 PM, Robert Haas  wrote:
>>
>> On Thu, Jul 16, 2015 at 9:41 AM, Fujii Masao 
>> wrote:
>> > Here are some minor comments:
>> >
>> > +ereport(LOG,
>> > +(errmsg("ignoring \"%s\" file because no
>> > \"%s\" file exists",
>> > +TABLESPACE_MAP, BACKUP_LABEL_FILE),
>> > + errdetail("could not rename file \"%s\" to
>> > \"%s\": %m",
>> > +   TABLESPACE_MAP,
>> > TABLESPACE_MAP_OLD)));
>> >
>> > WARNING is better than LOG here because it indicates a problematic case?
>>
>> No, that's not the right distinction.  Remember that, when sending
>> messages to the client, WARNING > LOG, and when sending messages to
>> the log, LOG > WARNING.  So messages that a user is more likely to
>> care about than the administrator should be logged at WARNNG; those
>> that the administrator is more likely to care about should be LOG.  I
>> think LOG is clearly the appropriate thing here.
>>
>> > In detail message, the first word of sentence needs to be capitalized.
>> >
>> > + errdetail("renamed file \"%s\" to \"%s\"",
>> > +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>> >
>> > In detail message, basically we should use a complete sentence.
>> > So like other similar detail messages in xlog.c, I think that it's
>> > better
>> > to use "\"%s\" was renamed to \"%s\"." as the detail message here.
>>
>> Right, that's what the style guidelines say.
>>
>
> I have changed the errdetail messages as per above suggestion.
> Also, there was one issue (it was displaying 2 messages when
> rename fails) in patch which I have fixed in the updated version.

Thanks! Pushed.

BTW, while reading the code related to tablespace_map, I found that
CancelBackup() emits the WARNING message "online backup mode was not canceled"
when rename() fails. Isn't this confusing (or incorrect)? ISTM that we can
see that the online backup mode has already been canceled if backup_label file
is successfully removed whether tablespace_map file remains or not. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS restrictive hook policies

2015-08-03 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> In get_row_security_policies():
> 
> /*
>  * If the only built-in policy is the default-deny one, and hook policies
>  * exist, then use the hook policies only and do not apply the
>  * default-deny policy.  Otherwise, we will apply both sets below.
>  */
> if (defaultDeny &&
> (hook_policies_restrictive != NIL || hook_policies_permissive != NIL))
> {
> rowsec_expr = NULL;
> rowsec_with_check_expr = NULL;
> }
> 
> So if the only policies that exist are restrictive policies from an
> extension, the default-deny policy is not applied and the restrictive
> policies are applied instead, which is effectively a "default-allow"
> situation with each restrictive policy further limiting access to the
> table.

Right, the idea there being that applying the default-deny would render
the restrictive policies pointless, since nothing would ever be visible,
however..

> I think this is potentially very dangerous when both kinds of policy
> exist. Consider for example a situation where initially multiple
> permissive policies are set up for different users allowing them
> access to different subsets of the table, and users not covered by any
> of those permissive policies have no access. Then suppose that later a
> restrictive policy is added, applying to all users -- the intention
> being to prevent any user from accessing some particularly sensitive
> data. The result of adding that restrictive policy would be that all
> the users who previously had no access at all would suddenly have
> access to everything not prohibited by the restrictive policy.
> 
> That goes against everything I would expect from a restrictive policy
> -- adding more restrictive policies should only ever reduce the number
> of rows visible, not ever open up more. So I think the test above
> should only disable the default-deny policy if there are any
> permissive policies from the extension.
> 
> Of course that will make life a little harder for people who only want
> to use restrictive policies, since they will be forced to first add a
> permissive policy, possibly just one that's always true, but I think
> it's the safest approach.

Requiring a permissive policy which allows the records first, to avoid
the default-deny, makes a ton of sense.

> Possibly if/when we add proper SQL support for defining restrictive
> policies, we might also add an option to ALTER TABLE ... ENABLE ROW
> LEVEL SECURITY to allow a table to have RLS enabled in default-allow
> mode, for users who know they only want to add restrictive policies.

Perhaps...  I'm not sure that's really better than simply requiring a
'create policy p1 on t1 using (true);' to be done.

> However, I think that should be something the user has to explicitly
> ask for, not an automatic decision based on the existence of
> restrictive policies.

Agreed.  I'm happy to commit that change and back-patch it to 9.5,
barring objections.  Given that the only way to have restrictive
policies currently is using hooks, I don't believe there's any
documentation update required either; of course, I'll update the comment
to reflect this discussion/decision and make any necessary changes to
test_rls_hooks.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-08-03 Thread Fujii Masao
On Fri, Jul 17, 2015 at 12:28 PM, Michael Paquier
 wrote:
> On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao  wrote:
>> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas  wrote:
>>> On 06/29/2015 09:44 AM, Michael Paquier wrote:

 On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:
>
> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
> it
> would be enough to special-case pg_xlog for now.


 Well, sure, pg_rewind does not copy the soft links either way. Now it
 would be nice to have an option to be able to recreate the soft link
 of at least pg_xlog even if it can be scripted as well after a run.
>>>
>>>
>>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
>>> any non-trivial scenarios, just copying all the files from pg_xlog isn't
>>> enough anyway, and you need to set up a recovery.conf after running
>>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the
>>> WAL. So you can argue that by not copying pg_xlog automatically, we're
>>> actually doing a favour to the DBA, by forcing him to set up the
>>> recovery.conf file correctly. Because if you just test simple scenarios
>>> where not much time has passed between the failover and running pg_rewind,
>>> it might be enough to just copy all the WAL currently in pg_xlog, but it
>>> would not be enough if more time had passed and not all the required WAL is
>>> present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
>>> copying, as restore_command or streaming replication will only copy what's
>>> needed, while pg_rewind would copy all WAL it can find the target's data
>>> directory.
>>>
>>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog
>>> option. It would be nice to also add an optional --xlog option to pg_rewind,
>>> but with pg_rewind it's possible that all the required WAL isn't present in
>>> the pg_xlog directory anymore, so you wouldn't always achieve the same
>>> effect of making the backup self-contained.
>>>
>>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
>>> in both the source and the target.
>>
>> If pg_xlog is simply ignored, some old WAL files may remain in target server.
>> Don't these old files cause the subsequent startup of target server as new
>> standby to fail? That is, it's the case where the WAL file with the same name
>> but different content exist both in target and source. If that's harmfull,
>> pg_rewind also should remove the files in pg_xlog of target server.
>
> This would reduce usability. The rewound node will replay WAL from the
> previous checkpoint where WAL forked up to the minimum recovery point
> of source node where pg_rewind has been run. Hence if we remove
> completely the contents of pg_xlog we'd lose a portion of the logs
> that need to be replayed until timeline is switched on the rewound
> node when recovering it (while streaming from the promoted standby,
> whatever).

Even if we remove the WAL files in *target server", we don't lose any
files in *source server" that we will need to replay later.

> I don't really see why recycled segments would be a
> problem, as that's perhaps what you are referring to, but perhaps I am
> missing something.

Please imagine the case where the WAL files with the same name were
created in both servers after the fork. Their contents may be different.
After pg_rewind is executed successfully, the rewound server
(i.e., target server) should retrieve and replay that WAL file from
the *source* server. But the problem is that the rewound server tries to
replay the WAL file from its local since the file exists locally (even
if primary_conninfo is specified).

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-08-03 Thread Ildus Kurbangaliev

On 07/28/2015 10:28 PM, Heikki Linnakangas wrote:

On 07/27/2015 01:20 PM, Ildus Kurbangaliev wrote:

Hello.
In the attached patch I've made a refactoring for tranches.
The prefix for them was extended,  and I've did a split of LWLockAssign
to two
functions (one with tranche and second for user defined LWLocks).


This needs some work in order to be maintainable:

* The patch requires that the LWLOCK_INDIVIDUAL_NAMES array is kept in 
sync with the list of individual locks in lwlock.h. Sooner or later 
someone will add an LWLock and forget to update the names-array. That 
needs to be made less error-prone, so that the names are maintained in 
the same place as the #defines. Perhaps something like rmgrlist.h.


* The "base" tranches are a bit funny. They all have the same 
array_base, pointing to MainLWLockArray. If there are e.g. 5 clog 
buffer locks, I would expect the T_NAME() to return "ClogBufferLocks" 
for all of them, and T_ID() to return numbers between 0-4. But in 
reality, T_ID() will return something like 55-59.


Instead of passing a tranche-id to LWLockAssign(), I think it would be 
more clear to have a new function to allocate a contiguous block of 
lwlocks as a new tranche. It could then set the base correctly.


* Instead of having LWLOCK_INDIVIDUAL_NAMES to name "individual" 
locks, how about just giving each one of them a separate tranche?


* User manual needs to be updated to explain the new column in 
pg_stat_activity.


- Heikki



Hello. Thanks for review. I attached new version of patch.

It adds new field in pg_stat_activity that shows current wait in backend.

I've did a some refactoring LWLocks tranche mechanism. In lwlock.c only
invididual and user defined LWLocks is creating, other LWLocks created by
modules who need them. I think that is more logical (user know about module,
not module about of all users). It also simplifies LWLocks acquirement.

Now each individual LWLock and other groups of LWLocks have their 
tranche, and can
be easily identified. If somebody will add new individual LWLock and 
forget to add
its name, postgres will show a message. Individual LWLocks still 
allocated in

one array but tranches for them point to particular index in main array.

Sample:

b1=# select pid, wait_event from pg_stat_activity; \watch 1

 pid  |   wait_event
--+-
 7722 | Storage: READ
 7653 |
 7723 | Network: WRITE
 7725 | Network: READ
 7727 | Locks: Transaction
 7731 | Storage: READ
 7734 | Network: READ
 7735 | Storage: READ
 7739 | LWLocks: WALInsertLocks
 7738 | Locks: Transaction
 7740 | LWLocks: BufferMgrLocks
 7741 | Network: READ
 7742 | Network: READ
 7743 | Locks: Transaction


--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e64b7ef..2e4e67e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -630,6 +630,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  True if this backend is currently waiting on a lock
 
 
+ wait_event
+ text
+ Name of a current wait event in backend
+
+
  state
  text
  Current overall state of this backend.
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..10c25cf 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -457,7 +457,8 @@ CLOGShmemInit(void)
 {
 	ClogCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, "CLOG Ctl", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  CLogControlLock, "pg_clog");
+  CLogControlLock, "pg_clog",
+  "CLogBufferLocks");
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..dd7578f 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -466,7 +466,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs Ctl", CommitTsShmemBuffers(), 0,
-  CommitTsControlLock, "pg_commit_ts");
+  CommitTsControlLock, "pg_commit_ts",
+  "CommitTSBufferLocks");
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
 	 sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1933a87..b905c59 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1842,10 +1842,12 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
   "MultiXactOffset Ctl", NUM_MXACTOFFSET_BUFFERS, 0,
-  MultiXactOffsetControlLock, "pg_multixact/offsets");
+  MultiXactOffsetControlLock, "pg_multixact/offsets",
+  "MultiXactOffsetBufferLocks");
 	SimpleLruInit(MultiXactMemberCtl,
   "MultiXactMember Ctl", NUM_MXACTMEMBER_BUFFERS, 0,
-  MultiXactMemberControlLock

Re: [HACKERS] Autonomous Transaction is back

2015-08-03 Thread Merlin Moncure
On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi
 wrote:
> On 31 July 2015 23:10, Robert Haas Wrote:
>>I think we're going entirely down the wrong path here.  Why is it ever useful 
>>for a backend's lock requests to conflict with themselves, even with 
>>autonomous transactions?  That seems like an artifact of somebody else's 
>>implementation that we should be happy we don't need to copy.
>
> IMHO, since most of the locking are managed at transaction level not backend 
> level and we consider main & autonomous transaction to be independent 
> transaction, then practically they may conflict right.
> It is also right as you said that there is no as such useful use-cases where 
> autonomous transaction conflicts with main (parent) transaction. But we 
> cannot take it for granted as user might make a mistake. So at-least we 
> should have some mechanism to handle this rare case, for which as of now I 
> think throwing error from autonomous transaction as one of the solution. Once 
> error thrown from autonomous transaction, main transaction may continue as it 
> is (or abort main transaction also??).

hm.  OK, what's the behavior of:

BEGIN
  UPDATE foo SET x = x + 1 WHERE foo_id = 1;

  BEGIN WITH AUTONOMOUS TRANSACTION
UPDATE foo SET x = x + 1 WHERE foo_id = 1;
  END;

  RAISE EXCEPTION ...;
EXCEPTION ...

END;

Also,
*) What do the other candidate implementations do?  IMO, compatibility
should be the underlying design principle.

*) What will the "SQL only" feature look like?

*) Is the SPI interface going to be extended to expose AT?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-03 Thread Amit Kapila
On Fri, Jul 31, 2015 at 10:11 AM, Amit Kapila 
wrote:
>
> On Wed, Jul 29, 2015 at 11:48 PM, Andres Freund 
wrote:
> >
> > On 2015-07-29 12:54:59 -0400, Robert Haas wrote:
> > > I would try to avoid changing lwlock.c.  It's pretty easy when so
> > > doing to create mechanisms that work now but make further upgrades to
> > > the general lwlock mechanism difficult.  I'd like to avoid that.
> >
> > I'm massively doubtful that re-implementing parts of lwlock.c is the
> > better outcome. Then you have two different infrastructures you need to
> > improve over time.
>
> I agree and modified the patch to use 32-bit atomics based on idea
> suggested by Robert and didn't modify lwlock.c.

While looking at patch, I found that the way it was initialising the list
to be empty was wrong, it was using pgprocno as 0 to initialise the
list, as 0 is a valid pgprocno.  I think we should use a number greater
that PROCARRAY_MAXPROC (maximum number of procs in proc
array).

Apart from above fix, I have modified src/backend/access/transam/README
to include the information about the improvement this patch brings to
reduce ProcArrayLock contention.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_xid_clearing_at_trans_end_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-08-03 Thread Heikki Linnakangas

On 08/03/2015 07:01 AM, Michael Paquier wrote:

On Sun, Aug 2, 2015 at 4:01 AM, Heikki Linnakangas wrote:

Perhaps it's best if we copy all the WAL files from source in copy-mode, but
not in libpq mode. Regarding old WAL files in the target, it's probably best
to always leave them alone. They should do no harm, and as a general
principle it's best to avoid destroying evidence.

It'd be nice to get some fix for this for alpha2, so I'll commit a fix to do
that on Monday, unless we come to a different conclusion before that.


+1. Both things sound like a good plan to me.


I had some trouble implementing that. Recovery seemed to get confused 
sometimes, when it didn't find some of the WAL files in pg_xlog 
directory, even though it could fetch them through streaming 
replication. I'll have to investigate that further, but in the meantime, 
to have some fix in place for alpha2, I committed an even simpler fix 
for the immediate issue that pg_xlog is a symlink: just pretend that 
"pg_xlog" is a normal directory, even when it's a symlink.


I'll continue to investigate what was wrong with my initial attempt. And 
it would be nice to avoid copying the pre-allocated WAL files from the 
source, because it's really unnecessary. But this fixes the immediate 
problem that pg_rewind didn't work at all if pg_xlog was a symlink.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RLS restrictive hook policies

2015-08-03 Thread Dean Rasheed
In get_row_security_policies():

/*
 * If the only built-in policy is the default-deny one, and hook policies
 * exist, then use the hook policies only and do not apply the
 * default-deny policy.  Otherwise, we will apply both sets below.
 */
if (defaultDeny &&
(hook_policies_restrictive != NIL || hook_policies_permissive != NIL))
{
rowsec_expr = NULL;
rowsec_with_check_expr = NULL;
}

So if the only policies that exist are restrictive policies from an
extension, the default-deny policy is not applied and the restrictive
policies are applied instead, which is effectively a "default-allow"
situation with each restrictive policy further limiting access to the
table.

I think this is potentially very dangerous when both kinds of policy
exist. Consider for example a situation where initially multiple
permissive policies are set up for different users allowing them
access to different subsets of the table, and users not covered by any
of those permissive policies have no access. Then suppose that later a
restrictive policy is added, applying to all users -- the intention
being to prevent any user from accessing some particularly sensitive
data. The result of adding that restrictive policy would be that all
the users who previously had no access at all would suddenly have
access to everything not prohibited by the restrictive policy.

That goes against everything I would expect from a restrictive policy
-- adding more restrictive policies should only ever reduce the number
of rows visible, not ever open up more. So I think the test above
should only disable the default-deny policy if there are any
permissive policies from the extension.

Of course that will make life a little harder for people who only want
to use restrictive policies, since they will be forced to first add a
permissive policy, possibly just one that's always true, but I think
it's the safest approach.

Possibly if/when we add proper SQL support for defining restrictive
policies, we might also add an option to ALTER TABLE ... ENABLE ROW
LEVEL SECURITY to allow a table to have RLS enabled in default-allow
mode, for users who know they only want to add restrictive policies.
However, I think that should be something the user has to explicitly
ask for, not an automatic decision based on the existence of
restrictive policies.

Thoughts?

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-08-03 Thread Ashutosh Bapat
On Sat, Aug 1, 2015 at 12:18 AM, Robert Haas  wrote:

> On Fri, Jul 31, 2015 at 6:33 AM, Ashutosh Bapat
>  wrote:>
> >> I'm not hung up on the table-level attribute, but I think having a
> >> server-level attribute rather than a global GUC is a good idea.
> >> However, I welcome other thoughts on that.
> >
> > The patch supports server level attribute. Let me repeat the relevant
> > description from my earlier mail
> > --
> > Every FDW needs to register the connection while starting new
> transaction on
> > a foreign connection (RegisterXactForeignServer()). A foreign server
> > connection is identified by foreign server oid and the local user oid
> > (similar to the entry cached by postgres_fdw). While registering, FDW
> also
> > tells whether the foreign server is capable of participating in two-phase
> > commit protocol. How to decide that is left entirely to the FDW. An FDW
> like
> > file_fdw may not have 2PC support at all, so all its foreign servers do
> not
> > comply with 2PC. An FDW might have all its servers 2PC compliant. An FDW
> > like postgres_fdw can have some of its servers compliant and some not,
> > depending upon server version, configuration (max_prepared_transactions
> = 0)
> > etc.
> > --
> >
> > Does that look good?
>
> OK, sure.  But let's make sure postgres_fdw gets a server-level option
> to control this.
>
>
For postgres_fdw it's a boolean server-level option 'twophase_compliant'
(suggestions for name welcome).


> >> > Done, there are three hooks now
> >> > 1. For preparing a foreign transaction
> >> > 2. For resolving a prepared foreign transaction
> >> > 3. For committing/aborting a running foreign transaction (more
> >> > explanation
> >> > later)
> >>
> >> (2) and (3) seem like the same thing.  I don't see any further
> >> explanation later in your email; what am I missing?
> >
> > In case of postgres_fdw, 2 always fires COMMIT/ROLLBACK PREPARED 'xyz'
> (fill
> > the prepared transaction id) and 3 always fires COMMIT/ABORT TRANSACTION
> > (notice absence of PREPARED and 'xyz').
>
> Oh, OK.  But then isn't #3 something we already have?  i.e.
> pgfdw_xact_callback?
>

While transactions are being prepared on the foreign connections, if any
prepare fails, we have to abort transactions on the rest of the connections
(and abort the prepared transactions). pgfdw_xact_callback wouldn't know,
which connections have prepared transactions and which do not have. So,
even in case of two-phase commit we need all the three hooks. Since we have
to define these three hooks, we might as well centralize all the
transaction processing and let the foreign transaction manager decide which
of the hooks to invoke. So, the patch moves most of the code in
pgfdw_xact_callback in the relevant hook and foreign transaction manager
invokes appropriate hook. Only thing that remains in pgfdw_xact_callback
now is end of transaction handling like resetting cursor numbering.


> >> That seems totally broken.  Before PITR, the database might be
> >> inconsistent, in which case you can't call any functions at all.
> >> Also, you shouldn't be trying to resolve any transactions until the
> >> end of recovery, because you don't know when you see that the
> >> transaction was prepared whether, at some subsequent time, you will
> >> see it resolved.  You need to finish recovery and, only after entering
> >> normal running, decide whether to resolve any transactions that are
> >> still sitting around.
> >
> > That's how it works in the patch for unresolved prepared foreign
> > transactions belonging to xids within the known range. For those
> belonging
> > to xids in future (beyond of known range of xids after PITR), we can not
> > determine the status of that local transaction (as those do not appear in
> > the xlog) and hence can not decide the fate of prepared foreign
> transaction.
> > You seem to be suggesting that we should let the recovery finish and mark
> > those prepared foreign transaction as "can not be resolved" or something
> > like that. A DBA can remove those entries once s/he has dealt with them
> on
> > the foreign server.
> >
> > There's little problem with that approach. Triplet (xid, serverid,
> userid)
> > is used to identify the a foreign prepared transaction entry in memory
> and
> > is used to create unique file name for storing it on the disk. If we
> allow a
> > future xid after PITR, it might conflict with an xid of a transaction
> that
> > might take place after PITR. It will cause problem if exactly same
> foreign
> > server and user participate in the transaction with conflicting xid (rare
> > but possible).
> >
> > Other problem is that the foreign server on which the transaction was
> > prepared (or the user whose mapping was used to prepare the transaction),
> > might have got added in a future time wrt PITR, in which case, we can not
> > even know which foreign server this transaction was prepared on.
> >
> >> There should be no situation (short of e.g. OS
> >> errors writing the s

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-08-03 Thread Amit Kapila
On Thu, Jul 16, 2015 at 9:58 PM, Robert Haas  wrote:
>
> On Thu, Jul 16, 2015 at 9:41 AM, Fujii Masao 
wrote:
> > Here are some minor comments:
> >
> > +ereport(LOG,
> > +(errmsg("ignoring \"%s\" file because no
> > \"%s\" file exists",
> > +TABLESPACE_MAP, BACKUP_LABEL_FILE),
> > + errdetail("could not rename file \"%s\" to
> > \"%s\": %m",
> > +   TABLESPACE_MAP,
TABLESPACE_MAP_OLD)));
> >
> > WARNING is better than LOG here because it indicates a problematic case?
>
> No, that's not the right distinction.  Remember that, when sending
> messages to the client, WARNING > LOG, and when sending messages to
> the log, LOG > WARNING.  So messages that a user is more likely to
> care about than the administrator should be logged at WARNNG; those
> that the administrator is more likely to care about should be LOG.  I
> think LOG is clearly the appropriate thing here.
>
> > In detail message, the first word of sentence needs to be capitalized.
> >
> > + errdetail("renamed file \"%s\" to \"%s\"",
> > +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> >
> > In detail message, basically we should use a complete sentence.
> > So like other similar detail messages in xlog.c, I think that it's
better
> > to use "\"%s\" was renamed to \"%s\"." as the detail message here.
>
> Right, that's what the style guidelines say.
>

I have changed the errdetail messages as per above suggestion.
Also, there was one issue (it was displaying 2 messages when
rename fails) in patch which I have fixed in the updated version.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


rename_mapfile_if_backupfile_not_present_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect patch, for showing tuple data

2015-08-03 Thread Nikolay Shaplov
В письме от 3 августа 2015 14:30:46 пользователь Michael Paquier написал:
> On Mon, Aug 3, 2015 at 1:03 AM, Nikolay Shaplov
> 
>  wrote:
> > Hi!
> > 
> > I've created a patch for pageinspect that allows to see data stored in the
> > tuple.
> > 
> > This patch has two main purposes:
> > 
> > 1. Practical: Make manual DB recovery more simple
> 
> To what are you referring to in this case? Manual manipulation of
> on-disk data manually?
Yes, when DB is broken for example

> 
> > b) I have plenty of sanity check while reading parsing that tuple, for
> > this
> > function I've changed error level from ERROR to WARNING. This function
> > will
> > allow to write proper tests that all these checks work as they were
> > designed (I hope to write these tests sooner or later)
> 
> +ereport(inter_call_data->error_level,
> +(errcode(ERRCODE_DATA_CORRUPTED),
> +errmsg("Data corruption: Iterating over
> tuple data reached out of actual tuple size")));
> I don't think that the possibility to raise a WARNING is a good thing
> in those code paths. If data is corrupted this may crash, and I am not
> sure that anybody would want that even for educational purposes.
Hm... I considered _heap_page_items really very dangerous function, with big 
red "Do not call it if you not sure what are you doing" warning. Reading data 
with not proper attribute descriptor is dangerous any way. But when I wrote 
that code, I did not have that checks at first, and it was really interesting 
to subst one data with another and see what will happen. And I thought that 
may be other explorers will like to do the same. And it is really possible 
only in warning mode. So I left warnings only in _heap_page_items, and set 
errors for all other cases.


-- 
Nikolay Shaplov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-08-03 Thread Alexander Korotkov
On Mon, Aug 3, 2015 at 12:27 PM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> 1) Test and results are in attachments. Everything seems to work as
> expected.
> 2) I dropped these notices. It was done only for debug purposes. Updated
> patch is attached.
> 3) fixed
>

Good! Another couple of notes from me:
1) I think gistvacuumpage() and gistkillitems() need function-level
comments.
2) ItemIdIsDead() can be used in index scan like it's done
in _bt_checkkeys().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-08-03 Thread Anastasia Lubennikova



30.07.2015 16:33, Alexander Korotkov пишет:

Hi!

On Thu, Jul 30, 2015 at 2:51 PM, Anastasia Lubennikova 
mailto:lubennikov...@gmail.com>> wrote:


I have written microvacuum support for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all
transactions it is marked LP_DEAD and page is marked as "has dead
tuples",
2. Then, when insert touches full page which has dead tuples it
calls microvacuum instead of splitting page.
You can find a kind of review here [1].

[1]

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120

Patch is in attachements. Please review it.


Nice!

Some notes about this patch.

1) Could you give same test case demonstrating that microvacuum really 
work with patch? Finally, we should get index less growing with 
microvacuum.


2) Generating notices for every dead tuple would be too noisy. I 
suggest to replace notice with one of debug levels.


+ elog(NOTICE, "gistkillitems. Mark Item Dead offnum %hd, blkno %d", 
offnum, BufferGetBlockNumber(buffer));



3) Please, recheck coding style. For instance, this line needs more 
spaces and open brace should be on the next line.


+ if ((scan->kill_prior_tuple)&&(so->curPageData > 
0)&&(so->curPageData == so->nPageData)) {


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 


The Russian Postgres Company
1) Test and results are in attachments. Everything seems to work as 
expected.
2) I dropped these notices. It was done only for debug purposes. Updated 
patch is attached.

3) fixed
//

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 36,42  static bool gistinserttuples(GISTInsertState *state, 
GISTInsertStack *stack,
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! 
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
--- 36,42 
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
***
*** 209,214  gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
--- 209,225 
 * because the tuple vector passed to gistSplit won't include this 
tuple.
 */
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 
+   /*
+* If leaf page is full, try at first to delete dead tuples. And then
+* check again.
+*/
+   if ((is_split) && GistPageIsLeaf(page) && GistPageHasGarbage(page))
+   {
+   gistvacuumpage(rel, page, buffer);
+   is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+   }
+ 
if (is_split)
{
/* no space for insertion */
***
*** 1440,1442  freeGISTstate(GISTSTATE *giststate)
--- 1451,1519 
/* It's sufficient to delete the scanCxt */
MemoryContextDelete(giststate->scanCxt);
  }
+ 
+ static void
+ gistvacuumpage(Relation rel, Page page, Buffer buffer)
+ {
+   OffsetNumber deletable[MaxOffsetNumber];
+   int ndeletable = 0;
+   OffsetNumber offnum,
+   minoff,
+   maxoff;
+ 
+   /*
+* Scan over all items to see which ones need to be deleted according to
+* LP_DEAD flags.
+*/
+   maxoff = PageGetMaxOffsetNumber(page);
+   for (offnum = FirstOffsetNumber;
+offnum <= maxoff;
+offnum = OffsetNumberNext(offnum))
+   {
+   ItemId  itemId = PageGetItemId(page, offnum);
+ 
+   if (ItemIdIsDead(itemId))
+   deletable[ndeletable++] = offnum;
+   }
+ 
+   if (ndeletable > 0)
+   {
+   START_CRIT_SECTION();
+ 
+   PageIndexMultiDelete(page, deletable, ndeletable);
+ 
+   /*
+* Mark the page as not containing any LP_DEAD items.  This is 
not
+* certainly true (there might be some that have recently been 
marked,
+* but weren't included in our target-item list), but it will 
almost
+* always be true and it doesn't seem worth an additional pa

Re: [HACKERS] pg_rewind tap test unstable

2015-08-03 Thread Christoph Berg
Re: Michael Paquier 2015-07-28 

> On Tue, Jul 28, 2015 at 5:57 PM, Christoph Berg  wrote:
> > for something between 10% and 20% of the devel builds for apt.postgresql.org
> > (which happen every 6h if there's a git change, so it happens every few 
> > days),
> > I'm seeing this:
> > Dubious, test returned 1 (wstat 256, 0x100)
> > Failed 1/8 subtests
> >
> > I don't have the older logs available, but from memory, the subtest
> > failing and the two numbers mentioned are always the same.
> 
> There should be some output logs in src/bin/pg_rewind/tmp_check/log/*?
> Could you attach them here if you have them? That would be helpful to
> understand what is happening.

It took me a few attempts to tell the build environment to save a copy
on failure and not shred everything right away. So here we go:

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


master.log.gz
Description: application/gzip


regress_log_001_basic.gz
Description: application/gzip


regress_log_002_databases.gz
Description: application/gzip


regress_log_003_extrafiles.gz
Description: application/gzip


standby.log.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minimum tuple threshold to decide last pass of VACUUM

2015-08-03 Thread Simon Riggs
On 2 August 2015 at 13:13, Michael Paquier 
wrote:

> Hi all,
>
> Commit 4046e58c (dated of 2001) has introduced the following comment
> in vacuumlazy.c:
> +   /* If any tuples need to be deleted, perform final vacuum cycle */
> +   /* XXX put a threshold on min nuber of tuples here? */
> +   if (vacrelstats->num_dead_tuples > 0)
> In short, we may want to have a reloption to decide if we do or not
> the last pass of VACUUM or not depending on a given number of
> remaining tuples. Is this still something we would like to have?
>

I don't think we want a new user parameter, but we should have an internal
limit with a heuristic, similar to how we decide whether to truncate.

I would suggest this internal logic...

* If its a VACUUM FREEZE then index_scan_threshold = 0, i.e. always scan if
needed, since the user is requesting maximum vacuum

* For emergency anti-wraparound VACUUMs we shouldn't scan indexes at all,
since they aren't critical path activities at that point

* For normal VACUUMs we should scan indexes only if (num_dead_tuples * 20)
> (blocks to be scanned in any one index), which allows some index bloat
but not much

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services