Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 2:13 PM, Robert Haas  wrote:
> My first comment here is that I think we should actually teach
> heapam.c about parallelism.

I coded this up; see attached.  I'm also attaching an updated version
of the parallel count code revised to use this API.  It's now called
"parallel_count" rather than "parallel_dummy" and I removed some
stupid stuff from it.  I'm curious to see what other people think, but
this seems much cleaner to me.  With the old approach, the
parallel-count code was duplicating some of the guts of heapam.c and
dropping the rest on the floor; now it just asks for a parallel scan
and away it goes.  Similarly, if your parallel-seqscan patch wanted to
scan block-by-block rather than splitting the relation into equal
parts, or if it wanted to participate in the synchronized-seqcan
stuff, there was no clean way to do that.  With this approach, those
decisions are - as they quite properly should be - isolated within
heapam.c, rather than creeping into the executor.

(These patches should be applied over parallel-mode-v4.patch.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 096b3d5bdb4df5de095104fd3f58efa97e08a2ff
Author: Robert Haas 
Date:   Fri Feb 6 21:19:40 2015 -0500

Support parallel heap scans.

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 50bede8..abfe8c2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -62,6 +62,7 @@
 #include "storage/predicate.h"
 #include "storage/procarray.h"
 #include "storage/smgr.h"
+#include "storage/spin.h"
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
@@ -79,8 +80,10 @@ bool		synchronize_seqscans = true;
 static HeapScanDesc heap_beginscan_internal(Relation relation,
 		Snapshot snapshot,
 		int nkeys, ScanKey key,
+		ParallelHeapScanDesc parallel_scan,
 		bool allow_strat, bool allow_sync,
 		bool is_bitmapscan, bool temp_snap);
+static BlockNumber heap_parallelscan_nextpage(ParallelHeapScanDesc);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 	TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -221,7 +224,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
 	 * results for a non-MVCC snapshot, the caller must hold some higher-level
 	 * lock that ensures the interesting tuple(s) won't change.)
 	 */
-	scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);
+	if (scan->rs_parallel != NULL)
+		scan->rs_nblocks = scan->rs_parallel->phs_nblocks;
+	else
+		scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);
 
 	/*
 	 * If the table is large relative to NBuffers, use a bulk-read access
@@ -480,7 +486,18 @@ heapgettup(HeapScanDesc scan,
 tuple->t_data = NULL;
 return;
 			}
-			page = scan->rs_startblock; /* first page */
+			if (scan->rs_parallel != NULL)
+			{
+page = heap_parallelscan_nextpage(scan->rs_parallel);
+if (page >= scan->rs_nblocks)
+{
+	Assert(!BufferIsValid(scan->rs_cbuf));
+	tuple->t_data = NULL;
+	return;
+}
+			}
+			else
+page = scan->rs_startblock; /* first page */
 			heapgetpage(scan, page);
 			lineoff = FirstOffsetNumber;		/* first offnum */
 			scan->rs_inited = true;
@@ -503,6 +520,9 @@ heapgettup(HeapScanDesc scan,
 	}
 	else if (backward)
 	{
+		/* backward parallel scan not supported */
+		Assert(scan->rs_parallel == NULL);
+
 		if (!scan->rs_inited)
 		{
 			/*
@@ -655,11 +675,19 @@ heapgettup(HeapScanDesc scan,
 		}
 		else
 		{
-			page++;
-			if (page >= scan->rs_nblocks)
-page = 0;
-			finished = (page == scan->rs_startblock) ||
-(scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks <= 0 : false);
+			if (scan->rs_parallel != NULL)
+			{
+page = heap_parallelscan_nextpage(scan->rs_parallel);
+finished = (page >= scan->rs_nblocks);
+			}
+			else
+			{
+page++;
+if (page >= scan->rs_nblocks)
+	page = 0;
+finished = (page == scan->rs_startblock) ||
+	(scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks <= 0 : false);
+			}
 
 			/*
 			 * Report our new scan position for synchronization purposes. We
@@ -757,7 +785,18 @@ heapgettup_pagemode(HeapScanDesc scan,
 tuple->t_data = NULL;
 return;
 			}
-			page = scan->rs_startblock; /* first page */
+			if (scan->rs_parallel != NULL)
+			{
+page = heap_parallelscan_nextpage(scan->rs_parallel);
+if (page >= scan->rs_nblocks)
+{
+	Assert(!BufferIsValid(scan->rs_cbuf));
+	tuple->t_data = NULL;
+	return;
+}
+			}
+			else
+page = scan->rs_startblock; /* first page */
 			heapgetpage(scan, page);
 			lineindex = 0;
 			scan->rs_inited = true;
@@ -777,6 +816,9 @@ heapgettup_pagemode(HeapScanDesc scan,
 	}
 	else if (backward)
 	{
+		/* backward parallel scan not supported */
+		Assert(scan->rs_parallel == NULL);
+
 		if (!scan->rs_inited

Re: [HACKERS] New CF app deployment

2015-02-06 Thread Peter Geoghegan
On Fri, Feb 6, 2015 at 4:02 PM, Jeff Janes  wrote:
> I liked to add comments which would point out some fact that was important
> to testing but which was non-obvious. Often this fact was mentioned
> somewhere in the 300 message thread, but it needs to be called out
> specifically for people interested in testing but not so interested in
> architectural debates.  Obviously adding another email to a overly-long
> thread is going the wrong way when it comes to making things stand out
> better.  (Also, if the comment is about a uncommitted dependency, then the
> comment can be deleted once the dependency is committed)

Good point. I think that your role in testing is very valuable, and
that we should make a particular effort to accommodate that sort of
thing.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Jeff Janes
On Fri, Feb 6, 2015 at 5:20 AM, Robert Haas  wrote:

> On Fri, Feb 6, 2015 at 5:51 AM, Magnus Hagander 
> wrote:
> > So in an attempt to actually move this forward in a constructive way I'm
> > going to ignore  a bunch of what happened after this email, and fork the
> > discussion at this point.
>
> Thanks, and I probably owe you an apology for some of that, so, sorry
> about that.
>
> I think the core of the problem here is that the old application saw
> its goal in life as *summarizing* the thread.  The idea is that people
> would go in and add comments (which could be flagged as comment,
> patch, or review) pointing to particularly important messages in the
> discussion.  The problem with this is that it had to be manually
> updated, and some people didn't like that.[1]  The new app attaches
> the entire thread, which has the advantage that everything is always
> there.  The problem with that is that the unimportant stuff is there,
> too, and there's no way to mark the important stuff so that you can
> distinguish between that and the unimportant stuff.  I think that's
> the problem we need to solve.
>

I'd like the ability to add a comment which does not get turned into an
email.

I liked to add comments which would point out some fact that was important
to testing but which was non-obvious. Often this fact was mentioned
somewhere in the 300 message thread, but it needs to be called out
specifically for people interested in testing but not so interested in
architectural debates.  Obviously adding another email to a overly-long
thread is going the wrong way when it comes to making things stand out
better.  (Also, if the comment is about a uncommitted dependency, then the
comment can be deleted once the dependency is committed)


>
> One thing that would probably *help* is if the list of attachments
> mentioned the names of the files that were attached to each message
> rather than just noting that they have some kind of attachment.  If
> people name their attachments sensibly, then you'll be able to
> distinguish parallel-seqscan-v23.patch from
> test-case-that-breaks-parallel-seqscan.sql, and that would be nice.
>

Yes, I was going to request that as well.

Cheers,

Jeff


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-06 Thread Bruce Momjian
On Wed, Feb  4, 2015 at 04:49:46PM -0800, Peter Geoghegan wrote:
> On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund  wrote:
> > A first (not actually that quick :() look through the patches to see
> > what actually happened in the last months. I didn't keep up with the
> > thread.
> 
> So, let me get this out of the way: This is the first in-depth
> technical review that this work has had in a long time. Thank you for
> your help here.

I looked at all the patches too.  The patch is only 9k lines, not huge. 

Other than the locking part, the biggest part of this patch is adjusting
things so that an INSERT can change into an UPDATE.  The code that
handles SELECT/INSERT/UPDATE/DELETE is already complex, and this makes
it even more so.  I have no idea how we can be sure we have hit every
single case, but I am also unclear how we will _ever_ know we have hit
them all.

We know people want this feature, and this patch seems to be our best
bet to getting it.  If we push this off for 9.6, I am not sure what that
buys us.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 2:13 PM, Robert Haas  wrote:
> The complicated part here seems to me to figure out what we need to
> pass from the parallel leader to the parallel worker to create enough
> state for quals and projection.  If we want to be able to call
> ExecScan() without modification, which seems like a good goal, we're
> going to need a ScanState node, which is going to need to contain
> valid pointers to (at least) a ProjectionInfo, an ExprContext, and a
> List of quals.  That in turn is going to require an ExecutorState.
> Serializing those things directly doesn't seem very practical; what we
> instead want to do is figure out what we can pass that will allow easy
> reconstruction of those data structures.  Right now, you're passing
> the target list, the qual list, the range table, and the params, but
> the range table doesn't seem to be getting used anywhere.  I wonder if
> we need it.  If we could get away with just passing the target list
> and qual list, and params, we'd be doing pretty well, I think.  But
> I'm not sure exactly what that looks like.

IndexBuildHeapRangeScan shows how to do qual evaluation with
relatively little setup:

estate = CreateExecutorState();
econtext = GetPerTupleExprContext(estate);
slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));

/* Arrange for econtext's scan tuple to be the tuple under test */
econtext->ecxt_scantuple = slot;

/* Set up execution state for predicate, if any. */
predicate = (List *)
ExecPrepareExpr((Expr *) indexInfo->ii_Predicate,
estate);

Then, for each tuple:

   ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);

And:

if (!ExecQual(predicate, econtext, false))
continue;

This looks like a good model to follow for parallel sequential scan.
The point though is that I think we should do it directly rather than
letting the portal machinery do it for us.  Not sure how to get
projection working yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Early Setup of instrumentation information in pg_stat_statements

2015-02-06 Thread Robert Haas
On Thu, Feb 5, 2015 at 10:28 AM, Amit Kapila  wrote:
> Currently in pg_stat_statements, the setup to track
> instrumentation/totaltime information is done after
> ExecutorStart().  Can we do this before ExecutorStart()?
> In particular, I am referring to below code:
>
> static void
> pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
> {
> ..
> standard_ExecutorStart(queryDesc, eflags);
> ..
> if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0)
> {
> ..
> if (queryDesc->totaltime == NULL)
> {
> ..
> queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL);
> ..
> }
> }
> }
>
> The reason why I am asking is that to track instrumentation
> information (like buffer usage parameters) in case of parallel
> sequence scan, we need to pass this information at start of
> backend workers which are started in ExecutorStart() phase
> and at that time, there is no information available which can
> guarantee (we have queryId stored in planned stmt, but I think
> that is not sufficient to decide) that such an information is
> needed by plugin.  This works well for Explain statement as
> that has the information for instrumentation available before
> ExecutorStart() phase.
>
> Please suggest me if there is a better way to make this
> information available in ExecutorStart() phase?

This seems like a clue that maybe starting the workers during the
ExecutorStart() phase is a bad idea.  It might be a bad idea for other
reasons, too: what happens if that node never gets executed?  I think
the general pattern of executor nodes is that they're not supposed to
do any real work until somebody tries to pull a tuple from them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 12:34 PM, Robert Haas  wrote:
> 4.

Obviously that went out a bit too soon.  Anyway, what I think we
should do here is back up a bit and talk about what the problems are
that we need to solve here and how each of them should be solved.  I
think there is some good code in this patch, but we really need to
think about what the interfaces should look like and achieve a clean
separation of concerns.

Looking at the code for the non-parallel SeqScan node, there are
basically two things going on here:

1. We call heap_getnext() to get the next tuple and store it into a
TupleTableSlot.
2. Via ExecScan(), we do projection and apply quals.

My first comment here is that I think we should actually teach
heapam.c about parallelism.  In other words, let's have an interface
like this:

extern Size heap_parallelscan_estimate(Snapshot snapshot);
extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
Relation relation, Snapshot snapshot);
extern HeapScanDesc heap_beginscan_parallel(ParallelHeapScanDesc);

So the idea is that if you want to do a parallel scan, you call
heap_parallelscan_estimate() to figure out how much space to reserve
in your dynamic shared memory segment.  Then you call
heap_parallelscan_initialize() to initialize the chunk of memory once
you have it.  Then each backend that wants to assist in the parallel
scan calls heap_beginscan_parallel() on that chunk of memory and gets
its own HeapScanDesc.  Then, they can all call heap_getnext() just as
in the non-parallel case.  The ParallelHeapScanDesc needs to contain
the relation OID, the snapshot, the ending block number, and a
current-block counter.  Instead of automatically advancing to the next
block, they use one of Andres's nifty new atomic ops to bump the
current-block counter and then scan the block just before the new
value.  All this seems pretty straightforward, and if we decide to
later change the way the relation gets scanned (e.g. in 1GB chunks
rather than block-by-block) it can be handled here pretty easily.

Now, let's suppose that we have this interface and for some reason we
don't care about quals and projection - we just want to get the tuples
back to the master.  It's easy enough to create a parallel context
that fires up a worker and lets the worker call
heap_beginscan_parallel() and then heap_getnext() in a loop, but what
does it do with the resulting tuples?  We need a tuple queue that can
be used to send the tuples back to master.  That's also pretty easy:
just set up a shm_mq and use shm_mq_send() to send each tuple.  Use
shm_mq_receive() in the master to read them back out.  The only thing
we need to be careful about is that the tuple descriptors match.  It
must be that they do, because the way the current parallel context
patch works, the master is guaranteed to hold a lock on the relation
from before the worker starts up until after it dies.  But we could
stash the tuple descriptor in shared memory and cross-check that it
matches just to be sure.  Anyway, this doesn't seem terribly complex
although we might want to wrap some abstraction around it somehow so
that every kind of parallelism that uses tuple queues can benefit from
it.  Perhaps this could even be built into the parallel context
machinery somehow, or maybe it's something executor-specific.  At any
rate it looks simpler than what you've got now.

The complicated part here seems to me to figure out what we need to
pass from the parallel leader to the parallel worker to create enough
state for quals and projection.  If we want to be able to call
ExecScan() without modification, which seems like a good goal, we're
going to need a ScanState node, which is going to need to contain
valid pointers to (at least) a ProjectionInfo, an ExprContext, and a
List of quals.  That in turn is going to require an ExecutorState.
Serializing those things directly doesn't seem very practical; what we
instead want to do is figure out what we can pass that will allow easy
reconstruction of those data structures.  Right now, you're passing
the target list, the qual list, the range table, and the params, but
the range table doesn't seem to be getting used anywhere.  I wonder if
we need it.  If we could get away with just passing the target list
and qual list, and params, we'd be doing pretty well, I think.  But
I'm not sure exactly what that looks like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 9:43 AM, Amit Kapila  wrote:
> Here is the latest patch which fixes reported issues and supported
> Prepared Statements and Explain Statement for parallel sequential
> scan.
>
> The main purpose is to get the feedback if possible on overall
> structure/design of code before I goahead.

I'm not very happy with the way this is modularized:

1. The new parallel sequential scan node runs only in the master.  The
workers are running a regular sequential scan with a hack to make them
scan a subset of the blocks.  I think this is wrong; parallel
sequential scan shouldn't require this kind of modifications to the
non-parallel case.

2. InitiateWorkers() is entirely specific to the concerns of parallel
sequential scan.  After looking this over, I think there are three
categories of things that need to be clearly separated.  Some stuff is
going to be needed for any parallel query; some stuff is going to be
needed only for parallel scans but will be needed for any type of
parallel scan, not just parallel sequential scan[1]; some stuff is
needed for any type of node that returns tuples but not for nodes that
don't return tuples (e.g. needed for ParallelSeqScan and
ParallelHashJoin, but not needed for ParallelHash); and some stuff is
only going to be needed for parallel sequential scan specifically.
This patch mixes all of those concerns together in a single function.
That won't do; this needs to be easily extensible to whatever someone
wants to parallelize next.

3. I think the whole idea of using the portal infrastructure for this
is wrong.  We've talked about this before, but the fact that you're
doing it this way is having a major impact on the whole design of the
patch, and I can't believe it's ever going to be committable this way.
To create a portal, you have to pretend that you received a protocol
message, which you didn't; and you have to pretend there is an SQL
query so you can call PortalDefineQuery.   That's ugly.  As far as I
can see the only thing we really get out of any of that is that we can
use the DestReceiver stuff to get the tuples back to the master, but
that doesn't really work either, because you're having to hack
printtup.c anyway.  So from my point of view you're going through a
bunch of layers that really don't have any value.  Considering the way
the parallel mode patch has evolved, I no longer think there's much
point to passing anything other than raw tuples between the backends,
so the whole idea of going through a deform/send/recv/form cycle seems
like something we can entirely skip.

4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] It is of course arguable whether a parallel index-scan or parallel
bitmap index-scan or parallel index-only-scan or parallel custom scan
makes sense, but this patch shouldn't assume that we won't want to do
those things.  We have other places in the code that know about the
concept of a scan as opposed to some other kind of executor construct,
and we should preserve that distinction here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-06 Thread Tom Lane
Etsuro Fujita  writes:
> On 2015/02/03 15:32, Ashutosh Bapat wrote:
>> Instead, can we show all the relations that are being modified e.g
>> Update on child1, child2, child3. That will disambiguate everything.

> That's an idea, but my concern about that is the cases where there are a 
> large number of child tables as the EXPLAIN would be difficult to read 
> in such cases.

I concur, that would *not* be an improvement in readability.  Moreover,
I don't think it really fixes the issue: what we want to show is a table
name in Modify that matches what the user wrote in the query.  Given that
context, the user should be able to understand why some tables are listed
below that and others are not.

IIRC, this code was written at a time when we didn't have NO INHERIT check
constraints and so it was impossible for the parent table to get optimized
away while leaving children.  So the comment in ExplainModifyTarget was
good at the time.  But it no longer is.

I think your basic idea of preserving the original parent table's relid
is correct; but instead of doing it like this patch does, I'd be inclined
to make ModifyTable inherit from Scan not Plan, and use the scan.scanrelid
field to carry the parent RTI.  Then you would probably end up with a net
savings of code rather than net addition; certainly ExplainModifyTarget
would go away entirely since you'd just treat ModifyTable like any other
Scan in this part of EXPLAIN.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier
On Fri, Feb 6, 2015 at 6:35 PM, Syed, Rahila wrote:
> The compression patch can  use the latest interface 
> MemoryContextAllocExtended to proceed without compression when sufficient 
> memory is not available for
> scratch buffer.
> The attached patch introduces OutOfMem flag which is set on when 
> MemoryContextAllocExtended returns NULL .

TBH, I don't think that brings much as this allocation is done once
and process would surely fail before reaching the first code path
doing a WAL record insertion. In any case, OutOfMem is useless, you
could simply check if compression_scratch is NULL when assembling a
record.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeType internal use

2015-02-06 Thread Tom Lane
Amit Langote  writes:
> I wonder why I cannot find a way to get a range type for a given (sub-)
> type. I would like to build a RangeType from Datum's of lower and upper
> bounds. Much like how construct_array() builds an ArrayType from a Datum
> array of elements given elements' type info.

> Is there some way I do not seem to know? If not, would it be worthwhile
> to make something like construct_range() that returns a RangeType given
> Datum's of lower and upper bounds and subtype info?

There is no good reason to assume that a range type exists at all, much
less that it is unique for a subtype.  (Read the CREATE TYPE documentation
if you're unclear as to why not.)  You have not said what you're trying to
accomplish, but this seems like a dead end.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-06 Thread Thom Brown
On 29 January 2015 at 23:38, Peter Geoghegan  wrote:

> On Sat, Jan 17, 2015 at 6:48 PM, Peter Geoghegan  wrote:
> > I continued with this since posting V2.0.
>
> Attached version (V2.1) fixes bit-rot caused by the recent changes by
> Stephen ("Fix column-privilege leak in error-message paths"). More
> precisely, it is rebased on top of today's 17792b commit.
>

Patch 0002 no longer applies due to a conflict in
src/backend/executor/execUtils.c.

Thom


Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Amit Kapila
On Thu, Jan 22, 2015 at 10:30 AM, Amit Kapila 
wrote:
>
> On Thu, Jan 22, 2015 at 6:37 AM, Kouhei Kaigai 
wrote:
> >
> > (Please point out me if my understanding is incorrect.)
> >
> > What happen if dynamic background worker process tries to reference
temporary
> > tables? Because buffer of temporary table blocks are allocated on
private
> > address space, its recent status is not visible to other process unless
it is
> > not flushed to the storage every time.
> >
> > Do we need to prohibit create_parallelscan_paths() to generate a path
when
> > target relation is temporary one?
> >
>
> Yes, we need to prohibit parallel scans on temporary relations.  Will fix.
>

Here is the latest patch which fixes reported issues and supported
Prepared Statements and Explain Statement for parallel sequential
scan.

The main purpose is to get the feedback if possible on overall
structure/design of code before I goahead.

Note -
a. it is still based on parallel-mode-v1 [1] patch of Robert.
b. based on CommitId - fd496129 [on top of this commit, apply
Robert's patch and then the attached patch]
c. just build and tested on Windows, my linux box has some
problem, will fix that soon and verify this on linux as well.

[1]
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Syed, Rahila

>In any case, those things have been introduced by what I did in previous 
>versions... And attached is a new patch.
Thank you for feedback.

>   /* allocate scratch buffer used for compression of block images */
>+  if (compression_scratch == NULL)
>+  compression_scratch = MemoryContextAllocZero(xloginsert_cxt,
>+  
> BLCKSZ);
 >}
The compression patch can  use the latest interface MemoryContextAllocExtended 
to proceed without compression when sufficient memory is not available for 
scratch buffer. 
The attached patch introduces OutOfMem flag which is set on when 
MemoryContextAllocExtended returns NULL . 

Thank you,
Rahila Syed

-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 12:46 AM
To: Syed, Rahila
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila  wrote:
>>/*
>>+* We recheck the actual size even if pglz_compress() report success,
>>+* because it might be satisfied with having saved as little as one byte
>>+* in the compressed data.
>>+*/
>>+   *len = (uint16) compressed_len;
>>+   if (*len >= orig_len - 1)
>>+   return false;
>>+   return true;
>>+}
>
> As per latest code ,when compression is 'on' we introduce additional 2 bytes 
> in the header of each block image for storing raw_length of the compressed 
> block.
> In order to achieve compression while accounting for these two additional 
> bytes, we must ensure that compressed length is less than original length - 2.
> So , IIUC the above condition should rather be
>
> If (*len >= orig_len -2 )
> return false;
> return true;
> The attached patch contains this. It also has a cosmetic change-  renaming 
> compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down, I 
noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up all those 
sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast < orig_len - 2
This comment block should be reworked, and misses a dot at its end. I rewrote 
it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in previous 
versions... And attached is a new patch.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-06 Thread Daniel Bausch
Hi David and others!

David Fetter  writes:

> On Tue, Jan 27, 2015 at 08:02:37AM +0100, Daniel Bausch wrote:
>> 
>> Tom Lane  writes:
>> 
>> >> Wait for first IO, issue second IO request
>> >> Compute
>> >> Already have second IO request, issue third
>> >> ...
>> >
>> >> We'd be a lot less sensitive to IO latency.
>> >
>> > It would take about five minutes of coding to prove or disprove this:
>> > stick a PrefetchBuffer call into heapgetpage() to launch a request for the
>> > next page as soon as we've read the current one, and then see if that
>> > makes any obvious performance difference.  I'm not convinced that it will,
>> > but if it did then we could think about how to make it work for real.
>> 
>> Sorry for dropping in so late...
>> 
>> I have done all this two years ago.  For TPC-H Q8, Q9, Q17, Q20, and Q21
>> I see a speedup of ~100% when using IndexScan prefetching + Nested-Loops
>> Look-Ahead (the outer loop!).
>> (On SSD with 32 Pages Prefetch/Look-Ahead + Cold Page Cache / Small RAM)
>
> Would you be so kind as to pass along any patches (ideally applicable
> to git master), tests, and specific measurements you made?

Attached find my patches based on the old revision
36f4c7843cf3d201279855ed9a6ebc1deb3c9463
(Adjust cube.out expected output for new test queries.)

I did not test applicability against HEAD by now.

Disclaimer: This was just a proof-of-concept and so is poor
implementation quality.  Nevertheless, performance looked promising
while it still needs a lot of extra rules for special cases, like
detecting accidential sequential scans.  General assumption is: no
concurrency - a single query owning the machine.

Here is a comparison using dbt3.  Q8, Q9, Q17, Q20, and Q21 are
significantly improved.

| |   baseline |  indexscan | indexscan+nestloop |
| || patch 1+2  | patch 3|
|-+++|
| Q1  |  76.124261 |  73.165161 |  76.323119 |
| Q2  |   9.676956 |  11.211073 |  10.480668 |
| Q3  |  36.836417 |  36.268022 |  36.837226 |
| Q4  |  48.707501 |64.2255 |  30.872218 |
| Q5  |  59.371467 |  59.205048 |  58.646096 |
| Q6  |  70.514214 |  73.021006 |   72.64643 |
| Q7  |  63.667594 |  63.258499 |  62.758288 |
| Q8  |  70.640973 |  33.144454 |  32.530732 |
| Q9  | 446.630473 | 379.063773 | 219.926094 |
| Q10 |  49.616125 |  49.244744 |  48.411664 |
| Q11 |   6.122317 |   6.158616 |   6.160189 |
| Q12 |  74.294292 |  87.780442 |  87.533936 |
| Q13 |   32.37932 |  32.771938 |  33.483444 |
| Q14 |  47.836053 |  48.093996 |   47.72221 |
| Q15 | 139.350038 | 138.880208 | 138.681336 |
| Q16 |  12.092429 |  12.120661 |  11.668971 |
| Q17 |   9.346636 |   4.106042 |   4.018951 |
| Q18 |  66.106875 | 123.754111 | 122.623193 |
| Q19 |  22.750504 |  23.191532 |   22.34084 |
| Q20 |  80.481986 |  29.906274 |   28.58106 |
| Q21 | 396.897269 |  355.45988 |  214.44184 |
| Q22 |   6.834841 |   6.600922 |   6.524032 |

Regards,
Daniel
-- 
MSc. Daniel Bausch
Research Assistant (Computer Science)
Technische Universität Darmstadt
http://www.dvs.tu-darmstadt.de/staff/dbausch
>From 569398929d899100b769abfd919bc3383626ac9f Mon Sep 17 00:00:00 2001
From: Daniel Bausch 
Date: Tue, 22 Oct 2013 15:22:25 +0200
Subject: [PATCH 1/4] Quick proof-of-concept for indexscan prefetching

This implements a prefetching queue of tuples whose tid is read ahead.
Their block number is quickly checked for random properties (not current
block and not the block prefetched last).  Random reads are prefetched.
Up to 32 tuples are considered by default.  The tids are queued in a
fixed ring buffer.

The prefetching is implemented in the generic part of the index scan, so
it applies to all access methods.
---
 src/backend/access/index/indexam.c | 96 ++
 src/include/access/relscan.h   | 12 +
 2 files changed, 108 insertions(+)

diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b878155..1c54ef5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -251,6 +251,12 @@ index_beginscan(Relation heapRelation,
 	scan->heapRelation = heapRelation;
 	scan->xs_snapshot = snapshot;
 
+#ifdef USE_PREFETCH
+	scan->xs_prefetch_head = scan->xs_prefetch_tail = -1;
+	scan->xs_last_prefetch = -1;
+	scan->xs_done = false;
+#endif
+
 	return scan;
 }
 
@@ -432,6 +438,55 @@ index_restrpos(IndexScanDesc scan)
 	FunctionCall1(procedure, PointerGetDatum(scan));
 }
 
+static int
+index_prefetch_queue_space(IndexScanDesc scan)
+{
+	if (scan->xs_prefetch_tail < 0)
+		return INDEXSCAN_PREFETCH_COUNT;
+
+	Assert(scan->xs_prefetch_head >= 0);
+
+	return (INDEXSCAN_PREFETCH_COUNT
+			- (scan->xs_prefetch_tail - scan->xs_prefetch_head + 1))
+		% INDEXSCAN_PREFETCH_COUNT;
+}
+
+/* makes copy of ItemPoint

Re: [HACKERS] Possible problem with pgcrypto

2015-02-06 Thread Jan Wieck

On 02/05/2015 02:15 PM, Jan Wieck wrote:

On 02/05/2015 01:18 PM, Marko Tiikkaja wrote:



  "pgcrypto bug"


That doesn't look too good, but I can't reproduce it against 9.3.6 either.


Attached is an improved script and the final output from it. I ran it 
over night and it did not reproduce the pgcrypto bug message, but 
instead the expected "Corrupt data" plus one, that I haven't seen before 
which is "Unsupported compression algorithm".



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


pgcrypto_test2.sh
Description: application/shellscript
select * from pgp_test2 order by 1;
 id |  
vdata   
|  vkey   |  verrmsg  
+--+-+---
  1 | 
\xc30d040703027123de71fa32937175d23a01cc0377628c3b58119e4c8e51804b74cccf961c8b500b8a283db7084ed809833f9bfd7827b70cf06aa0254707e1c08b3db8419e6e2eda697637
 | key_1   | ERROR:  Wrong key or corrupt data
  2 | 
\xc30d040703027123de71fa32937175d23a01cc0377628c3b58119e4c8e51804b74cccf961c8b500b8a283db7084ed809833f9bfd7827b70cf06aa0254707e1c08b3db8419e6e2eda697637
 | key_686 | ERROR:  Not text data
  3 | 
\xc30d04070302b9688f5b4b1db6bb77d23a01ee1cad33770344503c564496fb6463d9b11a30013424fcae8b80cfbafba415ba3c7047aec7499b8e74254069b29390990c0f2a34740a7d2085
 | key_972 | ERROR:  Unsupported compression algorithm
  4 | 
\xc30d04070302c011ec7879402f4178d23a0165d2462bb4e6a7a204f8af20fc0cfadfc72f6a55902f013479697f316659bbc028ddfe624578e462aee3279b65f3e66f0993305d378ae35593
 | key_816 | ERROR:  Corrupt data
(4 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup, tablespace mapping and path canonicalization

2015-02-06 Thread Robert Haas
On Thu, Feb 5, 2015 at 10:21 PM, Ian Barwick  wrote:
> I stumbled on what appears to be inconsistent handling of double slashes
> in tablespace paths when using pg_basebackup with the -T/--tablespace-mapping
> option:
>
> ibarwick:postgresql (master)$ mkdir /tmp//foo-old
[...]
> The attached patch adds the missing canonicalization; I can't see any
> reason not to do this. Thoughts?

Seems OK to me.  Anyone think differently?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 5:51 AM, Magnus Hagander  wrote:
> So in an attempt to actually move this forward in a constructive way I'm
> going to ignore  a bunch of what happened after this email, and fork the
> discussion at this point.

Thanks, and I probably owe you an apology for some of that, so, sorry
about that.

I think the core of the problem here is that the old application saw
its goal in life as *summarizing* the thread.  The idea is that people
would go in and add comments (which could be flagged as comment,
patch, or review) pointing to particularly important messages in the
discussion.  The problem with this is that it had to be manually
updated, and some people didn't like that.[1]  The new app attaches
the entire thread, which has the advantage that everything is always
there.  The problem with that is that the unimportant stuff is there,
too, and there's no way to mark the important stuff so that you can
distinguish between that and the unimportant stuff.  I think that's
the problem we need to solve.

One thing that would probably *help* is if the list of attachments
mentioned the names of the files that were attached to each message
rather than just noting that they have some kind of attachment.  If
people name their attachments sensibly, then you'll be able to
distinguish parallel-seqscan-v23.patch from
test-case-that-breaks-parallel-seqscan.sql, and that would be nice.
But that doesn't help with, say, distinguishing useful reviews from
general discussion on the thread.  I don't think there's any way to do
that in an automated way, so it's got to be something that somebody
does manually.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] My personal opinion is that this complaint was misguided.  I have
yet to see an issue tracker that didn't require scads of manual work
to keep the information in it relevant, and ours had a higher
signal-to-noise ratio than many I've had to use professionally.  That
having been said, I understand the frustration.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Robert Haas
On Fri, Feb 6, 2015 at 5:55 AM, Magnus Hagander  wrote:
> On Mon, Jan 26, 2015 at 10:29 PM, Robert Haas  wrote:
>> (While I'm complaining, the links only go to the "flat" version of the
>> thread, while I happen to prefer the version that shows one message at
>> a time with a message-ID selector to switch messages.)
>
> Then you're clicking the wrong link :) As long as you click the link for the
> message (first or last) you get the "regular" view. The "thread" link always
> goes to the same message as the "first" one, except the flat view. So you
> have a choice of both (and these links are both always visible, none of them
> is in the "collapsed" area which holds the attachments)

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-02-06 Thread Fujii Masao
On Fri, Feb 6, 2015 at 1:23 PM, Michael Paquier
 wrote:
> On Thu, Feb 5, 2015 at 11:58 PM, Michael Paquier wrote:
>> An updated patch is attached.
> I just noticed that the patch I sent was incorrect:
> - Parameter name was still wal_availability_check_interval and not
> wal_retrieve_retry_interval
> - Documentation was incorrect.
> Please use the patch attached instead for further review.

Thanks for updating the patch!

timestamp.c:1708: warning: implicit declaration of function
'HandleStartupProcInterrupts'

I got the above warning at the compilation.

+pg_usleep(wait_time);
+HandleStartupProcInterrupts();
+total_time -= wait_time;

It seems strange to call the startup-process-specific function in
the "generic" function like TimestampSleepDifference().

 * 5. Sleep 5 seconds, and loop back to 1.

In WaitForWALToBecomeAvailable(), the above comment should
be updated.

- * Wait for more WAL to arrive. Time out after 5 seconds,
- * like when polling the archive, to react to a trigger
- * file promptly.
+ * Wait for more WAL to arrive. Time out after
the amount of
+ * time specified by wal_retrieve_retry_interval, like
+ * when polling the archive, to react to a
trigger file promptly.
  */
 WaitLatch(&XLogCtl->recoveryWakeupLatch,
   WL_LATCH_SET | WL_TIMEOUT,
-  5000L);
+  wal_retrieve_retry_interval * 1000L);

This change can prevent the startup process from reacting to
a trigger file. Imagine the case where the large interval is set
and the user want to promote the standby by using the trigger file
instead of pg_ctl promote. I think that the sleep time should be 5s
if the interval is set to more than 5s. Thought?

+# specifies an optional internal to wait for WAL to become available when

s/internal/interval?

+This parameter specifies the amount of time to wait when a failure
+occurred when reading WAL from a source (be it via streaming
+replication, local pg_xlog or WAL archive) for a node
+in standby mode, or when WAL is expected from a source.

At least to me, it seems not easy to understand what the parameter is
from this description. What about the following, for example?

This parameter specifies the amount of time to wait when WAL is not
available from any sources (streaming replication, local
pg_xlog or WAL archive) before retrying to retrieve WAL

Isn't it better to place this parameter in postgresql.conf rather than
recovery.conf? I'd like to change the value of this parameter without
restarting the server.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier
On Fri, Feb 6, 2015 at 4:30 PM, Michael Paquier wrote:
> On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e., uncompressed)
>> must be BLCKSZ, so we don't need to save the original size in
>> the extra bytes.
>
> Yes, we would need a additional bit to identify that. We could steal
> it from length in XLogRecordBlockImageHeader.
>
>> Furthermore, when fpw compression is disabled and the hole length
>> is zero, we seem to be able to save one byte from the header of
>> backup block. Currently we use 4 bytes for the header, 2 bytes for
>> the length of backup block, 15 bits for the hole offset and 1 bit for
>> the flag indicating whether block is compressed or not. But in that case,
>> the length of backup block doesn't need to be stored because it must
>> be BLCKSZ. Shouldn't we optimize the header in this way? Thought?
>
> If we do it, that's something to tackle even before this patch on
> HEAD, because you could use the 16th bit of the first 2 bytes of
> XLogRecordBlockImageHeader to do necessary sanity checks, to actually
> not reduce record by 1 byte, but 2 bytes as hole-related data is not
> necessary. I imagine that a patch optimizing that wouldn't be that
> hard to write as well.

Actually, as Heikki pointed me out... A block image is 8k and pages
without holes are rare, so it may be not worth sacrificing code
simplicity for record reduction at the order of 0.1% or smth like
that, and the current patch is light because it keeps things simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier
On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e., uncompressed)
> must be BLCKSZ, so we don't need to save the original size in
> the extra bytes.

Yes, we would need a additional bit to identify that. We could steal
it from length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length
> is zero, we seem to be able to save one byte from the header of
> backup block. Currently we use 4 bytes for the header, 2 bytes for
> the length of backup block, 15 bits for the hole offset and 1 bit for
> the flag indicating whether block is compressed or not. But in that case,
> the length of backup block doesn't need to be stored because it must
> be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on
HEAD, because you could use the 16th bit of the first 2 bytes of
XLogRecordBlockImageHeader to do necessary sanity checks, to actually
not reduce record by 1 byte, but 2 bytes as hole-related data is not
necessary. I imagine that a patch optimizing that wouldn't be that
hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compressed = true;
> +}
>
> You can refactor XLogCompressBackupBlock() and move all the
> above code to it for more simplicity.

Sure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Andres Freund
On 2015-02-06 11:51:50 +0100, Magnus Hagander wrote:
> So in an attempt to actually move this forward in a constructive way I'm
> going to ignore  a bunch of what happened after this email, and fork the
> discussion at this point.

Sounds good.

> First of all - assuming we'lI fix this particular thing. Are there *other*
> things that you would also say makes the tool "essentially unusable"?

I personally don't have any really fundamental complaints but this
one. I'd like to be able to add some CF app only annotations to the
whole patch, but that's not that important.

> What kind of annotation are we actually talking about here?

Well, to explain where, at least I, am coming from, let's have a look at
an example:
https://commitfest.postgresql.org/4/17/

a) The linked messages aren't going to help either the author, a
   committer or prospective reviewers to find existing reviews. And
   looking through the whole thread isn't realistic at that message
   count. And all of them need to.
b) The 'latest attachement' isn't actually the patch that's supposed to
   be reviewed. The first patch is completely outdated. So one would
   need to grovel through all the linked patches.
c) By just looking at the CF entry I have no clue what the status of the
   patch actually is. Sure it "Needs review", but that can mean minor
   changes or major reworks. In the old app the comments to the patches
   and reviews could say things like "major issues identified,
   significant work needed", "some cosmetic issues remain", "new version
   of patch with major redesign" or "rebased patch without other changes".

I don't think any of these questions can be answered by improved
automatisms alone.

This really doesn't matter much on a 5-10 message thread about a simple
patch. But we regularly have thread with hundreds of messages.


Makes sense?


> Are you looking for something that's basically "star a message" (like
> you'd do in a MUA in a lot of cases)? Or "assign tag" (from a set of -
> editable of course - pre-existing tags). Or a complete free-text
> field?

I think it has to be free form. If you look at the above, it's hard to
do that sanely with tags.

> Then, what do you need to be able to annotate on?

I think both individual messages and the CF entry itself, without an
email, make sense.

> Right now the app only shows messages that actually have attachments on
> them, assuming that you'd want to read the whole thread if you want to look
> at other things.

I don't think anybody can realistically do that on more complex
patches. Often the discussion in the beginning has little bearance on
later state, so even if you had the time to do so, it'd better be spent
doing something else.

> Are you saying you want to be able to attach an annotation to *any*
> message in the thread?

Yes.

> If so, we'd have to pull in and show every single message on the
> thread in the UI - I'm thinking that might become more than a little
> cluttered in a lot of cases.

Well, we only would need to show commented on messages in the UI when
viewing the entry.

> If that is what you're talking about, any
> suggestions for how to actually make that into an UI that's not that
> cluttered?

The simplest seems to be to simply allow comments using the message id
:(. I've no really good idea how to allow to select the message to
comment on. Even if we find something else, it should probably be at
least an option - it certainly is quicker for me to grab the message id
of an email I just read or wrote and paste it than pretty much any other
method.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup may fail to send feedbacks.

2015-02-06 Thread Fujii Masao
On Fri, Feb 6, 2015 at 3:22 PM, Kyotaro HORIGUCHI
 wrote:
> Sorry, I misunderstood that.
>
>> > At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao  
>> > wrote in 
>> > 
>> >> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> >> > I'm very sorry for confused report. The problem found in 9.4.0
>> >> > and the diagnosis was mistakenly done on master.
>> >> >
>> >> > 9.4.0 has no problem of feedback delay caused by slow xlog
>> >> > receiving on pg_basebackup mentioned in the previous mail. But
>> >> > the current master still has this problem.
>> >>
>> >> Seems walreceiver has the same problem. No?
>> >
>> > pg_receivexlog.c would have the same problem since it uses the
>> > same function with pg_basebackup.c.
>> >
>> > The correspondent of HandleCopyStream in wansender is
>> > WalReceiverMain, and it doesn't seem to have the same kind of
>> > loop shown below. It seems to surely send feedback per one
>> > record.
>> >
>> > |   r = stream_reader();
>> > |   while (r > 0)
>> > |   {
>> > |  ... wal record processing stuff without sending feedback..
>> > |  r = stream_reader();
>> > |   }
>>
>> WalReceiverMain() has the similar code as follows.
>>
>> len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
>> if (len != 0)
>> {
>> for (;;)
>> {
>> if (len > 0)
>> {
>> 
>> len = walrcv_receive(0, &buf);
>> }
>> }
>
> The loop seems a bit different but eventually the same about this
> discussion.
>
> 408> len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
> 409> if (len != 0)
> 410> {
> 415> for (;;)
> 416> {
> 417> if (len > 0)
> 418> {
> 425>XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
> 426> }
> 427-438> else {break;}
> 439> len = walrcv_receive(0, &buf);
> 440> }
> 441> }
>
> XLogWalRcvProcessMsg doesn't send feedback when processing
> 'w'=WAL record packet. So the same thing as pg_basebackup and
> pg_receivexlog will occur on walsender.
>
> Exiting the for(;;) loop by TimestampDifferenceExceeds just
> before line 439 is an equivalent measure to I poposed for
> receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
> is seemingly simpler (but I feel a bit uncomfortable for the
> latter)

I'm concerned about the performance degradation by calling
getimeofday() per a record.

> Or other measures?

Introduce the maximum number of records that we can receive and
process between feedbacks? For example, we can change
XLogWalRcvSendHSFeedback() so that it's called per at least
8 records. I'm not sure what number is good, though...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Fujii Masao
On Fri, Feb 6, 2015 at 4:15 AM, Michael Paquier
 wrote:
> On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila  wrote:
>>>/*
>>>+* We recheck the actual size even if pglz_compress() report success,
>>>+* because it might be satisfied with having saved as little as one byte
>>>+* in the compressed data.
>>>+*/
>>>+   *len = (uint16) compressed_len;
>>>+   if (*len >= orig_len - 1)
>>>+   return false;
>>>+   return true;
>>>+}
>>
>> As per latest code ,when compression is 'on' we introduce additional 2 bytes 
>> in the header of each block image for storing raw_length of the compressed 
>> block.
>> In order to achieve compression while accounting for these two additional 
>> bytes, we must ensure that compressed length is less than original length - 
>> 2.
>> So , IIUC the above condition should rather be
>>
>> If (*len >= orig_len -2 )
>> return false;

"2" should be replaced with the macro variable indicating the size of
extra header for compressed backup block.

Do we always need extra two bytes for compressed backup block?
ISTM that extra bytes are not necessary when the hole length is zero.
In this case the length of the original backup block (i.e., uncompressed)
must be BLCKSZ, so we don't need to save the original size in
the extra bytes.

Furthermore, when fpw compression is disabled and the hole length
is zero, we seem to be able to save one byte from the header of
backup block. Currently we use 4 bytes for the header, 2 bytes for
the length of backup block, 15 bits for the hole offset and 1 bit for
the flag indicating whether block is compressed or not. But in that case,
the length of backup block doesn't need to be stored because it must
be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

+int page_len = BLCKSZ - hole_length;
+char *scratch_buf;
+if (hole_length != 0)
+{
+scratch_buf = compression_scratch;
+memcpy(scratch_buf, page, hole_offset);
+memcpy(scratch_buf + hole_offset,
+   page + (hole_offset + hole_length),
+   BLCKSZ - (hole_length + hole_offset));
+}
+else
+scratch_buf = page;
+
+/* Perform compression of block */
+if (XLogCompressBackupBlock(scratch_buf,
+page_len,
+regbuf->compressed_page,
+&compress_len))
+{
+/* compression is done, add record */
+is_compressed = true;
+}

You can refactor XLogCompressBackupBlock() and move all the
above code to it for more simplicity.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Magnus Hagander
On Mon, Jan 26, 2015 at 10:16 PM, Robert Haas  wrote:

> On Mon, Jan 26, 2015 at 4:07 PM, Magnus Hagander 
> wrote:
> > I assume what was referred to was that the old cf app would show the
> last 3
> > (I think it was) comments/patches/whatnot on a patch on the summary page
> > (and then clickthrough for more details).
>
> Yep.
>
>
So doing this for whatever activity is under patch->history is trivial. Is
that the kind of information that people would like to see? (this assumes
that making annotations etc per the other mesage in this thread will create
an entry in there of course) Or is it something else?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Magnus Hagander
On Mon, Jan 26, 2015 at 10:29 PM, Robert Haas  wrote:

> (While I'm complaining, the links only go to the "flat" version of the
> thread, while I happen to prefer the version that shows one message at
> a time with a message-ID selector to switch messages.)
>

Then you're clicking the wrong link :) As long as you click the link for
the message (first or last) you get the "regular" view. The "thread" link
always goes to the same message as the "first" one, except the flat view.
So you have a choice of both (and these links are both always visible, none
of them is in the "collapsed" area which holds the attachments)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] New CF app deployment

2015-02-06 Thread Magnus Hagander
On Mon, Jan 26, 2015 at 9:46 PM, Robert Haas  wrote:

> On Mon, Jan 26, 2015 at 3:24 PM, Magnus Hagander 
> wrote:
> > Yes, and the agreement after that feedback was to try it out and then
> figure
> > out what changes were needed? As about half the feedback said it was
> better
> > without and half said it was better with.
>
> Well, I can't speak to anyone else's opinion, but I'm quite sure I
> raised the issue that we need a way to call out which messages in the
> thread are important, and I think that's pretty much what Peter is
> saying, too.  I find the new tool essentially unusable - having one
> link to the whole thread instead of individual links to just the
> important messages in that thread is a huge regression for me, as is
> the lack of the most recent activity on the summary page.  I don't
> know how much more feedback than that you need to be convinced, but
> I'm going to shut up now before I say something I will later regret.
>
>
So in an attempt to actually move this forward in a constructive way I'm
going to ignore  a bunch of what happened after this email, and fork the
discussion at this point.

Since it's pretty clear that several people, many of who are definitely
more productive pg developers than me (and some of the others who
specifically said they did not want this feature), have spoken up about it
*after* the release of it (there's no point in arguing who said what
beforehand), we should look into doing that.


First of all - assuming we'lI fix this particular thing. Are there *other*
things that you would also say makes the tool "essentially unusable"?
Because if there are, can we get that out of the way early? I wouldn't want
to spend a bunch of time fixing this, just to get back into the same round
of argument again, and basically the only thing that's accepted is a
rollback. Because if that's what people want, then let's rollback - I'm not
going to waste time on it in that case. (I'm equally not going to spend
time on fixing anything on the old one anymore, so the next time there's a
security issue or such around it, it will be shut down until someone fixes
it - but that's still better than a tool people don't like to use)


Second, so let's look at how to actually do this.

What kind of annotation are we actually talking about here? Are you looking
for something that's basically "star a message" (like you'd do in a MUA in
a lot of cases)? Or "assign tag" (from a set of - editable of course -
pre-existing tags). Or a complete free-text field?

Then, what do you need to be able to annotate on?

Right now the app only shows messages that actually have attachments on
them, assuming that you'd want to read the whole thread if you want to look
at other things. For this reason, for the message thread itself it only
keeps track of the first and last message (so you can know when it has
updated). Are you saying you want to be able to attach an annotation to
*any* message in the thread? If so, we'd have to pull in and show every
single message on the thread in the UI - I'm thinking that might become
more than a little cluttered in a lot of cases. If that is what you're
talking about, any suggestions for how to actually make that into an UI
that's not that cluttered?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-06 Thread Michael Paquier
On Fri, Feb 6, 2015 at 1:46 PM, Heikki Linnakangas wrote:
> It simplifies the code to do all the sleeping and interrupt handling code in
> the upper level, in secure_[read|write]. Do you see a problem with it?
Not directly. Reading the code I got uneasy with the fact that we fact
unconditionally those parameters even if port is in block mode.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-06 Thread Andres Freund
On 2015-02-05 16:45:50 +0200, Heikki Linnakangas wrote:
> Looking again at the code after Andres' interrupt-handling patch series, I
> got confused by the fact that there are several wait-retry loops in
> different layers, and reading and writing works slightly differently.

They don't really work all that differently?

> I propose the attached, which pulls all the wait-retry logic up to
> secure_read() and secure_write(). This makes the code a lot more
> understandable.

Generally a good idea. Especially if we get more ssl implementations.

But I think it'll make the already broken renegotiation handling even
worse. Because now we're always entering do_handshake in nonblocking
mode (essentially).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-06 Thread Heikki Linnakangas

On 02/06/2015 10:38 AM, Michael Paquier wrote:

On Thu, Feb 5, 2015 at 6:45 PM, Heikki Linnakangas wrote:

Looking again at the code after Andres' interrupt-handling patch series, I
got confused by the fact that there are several wait-retry loops in
different layers, and reading and writing works slightly differently.

I propose the attached, which pulls all the wait-retry logic up to
secure_read() and secure_write(). This makes the code a lot more
understandable.


Are you sure that it is a good idea to move the check of port->noblock
out of be_tls_[read|write] to an upper level? ISTM that we should set
errno and n to respectively EWOULDBLOCK and -1 in be_tls_[write|read]
when port->noblock and do nothing otherwise. In your patch those
values are set even if the port is in block mode.


It simplifies the code to do all the sleeping and interrupt handling 
code in the upper level, in secure_[read|write]. Do you see a problem 
with it?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-02-06 Thread Andreas Karlsson

On 02/06/2015 08:16 AM, Michael Paquier wrote:

On Fri, Jan 30, 2015 at 10:26 PM, Andreas Karlsson wrote:

On 01/30/2015 07:48 AM, Michael Paquier wrote:

Looking at the latest patch, it seems that in
AlterTableGetLockLevel@tablecmds.c we ought to put AT_ReAddConstraint,
AT_AddConstraintRecurse and AT_ProcessedConstraint under the same
banner as AT_AddConstraint. Thoughts?


Good point. I think moving those would be a good thing even though it is
technically not necessary for AT_AddConstraintRecurse, since that one should
only be related to check constraints.


Andreas, are you planning to send an updated patch?


Yes, I will hopefully send it later today or tomorrow.

Andreas




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-06 Thread Ashutosh Bapat
Well let's see what others think. Also, we might want to separate that
information on result relations heading probably.

On Fri, Feb 6, 2015 at 1:35 PM, Etsuro Fujita 
wrote:

> Hi Ashutosh,
>
> Thank you for the review!
>
>
> On 2015/02/03 15:32, Ashutosh Bapat wrote:
>
>> I agree that it's a problem, and it looks more severe when there are
>> multiple children
>> postgres=# create table parent (a int check (a < 0) no inherit);
>> CREATE TABLE
>> postgres=# create table child1 (a int check (a >= 0));
>> CREATE TABLE
>> postgres=# create table child2 (a int check (a >= 0));
>> CREATE TABLE
>> postgres=# create table child3 (a int check (a >= 0));
>> CREATE TABLE
>> postgres=# alter table child1 inherit parent;
>> ALTER TABLE
>> postgres=# alter table child2 inherit parent;
>> ALTER TABLE
>> postgres=# alter table child3 inherit parent;
>> ALTER TABLE
>> postgres=# explain update parent set a = a * 2 where a >= 0;
>> QUERY PLAN
>> 
>>   Update on child1  (cost=0.00..126.00 rows=2400 width=10)
>> ->  Seq Scan on child1  (cost=0.00..42.00 rows=800 width=10)
>>   Filter: (a >= 0)
>> ->  Seq Scan on child2  (cost=0.00..42.00 rows=800 width=10)
>>   Filter: (a >= 0)
>> ->  Seq Scan on child3  (cost=0.00..42.00 rows=800 width=10)
>>   Filter: (a >= 0)
>> (7 rows)
>>
>> It's certainly confusing why would an update on child1 cause scan on
>> child*.
>>
>
> Yeah, I think so too.
>
>  But I also think that showing parent's name with Upate would be
>> misleading esp. when user expects it to get filtered because of
>> constraint exclusion.
>>
>> Instead, can we show all the relations that are being modified e.g
>> Update on child1, child2, child3. That will disambiguate everything.
>>
>
> That's an idea, but my concern about that is the cases where there are a
> large number of child tables as the EXPLAIN would be difficult to read in
> such cases.
>
> Best regards,
> Etsuro Fujita
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-06 Thread Michael Paquier
On Thu, Feb 5, 2015 at 6:45 PM, Heikki Linnakangas wrote:
> Looking again at the code after Andres' interrupt-handling patch series, I
> got confused by the fact that there are several wait-retry loops in
> different layers, and reading and writing works slightly differently.
>
> I propose the attached, which pulls all the wait-retry logic up to
> secure_read() and secure_write(). This makes the code a lot more
> understandable.

Are you sure that it is a good idea to move the check of port->noblock
out of be_tls_[read|write] to an upper level? ISTM that we should set
errno and n to respectively EWOULDBLOCK and -1 in be_tls_[write|read]
when port->noblock and do nothing otherwise. In your patch those
values are set even if the port is in block mode.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-02-06 Thread Etsuro Fujita

Hi Ashutosh,

On 2015/02/03 16:44, Ashutosh Bapat wrote:

I am having some minor problems running this repro



[Terminal 2]
postgres=# create foreign table ft (a int) server loopback options
(table_name 'lbt');



There isn't any table "lbt" mentioned here. Do you mean "t" here?


Sorry, my explanation was not enough.  "lbt" means a foreign table named 
"lbt" defined on a foreign server named "loopback".  It'd be defined eg, 
in the following manner:


$ createdb efujita

efujita=# create table lbt (a int);
CREATE TABLE

postgres=# create server loopback foreign data wrapper postgres_fdw 
options (dbname 'efujita');

CREATE SERVER
postgres=# create user mapping for current_user server loopback;
CREATE USER MAPPING
postgres=# create foreign table ft (a int) server loopback options 
(table_name 'lbt');

CREATE FOREIGN TABLE

Thanks for the review!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-02-06 Thread Kyotaro HORIGUCHI
Hello,

> Redshift has a table, stv_inflight, which serves about the same purpose as
> pg_stat_activity. Redshift seems to perform better with very high fetch
> sizes (10,000 is a good start), so the default foreign data wrapper didn't
> perform so well.

I agree with you.

> I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> the query, which is normal highly unhelpful, but in this case it confirms
> that tables created in redshift_server do by default use the fetch_size
> option given during server creation.
> 
> I was also able to confirm that the second query showed "FETCH 151 FROM c1"
> as the query, which shows that per-table overrides also work.
> 
> I'd be happy to stop here, but Kyotaro might feel differently.

This is enough in its own way, of course.

> With this
> limited patch, one could estimate the number of rows that would fit into
> the desired memory block based on the row width and set fetch_size
> accordingly.

The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.

On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.

This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.

> But we could go further, and have a fetch_max_memory option only at the
> table level, and the fdw could do that same memory / estimated_row_size
> calculation and derive fetch_size that way *at table creation time*, not
> query time.

We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.

However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the "overhead"
argument is prohibitive even for me:(

> Thanks to Kyotaro and Bruce Momjian for their help.

Not at all.

regardes,


At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker  
wrote in 
> I applied this patch to REL9_4_STABLE, and I was able to connect to a
> foreign database (redshift, actually).
> 
> the basic outline of the test is below, names changed to protect my
> employment.
> 
> create extension if not exists postgres_fdw;
> 
> create server redshift_server foreign data wrapper postgres_fdw
> options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
> fetch_size '150' );
> 
> create user mapping for public server redshift_server options ( user
> 'redacted_user', password 'comeonyouarekiddingright' );
> 
> create foreign table redshift_tab150 (  )
> server redshift_server options (table_name 'redacted_table', schema_name
> 'redacted_schema' );
> 
> create foreign table redshift_tab151 (  )
> server redshift_server options (table_name 'redacted_table', schema_name
> 'redacted_schema', fetch_size '151' );
> 
> -- i don't expect the fdw to push the aggregate, this is just a test to see
> what query shows up in stv_inflight.
> select count(*) from redshift_ccp150;
> 
> -- i don't expect the fdw to push the aggregate, this is just a test to see
> what query shows up in stv_inflight.
> select count(*) from redshift_ccp151;
> 
> 
> For those of you that aren't familiar with Redshift, it's a columnar
> database that seems to be a fork of postgres 8.cough. You can connect to it
> with modern libpq programs (psql, psycopg2, etc).
> Redshift has a table, stv_inflight, which serves about the same purpose as
> pg_stat_activity. Redshift seems to perform better with very high fetch
> sizes (10,000 is a good start), so the default foreign data wrapper didn't
> perform so well.
> 
> I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> the query, which is normal highly unhelpful, but in this case it confirms
> that tables created in redshift_server do by default use the fetch_size
> option given during server creation.
> 
> I was also able to confirm that the second query showed "FETCH 151 FROM c1"
> as the query, which shows that per-table overrides also work.
> 
> I'd be happy to stop here, but Kyotaro might feel differently. With this
> limited patch, one could estimate the number of rows that would fit into
> the desired memory block based on the row width and set fetch_size
> accordingly.
> 
> But we could go further, and have a fetch_max_memory option only at the
> table level, and the fdw could do that same memory / estimated_row_size
> calculation and derive fetch_size that way *at table creation time*, not
> query time.
> 
> Thanks to Kyotaro and Bruce Momjian for their help.
> 
> 
> 
> 
> 
> 
> On Mon, Feb 2, 20

Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-06 Thread Etsuro Fujita

Hi Ashutosh,

Thank you for the review!

On 2015/02/03 15:32, Ashutosh Bapat wrote:

I agree that it's a problem, and it looks more severe when there are
multiple children
postgres=# create table parent (a int check (a < 0) no inherit);
CREATE TABLE
postgres=# create table child1 (a int check (a >= 0));
CREATE TABLE
postgres=# create table child2 (a int check (a >= 0));
CREATE TABLE
postgres=# create table child3 (a int check (a >= 0));
CREATE TABLE
postgres=# alter table child1 inherit parent;
ALTER TABLE
postgres=# alter table child2 inherit parent;
ALTER TABLE
postgres=# alter table child3 inherit parent;
ALTER TABLE
postgres=# explain update parent set a = a * 2 where a >= 0;
QUERY PLAN

  Update on child1  (cost=0.00..126.00 rows=2400 width=10)
->  Seq Scan on child1  (cost=0.00..42.00 rows=800 width=10)
  Filter: (a >= 0)
->  Seq Scan on child2  (cost=0.00..42.00 rows=800 width=10)
  Filter: (a >= 0)
->  Seq Scan on child3  (cost=0.00..42.00 rows=800 width=10)
  Filter: (a >= 0)
(7 rows)

It's certainly confusing why would an update on child1 cause scan on child*.


Yeah, I think so too.


But I also think that showing parent's name with Upate would be
misleading esp. when user expects it to get filtered because of
constraint exclusion.

Instead, can we show all the relations that are being modified e.g
Update on child1, child2, child3. That will disambiguate everything.


That's an idea, but my concern about that is the cases where there are a 
large number of child tables as the EXPLAIN would be difficult to read 
in such cases.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers