Re: [HACKERS] WIP: Rework access method interface

2015-09-26 Thread Amit Kapila
On Fri, Sep 25, 2015 at 7:41 PM, Teodor Sigaev  wrote:

> I'm OK about continuing work on amvalidate if we can build consuensus on
>> its design.
>> Could you give some feedback on amvalidate version of patch please?
>>
>> http://www.postgresql.org/message-id/capphfds8zywenz9vw6te5rzxbol1vu_wsw181veq+mu+v1d...@mail.gmail.com
>>
>
> In attach a bit modified patch based on 4-th version, and if there are no
> strong objections, I will commit it. Waiting this patch stops Alexander's
> development on CREATE ACCESS METHOD and he needs to move forward.
>
>
1.
+InitIndexAmRoutine(Relation relation)
+{
+ IndexAmRoutine *result, *tmp;
+
+ result = (IndexAmRoutine *)MemoryContextAlloc(CacheMemoryContext,
+ sizeof(IndexAmRoutine));
+ tmp = (IndexAmRoutine *)DatumGetPointer(
+ OidFunctionCall0(relation->rd_am->amhandler));
+ memcpy(result, tmp, sizeof(IndexAmRoutine));
+ relation->amroutine = result;
+}

Is it appropriate to use CacheMemoryContext here?
Currently in load_relcache_init_file(), all the other information
like rd_indoption, rd_aminfo in Relation is allocated in indexcxt
which ofcourse in allocated in CacheMemoryContext, but still is
there a reason why this also shouldn't be allocated in same
context.

2.
- aform = (Form_pg_am) MemoryContextAlloc(CacheMemoryContext, sizeof
*aform);
+ aform = (Form_pg_am)MemoryContextAlloc(CacheMemoryContext, sizeof *aform);

Spurious change.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pageinspect patch, for showing tuple data

2015-09-26 Thread Michael Paquier
On Sat, Sep 26, 2015 at 1:46 AM, Nikolay Shaplov wrote:
> В письме от 25 сентября 2015 20:59:29 пользователь Michael Paquier написал:
>> Thanks! I just had a short look at it:
>> - I am not convinced that it is worth declaring 3 versions of
>> tuple_data_split.
> How which of them should we leave?

The one with the most arguments. Now perhaps we could have as well two
of them: one which has the minimum number of arguments with default
values, and the second one with the maximum number of arguments.

>> - The patch does not respect the project code style,
>> particularly one-line "if condition {foo;}" are not adapted, code line is
> limited
>> to 80 characters, etc. The list is basically here:
>> http://www.postgresql.org/docs/current/static/source.html
> I did my best. Results are attached.

Thanks, it looks better.

>> - Be aware of typos: s/whitch/which is one.
> I've run spellchecker on all comments. Hope that I removed most of the
> mistakes. But as I am not native speaker, I will not be able to eliminate them
> all. I will need help here.

I have not spotted new mistakes regarding that.

>> +t_infomask2
>> +integer
>> +stores number of attributes (11 bits of the word). The
>> rest are used for flag bits:
>> +
>> +0x2000 - tuple was updated and key cols modified, or tuple deleted
>> +0x4000 - tuple was HOT-updated
>> +0x8000 - this is heap-only tuple
>> +0xE000 - visibility-related bits (so called "hint bits")
>> +
>> This large chunk of documentation is a duplicate of storage.sgml. If
>> that's really necessary, it looks adapted to me to have more detailed
>> comments at code level directly in heapfuncs.c.
> Hm... There is no explanation of t_infomask/t_infomask2 bits in storage.sgml.
>
> If there is no source of information other then source code, then the
> documentation is not good.

pageinspect is already something that works at low-level. My guess is
that users of this module are going to have a look at the code either
way to understand how tuple data is manipulated within a page.

> So I would consider two options: Either move t_infomask/t_infomask2 details
> into storage.sgml or leave as it is.

The documentation redirects the reader to look at htup_details.h (the
documentation is actually incorrect, I sent a separate patch), and I
see no point in duplicating such low-level descriptions, that would be
double maintenance for the same result.

> I am lazy, and does not feel confidence about touching main documentation, so 
> I
> would prefer to leave as it is.

Your patch is proving the contrary :) Honestly I would just rip out
the part you have added to describe all the fields related to
HeapTupleHeaderData, and have only a simple self-contained example to
show how to use the new function tuple_data_split.

>> The example of tuple_data_split in the docs would be more interesting
>> if embedded with a call to heap_page_items.
>
> This example would be almost not readable. May be I should add one more praise
> explaining where did we take arguments for tuple_data_split

I would as well just remove heap_page_item_attrs, an example in the
docs would be just enough IMO (note that I never mind being outvoted).

Btw, shouldn't the ereport messages in tuple_data_split use
error_level instead of ERROR?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned checkpointing

2015-09-26 Thread Fabien COELHO


Hello,

These are interesting runs.

In a situation in which small values are set in dirty_bytes and 
dirty_backgound_bytes, a buffer is likely stored in the HD immediately 
after the buffer is written in the kernel by the checkpointer. Thus, I 
tried a quick hack to make the checkpointer invoke write system call to 
write a dirty buffer immediately followed by invoking store operation 
for a buffer implemented with sync_file_range() system call. # For 
reference, I attach the patch. As shown in file_sync_range.JPG, this 
strategy considered to have been effective.


Indeed. This approach is part of this current patch:

https://commitfest.postgresql.org/6/260/

Basically, what you do is to call sync_file_range on each block, and you 
tested on a high-end system probably with a lot of BBU disk cache, which I 
guess allows the disk to reorder writes so as to benefit from sequential 
write performance.


In conclusion, as long as pgbench execution against linux concerns, 
using sync_file_range() is a promising solution.


I found that calling sync_file_range for every block could degrade 
performance a bit under some conditions, at least onmy low-end systems 
(just a [raid] disk, no significant disk cache in front of it), so the 
above patch aggregates neighboring writes so as to issue less 
sync_file_range calls.


That is, the checkpointer invokes sync_file_range() to store a buffer 
immediately after it writes the buffer in the kernel.


Yep. It is interesting that sync_file_range alone improves stability a lot 
on your high-end system, although sorting is mandatory for low-end 
systems.


My interpretation, already stated above, is that the hardware does the 
sorting on the cached data at the disk level in your system.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2015-09-26 Thread Fabien COELHO


Here is a v10, which is a rebase because of the "--progress-timestamp" 
option addition.


It also include the fix for the tps without connection computation and 
some minor code simplification, so it is redundant with this bug fix 
patch:


https://commitfest.postgresql.org/7/378/

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Amit Kapila
On Sat, Sep 26, 2015 at 5:52 AM, Robert Haas  wrote:
>
> On Fri, Sep 25, 2015 at 12:00 AM, Amit Kapila 
wrote:
> > I think initPlan will work with the existing patches as we are always
> > executing it in master and then sending the result to workers. Refer
> > below code in funnel patch:
>
> Sure, *if* that's what we're doing, then it will work.  But if an
> initPlan actually attaches below a funnel, then it will break.
>

Currently, it's considered for initPlan of only left tree of Funnel, however
if we want to push multiple nodes under Funnel, then it won't work as
it is.  I think even if we want to make that work, we would need to
traverse the whole tree under Funnel and do what currently is done for
each of the initPlan we encounter.  In general, I think the idea of
passing the results of initPlan to workers seems like the right way
of dealing with initPlans and by the way this was a suggestion made
by you sometime back.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-26 Thread Tomas Vondra

Hi,

On 09/18/2015 03:46 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra  wrote 
in <55fadeeb.4000...@2ndquadrant.com>

Yes, this seems sane. I've been poking at this a bit too, and I came
to the same plan in general, except that I think it's better to build
list of clauses that are *not* implied by the index, because that's
what we need both in cost_index and check_index_only.


I intended to isolate IndexOptInfo from belonging RelOptInfo but
the exclusion list also bonds them tightly, and one IndexOptInfo
belongs to only one RelOptInfo so no need to isolate. So
not-implied-restrictclauses in IndexOptInfo would be preferable.


It also seems to me that this change (arguably a bug fix) should
pretty much make the original patch irrelevant, because
check_index_only can simply walk over the new list.


Yeah. This seems to be a bug irrelevant to your index-only-scan
ptch.



Attached is a patch that makes this work correctly, I believe. The only 
difference compared to the v1 of the patch is this piece of code in 
match_clause_to_index (indexpath.c):


if ((index->indpred != NIL) &&
(predicate_implied_by(lappend(NIL, rinfo->clause), index->indpred)))
continue;

which effectively says that we should ignore clauses implied by the 
index predicate.


This fixes the way we build IndexClauseSet for the two indexes - 
originally we'd add the implied clause on the large index but not on the 
small one (because it does not have the column referenced in the 
condition). And as far as I can say, this also fixes all the costing 
issues that I've described because cost_index sees the same thing for 
both indexes.


It's also early enough to fix the actual path, so EXPLAIN shows the same 
thing for both indexes (so no "Index Cond" present for only one of the 
indexes).


I've originally tried to fix this in extract_nonindex_conditions, but 
that's way too late - it can only fix the costing, not the path.


Arguably this part of the patch is a bugfix of an issue present on master.

The patch does not change the check_index_only implementation - it still 
needs to check the clauses, just like in v1 of the patch. To make this 
re-check unnecessary, we'd have to stick the remaining clauses 
somewhere, so that check_index_only can use only the filtered list 
(instead of walking through the complete list of restrictions).


Or perhaps we can do the check in match_clause_to_index, pretty much for 
free? If the clause is not implied by the index predicate (which we know 
thanks to the fix), and we don't assign it to any of the index columns, 
it means we can'd use IOS, no?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7069f60..d2c9f51 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -86,6 +86,7 @@
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
 #include "optimizer/restrictinfo.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 9da5444..2d3c0f8 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1798,13 +1798,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 * Check that all needed attributes of the relation are available from the
 	 * index.
 	 *
-	 * XXX this is overly conservative for partial indexes, since we will
-	 * consider attributes involved in the index predicate as required even
-	 * though the predicate won't need to be checked at runtime.  (The same is
-	 * true for attributes used only in index quals, if we are certain that
-	 * the index is not lossy.)  However, it would be quite expensive to
-	 * determine that accurately at this point, so for now we take the easy
-	 * way out.
+	 * For partial indexes we won't consider attributes involved in clauses
+	 * implied by the index predicate, as those won't be needed at runtime.
+	 *
+	 * XXX The same is true for attributes used only in index quals, if we
+	 * are certain that the index is not lossy. However, it would be quite
+	 * expensive to determine that accurately at this point, so for now we
+	 * take the easy way out.
 	 */
 
 	/*
@@ -1819,6 +1819,28 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
+		/*
+		 * If the index is partial, we won't consider the clauses that are
+		 * implied by the index predicate (those are not needed at runtime).
+		 */
+		if (index->indpred != NIL)
+		{
+			bool	implied = false;
+			List   *clauses  = NIL;
+
+			/* need a list for the 'implied_by' call */
+			clauses = lappend(clauses, (Node *) 

Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-09-26 Thread Michael Paquier
On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera 
wrote:

> Euler Taveira wrote:
> > On 17-09-2015 14:21, Michael Paquier wrote:
> > >pg_dump relies on attnum to define the column ordering, so one
> > >possibility would be to do things more consistently at backend level.
>
> We discussed this in some other thread, not long ago.  I looked briefly
> in the archives but couldn't find it.  I think the conclusion was
> something along the lines of "hmm, tough".
>

That's hours, even days of fun ahead for sure.


> > Someone can say that we could assign an attnum for column "d" considering
> > all of the inheritance tree. However, attnum is used as an index to
> arrays
> > (we could bloat some of those) and some logic rely on it to count the
> number
> > of columns. It would become tablecmds.c into an spaghetti.
>

That was my first impression, but that's just a band-aid. Switching the
logic in stable branches to assign a correct attnum for a child table,
while switching on the way the attnum referring to the parent entries is a
scary idea. I would suspect that this would break many other things for
fixing a narrow case, and it is still possible for the user to get away by
using COPY or INSERT with the list of columns listed when taking a dump.

We don't need any more spaghetti there, thanks!
>

Agreed.


> > IMHO a possible way to solve it is adding support for logical column
> > ordering. An ALTER TABLE command (emitted if a parameter was informed)
> > during dump could handle it. BTW, last thread [1] about logical column
> > ordering seems to have died a few months ago. Alvaro?
>
> Tomas Vondra also worked a bit on this patch, and we eventually gave up
> on it due to lack of time.  We might be able to get back on it someday,
> but do not hold your breath.  If you want the current bug fixed, do not
> wait for logical column numbering.
>

Honestly I have the feeling that the discussion of this thread gets
unproductive, let's not forget that the patch presented on this thread is
just aiming at adding one test case to ensure that extensions using
dumpable relations with FKs get correctly dumped, which is to ensure that
we do not reintroduce a bug that existed in the extension facility since
its introduction in 9.1. That being said, the patch is just fine for this
purpose, but that's just my opinion.
-- 
Michael


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Amit Kapila
On Sat, Sep 26, 2015 at 12:38 PM, Amit Kapila 
wrote:
>
> On Sat, Sep 26, 2015 at 6:07 AM, Robert Haas 
wrote:
> >
>
> > Assuming I'm not confused, I'm planning to see about fixing this...
> >
>
> Can't we just traverse the queryDesc->planstate tree and fetch/add
> all the instrument information if there are multiple nodes?
>

I think the above suggestion made by me won't work, because we want
this information per node basis in master as Explain will display each
node's information separately.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Tab completion for ALTER COLUMN SET STATISTICS

2015-09-26 Thread Michael Paquier
On Sat, Sep 26, 2015 at 7:18 AM, Jeff Janes  wrote:
> If I have "alter table foo alter COLUMN bar SET STATISTICS" in the line
> buffer,
> it tab completes to add " TO", which is not legal.
>
> The attached patch makes it not tab complete anything at all, which is at
> least not actively misleading.
> I thought of having it complete "-1", "" so that it gives a clue
> about what is needed, but I didn't see any precedence for non-literal
> clue-giving and I did not want to try to create new precedence.

+1 for the way you are doing it in your patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Robert Haas
On Sat, Sep 26, 2015 at 3:08 AM, Amit Kapila  wrote:
>> memcpy() can cope with unaligned data; structure member assignment can't.
>
> So doesn't coping means, it anyways have to have to pay the performance
> penality to make it equivalent to aligned address access.  Apart from that,
> today I had read about memcpy's behaviour incase of unaligned address,
> it seems from some of the information on net that it could be unsafe
> [1],[2].

I'm not concerned about the performance penalty for unaligned access
in this case; I'm concerned about the fact that on some platforms it
causes a segmentation fault.  The links you've provided there are
examples of cases where that wasn't true, and people reported that as
a bug in memcpy.

> Yes, you have figured out correctly, I was under impression that we
> will have single node execution in worker for first version and then
> will extend it later.

No, I really want it to work with multiple nodes from the start, and
I've pretty much got that working here now.

> QueryDesc's totaltime is for instrumentation information for plugin's
> like pg_stat_statements and we need only the total buffer usage
> of each worker to make it work as the other information is already
> collected in master backend, so I think that should work as I have
> written.

I don't think that's right at all.  First, an extension can choose to
look at any part of the Instrumentation, not just the buffer usage.
Secondly, the buffer usage inside QueryDesc's totaltime isn't the same
as the global pgBufferUsage.

>> Assuming I'm not confused, I'm planning to see about fixing this...
>
> Can't we just traverse the queryDesc->planstate tree and fetch/add
> all the instrument information if there are multiple nodes?

Well you need to add each node's information in each worker to the
corresponding node in the leader.  You're not just adding them all up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Amit Kapila
On Sat, Sep 26, 2015 at 6:07 AM, Robert Haas  wrote:
>
> On Fri, Sep 25, 2015 at 7:46 AM, Amit Kapila 
wrote:
> > I have a question here which is why this format doesn't have a similar
> > problem
> > as the current version, basically in current patch the second read of
> > SerializedParamExternData can be misaligned and for same reason in your
> > patch the second read of Oid could by misaligned?
>
> memcpy() can cope with unaligned data; structure member assignment can't.
>

So doesn't coping means, it anyways have to have to pay the performance
penality to make it equivalent to aligned address access.  Apart from that,
today I had read about memcpy's behaviour incase of unaligned address,
it seems from some of the information on net that it could be unsafe
[1],[2].

> I've worked some of this code over fairly heavily today and I'm pretty
> happy with how my copy of execParallel.c looks now, but I've run into
> one issue where I wanted to check with you. There are three places
> where Instrumentation can be attached to a query: a ResultRelInfo's
> ri_TrigInstrument (which doesn't matter for us because we don't
> support parallel write queries, and triggers don't run on reads), a
> PlanState's instrument, and a QueryDesc's total time.
>
> Your patch makes provision to copy ONE Instrumentation structure per
> worker back to the parallel leader.  I assumed this must be the
> QueryDesc's totaltime, but it looks like it's actually the PlanState
> of the top node passed to the worker.  That's of course no good if we
> ever push more than one node down to the worker, which we may very
> well want to do in the initial version, and surely want to do
> eventually.  We can't just deal with the top node and forget all the
> others.  Is that really what's happening here, or am I confused?
>

Yes, you have figured out correctly, I was under impression that we
will have single node execution in worker for first version and then
will extend it later.

QueryDesc's totaltime is for instrumentation information for plugin's
like pg_stat_statements and we need only the total buffer usage
of each worker to make it work as the other information is already
collected in master backend, so I think that should work as I have
written.

> Assuming I'm not confused, I'm planning to see about fixing this...
>

Can't we just traverse the queryDesc->planstate tree and fetch/add
all the instrument information if there are multiple nodes?


[1] - http://gcc.gnu.org/ml/gcc-bugs/2000-03/msg00155.html
[2] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53016


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-26 Thread Tomas Vondra

Hi,

On 09/26/2015 01:28 PM, Tomas Vondra wrote:


The patch does not change the check_index_only implementation - it
still needs to check the clauses, just like in v1 of the patch. To
make this re-check unnecessary, we'd have to stick the remaining
clauses somewhere, so that check_index_only can use only the filtered
list (instead of walking through the complete list of restrictions).


After thinking about this a bit more, I realized we already have a good 
place for keeping this list - IndexClauseSet. So I've done that, and 
removed most of the code from check_index_only - it still needs to 
decide whether to use the full list of restrictions (regular indexes) or 
the filtered list (for partial indexes).


Calling predicate_implied_by in match_clause_to_index makes the check a 
bit earlier, compared to the v1 of the patch. So theoretically there 
might be cases where we'd interrupt the execution between those places, 
and the v1 would not pay the price for the call. But those places are 
fairly close together, so I find that unlikely and I've been unable to 
reproduce a plausible example of such regression despite trying.


The only regression test this breaks is "aggregates", and I believe 
that's actually correct because it changes the plans like this:


   ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
 Index Cond: (f1 IS NOT NULL)
   ->  Index Only Scan using minmaxtest3i on minmaxtest3
-Index Cond: (f1 IS NOT NULL)

i.e. it removes the Index Condition from scans of minmaxtest3i, which is 
perfectly sensible because the index is defined like this:


create index minmaxtest3i on minmaxtest3(f1) where f1 is not null;

So it's partial index and that condition is implied by the predicate 
(it's actually exactly the same).


I haven't fixed those regression tests for now.



Or perhaps we can do the check in match_clause_to_index, pretty much
for free? If the clause is not implied by the index predicate (which
we know thanks to the fix), and we don't assign it to any of the
index columns, it means we can'd use IOS, no?


After thinking about this a bit more, this is clearly nonsense. It fails 
on conditions referencing multiple columns (WHERE a=b) which can't be 
assigned to a single index column. So the logic would have to be much 
more complicated, effectively doing what check_index_only is doing just 
a tiny bit later.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 9da5444..d257441 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -59,6 +59,7 @@ typedef struct
 	bool		nonempty;		/* True if lists are not all empty */
 	/* Lists of RestrictInfos, one per index column */
 	List	   *indexclauses[INDEX_MAX_KEYS];
+	List	   *rclauses;		/* clauses not implied by predicate */
 } IndexClauseSet;
 
 /* Per-path data used within choose_bitmap_and() */
@@ -129,7 +130,7 @@ static PathClauseUsage *classify_index_clause_usage(Path *path,
 static Relids get_bitmap_tree_required_outer(Path *bitmapqual);
 static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds);
 static int	find_list_position(Node *node, List **nodelist);
-static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index);
+static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index, List *clauses);
 static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids);
 static double adjust_rowcount_for_semijoins(PlannerInfo *root,
 			  Index cur_relid,
@@ -1019,7 +1020,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	 * index data retrieval anyway.
 	 */
 	index_only_scan = (scantype != ST_BITMAPSCAN &&
-	   check_index_only(rel, index));
+	   check_index_only(rel, index, clauses->rclauses));
 
 	/*
 	 * 4. Generate an indexscan path if there are relevant restriction clauses
@@ -1782,7 +1783,7 @@ find_list_position(Node *node, List **nodelist)
  *		Determine whether an index-only scan is possible for this index.
  */
 static bool
-check_index_only(RelOptInfo *rel, IndexOptInfo *index)
+check_index_only(RelOptInfo *rel, IndexOptInfo *index, List *clauses)
 {
 	bool		result;
 	Bitmapset  *attrs_used = NULL;
@@ -1790,6 +1791,8 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	ListCell   *lc;
 	int			i;
 
+	List	   *rclauses;
+
 	/* Index-only scans must be enabled */
 	if (!enable_indexonlyscan)
 		return false;
@@ -1798,13 +1801,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 * Check that all needed attributes of the relation are available from the
 	 * index.
 	 *
-	 * XXX this is overly conservative for partial indexes, since we will
-	 * consider attributes involved in the index predicate as required even
-	 * though the predicate won't need to be checked at runtime.  

Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-09-26 Thread Andres Freund
On 2015-09-26 21:54:46 +0900, Michael Paquier wrote:
> On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera 
> wrote:
> > We discussed this in some other thread, not long ago.  I looked briefly
> > in the archives but couldn't find it.  I think the conclusion was
> > something along the lines of "hmm, tough".

> That's hours, even days of fun ahead for sure.

To me that's a somewhat serious bug, and something that we probably need
to address at some point.

> Honestly I have the feeling that the discussion of this thread gets
> unproductive, let's not forget that the patch presented on this thread is
> just aiming at adding one test case to ensure that extensions using
> dumpable relations with FKs get correctly dumped, which is to ensure that
> we do not reintroduce a bug that existed in the extension facility since
> its introduction in 9.1. That being said, the patch is just fine for this
> purpose, but that's just my opinion.

It's an unsustainable test model. Adding own test runners, directories,
initdb etc. for a simple regression test of a couple lines won't hurt
for maybe the first five but after that it starts to get unmaintainable.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-26 Thread Andres Freund
Hi,

I pushed this to 9.5 and master, committing the xlog page magic bump
separately. To avoid using a magic value from master in 9.5 I bumped the
numbers by two in both branches.

Should this get a release note entry given that we're not (at least
immediately) backpatching this?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Rework access method interface

2015-09-26 Thread Petr Jelinek

On 2015-09-25 17:45, Alvaro Herrera wrote:


I think the API of getOpFamilyInfo is a bit odd; is the caller expected
to fill intype and keytype always, and then it only sets the procs/opers
lists?  I wonder if it would be more sensible to have that routine
receive the pg_opclass tuple (or even the opclass OID) instead of the
opfamily OID.

I think "amapi.h" is not a great file name.  What about am_api.h?



Well we have related fdwapi.h and tsmapi.h (and several unrelated *api.h 
which also don't use "_" in the name) so amapi.h seems fine to me.



I'm unsure about BRIN_NPROC.  Why did you set it to 15?  It's not
unthinkable that future opclass frameworks will have different numbers


The BRIN_NPROC should be probably defined in brin.c since it's only used 
for sizing local array variable in amvalidate and should be used to set 
amsupport in the init function as well then.



of support procs.  For BRIN I'm thinking that we could add another
support proc which validates the opclass definition using the specific
framework; that way we will be able to check that the set of operators
defined are correct, etc (something that the current approach cannot
do).


As I said before in the thread I would prefer more granular approach to 
validation - have amvalidateopclass in the struct for the current 
functionality so that we can easily add more validators in the future. 
There can still be one amvalidate function exposed on SQL level that 
just calls all the amvalidate* functions that the am defines.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-09-26 Thread Michael Paquier
On Sat, Sep 26, 2015 at 4:29 AM, Paul Ramsey wrote:
> On Thu, Aug 20, 2015 at 6:01 PM, Michael Paquier wrote:
> src/backend/utils/adt/format_type.c
> +/*
> + * This version allows a nondefault typemod to be specified and fully
> qualified.
> + */
> +char *
> +format_type_with_typemod_qualified(Oid type_oid, int32 typemod)
> +{
> +   return format_type_internal(type_oid, typemod, true, false, true);
> +}

Patch 0001 in this email has a routine called format_type_detailed
that I think is basically what we need for this stuff:
http://www.postgresql.org/message-id/CACACo5Q_UXYwF117LBhjZ3xaMPyrgqnqE=mxvrhefjj51ac...@mail.gmail.com
Could you look at it?

> I've put a very very small regression file in that tests the shippable
> feature, which can be fleshed out further as needed.
>
> Thanks so much!

Nice. Thanks for the addition of the regression tests. It begins to
have a nice shape.

+ * shippable.c
+ *   Non-built-in objects cache management and utilities.
+ *
+ * Is a non-built-in shippable to the remote server? Only if
+ * the object is in an extension declared by the user in the
+ * OPTIONS of the wrapper or the server.
This is rather unclear. It would be more simple to describe that as
"Facility to track database objects shippable to a foreign server".

+extern bool extractExtensionList(char *extensionString,
+   List **extensionOids);
What's the point of the boolean status in this new routine? The return
value of extractExtensionList is never checked, and it would be more
simple to just return the parsed list as return value, no?

-REGRESS = postgres_fdw
+REGRESS = postgres_fdw shippable
+EXTRA_INSTALL = contrib/seg
The order of the tests is important and should be mentioned,
shippable.sql using the loopback server created by postgres_fdw.

+-- ===
+-- clean up
+-- ===
Perhaps here you meant dropping the schema and the foreign tables
created previously?

+CREATE SCHEMA "SH 1";
Is there a reason to use double-quoted relations? There is no real
reason to not use them, this is just to point out that this is not a
common practice in the other regression tests.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-26 Thread Tom Lane
Andres Freund  writes:
> Should this get a release note entry given that we're not (at least
> immediately) backpatching this?

I'll probably put something in when I update the release notes for beta1
(next week sometime); no real need to deal with it individually.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Robert Haas
On Sat, Sep 26, 2015 at 10:16 AM, Robert Haas  wrote:
> On Sat, Sep 26, 2015 at 8:38 AM, Robert Haas  wrote:
>>> QueryDesc's totaltime is for instrumentation information for plugin's
>>> like pg_stat_statements and we need only the total buffer usage
>>> of each worker to make it work as the other information is already
>>> collected in master backend, so I think that should work as I have
>>> written.
>>
>> I don't think that's right at all.  First, an extension can choose to
>> look at any part of the Instrumentation, not just the buffer usage.
>> Secondly, the buffer usage inside QueryDesc's totaltime isn't the same
>> as the global pgBufferUsage.
>
> Oh... but I'm wrong.  As long as our local pgBufferUsage gets update
> correctly to incorporate the data from the other workers, the
> InstrStopNode(queryDesc->totaltime) will suck in those statistics.
> And the only other things getting updated are nTuples (which shouldn't
> count anything that the workers did), firsttuple (similarly), and
> counter (where the behavior is a bit more arguable, but just counting
> the master's wall-clock time is at least defensible).  So now I think
> you're right: this should be OK.

OK, so here's a patch extracted from your
parallel_seqscan_partialseqscan_v18.patch with a fairly substantial
amount of rework by me:

- I left out the Funnel node itself; this is just the infrastructure
portion of the patch.  I also left out the stop-the-executor early
stuff and the serialization of PARAM_EXEC values.  I want to have
those things, but I think they need more thought and study first.
- I reorganized the code a fair amount into a former that I thought
was clearer, and certainly is closer to what I did previously in
parallel.c.  I found your version had lots of functions with lots of
parameters, and I found that made the logic difficult to follow, at
least for me.  As part of that, I munged the interface a bit so that
execParallel.c returns a structure with a bunch of pointers in it
instead of separately returning each one as an out parameter.  I think
that's cleaner.  If we need to add more stuff in the future, that way
we don't break existing callers.
- I reworked the interface with instrument.c and tried to preserve
something of an abstraction boundary there.  I also changed the way
that stuff accumulated statistics to include more things; I couldn't
see any reason to make it as narrow as you had it.
- I did a bunch of cosmetic cleanup, including changing function names
and rewriting comments.
- I replaced your code for serializing and restoring a ParamListInfo
with my version.
- I fixed the code so that it can handle collecting instrumentation
data from multiple nodes, bringing all the data back to the leader and
associating it with the right plan node.  This involved giving every
plan node a unique ID, as discussed with Tom on another recent thread.

After I did all that, I wrote some test code, which is also attached
here, that adds a new GUC force_parallel_worker.  If you set that GUC,
when you run a query, it'll run the query in a parallel worker and
feed the results back to the master.  I've tested this and it seems to
work, at least on the queries where you'd expect it to work.  It's
just test code, so it doesn't have error checking or make any attempt
not to push down queries that will fail in parallel mode. But you can
use it to see what happens.  You can also run queries under EXPLAIN
ANALYZE this way and, lo and behold, the worker stats show up attached
to the correct plan nodes.

I intend to commit this patch (but not the crappy test code, of
course) pretty soon, and then I'm going to start working on the
portion of the patch that actually adds the Funnel node, which I think
you are working on renaming to Gather.  I think that getting that part
committed is likely to be pretty straightforward; it doesn't need to
do a lot more than call this stuff and tell it to go do its thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From 7c5fa754fada96553930820b970da876fb1dc468 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Fri, 25 Sep 2015 17:50:19 -0400
Subject: [PATCH 2/2] Test code.

---
 src/backend/executor/execMain.c | 47 -
 src/backend/executor/execParallel.c |  4 +++-
 src/backend/utils/misc/guc.c| 11 +
 src/include/utils/guc.h |  2 ++
 4 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 85ff46b..4863afd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -45,6 +45,8 @@
 #include "commands/matview.h"
 #include "commands/trigger.h"
 #include "executor/execdebug.h"
+#include "executor/execParallel.h"
+#include "executor/tqueue.h"
 #include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -339,13 

Re: [HACKERS] unclear about row-level security USING vs. CHECK

2015-09-26 Thread Peter Eisentraut
On 9/23/15 3:41 PM, Stephen Frost wrote:
> The CREATE POLICY documentation discusses how lack of a WITH CHECK
> policy means the USING expression is used:
> 
> """
> Policies can be applied for specific commands or for specific roles. The
> default for newly created policies is that they apply for all commands
> and roles, unless otherwise specified. If multiple policies apply to a
> given query, they will be combined using OR (although ON CONFLICT DO
> UPDATE and INSERT policies are not combined in this way, but rather
> enforced as noted at each stage of ON CONFLICT execution). Further, for
> commands which can have both USING and WITH CHECK policies (ALL and
> UPDATE), if no WITH CHECK policy is defined then the USING policy will
> be used for both what rows are visible (normal USING case) and which
> rows will be allowed to be added (WITH CHECK case).
> """

I see.  But it is a bit odd to hide this very fundamental behavior
somewhere in a paragraph that starts out with something about roles.

There is also a mistake, I believe: DELETE policies also take both a
CHECK and a USING clause.

I still find something about this weird, but I'm not sure what.  It's
not clear to me at what level this USING->CHECK mapping is applied.  I
can write FOR ALL USING and it will be mapped to CHECK for all actions,
including INSERT, but when I write FOR INSERT USING it complains.  Why
doesn't it do the mapping that case, too?

>> (Btw., what's the meaning of a policy for DELETE?)
> 
> The DELETE policy controls what records a user is able to delete.

That needs to be documented somewhere.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-26 Thread Robert Haas
On Sat, Sep 26, 2015 at 8:38 AM, Robert Haas  wrote:
>> QueryDesc's totaltime is for instrumentation information for plugin's
>> like pg_stat_statements and we need only the total buffer usage
>> of each worker to make it work as the other information is already
>> collected in master backend, so I think that should work as I have
>> written.
>
> I don't think that's right at all.  First, an extension can choose to
> look at any part of the Instrumentation, not just the buffer usage.
> Secondly, the buffer usage inside QueryDesc's totaltime isn't the same
> as the global pgBufferUsage.

Oh... but I'm wrong.  As long as our local pgBufferUsage gets update
correctly to incorporate the data from the other workers, the
InstrStopNode(queryDesc->totaltime) will suck in those statistics.
And the only other things getting updated are nTuples (which shouldn't
count anything that the workers did), firsttuple (similarly), and
counter (where the behavior is a bit more arguable, but just counting
the master's wall-clock time is at least defensible).  So now I think
you're right: this should be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reusing abbreviated keys during second pass of ordered [set] aggregates

2015-09-26 Thread Peter Geoghegan
On Sun, Sep 6, 2015 at 9:35 PM, Thomas Munro
 wrote:
> Running various contrived aggregate queries on a large low cardinality
> dataset in a small range (therefore frequently the same weight & size), I
> managed to measure a small improvement of up to a few percent with the
> attached patch.  I also wonder whether numeric_cmp could be profitably
> implemented with memcmp on big endian systems and some unrolled loops on
> little endian systems when size & weight match.

I think that setting up numeric SortSupport to have scratch memory
used across authoritative numeric comparator calls would also help.

We should prefer to do this kind of thing in a datatype independent
way, of course. I'm not opposed to doing what you outline too, but I
don't think it will be especially helpful for the cases here. I think
that what you're talking about would live in the SortSupport
authoritative comparator, and would benefit non-leading-attribute
comparisons most.

> Of course there are a ton of other overheads involved with numeric.  I
> wonder how crazy or difficult it would be to make it so that we could
> optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag
> bit to say 'I'm an immediate value, not a pointer', and then packing 3
> digits (= 12 significant figures) + sign + weight into the other 63 bits.

That seems possible, but very invasive. I'd want to get a good sense
of the pay-off before undertaking such a project.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers