Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-12 Thread Amit Langote
On 2016/01/12 11:28, Vinayak Pokale wrote:
> On Jan 12, 2016 11:22 AM, "Amit Langote" 
> wrote:
>>
>> On 2016/01/12 10:30, Amit Langote wrote:
>>> I'm slightly concerned that the latest patch doesn't incorporate any
>>> revisions to the original pgstat interface per Robert's comments in [1].
>>
>> I meant to say "originally proposed pgstat interface on this thread".
> 
> Yes.
> Robert's comments related to pgstat interface needs to be address.
> I will update it.

So, I updated the patch status to "Waiting on Author".

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some bugs in psql_complete of psql

2016-01-12 Thread Peter Eisentraut
On 12/22/15 4:44 AM, Kyotaro HORIGUCHI wrote:
> 1. 0001-Fix-tab-complete-of-CREATE-INDEX.patch
> 
>   Fixes completion for CREATE INDEX in ordinary way.

This part has been fixed in another thread.  Please check whether that
satisfies all your issues.

> 3. 0002-Fix-tab-completion-for-DROP-INDEX.patch
> 
>   Fix of DROP INDEX completion in the type-2 way.

I agree that we could use completion support for DROP INDEX
CONCURRENTLY, but I would rather not throw IF NOT EXISTS into the same
patch.  We don't have support for IF NOT EXISTS anywhere else.  If you
think about, it's rather unnecessary, because tab completion will
determine for you whether an object exists.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Amit Kapila
On Tue, Jan 12, 2016 at 7:24 PM, Andres Freund  wrote:
> On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> > Why can't we do it at larger intervals (relative to total amount of
writes)?
> > To explain, what I have in mind, let us assume that checkpoint interval
> > is longer (10 mins) and in the mean time all the writes are being done
> > by bgwriter
>
> But that's not the scenario with the regression here, so I'm not sure
> why you're bringing it up?
>
> And if we're flushing significant portion of the writes, how does that
> avoid the performance problem pointed out two messages upthread? Where
> sorting leads to flushing highly contended buffers together, leading to
> excessive wal flushing?
>

I think it will avoid that problem, because what I am telling is not-to-sort
the buffers before writing, rather sort the flush requests.  If I remember
correctly, the initial patch of Fabien doesn't have sorting at the buffer
level, but still he is able to see the benefits in many cases.

>
> But more importantly, unless you also want to delay the writes
> themselves, leaving that many dirty buffers in the kernel page cache
> will bring back exactly the type of stalls (where the kernel flushes all
> the pending dirty data in a short amount of time) we're trying to avoid
> with the forced flushing. So doing flushes in a large patches is
> something we really fundamentally do *not* want!
>

Could it be because random I/O?

> > which it registers in shared memory so that later checkpoint
> > can perform corresponding fsync's, now when the request queue
> > becomes threshhold size (let us say 1/3rd) full, then we can perform
> > sorting and merging and issue flush hints.
>
> Which means that a significant portion of the writes won't be able to be
> collapsed, since only a random 1/3 of the buffers is sorted together.
>
>
> > Basically, I think this can lead to lesser merging of neighbouring
> > writes, but might not hurt if sync_file_range() API is cheap.
>
> The cost of writing out data doess correspond heavily with the number of
> random writes - which is what you get if you reduce the number of
> neighbouring writes.
>

Yeah, thats right, but I am not sure how much difference it would
create if sorting everything at one short versus if we do that in
batches.  In anycase, I am just trying to think out loud to see if we
can find some solution to the regression you have seen above
without disabling sorting altogether for certain cases.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function

2016-01-12 Thread Simon Riggs
On 12 January 2016 at 05:58, Michael Paquier 
wrote:

> On Tue, Jan 12, 2016 at 6:35 AM, Simon Riggs wrote:
> > Comments in xlog.c say
> >
> > "In addition to the shared variable, each backend has a private copy of
> > LogwrtResult, which is updated when convenient."
> > It is therefore valid to update the value of both Write and Flush
> positions
> > at the same time, any time either is required.
>
> Yes I saw this one yesterday when looking at this code. My comment
> regarded the potential interactions between this field with XLogFlush,
> but now I see that my concerns are not valid, updating more frequently
> LogwrtResult may save some cycles though.
>
> > My suggested commit pattern for this is...
> > 1. Update existing function to maintain LogwrtResult more eagerly
> (separate
> > patch)
>
> The only place I see now that would benefit a bit from that is
> UpdateMinRecoveryPoint when info_lck is taken, which can be called by
> XLogFlush. Though I would expect this to have minimal impact.
>
> > 2. Have the patch use the existing function name (main patch)
>
> Yeah, we had better just use GetFlushRecPtr and be done with it. It
> seems that there is little point to add a new function, and it is not
> going to be called that much so its effects in updating LogwrtResult
> would be minimized for a single backend.
>

Patch committed, thanks for patch and review.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-12 Thread Teodor Sigaev

!   float4  tmpsml = cnt_sml(qtrg, key, 
*recheck);

/* strange bug at freebsd 5.2.1 and gcc 3.3.3 */
!   res = (*(int *)  == *(int *)  || 
tmpsml > nlimit) ? true : false;


What's the compiler bug about?  This code and comment were introduced in
cbfa4092bb (May 2004) without any explanation.  Do we still need to keep
it, if  is now a local variable instead of a global?  FWIW the
oldest GCC in the buildfarm is 3.4.2/3.4.3 (except for Gaur which uses


As I remeber, the problem was with x87 math coprocessor. Compiler (suppose, 
modern compiler could do it too) keeps tmpsml in internal 80-bit wide register 
and compares with 32-bit wide float. Of course, it depends on level of 
optimization and so, result of comparison was differ in optimization enabled and 
disabled instances. Such strange way I choose to force compiler to use main 
memory for tmpsml variable. Actually, I don't know better way even now.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-12 Thread Aleksander Alekseev
Hello, Álvaro

> So you were saying some posts upthread that the new hash tables would
> "borrow" AN elements from the freelist then put AN elements back when
> too many got unused.  Is that idea embodied in this patch?

Right. It didn't satisfy "use all memory" requirement anyway. It's
better to sacrifice a few TPS (according only to my benchmark which
doesn't represent all possible use cases) than accidentally break
someones system after upgrading to the next version of PostgreSQL. See
example with pg_dump given by Robert Haas above.

>> +/* Number of partitions of the shared buffer mapping hashtable */
>> +#define NUM_BUFFER_PARTITIONS NUM_LOCK_PARTITIONS
>> +
>>  /* Number of partitions the shared predicate lock tables are divided
>> into */ #define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
>>  #define NUM_PREDICATELOCK_PARTITIONS  (1 <<
>> LOG2_NUM_PREDICATELOCK_PARTITIONS)  
>
> I find this odd.  Why is it a good idea to define the bufMapping
> partitions like this? What's the explanation for the performance
> numbers you saw?

My mistake. I thought that last table was self-explanatory.

We see that a single spinlock for accessing a freeList (current
implementation) is obviously a bottleneck. Replacing it with say an
array of spinlocks reduces lock contention and therefore gives more TPS
(see first row). Also we can increase concurrency level even further by
increasing number of lock partitions (see columns "no locks", "lwlock"
and "spinlock array"). Previously it couldn't help us (see "master"
column) because of a bottleneck.

If you have any other questions regarding this patch please don't
hesitate to ask.

Best regards,
Aleksander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-01-12 Thread Fabien COELHO


Hello Michaël,


1) When precising a negative parameter in the gaussian and exponential
functions an assertion is triggered:
Assertion failed: (parameter > 0.0), function getExponentialRand, file
pgbench.c, line 494.
Abort trap: 6 (core dumped)
An explicit error is better.


Ok for assert -> error.


2) When precising a value between 0 and 2, pgbench will loop
infinitely. For example:
\set cid debug(random_gaussian(-100, 100, 0))
In both cases we just lack sanity checks with PGBENCH_RANDOM,
PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
those checks would be better if moved into getExponentialRand & co
with the assertions removed. I would personally have those functions
return a boolean status and have the result in a pointer as a function
argument.


ISTM that if pgbench is to be stopped, the simplest option is just to 
abort with a nicer error message from the get*Rand function, there is no 
need to change the function signature and transfer the error management 
upwards.



Another thing itching me is that ENODE_OPERATOR is actually useless
now that there is a proper function layer. Removing it and replacing
it with a set of functions would be more adapted and make the code
simpler, at the cost of more functions and changing the parser so as
an operator found is transformed into a function expression.


Hmmm. Why not, although the operator management is slightly more efficient 
(eg always 2 operands...).



I am attaching a patch where I tweaked a bit the docs and added some
comments for clarity. Patch is switched to "waiting on author" for the
time being.


Ok. I'm hesitating about removing the operator management, especially if 
I'm told to put it back afterwards.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

2016-01-12 Thread Etsuro Fujita

On 2016/01/12 2:36, Alvaro Herrera wrote:

I wonder,


--- 2166,2213 
}

/*
!* If rel is a base relation, detect whether any system columns are
!* requested from the rel.  (If rel is a join relation, rel->relid will 
be
!* 0, but there can be no Var in the target list with relid 0, so we 
skip
!* this in that case.  Note that any such system columns are assumed to 
be
!* contained in fdw_scan_tlist, so we never need fsSystemCol to be true 
in
!* the joinrel case.)  This is a bit of a kluge and might go away 
someday,
!* so we intentionally leave it out of the API presented to FDWs.
 */
!   scan_plan->fsSystemCol = false;
!   if (scan_relid > 0)
{
!   Bitmapset  *attrs_used = NULL;
!   ListCell   *lc;
!   int i;

!   /*
!* First, examine all the attributes needed for joins or final 
output.
!* Note: we must look at reltargetlist, not the attr_needed 
data,
!* because attr_needed isn't computed for inheritance child 
rels.
!*/
!   pull_varattnos((Node *) rel->reltargetlist, scan_relid, 
_used);

!   /* Add all the attributes used by restriction clauses. */
!   foreach(lc, rel->baserestrictinfo)
{
!   RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
!
!   pull_varattnos((Node *) rinfo->clause, scan_relid, 
_used);
}

!   /* Now, are any system columns requested from rel? */
!   for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
!   {
!   if (bms_is_member(i - 
FirstLowInvalidHeapAttributeNumber, attrs_used))
!   {
!   scan_plan->fsSystemCol = true;
!   break;
!   }
!   }
!
!   bms_free(attrs_used);
!   }

return scan_plan;
   }


Would it make sense to call pull_varattnos(reltargetlist), then walk the
bitmapset and break if we see a system column, then call
pull_varattnos() on the rinfo->clause?  That way, if the targetlist
request a system column we don't have to walk the RestrictInfos.


Seems like a good idea.  Will update the patch.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Simon Riggs
On 12 January 2016 at 06:35, Michael Paquier 
wrote:

> On Tue, Jan 12, 2016 at 4:57 AM, Simon Riggs 
> wrote:
> > Performance tests for me show that the patch is effective; my results
> match
> > Jesper's roughly in relative numbers.
> >
> > My robustness review is that the approach and implementation are safe.
> >
> > It's clear there are various additional tuning opportunities, but the
> > objective of the current patch to improve performance is very, very
> clearly
> > met, so I'm aiming to commit *this* patch soon.
>
> -   /* initialize LSN to 0 (start of WAL) */
> -   gxact->prepare_lsn = 0;
> +   /* initialize LSN to InvalidXLogRecPtr */
> +   gxact->prepare_start_lsn = InvalidXLogRecPtr;
> +   gxact->prepare_end_lsn = InvalidXLogRecPtr;
>

OK


> I think that it would be better to explicitly initialize gxact->ondisk
> to false here.
>
> +   xlogreader = XLogReaderAllocate(_read_local_xlog_page,
> NULL);
> +   if (!xlogreader)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_OUT_OF_MEMORY),
> +errmsg("out of memory"),
> +errdetail("Failed while allocating an
> XLog reading processor.")));
> Depending on something that is part of logical decoding to decode WAL
> is not a good idea.


Well, if you put it like that, it sounds wrong, clearly; that's not how I
saw it, when reviewed.

I think any fuss can be avoided simply by renaming
logical_read_local_xlog_page() to read_local_xlog_page()


> If you want to move on with this approach, you
> should have a dedicated function.


The code is exactly what we need, apart from the point that the LSN is
always known flushed by the time we execute it, for 2PC.


> Even better, it would be nice to
> come up with a generic function used by both 2PC and logical decoding.
>

Surely that is exactly what has been done?

A specific function could have been written, which would simply have
duplicated about 160 lines of code. Reusing the existing code makes the
code generic. So lets just rename the function, as mentioned above.

Should we just move the code somewhere just to imply it is generic? Seems
pointless refactoring to me.

The code is clearly due for refactoring once we can share elog with client
programs, as described in comments on the functions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Simon Riggs
On 12 January 2016 at 06:41, Michael Paquier 
wrote:

>
> +   if (log_checkpoints && n > 0)
> +   ereport(LOG,
> +   (errmsg("%u two-phase state files were
> written "
> +   "for long-running
> prepared transactions",
> +   n)));
> This would be better as an independent change. That looks useful for
> debugging, and I guess that's why you added it.
>

The typical case is that no LOG message would be written at all, since that
only happens minutes after a prepared transaction is created and then not
terminated. Restarting a transaction manager likely won't take that long,
so it implies a crash or emergency shutdown of the transaction manager.

I think it is sensible and useful to be notified of this as a condition the
operator would wish to know about. The message doesn't recur every
checkpoint, it occurs only once at the point the files are created, so its
not log spam either.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Relation extension scalability

2016-01-12 Thread Dilip Kumar
On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund  wrote:

> On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
>
> I think it's a worthwhile approach to pursue. But until it actually
> fixes the problem of leaving around uninitialized pages I don't think
> it's very meaningful to do performance comparisons.
>

Attached patch solves this issue, I am allocating the buffer for each page
and initializing the page, only after that adding to FSM.


> > a. Extend the relation page by page and add it to FSM without
> initializing
> > it.  I think this is what the current patch of Dilip seems to be doing.
> If
> > we
> I think that's pretty much unacceptable, for the non-error path at
> least.
>

Performance results:

Test Case:

./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";

echo COPY data from '/tmp/copybinary' WITH BINARY; > copy_script

./psql -d postgres -c "truncate table data"
./psql -d postgres -c "checkpoint"
./pgbench -f copy_script -T 120 -c$ -j$ postgres

Test Summary:

1. I have measured the performance of base and patch.
2. With patch there are multiple results, that are with different values of
"extend_num_pages" (parameter which says how many extra block to extend)

Test with Data on magnetic Disk and WAL on SSD

Shared Buffer :48GB
max_wal_size :10GB
Storage  :  Magnetic Disk
WAL   :  SSD

 tps with different value of extend_num_page



Client   Base 10-Page   20-Page   50-Page

1105  103 157 129
2217  219 255 288
4210  421 494 486
8166  605 702 701
16   145  484 563 686
32   124  477 480 745

Test with Data and WAL on SSD
---

Shared Buffer :   48GB
Max Wal Size :   10GB
Storage  :   SSD

 tps with different value of extend_num_page



Client   Base 10-Page   20-Page   50-Page   100-Page

1  152  153  155  147  157
2  281  281  292  275  287
4  236  505  502  508  514
8  171  662  687  767  764
16 145  527  639  826  907

Note: Test with both data and WAL on Magnetic Disk : No significant
improvement visible
-- I think wall write is becoming bottleneck in this case.

Currently i have kept extend_num_page as session level parameter but i
think later we can make this as table property.
Any suggestion on this ?

Apart from this approach, I also tried extending the file in multiple block
in one extend call, but this approach (extending one by one) is performing
better.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
*** a/src/backend/access/heap/hio.c
--- b/src/backend/access/heap/hio.c
***
*** 24,29 
--- 24,30 
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  
+ int extend_num_pages = 0;
  
  /*
   * RelationPutHeapTuple - place tuple at specified page
***
*** 238,243  RelationGetBufferForTuple(Relation relation, Size len,
--- 239,245 
  	BlockNumber targetBlock,
  otherBlock;
  	bool		needLock;
+ 	int			totalBlocks;
  
  	len = MAXALIGN(len);		/* be conservative */
  
***
*** 449,467  RelationGetBufferForTuple(Relation relation, Size len,
  	 * it worth keeping an accurate file length in shared memory someplace,
  	 * rather than relying on the kernel to do it for us?
  	 */
! 	buffer = ReadBufferBI(relation, P_NEW, bistate);
! 
! 	/*
! 	 * We can be certain that locking the otherBuffer first is OK, since it
! 	 * must have a lower page number.
! 	 */
! 	if (otherBuffer != InvalidBuffer)
! 		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
! 
! 	/*
! 	 * Now acquire lock on the new page.
! 	 */
! 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
  
  	/*
  	 * Release the file-extension lock; it's now OK for someone else to extend
--- 451,491 
  	 * it worth keeping an accurate file length in shared memory someplace,
  	 * rather than relying on the kernel to do it for us?
  	 */
! 
! 	totalBlocks = extend_num_pages;
! 
! 	do {
! 
! 
! 		buffer = ReadBufferBI(relation, P_NEW, bistate);
! 
! 		/*
! 		 * We can be certain that locking the otherBuffer first is OK, since it
! 		 * must have a lower page number.
! 		 */
! 		if ((otherBuffer != InvalidBuffer) && !totalBlocks)
! 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
! 
! 		/*
! 		 * Now acquire lock on the new page.
! 		

Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
Hi

2016-01-12 11:57 GMT+01:00 Michal Novotny :

> Dear PostgreSQL Hackers,
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.
>
> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?
>

drop table should be fast.

There can be two reasons why not:

1. locks - are you sure, so this statement didn't wait on some lock?

2. filesystem issue  - can you check the speed of rm 5TB file on your IO?

Regards

Pavel





>
> PostgreSQL version used is PgSQL 9.4.
>
> Thanks a lot!
> Michal
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Marko Tiikkaja

On 12/01/16 12:17, Pavel Stehule wrote:

2016-01-12 12:14 GMT+01:00 Michal Novotny :


Hi Pavel,
thanks for the information. I've been doing more investigation of this
issue and there's autovacuum running on the table however it's
automatically starting even if there is "autovacuum = off" in the
postgresql.conf configuration file.



Real autovacuum is automatically cancelled. It looks like VACUUM started by
cron, maybe?


Not if it's to prevent wraparound, which isn't unlikely if autovacuum=off.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-01-12 Thread Rushabh Lathia
On Thu, Jan 7, 2016 at 6:15 PM, Etsuro Fujita 
wrote:

> On 2016/01/06 18:58, Rushabh Lathia wrote:
>
>> I started looking at updated patch and its definitely iked the new
>> approach.
>>
>
> Thanks for the review!
>
> With the initial look and test overall things looking great, I am still
>> reviewing the code changes but here are few early doubts/questions:
>>
>
> .) What the need of following change ?
>>
>> @@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
>>  int nestlevel;
>>  ListCell   *lc;
>>
>> -   if (params)
>> -   *params = NIL;  /* initialize result list to empty */
>> -
>>  /* Set up context struct for recursion */
>>  context.root = root;
>>  context.foreignrel = baserel;
>> @@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
>>   }
>>
>
> It is needed for deparsePushedDownUpdateSql to store params in both WHERE
> clauses and expressions to assign to the target columns
> into one params_list list.
>

Hmm sorry but I am still not getting the point, can you provide some
example to explain this ?

.) When Tom Lane and Stephen Frost suggested getting the core code involved,
>> I thought that we can do the mandatory checks into core it self and making
>> completely out of dml_is_pushdown_safe(). Please correct me
>>
>
> The reason why I put that function in postgres_fdw.c is Check point 4:
>
> +  * 4. We can't push an UPDATE down, if any expressions to assign to the
> target
> +  * columns are unsafe to evaluate on the remote server.
>
>
Here I was talking about checks related to triggers, or to LIMIT. I think
earlier thread talked about those mandatory check to the core. So may
be we can move those checks into make_modifytable() before calling
the PlanDMLPushdown.

I think this depends on the capabilities of the FDW.
>
> .) Documentation for the new API is missing (fdw-callbacks).
>>
>
> Will add the docs.
>





-- 
Rushabh Lathia


[HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Dave Cramer
We have an interesting problem, and the reporter has been kind enough to
provide logs for which we can't explain.

I'd be interested to hear any plausible explanations for a prepared plan
suddenly going from 2ms to 60ms for the same input values ?


Dave Cramer

da...@postgresintl.com
www.postgresintl.com

-- Forwarded message --
From: Thomas Kellerer 
Date: 12 January 2016 at 04:03
Subject: [JDBC] Re: 9.4-1207 behaves differently with server side prepared
statements compared to 9.2-1102
To: pgsql-j...@postgresql.org


> Is it possible to get server logs ?

I have picked out one of the statements that suffered from this and
sanitized the logfile.

   http://sql-workbench.net/pg_jdbc_94.log

The complete statement is at the top of the file and in the messages
themselves,
I have replaced each occurrence of the statement with "select "

The interesting thing (at least for me) is that the first few executions of
the
server side statement have pretty much the same runtime as the ones before
the prepare.
And then suddenly the runtime shoots through the rough going up from 1ms to
40ms or even 60ms

Regards
Thomas




--
Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Dave Cramer
Hi Marko,

Interesting so why would it choose a worse plan at that point ? Why would
it change at all if the current plan is working well ?

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 12 January 2016 at 07:15, Marko Tiikkaja  wrote:

> On 12/01/16 13:00, Dave Cramer wrote:
>
>> We have an interesting problem, and the reporter has been kind enough to
>> provide logs for which we can't explain.
>>
>> I'd be interested to hear any plausible explanations for a prepared plan
>> suddenly going from 2ms to 60ms for the same input values ?
>>
>
> This is a new feature in 9.2, where on the fifth (or sixth, not sure)
> execution the planner might choose to use a generic plan.  From the 9.2
> release notes (though I'm fairly certain this is documented somewhere in
> the manual as well):
>
> In the past, a prepared statement always had a single "generic" plan that
> was used for all parameter values, which was frequently much inferior to
> the plans used for non-prepared statements containing explicit constant
> values. Now, the planner attempts to generate custom plans for specific
> parameter values. A generic plan will only be used after custom plans have
> repeatedly proven to provide no benefit. This change should eliminate the
> performance penalties formerly seen from use of prepared statements
> (including non-dynamic statements in PL/pgSQL).
>
>
> .m
>


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Amit Kapila
On Tue, Jan 12, 2016 at 12:57 AM, Andres Freund  wrote:>
>
> My theory is that this happens due to the sorting: pgbench is an update
> heavy workload, the first few pages are always going to be used if
> there's free space as freespacemap.c essentially prefers those. Due to
> the sorting all a relation's early pages are going to be in "in a row".
>

Not sure, what is best way to tackle this problem, but I think one way could
be to perform sorting at flush requests level rather than before writing
to OS buffers.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Pavel,
thanks for the information. I've been doing more investigation of this
issue and there's autovacuum running on the table however it's
automatically starting even if there is "autovacuum = off" in the
postgresql.conf configuration file.

The test of rm 5T file was fast and not taking 24 hours already. I guess
the autovacuum is the issue. Is there any way how to disable it? If I
killed the process using 'kill -9' yesterday the process started again.

Is there any way how to cancel this process and disallow PgSQL to run
autovacuum again and do the drop instead?

Thanks,
Michal

On 01/12/2016 12:01 PM, Pavel Stehule wrote:
> Hi
> 
> 2016-01-12 11:57 GMT+01:00 Michal Novotny  >:
> 
> Dear PostgreSQL Hackers,
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.
> 
> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?
> 
> 
> drop table should be fast.
> 
> There can be two reasons why not:
> 
> 1. locks - are you sure, so this statement didn't wait on some lock?
> 
> 2. filesystem issue  - can you check the speed of rm 5TB file on your IO?
> 
> Regards
> 
> Pavel
> 
> 
> 
>  
> 
> 
> PostgreSQL version used is PgSQL 9.4.
> 
> Thanks a lot!
> Michal
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova


04.01.2016 11:49, David Rowley:
On 2 December 2015 at 01:53, Anastasia Lubennikova 
> 
wrote:


Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread,
regression tests and docs update.
I think it's quite ready for review.


Hi Anastasia,

I've maybe mentioned before that I think this is a great feature and I 
think it will be very useful to have, so I've signed up to review the 
patch, and below is the results of my first pass from reading the 
code. Apologies if some of the things seem like nitpicks, I've 
basically just listed everything I've noticed during, no matter how small.


First of all, I would like to thank you for writing such a detailed review.
All mentioned style problems, comments and typos are fixed in the patch 
v4.0.
+   An access method that supports this feature sets 
pg_am.amcanincluding true.


I don't think this belongs under the "Index Uniqueness Checks" title. 
I think the "Columns included with clause INCLUDING  aren't used to 
enforce uniqueness." that you've added before it is a good idea, but 
perhaps the details of amcanincluding are best explained elsewhere.

agree


+This clause specifies additional columns to be appended to 
the set of index columns.
+Included columns don't support any constraints 
(UNIQUE, PRMARY KEY, EXCLUSION CONSTRAINT).
+These columns can improve the performance of some queries 
 through using advantages of index-only scan
+(Or so called covering indexes. 
Covering index is the index that
+covers all columns required in the query and prevents a table 
access).
+Besides that, included attributes are not stored in index 
inner pages.
+It allows to decrease index size and furthermore it provides 
a way to extend included
+columns to store atttributes without suitable opclass (not 
implemented yet).
+This clause could be applied to both unique and nonunique 
indexes.
+It's possible to have non-unique covering index, which 
behaves as a regular

+multi-column index with a bit smaller index-size.
+Currently, only the B-tree access method supports this feature.

"PRMARY KEY" should be "PRIMARY KEY". I ended up rewriting this 
paragraph as follows.


"An optional INCLUDING clause allows a list of columns to 
be specified which will be included in the index, in the non-key 
portion of the index. Columns which are part of this clause cannot 
also exist in the indexed columns portion of the index, and vice 
versa. The INCLUDING columns exist solely to allow more 
queries to benefit from index only scans by including 
certain columns in the index, the value of which would otherwise have 
to be obtained by reading the table's heap. Having these columns in 
the INCLUDING clause in some cases allows 
PostgreSQL to skip the heap read completely. This also 
allows UNIQUE indexes to be defined on one set of columns, 
which can include another set of column in the INCLUDING 
clause, on which the uniqueness is not enforced upon. This can also be 
useful for non-unique indexes as any columns which are not required 
for the searching or ordering of records can defined in the 
INCLUDING clause, which can often reduce the size of the 
index."


Maybe not perfect, but maybe it's an improvement?



Yes, this explanation is much better. I've just added couple of notes.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..a102391 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   (At present, only b-tree supports 

Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
2016-01-12 12:14 GMT+01:00 Michal Novotny :

> Hi Pavel,
> thanks for the information. I've been doing more investigation of this
> issue and there's autovacuum running on the table however it's
> automatically starting even if there is "autovacuum = off" in the
> postgresql.conf configuration file.
>

Real autovacuum is automatically cancelled. It looks like VACUUM started by
cron, maybe?


>
> The test of rm 5T file was fast and not taking 24 hours already. I guess
> the autovacuum is the issue. Is there any way how to disable it? If I
> killed the process using 'kill -9' yesterday the process started again.
>
> Is there any way how to cancel this process and disallow PgSQL to run
> autovacuum again and do the drop instead?
>
> Thanks,
> Michal
>
> On 01/12/2016 12:01 PM, Pavel Stehule wrote:
> > Hi
> >
> > 2016-01-12 11:57 GMT+01:00 Michal Novotny  > >:
> >
> > Dear PostgreSQL Hackers,
> > I've discovered an issue with dropping a large table (~5T). I was
> > thinking drop table is fast operation however I found out my
> assumption
> > was wrong.
> >
> > Is there any way how to tune it to drop a large table in the matter
> of
> > seconds or minutes? Any configuration variable in the
> postgresql.conf or
> > any tune up options available?
> >
> >
> > drop table should be fast.
> >
> > There can be two reasons why not:
> >
> > 1. locks - are you sure, so this statement didn't wait on some lock?
> >
> > 2. filesystem issue  - can you check the speed of rm 5TB file on your IO?
> >
> > Regards
> >
> > Pavel
> >
> >
> >
> >
> >
> >
> > PostgreSQL version used is PgSQL 9.4.
> >
> > Thanks a lot!
> > Michal
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> > )
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
> >
>


[HACKERS] Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

2016-01-12 Thread Andres Freund
Hi,

right now the defaults for autovacuum cost limiting are so low that they
regularly cause problems for our users. It's not exactly obvious that
any installation above a couple gigabytes definitely needs to change
autovacuum_vacuum_cost_delay &
autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
anti-wraparound/full table vacuums basically take forever with the
default settings.

On the other hand we don't want a database of a couple hundred megabytes
to be vacuumed as fast as possible and trash the poor tiny system. So we
can't just massively increase the limits by default; although I believe
some default adjustment would be appropriate anyway.

I wonder if it makes sense to compute the delays / limits in relation to
either cluster or relation size. If you have a 10 TB table, you
obviously don't want to scan with a few megabytes a second, which the
default settings will do for you. With that in mind we could just go for
something like the autovacuum_*_scale_factor settings. But e.g. for
partitioned workloads with a hundreds of tables in the couple gigabyte
range that'd not work that well.

Somehow computing the speed in relation to the cluster/database size is
probably possible, but I wonder how we can do so without constantly
re-computing something relatively expensive?

Thoughts?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Michael Paquier
On Tue, Jan 12, 2016 at 5:21 PM, Simon Riggs  wrote:
> Should we just move the code somewhere just to imply it is generic? Seems
> pointless refactoring to me.

Er, why not xlogutils.c? Having the 2PC code depending directly on
something that is within logicalfuncs.c is weird.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Albe Laurenz
Vladimir Sitnikov wrote:
> Here's the simplified testcase:
> https://gist.github.com/vlsi/df08cbef370b2e86a5c1
> 
> It reproduces the problem in both 9.4.4 and 9.5rc1.
> It is reproducible via both psql and pgjdbc.
> 
> I use a single table, however my production case includes a join of
> two tables and the query is like
> select ... from foo, bar where foo.skewed=? and bar.non_skewed=? and
> foo.bar_id=bar.id
> 
> Note: my application _always_ sends *the same* *bad* value for skewed
> column (it effectively is used as a filtering column in the particular
> query).
> Unfortunately, on 6th execution backend switches to the plan that uses
> skewed index access.
> 
> Is it something that can be fixed/improved?
> 
> Good plan (the first 5 executions):
> Index Scan using non_skewed__flipper on plan_flipper
> (cost=0.43..42.77 rows=10 width=113) (actual time=0.030..0.072 rows=10
> loops=1)
>   Index Cond: (non_skewed = 42)
>   Filter: (skewed = 0)
>   Rows Removed by Filter: 10
>   Buffers: shared hit=20 read=3
> Execution time: 0.094 ms
> 
> Bad plan (all the subsequent executions):
> Index Scan using skewed__flipper on plan_flipper  (cost=0.43..6.77
> rows=1 width=113) (actual time=0.067..355.867 rows=10 loops=1)
>   Index Cond: (skewed = $1)
>   Filter: (non_skewed = $2)
>   Rows Removed by Filter: 90
>   Buffers: shared hit=18182 read=2735
> Execution time: 355.901 ms

The problem is that the index "skewed__flipper" is more selective than
"non_skewed__flipper" except when "skewed = 0", so the generic plan prefers it.

I don't know if there is a good solution except disabling server prepared 
statements.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,

thanks a lot. I've managed to run DROP TABLE and then cancel process
using pg_cancel_backend(autovacuum_pid) and it passed and dropped the 5T
table.

Thanks a lot!
Michal

On 01/12/2016 12:37 PM, Andres Freund wrote:
> Hi,
> 
> On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
>> thanks a lot for your reply. Unfortunately I've found out most it didn't
>> really start DROP TABLE yet and it's locked on autovacuum running for
>> the table and even if I kill the process it's autostarting again and again.
> 
> Start the DROP TABLE and *then* cancel the autovacuum session. That
> should work.
> 
>> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
>> entirely? Please note the "autovacuum = off" is set in the config file
>> (postgresql.conf).
> 
> That actually is likely to have caused the problem. Every
> autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
> data can't be interpreted correctly anymore at some point. That's called
> 'anti-wraparound vacuum". It's started even if you disabled autovacuum,
> to prevent database corruption.
> 
> If you disable autovacuum, you really should start vacuums in some other
> way.
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Andres Freund
On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> Why can't we do it at larger intervals (relative to total amount of writes)?
> To explain, what I have in mind, let us assume that checkpoint interval
> is longer (10 mins) and in the mean time all the writes are being done
> by bgwriter

But that's not the scenario with the regression here, so I'm not sure
why you're bringing it up?

And if we're flushing significant portion of the writes, how does that
avoid the performance problem pointed out two messages upthread? Where
sorting leads to flushing highly contended buffers together, leading to
excessive wal flushing?

But more importantly, unless you also want to delay the writes
themselves, leaving that many dirty buffers in the kernel page cache
will bring back exactly the type of stalls (where the kernel flushes all
the pending dirty data in a short amount of time) we're trying to avoid
with the forced flushing. So doing flushes in a large patches is
something we really fundamentally do *not* want!

> which it registers in shared memory so that later checkpoint
> can perform corresponding fsync's, now when the request queue
> becomes threshhold size (let us say 1/3rd) full, then we can perform
> sorting and merging and issue flush hints.

Which means that a significant portion of the writes won't be able to be
collapsed, since only a random 1/3 of the buffers is sorted together.


> Basically, I think this can lead to lesser merging of neighbouring
> writes, but might not hurt if sync_file_range() API is cheap.

The cost of writing out data doess correspond heavily with the number of
random writes - which is what you get if you reduce the number of
neighbouring writes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Vladimir Sitnikov
Here's the simplified testcase:
https://gist.github.com/vlsi/df08cbef370b2e86a5c1

It reproduces the problem in both 9.4.4 and 9.5rc1.
It is reproducible via both psql and pgjdbc.

I use a single table, however my production case includes a join of
two tables and the query is like
select ... from foo, bar where foo.skewed=? and bar.non_skewed=? and
foo.bar_id=bar.id

Note: my application _always_ sends *the same* *bad* value for skewed
column (it effectively is used as a filtering column in the particular
query).
Unfortunately, on 6th execution backend switches to the plan that uses
skewed index access.

Is it something that can be fixed/improved?

Good plan (the first 5 executions):
Index Scan using non_skewed__flipper on plan_flipper
(cost=0.43..42.77 rows=10 width=113) (actual time=0.030..0.072 rows=10
loops=1)
  Index Cond: (non_skewed = 42)
  Filter: (skewed = 0)
  Rows Removed by Filter: 10
  Buffers: shared hit=20 read=3
Execution time: 0.094 ms

Bad plan (all the subsequent executions):
Index Scan using skewed__flipper on plan_flipper  (cost=0.43..6.77
rows=1 width=113) (actual time=0.067..355.867 rows=10 loops=1)
  Index Cond: (skewed = $1)
  Filter: (non_skewed = $2)
  Rows Removed by Filter: 90
  Buffers: shared hit=18182 read=2735
Execution time: 355.901 ms


Vladimir


plan_flipper.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Andres Freund
On 2016-01-12 13:54:21 +0100, Fabien COELHO wrote:
> >I measured it in a different number of cases, both on SSDs
> >and spinning rust.
> 
> Argh! This is a key point: the sort/flush is designed to help HDDs, and
> would have limited effect on SSDs, and it seems that you are showing that
> the effect is in fact negative on SSDs, too bad:-(

As you quoted, I could reproduce the slowdown both with SSDs *and* with
rotating disks.

> On SSDs, the linux IO scheduler works quite well, so this is a place where I
> would consider simply disactivating flushing and/or sorting.

Not my experience. In different scenarios, primarily with a large
shared_buffers fitting the whole hot working set, the patch
significantly improves performance.

> >postgres-ckpt14 \
> >   -D /srv/temp/pgdev-dev-800/ \
> >   -c maintenance_work_mem=2GB \
> >   -c fsync=on \
> >   -c synchronous_commit=off \
> 
> I'm not sure I like this one. I guess the intention is to focus on
> checkpointer writes and reduce the impact of WAL writes. Why not.

Now sure what you mean? s_c = off is *very* frequent in the field.

> >My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
> >master:
> >scaling factor: 800
> 
> The DB is probably about 12GB, so it fits in memory in the end, meaning that
> there should be only write activity after some time? So this is not really
> the case where it does not fit in memory, but it is large enough to get
> mostly random IOs both in read & write, so why not.

Doesn't really fit into ram - shared buffers uses some space (which will
be double buffered) and the xlog will use some more.

> >ckpt-14 (flushing by backends disabled):
> 
> Is this comment refering to "synchronous_commit = off"?
> I guess this is the same on master above, even if not written?

No, what I mean by that is that I didn't active flushing writes in
backends - something I found hugely effective in reducing jitter in a
number of workloads, but doesn't help throughput.

> >As you can see there's roughly a 30% performance regression on the
> >slower SSD and a ~9% on the faster one. HDD results are similar (but I
> >can't repeat on the laptop right now since the 2nd hdd is now an SSD).
> 
> Ok, that is what I would have expected, the larger the database, the smaller
> the impact of sorting & flushin on SSDs.

Again: "HDD results are similar". I primarily tested on a 4 disk raid10
of 4 disks, and a raid0 of 20 disks.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Amit Kapila
On Tue, Jan 12, 2016 at 5:52 PM, Andres Freund  wrote:
>
> On 2016-01-12 17:50:36 +0530, Amit Kapila wrote:
> > On Tue, Jan 12, 2016 at 12:57 AM, Andres Freund 
wrote:>
> > >
> > > My theory is that this happens due to the sorting: pgbench is an
update
> > > heavy workload, the first few pages are always going to be used if
> > > there's free space as freespacemap.c essentially prefers those. Due to
> > > the sorting all a relation's early pages are going to be in "in a
row".
> > >
> >
> > Not sure, what is best way to tackle this problem, but I think one way
could
> > be to perform sorting at flush requests level rather than before writing
> > to OS buffers.
>
> I'm not following. If you just sort a couple hundred more or less random
> buffers - which is what you get if you look in buf_id order through
> shared_buffers - the likelihood of actually finding neighbouring writes
> is pretty low.
>

Why can't we do it at larger intervals (relative to total amount of writes)?
To explain, what I have in mind, let us assume that checkpoint interval
is longer (10 mins) and in the mean time all the writes are being done
by bgwriter which it registers in shared memory so that later checkpoint
can perform corresponding fsync's, now when the request queue
becomes threshhold size (let us say 1/3rd) full, then we can perform
sorting and merging and issue flush hints.  Checkpointer task can
also follow somewhat similar technique which means that once it
has written 1/3rd or so of buffers (which we need to track), it can
perform flush hints after sort+merge.  Now, I think we can also
do it in checkpointer alone rather than in bgwriter and checkpointer.
Basically, I think this can lead to lesser merging of neighbouring
writes, but might not hurt if sync_file_range() API is cheap.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Jeff Janes
On Tue, Jan 12, 2016 at 8:59 AM, Anastasia Lubennikova
 wrote:
> 08.01.2016 00:12, David Rowley:
>
> On 7 January 2016 at 06:36, Jeff Janes  wrote:
>>

> But now I see the reason to create non-unique index with included columns -
> lack of suitable opclass on column "b".
> It's impossible to add it into the index as a key column, but that's not a
> problem with INCLUDING clause.
> Look at example.
>
> create table t1 (a int, b box);
> create index t1_a_inc_b_idx on t1 (a) including (b);
> create index on tbl (a,b);
> ERROR:  data type box has no default operator class for access method
> "btree"
> HINT:  You must specify an operator class for the index or define a default
> operator class for the data type.
> create index on tbl (a) including (b);
> CREATE INDEX
>
> This functionality is provided by the attached patch "omit_opclass_4.0",
> which must be applied over covering_unique_4.0.patch.

Thanks for the updates.

Why is omit_opclass a separate patch?  If the included columns now
never participate in the index ordering, shouldn't it be an inherent
property of the main patch that you can "cover" things without btree
opclasses?

Are you keeping them separate just to make review easier?  Or do you
think there might be a reason to commit one but not the other?  I
think that if we decide not to use the omit_opclass patch, then we
should also not allow covering columns to be specified on non-unique
indexes.

It looks like the "covering" patch, with or without the "omit_opclass"
patch, does not support expressions as included columns:

create table foobar (x text, y xml);
create index on foobar (x) including  (md5(x));
ERROR:  unrecognized node type: 904
create index on foobar (x) including  ((y::text));
ERROR:  unrecognized node type: 911

I think we would probably want it to work with those (or at least to
throw a better error message).

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-12 Thread Alvaro Herrera
Michael Paquier wrote:
> On Fri, Jan 8, 2016 at 1:53 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello,
> >
> > At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier 
> >  wrote in 
> > 

Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Alvaro Herrera
Michael Paquier wrote:
> On Tue, Jan 12, 2016 at 5:21 PM, Simon Riggs  wrote:
> > Should we just move the code somewhere just to imply it is generic? Seems
> > pointless refactoring to me.
> 
> Er, why not xlogutils.c? Having the 2PC code depending directly on
> something that is within logicalfuncs.c is weird.

Yes, I agree with Michael -- it's better to place code in its logical
location than keep it somewhere else just because historically it was
there.  That way, future coders can find the function more easily.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Andres Freund
Hi,

On 2016-01-11 19:39:14 +, Simon Riggs wrote:
> Currently, the patch reuses all of the code related to reading/write state
> files, so it is the minimal patch that can implement the important things
> for performance. The current patch succeeds in its goal to improve
> performance, so I personally see no further need for code churn.

Sorry, I don't buy that argument. This approach leaves us with a bunch
of code related to statefiles that's barely ever going to be exercised,
and leaves the performance bottleneck on WAL replay in place.

> As you suggest, we could also completely redesign the state file mechanism
> and/or put it in WAL at checkpoint. That's all very nice but is much more
> code and doesn't anything more for performance, since the current mainline
> path writes ZERO files at checkpoint.

Well, on the primary, yes.

> If you want that for some other reason or refactoring, I won't stop
> you, but its a separate patch for a separate purpose.

Maintainability/complexity very much has to be considered during review
and can't just be argued away with "but this is what we implemented".


> - *   In order to survive crashes and shutdowns, all prepared
> - *   transactions must be stored in permanent storage. This includes
> - *   locking information, pending notifications etc. All that state
> - *   information is written to the per-transaction state file in
> - *   the pg_twophase directory.
> + *   Information to recover prepared transactions in case of crash is
> + *   now stored in WAL for the common case. In some cases there will 
> be
> + *   an extended period between preparing a GXACT and commit/abort, 
> in

Absolutely minor: The previous lines were differently indented (two tabs
before, one space + two tabs now), which will probably mean pgindent
will yank all of it around, besides looking confusing with different tab
settings.


> * * In case of crash replay will move data from xlog to files, if 
> that
> *   hasn't happened before. XXX TODO - move to shmem in replay 
> also

This is a bit confusing - causing my earlier confusion about using
XlogReadTwoPhaseData in recovery - because what this actually means is
that we get the data from normal WAL replay, not our new way of getting
things from the WAL.


> @@ -772,7 +769,7 @@ TwoPhaseGetGXact(TransactionId xid)
>* During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be 
> called
>* repeatedly for the same XID.  We can save work with a simple cache.
>*/
> - if (xid == cached_xid)
> + if (xid == cached_xid && cached_gxact)
>   return cached_gxact;

What's that about? When can cached_xid be be equal xid and cached_gxact
not set? And why did that change with this patch?


> /*
>  * Finish preparing state file.
>  *
>  * Calculates CRC and writes state file to WAL and in pg_twophase directory.
>  */
> void
> EndPrepare(GlobalTransaction gxact)

In contrast to that comment we're not writing to pg_twophase anymore.


>   /*
>* If the file size exceeds MaxAllocSize, we won't be able to read it in
>* ReadTwoPhaseFile. Check for that now, rather than fail at commit 
> time.
>*/
>   if (hdr->total_len > MaxAllocSize)
>   ereport(ERROR,
>   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>errmsg("two-phase state file maximum length 
> exceeded")));
> 

Outdated comment.


> +/*
> + * Reads 2PC data from xlog. During checkpoint this data will be moved to
> + * twophase files and ReadTwoPhaseFile should be used instead.
> + */
> +static void
> +XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
> +{
> + XLogRecord *record;
> + XLogReaderState *xlogreader;
> + char   *errormsg;
> +
> + xlogreader = XLogReaderAllocate(_read_local_xlog_page, NULL);
> + if (!xlogreader)
> + ereport(ERROR,
> + (errcode(ERRCODE_OUT_OF_MEMORY),
> +  errmsg("out of memory"),
> +  errdetail("Failed while allocating an XLog 
> reading processor.")));

Creating and deleting an xlogreader for every 2pc transaction isn't
particularly efficient. Reading the 2pc state from WAL will often also
mean hitting disk if there's significant WAL volume (we even hint that
we want the cache to be throw away for low wal_level!).


If we really go this way, we really need a) a comment here explaining
why timelines are never an issue b) an assert, preventing to be called
during recovery.


> + record = XLogReadRecord(xlogreader, lsn, );
> + if (record == NULL ||
> + XLogRecGetRmid(xlogreader) != RM_XACT_ID ||
> + (XLogRecGetInfo(xlogreader) & XLOG_XACT_OPMASK) != 
> XLOG_XACT_PREPARE)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +   

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2016-01-12 Thread Adam Brightwell
> Pushed, with one cosmetic change (update comment on formrdesc).  I also
> bumped the catversion, but didn't verify whether this is critical.

Excellent! Thanks!

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Rework access method interface

2016-01-12 Thread Tom Lane
Alvaro Herrera  writes:
> I just noticed that this patch had remained in Waiting-for-author status
> after Alexander had already submitted the updated version of the patch.
> I marked it as ready-for-committer, because Petr stated so in the
> thread.

> Tom, you're registered as committer for this patch.  Do you have time in
> the near future to review this patch?

Yeah, I will look at it soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2016-01-12 Thread Adam Brightwell
> So this looks like a bugfix that we should backpatch, but on closer
> inspection it turns out that we need the rowtype OID to be fixed, which
> it isn't unless this:
>
>> -CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
>> +CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_ROWTYPE_OID(4066) 
>> BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
>
> so I'm afraid this cannot be backpatched at all; if we did, the rowtype
> wouldn't match for already-initdb'd installations.
>
> I'm gonna push this to master only, which means you won't be able to
> rely on pg_shseclabel until 9.6.

I thinks that's fair.

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dumps event triggers and transforms unconditionally

2016-01-12 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> AFAICT event triggers don't belong to any particular schema, so I'd
>> propose that they be dumped only when include_everything is true,
>> which is the usual rule for objects that don't belong to any schema.
>> 
>> Transforms have got the same issue, which means that the dump of a
>> database containing, eg, hstore_plperl is outright broken: it will
>> dump both the extension *and* a CREATE TRANSFORM command.  Again, they
>> ought to obey the default rules for schema-less objects, which in
>> this case is "dump if include_everything and not an extension member".

> Sounds reasonable.  (I assume this last detailed rule is what applies to
> both object types.  Surely event triggers can be part of an extension
> too.)

Don't know offhand if they can or not, but the code will do the right
thing if so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2016-01-12 Thread Julien Rouhaud
On 01/12/2015 12:07, David Rowley wrote:
> On 1 December 2015 at 17:09, Marko Tiikkaja  > wrote:
> 
> On 2015-12-01 05:00, David Rowley wrote:
> 
> We already allow a SELECT's target list to contain
> non-aggregated columns
> in a GROUP BY query in cases where the non-aggregated column is
> functionally dependent on the GROUP BY clause.
> 
> For example a query such as;
> 
> SELECT p.product_id,p.description, SUM(s.quantity)
> FROM product p
> INNER JOIN sale s ON p.product_id = s.product_id
> GROUP BY p.product_id;
> 
> is perfectly fine in PostgreSQL, as p.description is
> functionally dependent
> on p.product_id (assuming product_id is the PRIMARY KEY of product).
> 
> 
> This has come up before (on other forums, at least), and my main
> concern has been that unlike the case where we go from throwing an
> error to allowing a query, this has a chance to make the planning of
> currently legal queries slower.  Have you tried to measure the
> impact of this on queries where there's no runtime gains to be had?
> 
> 
> I've performed a series of benchmarks on the following queries:
> 
> Test1: explain select id1,id2 from t1 group by id1,id2;
> Test2: explain select id from t2 group by id;
> Test3: explain select t1.id1,t1.id2 from t2 inner join t1 on
> t1.id1=t2.id  group by t1.id1,t1.id2;
>  
> I ran each of these with pgbench for 60 seconds, 3 runs per query. In
> each case below I've converted the TPS into seconds using the average
> TPS over the 3 runs.
> 
> In summary:
> 
> Test1 is the worst case test. It's a very simple query so planning
> overhead of join searching is non-existent. The fact that there's 2
> columns in the GROUP BY means that the fast path cannot be used. I added
> this as if there's only 1 column in the GROUP BY then there's no point
> in searching for something to remove.
> 
> [...]
> 
> It seems that the worst case test adds about 7.6 microseconds onto
> planning time. To get this worse case result I had to add two GROUP BY
> columns, as having only 1 triggers a fast path as the code knows it
> can't remove any columns, since there's only 1. A similar fast path also
> exists which will only lookup the PRIMARY KEY details if there's more
> than 1 column per relation in the GROUP BY, so for example GROUP BY
> rel1.col1, rel2.col1 won't lookup any PRIMARY KEY constraint.
> 
> Given that the extra code really only does anything if the GROUP BY has
> 2 or more expressions, are you worried that this will affect too many
> short and fast to execute queries negatively?
> 
> 

In identify_key_vars()

+   /* Only PK constraints are of interest for now, see comment above */
+   if (con->contype != CONSTRAINT_PRIMARY)
+   continue;

I think the comment should better refer to
remove_useless_groupby_columns() (don't optimize further than what
check_functional_grouping() does).


+   /*
+* Exit if the constraint is deferrable, there's no point in looking
+* for another PK constriant, as there can only be one.
+*/

small typo on "constriant"



+   {
+   varlistmatches = bms_add_member(varlistmatches,
varlistidx);
+   found_col = true;
+   /* don't break, there might be dupicates */
+   }

small typo on "dupicates". Also, I thought transformGroupClauseExpr()
called in the parse analysis make sure that there isn't any duplicate in
the GROUP BY clause. Do I miss something?


in remove_useless_groupby_columns()

+   /*
+* Skip if there were no Vars found in the GROUP BY clause which
belong
+* to this relation. We can also skip if there's only 1 Var, as
there
+* needs to be more than one Var for there to be any columns
which are
+* functionally dependent on another column.
+*/


This comment doesn't sound very clear. I'm not a native english speaker,
so maybe it's just me.

+   /*
+* If we found any surplus Vars in the GROUP BY clause, then we'll build
+* a new GROUP BY clause without these surplus Vars.
+*/
+   if (anysurplus)
+   {
+   List *new_groupby = NIL;
+
+   foreach (lc, root->parse->groupClause)
+   {
+   SortGroupClause *sgc = (SortGroupClause *) lfirst(lc);
+   TargetEntry *tle;
+   Var *var;
+
+   tle = get_sortgroupclause_tle(sgc, root->parse->targetList);
+   var = (Var *) tle->expr;
+
+   if (!IsA(var, Var))
+   continue;
+ [...]

if var isn't a Var, it needs to be added in new_groupby.


+   /* XXX do we to alter tleSortGroupRef to remove gaps? */

no idea on that :/

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,
thanks a lot for your reply. Unfortunately I've found out most it didn't
really start DROP TABLE yet and it's locked on autovacuum running for
the table and even if I kill the process it's autostarting again and again.

Is there any way how to do the DROP TABLE and bypass/disable autovacuum
entirely? Please note the "autovacuum = off" is set in the config file
(postgresql.conf).

Thanks a lot,
Michal

On 01/12/2016 12:05 PM, Andres Freund wrote:
> Hi Michal,
> 
> This isn't really a question for -hackers, the list for postgres
> development. -general or -performance would have been more appropriate.
> 
> On 2016-01-12 11:57:05 +0100, Michal Novotny wrote:
>> I've discovered an issue with dropping a large table (~5T). I was
>> thinking drop table is fast operation however I found out my assumption
>> was wrong.
> 
> What exactly did you do, and how long did it take. Is there any chance
> you were actually waiting for the lock on that large table, instead of
> waiting for the actual execution?
> 
>> Is there any way how to tune it to drop a large table in the matter of
>> seconds or minutes? Any configuration variable in the postgresql.conf or
>> any tune up options available?
> 
> The time for dropping a table primarily is spent on three things:
> 1) acquiring the exclusive lock. How long this takes entirely depends on
>the concurrent activity. If there's a longrunning session using that
>table it'll take till that session is finished.
> 2) The cached portion of that table needs to be eviced from cache. How
>long that takes depends on the size of shared_buffers - but usually
>this is a relatively short amount of time, and only matters if you
>drop many, many relations.
> 3) The time the filesystem takes to actually remove the, in your case
>5000 1GB, files. This will take a while, but shouldn't be minutes.
> 
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Pavel Stehule
2016-01-12 12:22 GMT+01:00 Marko Tiikkaja :

> On 12/01/16 12:17, Pavel Stehule wrote:
>
>> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
>>
>> Hi Pavel,
>>> thanks for the information. I've been doing more investigation of this
>>> issue and there's autovacuum running on the table however it's
>>> automatically starting even if there is "autovacuum = off" in the
>>> postgresql.conf configuration file.
>>>
>>>
>> Real autovacuum is automatically cancelled. It looks like VACUUM started
>> by
>> cron, maybe?
>>
>
> Not if it's to prevent wraparound, which isn't unlikely if autovacuum=off.
>

I didn't know it.

Thank you

Pavel


>
>
> .m
>


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-01-12 Thread Etsuro Fujita

On 2016/01/07 21:45, Etsuro Fujita wrote:

On 2016/01/06 18:58, Rushabh Lathia wrote:

.) Documentation for the new API is missing (fdw-callbacks).



Will add the docs.


I added docs for new FDW APIs.

Other changes:

* Rename relation_has_row_level_triggers to relation_has_row_triggers 
shortly, and move it to rewriteHandler.c.  I'm not sure rewriteHandler.c 
is a good place for that, though.


* Revise code, including a helper function get_result_result, whcih I 
implemented using a modified version of store_returning_result in the 
previous patch, but on second thought, I think that that is a bit too 
invasive.  So, I re-implemented that function directly using 
make_tuple_from_result_row.


* Add more comments.

* Add more regression tests.

Attached is an updated version of the patch.  Comments are wellcome! 
(If the fix [1] is okay, I'd like to update this patch on top of the 
patch in [1].)


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/568f4430.6060...@lab.ntt.co.jp
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 816,822  deparseTargetList(StringInfo buf,
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
--- 816,822 
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.  Caller is responsible for initializing it to empty.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
***
*** 833,841  appendWhereClause(StringInfo buf,
  	int			nestlevel;
  	ListCell   *lc;
  
- 	if (params)
- 		*params = NIL;			/* initialize result list to empty */
- 
  	/* Set up context struct for recursion */
  	context.root = root;
  	context.foreignrel = baserel;
--- 833,838 
***
*** 971,976  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 968,1030 
  }
  
  /*
+  * deparse remote UPDATE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparsePushedDownUpdateSql(StringInfo buf, PlannerInfo *root,
+ 		   Index rtindex, Relation rel,
+ 		   List	*targetlist,
+ 		   List *targetAttrs,
+ 		   List	*remote_conds,
+ 		   List **params_list,
+ 		   List *returningList,
+ 		   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
+ 	deparse_expr_cxt context;
+ 	bool		first;
+ 	ListCell   *lc;
+ 
+ 	if (params_list)
+ 		*params_list = NIL;		/* initialize result list to empty */
+ 
+ 	/* Set up context struct for recursion */
+ 	context.root = root;
+ 	context.foreignrel = baserel;
+ 	context.buf = buf;
+ 	context.params_list = params_list;
+ 
+ 	appendStringInfoString(buf, "UPDATE ");
+ 	deparseRelation(buf, rel);
+ 	appendStringInfoString(buf, " SET ");
+ 
+ 	first = true;
+ 	foreach(lc, targetAttrs)
+ 	{
+ 		int			attnum = lfirst_int(lc);
+ 		TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+ 
+ 		if (!first)
+ 			appendStringInfoString(buf, ", ");
+ 		first = false;
+ 
+ 		deparseColumnRef(buf, rtindex, attnum, root);
+ 		appendStringInfoString(buf, " = ");
+ 		deparseExpr((Expr *) tle->expr, );
+ 	}
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	deparseReturningList(buf, root, rtindex, rel, false,
+ 		 returningList, retrieved_attrs);
+ }
+ 
+ /*
   * deparse remote DELETE statement
   *
   * The statement text is appended to buf, and we also create an integer List
***
*** 993,998  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 1047,1082 
  }
  
  /*
+  * deparse remote DELETE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparsePushedDownDeleteSql(StringInfo buf, PlannerInfo *root,
+ 		   Index rtindex, Relation rel,
+ 		   List	*remote_conds,
+ 		   List	**params_list,
+ 		   List *returningList,
+ 		   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
+ 
+ 	if (params_list)
+ 		*params_list = NIL;		/* initialize result list to empty */
+ 
+ 	appendStringInfoString(buf, "DELETE FROM ");
+ 	deparseRelation(buf, rel);
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-12 Thread Andres Freund
On 2016-01-11 21:16:42 -0300, Alvaro Herrera wrote:
> Andres, Robert, are you still reviewing this patch?

I don't think I've signed to - or actually did - reviewed this path.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Marko Tiikkaja

On 12/01/16 13:00, Dave Cramer wrote:

We have an interesting problem, and the reporter has been kind enough to
provide logs for which we can't explain.

I'd be interested to hear any plausible explanations for a prepared plan
suddenly going from 2ms to 60ms for the same input values ?


This is a new feature in 9.2, where on the fifth (or sixth, not sure) 
execution the planner might choose to use a generic plan.  From the 9.2 
release notes (though I'm fairly certain this is documented somewhere in 
the manual as well):


In the past, a prepared statement always had a single "generic" plan 
that was used for all parameter values, which was frequently much 
inferior to the plans used for non-prepared statements containing 
explicit constant values. Now, the planner attempts to generate custom 
plans for specific parameter values. A generic plan will only be used 
after custom plans have repeatedly proven to provide no benefit. This 
change should eliminate the performance penalties formerly seen from use 
of prepared statements (including non-dynamic statements in PL/pgSQL).



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-01-12 Thread Michael Paquier
On Tue, Jan 12, 2016 at 5:52 PM, Fabien COELHO  wrote:
>> 2) When precising a value between 0 and 2, pgbench will loop
>> infinitely. For example:
>> \set cid debug(random_gaussian(-100, 100, 0))
>> In both cases we just lack sanity checks with PGBENCH_RANDOM,
>> PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
>> those checks would be better if moved into getExponentialRand & co
>> with the assertions removed. I would personally have those functions
>> return a boolean status and have the result in a pointer as a function
>> argument.
>
>
> ISTM that if pgbench is to be stopped, the simplest option is just to abort
> with a nicer error message from the get*Rand function, there is no need to
> change the function signature and transfer the error management upwards.

That's fine to me, as long as the solution is elegant.

>> I am attaching a patch where I tweaked a bit the docs and added some
>> comments for clarity. Patch is switched to "waiting on author" for the
>> time being.
>
> Ok. I'm hesitating about removing the operator management, especially if I'm
> told to put it back afterwards.

I can understand that, things like that happen all the time here and
that's not a straight-forward patch that we have here. I am sure that
additional opinions here would be good to have before taking one
decision or another. With the current statu-quo, let's just do what
you think is best.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Michael Paquier
On Tue, Jan 12, 2016 at 5:26 PM, Simon Riggs  wrote:
> On 12 January 2016 at 06:41, Michael Paquier 
> wrote:
>>
>>
>> +   if (log_checkpoints && n > 0)
>> +   ereport(LOG,
>> +   (errmsg("%u two-phase state files were
>> written "
>> +   "for long-running
>> prepared transactions",
>> +   n)));
>> This would be better as an independent change. That looks useful for
>> debugging, and I guess that's why you added it.
>
>
> The typical case is that no LOG message would be written at all, since that
> only happens minutes after a prepared transaction is created and then not
> terminated. Restarting a transaction manager likely won't take that long, so
> it implies a crash or emergency shutdown of the transaction manager.

Thanks for the detailed explanation.

> I think it is sensible and useful to be notified of this as a condition the
> operator would wish to know about. The message doesn't recur every
> checkpoint, it occurs only once at the point the files are created, so its
> not log spam either.

Well, I am not saying that this is bad, quite the contrary actually.
It is just that this seems unrelated to this patch and would still be
useful even now with CheckPointTwoPhase.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
On 2016-01-12 12:17:09 +0100, Pavel Stehule wrote:
> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
> 
> > Hi Pavel,
> > thanks for the information. I've been doing more investigation of this
> > issue and there's autovacuum running on the table however it's
> > automatically starting even if there is "autovacuum = off" in the
> > postgresql.conf configuration file.
> >
> 
> Real autovacuum is automatically cancelled. It looks like VACUUM started by
> cron, maybe?

Unless it's an anti-wraparound autovacuum...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

2016-01-12 Thread Thom Brown
On 8 January 2016 at 05:08, Etsuro Fujita  wrote:
> On 2016/01/07 21:50, Etsuro Fujita wrote:
>>
>> On 2016/01/06 20:37, Thom Brown wrote:
>>>
>>> On 25 December 2015 at 10:00, Etsuro Fujita
>>>  wrote:

 Attached is an updated version of the patch, which is
 still WIP, but I'd be happy if I could get any feedback.
>
>
>>> I've run into an issue:
>>>
>>> *# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
>>> tableoid::regclass;
>>> ERROR:
>>> CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
>>> WHERE ((id = 16)) RETURNING NULL
>
>
>> Will fix.
>
>
> While working on this, I noticed that the existing postgres_fdw system shows
> similar behavior, so I changed the subject.
>
> IIUC, the reason for that is when the local query specifies "RETURNING
> tableoid::regclass", the FDW has fmstate->has_returning=false while the
> remote query executed at ModifyTable has "RETURNING NULL", as shown in the
> above example; that would cause an abnormal exit in executing the remote
> query in postgresExecForeignUpdate, since that the FDW would get
> PGRES_TUPLES_OK as a result of the query while the FDW would think that the
> right result to get should be PGRES_COMMAND_OK, from the flag
> fmstate->has_returning=false.
>
> Attached is a patch to fix that.

I can't apply this patch in tandem with FDW DML pushdown patch (either
v2 or v3).

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
Hi,

On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
> thanks a lot for your reply. Unfortunately I've found out most it didn't
> really start DROP TABLE yet and it's locked on autovacuum running for
> the table and even if I kill the process it's autostarting again and again.

Start the DROP TABLE and *then* cancel the autovacuum session. That
should work.

> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
> entirely? Please note the "autovacuum = off" is set in the config file
> (postgresql.conf).

That actually is likely to have caused the problem. Every
autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
data can't be interpreted correctly anymore at some point. That's called
'anti-wraparound vacuum". It's started even if you disabled autovacuum,
to prevent database corruption.

If you disable autovacuum, you really should start vacuums in some other
way.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

2016-01-12 Thread Etsuro Fujita

On 2016/01/12 20:36, Thom Brown wrote:

On 8 January 2016 at 05:08, Etsuro Fujita  wrote:



On 2016/01/06 20:37, Thom Brown wrote:

I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL



While working on this, I noticed that the existing postgres_fdw system shows
similar behavior, so I changed the subject.

IIUC, the reason for that is when the local query specifies "RETURNING
tableoid::regclass", the FDW has fmstate->has_returning=false while the
remote query executed at ModifyTable has "RETURNING NULL", as shown in the
above example; that would cause an abnormal exit in executing the remote
query in postgresExecForeignUpdate, since that the FDW would get
PGRES_TUPLES_OK as a result of the query while the FDW would think that the
right result to get should be PGRES_COMMAND_OK, from the flag
fmstate->has_returning=false.



Attached is a patch to fix that.



I can't apply this patch in tandem with FDW DML pushdown patch (either
v2 or v3).


That patch is for fixing the similar issue in the existing postgres_fdw 
system.  So, please apply that patch without the DML pushdown patch.  If 
that patch is reasonable as a fix for the issue, I'll update the DML 
pushdown patch (v3) on top of that patch.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Hi Andres,

On 01/12/2016 12:37 PM, Andres Freund wrote:
> Hi,
> 
> On 2016-01-12 12:17:01 +0100, Michal Novotny wrote:
>> thanks a lot for your reply. Unfortunately I've found out most it didn't
>> really start DROP TABLE yet and it's locked on autovacuum running for
>> the table and even if I kill the process it's autostarting again and again.
> 
> Start the DROP TABLE and *then* cancel the autovacuum session. That
> should work.


By cancelling the autovacuum session you mean to run
pg_cancel_backend(pid int) *after* running DROP TABLE ?


> 
>> Is there any way how to do the DROP TABLE and bypass/disable autovacuum
>> entirely? Please note the "autovacuum = off" is set in the config file
>> (postgresql.conf).


So should I set autovacuum to enable (on) and restart pgsql before doing
DROP TABLE (and pg_cancel_backend() as mentioned above)?


> 
> That actually is likely to have caused the problem. Every
> autovacuum_freeze_max_age tables need to be vacuumed - otherwise the
> data can't be interpreted correctly anymore at some point. That's called
> 'anti-wraparound vacuum". It's started even if you disabled autovacuum,
> to prevent database corruption.

Ok, any recommendation how to set autovacuum_freeze_max_age?

Thanks,
Michal


> 
> If you disable autovacuum, you really should start vacuums in some other
> way.
> 
> Greetings,
> 
> Andres Freund
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Fabien COELHO


Hello Andres,

Thanks for the details. Many comments and some questions below.


Also, maybe you could answer a question I had about the performance
regression you observed, I could not find the post where you gave the
detailed information about it, so that I could try reproducing it: what are
the exact settings and conditions (shared_buffers, pgbench scaling, host
memory, ...), what is the observed regression (tps? other?), and what is the
responsiveness of the database under the regression (eg % of seconds with 0
tps for instance, or something like that).


I measured it in a different number of cases, both on SSDs
and spinning rust.


Argh! This is a key point: the sort/flush is designed to help HDDs, and 
would have limited effect on SSDs, and it seems that you are showing that 
the effect is in fact negative on SSDs, too bad:-(


The bad news is that I do not have a host with a SSD available for 
reproducing such results.


On SSDs, the linux IO scheduler works quite well, so this is a place where 
I would consider simply disactivating flushing and/or sorting.


ISTM that I would rather update the documentation to "do not activate on 
SSD" than try to find a miraculous solution which may or may not exist. 
Basically I would use your results to give better advises in the 
documentation, not as a motivation to rewrite the patch from scratch.



postgres-ckpt14 \
   -D /srv/temp/pgdev-dev-800/ \
   -c maintenance_work_mem=2GB \
   -c fsync=on \
   -c synchronous_commit=off \


I'm not sure I like this one. I guess the intention is to focus on 
checkpointer writes and reduce the impact of WAL writes. Why not.



   -c shared_buffers=2GB \
   -c wal_level=hot_standby \
   -c max_wal_senders=10 \
   -c max_wal_size=100GB \
   -c checkpoint_timeout=30s


That is a very short one, but the point is to exercise the checkpoint, so 
why not.



My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
master:
scaling factor: 800


The DB is probably about 12GB, so it fits in memory in the end, meaning 
that there should be only write activity after some time? So this is not 
really the case where it does not fit in memory, but it is large enough to 
get mostly random IOs both in read & write, so why not.



query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 1155733


Assuming one buffer accessed per transaction on average, and considering a 
uniform random distribution, this means about 50% of pages actually loaded 
in memory at the end of the run (1 - e(-1155766/800*2048)) (with 2048 
pages per scale unit).



latency average: 4.151 ms
latency stddev: 8.712 ms
tps = 3851.242965 (including connections establishing)
tps = 3851.725856 (excluding connections establishing)



ckpt-14 (flushing by backends disabled):


Is this comment refering to "synchronous_commit = off"?
I guess this is the same on master above, even if not written?

[...] In neither case there are periods of 0 tps, but both have times of 
1000 tps with noticeably increased latency.


Ok, but we are talking SSDs, things are not too bad, even if there are ups 
and downs.



The endresults are similar with a sane checkpoint timeout - the tests
just take much longer to give meaningful results. Constantly running
long tests on prosumer level SSDs isn't nice - I've now killed 5 SSDs
with postgres testing...


Indeed. It wears out and costs, too bad:-(


As you can see there's roughly a 30% performance regression on the
slower SSD and a ~9% on the faster one. HDD results are similar (but I
can't repeat on the laptop right now since the 2nd hdd is now an SSD).


Ok, that is what I would have expected, the larger the database, the 
smaller the impact of sorting & flushin on SSDs. Now I would have hoped 
that flushing would help get a more constant load even in this case, at 
least this is what I measured in my tests. The closest to your setting 
test I ran is scale=660, and the sort/flush got 400 tps vs 100 tps 
without, with 30 minutes checkpoints, but HDDs do not compare to SSDs...


My overall comments about this SSD regression is that the patch is really 
designed to make a difference for HDDs, so to advise not activate on SSDs 
if there is a regression in such a case.


Now this is a little disappointing as on paper sorted writes should also 
be slightly better on SSDs, but if the bench says the contrary, I have to 
believe the bench:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Andres Freund
Hi Michal,

This isn't really a question for -hackers, the list for postgres
development. -general or -performance would have been more appropriate.

On 2016-01-12 11:57:05 +0100, Michal Novotny wrote:
> I've discovered an issue with dropping a large table (~5T). I was
> thinking drop table is fast operation however I found out my assumption
> was wrong.

What exactly did you do, and how long did it take. Is there any chance
you were actually waiting for the lock on that large table, instead of
waiting for the actual execution?

> Is there any way how to tune it to drop a large table in the matter of
> seconds or minutes? Any configuration variable in the postgresql.conf or
> any tune up options available?

The time for dropping a table primarily is spent on three things:
1) acquiring the exclusive lock. How long this takes entirely depends on
   the concurrent activity. If there's a longrunning session using that
   table it'll take till that session is finished.
2) The cached portion of that table needs to be eviced from cache. How
   long that takes depends on the size of shared_buffers - but usually
   this is a relatively short amount of time, and only matters if you
   drop many, many relations.
3) The time the filesystem takes to actually remove the, in your case
   5000 1GB, files. This will take a while, but shouldn't be minutes.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny

On 01/12/2016 12:20 PM, Andres Freund wrote:
> On 2016-01-12 12:17:09 +0100, Pavel Stehule wrote:
>> 2016-01-12 12:14 GMT+01:00 Michal Novotny :
>>
>>> Hi Pavel,
>>> thanks for the information. I've been doing more investigation of this
>>> issue and there's autovacuum running on the table however it's
>>> automatically starting even if there is "autovacuum = off" in the
>>> postgresql.conf configuration file.
>>>
>>
>> Real autovacuum is automatically cancelled. It looks like VACUUM started by
>> cron, maybe?
> 
> Unless it's an anti-wraparound autovacuum...
> 
> Andres
> 

Autovacuum is not started by CRON. How should I understand the
"anti-wraparound autovacuum" ?

Thanks,
Michal


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2016-01-12 Thread Andres Freund
On 2016-01-12 17:50:36 +0530, Amit Kapila wrote:
> On Tue, Jan 12, 2016 at 12:57 AM, Andres Freund  wrote:>
> >
> > My theory is that this happens due to the sorting: pgbench is an update
> > heavy workload, the first few pages are always going to be used if
> > there's free space as freespacemap.c essentially prefers those. Due to
> > the sorting all a relation's early pages are going to be in "in a row".
> >
> 
> Not sure, what is best way to tackle this problem, but I think one way could
> be to perform sorting at flush requests level rather than before writing
> to OS buffers.

I'm not following. If you just sort a couple hundred more or less random
buffers - which is what you get if you look in buf_id order through
shared_buffers - the likelihood of actually finding neighbouring writes
is pretty low.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about DROP TABLE

2016-01-12 Thread Michal Novotny
Dear PostgreSQL Hackers,
I've discovered an issue with dropping a large table (~5T). I was
thinking drop table is fast operation however I found out my assumption
was wrong.

Is there any way how to tune it to drop a large table in the matter of
seconds or minutes? Any configuration variable in the postgresql.conf or
any tune up options available?

PostgreSQL version used is PgSQL 9.4.

Thanks a lot!
Michal


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump dumps event triggers and transforms unconditionally

2016-01-12 Thread Tom Lane
I noticed $subject while trying to fix the extension membership problem
Karsten Hilbert complained of last week.  I do not think that if I ask
for a dump of a single table, that should include event triggers.

AFAICT event triggers don't belong to any particular schema, so I'd
propose that they be dumped only when include_everything is true,
which is the usual rule for objects that don't belong to any schema.

Transforms have got the same issue, which means that the dump of a
database containing, eg, hstore_plperl is outright broken: it will
dump both the extension *and* a CREATE TRANSFORM command.  Again, they
ought to obey the default rules for schema-less objects, which in
this case is "dump if include_everything and not an extension member".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dumps event triggers and transforms unconditionally

2016-01-12 Thread Alvaro Herrera
Tom Lane wrote:
> I noticed $subject while trying to fix the extension membership problem
> Karsten Hilbert complained of last week.  I do not think that if I ask
> for a dump of a single table, that should include event triggers.

Ugh, certainly not.

> AFAICT event triggers don't belong to any particular schema, so I'd
> propose that they be dumped only when include_everything is true,
> which is the usual rule for objects that don't belong to any schema.
> 
> Transforms have got the same issue, which means that the dump of a
> database containing, eg, hstore_plperl is outright broken: it will
> dump both the extension *and* a CREATE TRANSFORM command.  Again, they
> ought to obey the default rules for schema-less objects, which in
> this case is "dump if include_everything and not an extension member".

Sounds reasonable.  (I assume this last detailed rule is what applies to
both object types.  Surely event triggers can be part of an extension
too.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Simon Riggs
On 12 January 2016 at 18:14, Andres Freund  wrote:

> Hi,


Thank you for the additional review.


> On 2016-01-11 19:39:14 +, Simon Riggs wrote:
> > Currently, the patch reuses all of the code related to reading/write
> state
> > files, so it is the minimal patch that can implement the important things
> > for performance. The current patch succeeds in its goal to improve
> > performance, so I personally see no further need for code churn.
>
> Sorry, I don't buy that argument. This approach leaves us with a bunch
> of code related to statefiles that's barely ever going to be exercised,
> and leaves the performance bottleneck on WAL replay in place.


I raised the issue of WAL replay performance before you were involved, as
has been mentioned already. I don't see it as a blocker for this patch. I
have already requested it from Stas and he has already agreed to write that.

Anyway, we know the statefile code works, so I'd prefer to keep it, rather
than write a whole load of new code that would almost certainly fail.
Whatever the code looks like, the frequency of usage is the same. As I
already said, you can submit a patch for the new way if you wish; the
reality is that this code works and there's no additional performance gain
from doing it a different way.


> > As you suggest, we could also completely redesign the state file
> mechanism
> > and/or put it in WAL at checkpoint. That's all very nice but is much more
> > code and doesn't anything more for performance, since the current
> mainline
> > path writes ZERO files at checkpoint.
>
> Well, on the primary, yes.


Your changes proposed earlier wouldn't change performance on the standby.


> > If you want that for some other reason or refactoring, I won't stop
> > you, but its a separate patch for a separate purpose.
>
> Maintainability/complexity very much has to be considered during review
> and can't just be argued away with "but this is what we implemented".
>

;-) ehem, please don't make the mistake of thinking that because your
judgement differs to mine that you can claim that you are the only one that
has thought about maintainability and complexity.

I'm happy to do some refactoring if you and Michael think it necessary.


> > - *   In order to survive crashes and shutdowns, all prepared
> > - *   transactions must be stored in permanent storage. This
> includes
> > - *   locking information, pending notifications etc. All that
> state
> > - *   information is written to the per-transaction state file in
> > - *   the pg_twophase directory.
> > + *   Information to recover prepared transactions in case of
> crash is
> > + *   now stored in WAL for the common case. In some cases there
> will be
> > + *   an extended period between preparing a GXACT and
> commit/abort, in
>
> Absolutely minor: The previous lines were differently indented (two tabs
> before, one space + two tabs now), which will probably mean pgindent
> will yank all of it around, besides looking confusing with different tab
> settings.
>
>
> > * * In case of crash replay will move data from xlog to
> files, if that
> > *   hasn't happened before. XXX TODO - move to shmem in
> replay also
>
> This is a bit confusing - causing my earlier confusion about using
> XlogReadTwoPhaseData in recovery - because what this actually means is
> that we get the data from normal WAL replay, not our new way of getting
> things from the WAL.
>
>
> > @@ -772,7 +769,7 @@ TwoPhaseGetGXact(TransactionId xid)
> >* During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be
> called
> >* repeatedly for the same XID.  We can save work with a simple
> cache.
> >*/
> > - if (xid == cached_xid)
> > + if (xid == cached_xid && cached_gxact)
> >   return cached_gxact;
>
> What's that about? When can cached_xid be be equal xid and cached_gxact
> not set? And why did that change with this patch?
>
>
> > /*
> >  * Finish preparing state file.
> >  *
> >  * Calculates CRC and writes state file to WAL and in pg_twophase
> directory.
> >  */
> > void
> > EndPrepare(GlobalTransaction gxact)
>
> In contrast to that comment we're not writing to pg_twophase anymore.
>
>
> >   /*
> >* If the file size exceeds MaxAllocSize, we won't be able to read
> it in
> >* ReadTwoPhaseFile. Check for that now, rather than fail at
> commit time.
> >*/
> >   if (hdr->total_len > MaxAllocSize)
> >   ereport(ERROR,
> >   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> >errmsg("two-phase state file maximum
> length exceeded")));
> >
>
> Outdated comment.


Ack all above.


> > +/*
> > + * Reads 2PC data from xlog. During checkpoint this data will be moved
> to
> > + * twophase files and ReadTwoPhaseFile should be used instead.
> > + */
> > +static void
> > 

Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Simon Riggs
On 12 January 2016 at 12:53, Michael Paquier 
wrote:

> On Tue, Jan 12, 2016 at 5:21 PM, Simon Riggs 
> wrote:
> > Should we just move the code somewhere just to imply it is generic? Seems
> > pointless refactoring to me.
>
> Er, why not xlogutils.c? Having the 2PC code depending directly on
> something that is within logicalfuncs.c is weird.
>

If that sounds better, I'm happy to move the code there.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] WIP: Rework access method interface

2016-01-12 Thread Alvaro Herrera
I just noticed that this patch had remained in Waiting-for-author status
after Alexander had already submitted the updated version of the patch.
I marked it as ready-for-committer, because Petr stated so in the
thread.

Tom, you're registered as committer for this patch.  Do you have time in
the near future to review this patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-01-12 Thread Stas Kelvich
My +1 for moving function to xlogutils.c too.

Now call to this function goes through series of callbacks so it is hard to 
find it.
Personally I found it only after I have implemented same function by myself 
(based on code in pg_xlogdump).

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


> On 12 Jan 2016, at 18:56, Alvaro Herrera  wrote:
> 
> Michael Paquier wrote:
>> On Tue, Jan 12, 2016 at 5:21 PM, Simon Riggs  wrote:
>>> Should we just move the code somewhere just to imply it is generic? Seems
>>> pointless refactoring to me.
>> 
>> Er, why not xlogutils.c? Having the 2PC code depending directly on
>> something that is within logicalfuncs.c is weird.
> 
> Yes, I agree with Michael -- it's better to place code in its logical
> location than keep it somewhere else just because historically it was
> there.  That way, future coders can find the function more easily.
> 
> -- 
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-01-12 Thread Masahiko Sawada
On Mon, Dec 28, 2015 at 6:38 PM, Masahiko Sawada  wrote:
> On Mon, Dec 21, 2015 at 11:54 PM, Robert Haas  wrote:
>> On Mon, Dec 21, 2015 at 3:27 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> Hello,
>>>
>>> At Fri, 18 Dec 2015 12:09:43 -0500, Robert Haas  
>>> wrote in 
>>> 
 On Thu, Dec 17, 2015 at 1:17 AM, Michael Paquier
  wrote:
 > I am not really getting the meaning of this sentence. Shouldn't this
 > be reworded something like:
 > "Freezing occurs on the whole table once all pages of this relation 
 > require it."

 That statement isn't remotely true, and I don't think this patch
 changes that.  Freezing occurs on the whole table once relfrozenxid is
 old enough that we think there might be at least one page in the table
 that requires it.
>>>
>>> I doubt I can explain this accurately, but I took the original
>>> phrase as that if and only if all pages of the table are marked
>>> as "requires freezing" by accident, all pages are frozen. It's
>>> quite obvious but it is what I think "happen to require freezing"
>>> means. Does this make sense?
>>>
>>> The phrase might not be necessary if this is correct.
>>
>> Maybe you are trying to say something like "only those pages which
>> require freezing are frozen?".
>>
>
> I was thinking the same as what Horiguchi-san said.
> That is, even if relfrozenxid is old enough, freezing on the whole
> table is not required if the table are marked as "not requires
> freezing".
> In other word, only those pages which are marked as "not frozen" are frozen.
>

The recently changes to HEAD conflicts with freeze map patch, so I've
updated and attached latest freeze map patch.
The another patch that enhances the debug log message of visibilitymap
is attached to previous mail.
.

Please review it.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 001988b..5d08c73 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, ))
+		if (VM_ALL_VISIBLE(rel, blkno, ))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat->tuple_len += BLCKSZ - freespace;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 392eb70..c43443a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5916,7 +5916,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an eager freezing if the table's
 pg_class.relfrozenxid field has reached
 the age specified by this setting.  The default is 150 million
 transactions.  Although users can set this value anywhere from zero to
@@ -5960,7 +5960,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an eager freezing if the table's
 pg_class.relminmxid field has reached
 the age specified by this setting.  The default is 150 million multixacts.
 Although users can set this value anywhere from zero to two billions,
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 5204b34..7cc975d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -352,9 +352,9 @@
 Vacuum maintains a visibility map for each
 table to keep track of which pages contain only tuples that are known to be
 visible to all active transactions (and all future transactions, until the
-page is again modified).  This has two purposes.  First, vacuum
-itself can skip such pages on the next run, since there is nothing to
-clean up.
+page is again modified), and pages contain only frozen tuples.
+This has two purposes.  First, vacuum itself can skip such pages
+on the next run, since there is nothing to clean up.

 

@@ -438,28 +438,25 @@

 

-VACUUM normally skips pages that don't have any dead row
-versions, but those pages might still have row versions with old XID
-values.  To ensure all old row versions have been frozen, a
-scan of the whole table is needed.
+VACUUM skips pages that don't have any dead row
+versions, and pages that have only frozen rows. To ensure all old
+row versions have been frozen, a scan of all unfrozen pages is needed.
  controls when
-VACUUM does that: a 

Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Vladimir Sitnikov
> I don't know if there is a good solution except disabling server prepared 
> statements.

Why doesn't backend reuse already existing good plan?
The plan does account for the skew.

Can backend take selectivities from the original bind values?

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Pavel Stehule
2016-01-12 16:52 GMT+01:00 Vladimir Sitnikov :

> > I don't know if there is a good solution except disabling server
> prepared statements.
>
> Why doesn't backend reuse already existing good plan?
>

probably good plan is more expensive than wrong plan :(

this logic is driven by plan cost, not by plan execution time.



> The plan does account for the skew.
>
> Can backend take selectivities from the original bind values?
>

> Vladimir
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Vladimir Sitnikov
VS>>Why doesn't backend reuse already existing good plan?
PS>this logic is driven by plan cost, not by plan execution time.

It do understand that currently PG replans with $1, $2 and uses
default selectivities for that.

What I am asking is to make PG aware of "previously used bind values",
so it would calculate proper selectivities for $1, $2.

PS. It is not the first time the problem bites me, so I hand-crafted a testcase.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-01-12 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> Here is a two part v12, which:
> 
> part a (refactoring of scripts and their stats):
>  - fix option checks (-i alone)
>  - s/repleacable/replaceable/ in doc
>  - keep small description in doc and help for -S & -N
>  - fix 2 comments for pg style
>  - show builtin list if not found

I'm looking at this part of your patch and I think it's far too big to
be a simple refactoring.  Would you split it up please?  I think the
StatsData / SimpleStat addition should be one patch; then there's the -b
changes.  Then there may (or may not) be a bunch of other minor
cleanups, not sure.

I'm willing to commit these patches if I can easily review what they do,
which I cannot with the current state.

Please pgindent; make sure to add /*--- here to avoid pgindent mangling
the comment:

if (pg_strcasecmp(my_commands->argv[0], "setrandom") == 0)
{
/*
 * parsing:


> part b (weight)
>  - check that the weight is an int

This part looks okay to me.  Minor nitpick,

+   int i = 0, w = 0, wc = (int) getrand(thread, 0, total_weight - 1);

should be three lines, not one.  Also the @W part in the --help output
should be in brackets, as FILE[@W], right?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] blocking the alter role command

2016-01-12 Thread Fabrízio de Royes Mello
On Tue, Jan 12, 2016 at 8:32 PM, JotaComm  wrote:
>
> Hello,
>
> I would like to know if is it possible to block the following command:
ALTER USER myuser PASSWORD;
>
> My target is allow to execute this command from a specific address.
>
> Thanks a lot.
>

You should implement an extension using ProcessUtility_hook todo that. See
an example in [1]

Regards,

[1] https://github.com/michaelpq/pg_plugins/tree/master/hook_utility

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


[HACKERS] Weird behavior during CREATE EXTENSION

2016-01-12 Thread Jim Nasby

This behavior had be quite baffled...


~@decina.local/29760# create extension "trunklet-format" CASCADE;
NOTICE:  installing required extension "trunklet"
NOTICE:  installing required extension "variant"
CREATE EXTENSION
~@decina.local/29760# create extension "pgxntool-test";
ERROR:  syntax error at or near "-"
LINE 1: create extension "pgxntool-test";
^
~@decina.local/29760# select * from pg_available_extensions where name ~'-';
  name   | default_version | installed_version | 
comment
-+-+---+-
 pgxntool-test   | 0.1.0   |   |
 trunklet-format | 0.1.1   | 0.1.1 | A format()-based 
template language for trunklet
(2 rows)


Eventually, I realized the problem was the first line of the extension 
file itself:


CREATE FUNCTION pgxntool-test(

wrapping that in "s fixed the issue. (The reason that still doesn't line 
up with the ^ above is because the ^ is accounting for "LINE 1: ".)


This makes debugging extensions quite tedious. Part of the explanation 
is in the comment for execute_sql_string():



/*
 * Execute given SQL string.
 *
 * filename is used only to report errors.
 *
 * Note: it's tempting to just use SPI to execute the string, but that does
 * not work very well.  The really serious problem is that SPI will parse,
 * analyze, and plan the whole string before executing any of it; of course
 * this fails if there are any plannable statements referring to objects
 * created earlier in the script.  A lesser annoyance is that SPI insists
 * on printing the whole string as errcontext in case of any error, and that
 * could be very long.
 */


I can think of 4 ways to fix this:

1) Have psql parse the script into separate commands for us.
2) Pull enough of psql's parsing into the backend code to be able to do #1
3) Add *file* line numbers to the output of pg_parse_query()
4) Have ereport spit out the context you'd normally get from SPI if it 
sees that it was called from somewhere underneath execute_sql_string().


My preference would actually be #1, because that would make it easy for 
any tool that wanted to to get access to that. Jon Erdman actually 
looked into a similar alternative in the past and it was only a few 
lines of code. Basically, when the "parse file" option is chosen don't 
even attempt to connect to a database, just parse things, execute \ 
commands and print the results instead of sending them via libpq. That 
wouldn't work directly here because we want to split commands apart, but 
it wouldn't be hard to have psql spit out a special command separator 
line and then look for that. psql would have to ignore \quit in this 
mode though, but I think that's fine.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PGCon 2016 CFP - one week left

2016-01-12 Thread Dan Langille
Hello

There is one week left in the PGCon CFP.  Details below.  Please submit.  
Thanks.

PGCon 2016 will be on 17-21 May 2016 at University of Ottawa.

* 17-18 (Tue-Wed) tutorials
* 19 & 20 (Thu-Fri) talks - the main part of the conference
* 17 & 21 (Wed & Sat) The Developer Unconference & the User Unconference (both 
very popular)

PLEASE NOTE: PGCon 2016 is in May.

See http://www.pgcon.org/2016/

We are now accepting proposals for the main part of the conference (19-20 May).
Proposals can be quite simple. We do not require academic-style papers.

If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- The latest PostgreSQL features and features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

1 Dec 2015 Proposal acceptance begins
19 Jan 2016 Proposal acceptance ends
19 Feb 2016 Confirmation of accepted proposals

NOTE: the call for lightning talks will go out very close to the conference.
Do not submit lightning talks proposals until then.

See also 

Instructions for submitting a proposal to PGCon 2016 are available
from: 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread David Rowley
On 13 January 2016 at 05:59, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> 08.01.2016 00:12, David Rowley:
>
> On 7 January 2016 at 06:36, Jeff Janes  wrote:
>
>> On Tue, Jan 5, 2016 at 11:55 PM, David Rowley
>>  wrote:
>> > create table ab (a int,b int);
>> > insert into ab select x,y from generate_series(1,20) x(x),
>> > generate_series(10,1,-1) y(y);
>> > create index on ab (a) including (b);
>> > explain select * from ab order by a,b;
>> > QUERY PLAN
>> > --
>> >  Sort  (cost=10.64..11.14 rows=200 width=8)
>> >Sort Key: a, b
>> >->  Seq Scan on ab  (cost=0.00..3.00 rows=200 width=8)
>> > (3 rows)
>>
>> If you set enable_sort=off, then you get the index-only scan with no
>> sort.  So it believes the index can be used for ordering (correctly, I
>> think), just sometimes it thinks it is not faster to do it that way.
>>
>> I'm not sure why this would be a correctness problem.  The covered
>> column does not participate in uniqueness checks, but it still usually
>> participates in index ordering.  (That is why dummy op-classes are
>> needed if you want to include non-sortable-type columns as being
>> covered.)
>>
>
> If that's the case, then it appears that I've misunderstood INCLUDING.
> From reading _bt_doinsert() it appeared that it'll ignore the INCLUDING
> columns and just find the insert position based on the key columns. Yet
> that's not the way that it appears to work. I was also a bit confused, as
> from working with another database which has very similar syntax to this,
> that one only includes the columns to allow index only scans, and the
> included columns are not indexed, therefore can't be part of index quals
> and the index only provides a sorted path for the indexed columns, and not
> the included columns.
>
>
> Thank you for properly testing. Order by clause in this case definitely
> doesn't work as expected.
> The problem is fixed by patching a planner function
> "build_index_pathkeys()'. It disables using of index if sorting of included
> columns is required.
> Test example works correctly now - it always performs seq scan and sort.
>
>
Thank you for updating the patch.
That's cleared up my confusion. All the code I read seemed to indicate that
INCLUDING columns were leaf only, it just confused me as to why the indexed
appeared to search and order on all columns, including the including
columns. Thanks for clearing up my confusion and fixing the patch.


> Saying that, I'm now a bit confused to why the following does not produce
> 2 indexes which are the same size:
>
> create table t1 (a int, b text);
> insert into t1 select x,md5(random()::text) from
> generate_series(1,100) x(x);
> create index t1_a_inc_b_idx on t1 (a) including (b);
> create index t1_a_b_idx on t1 (a,b);
> select pg_relation_Size('t1_a_b_idx'),pg_relation_size('t1_a_inc_b_idx');
>  pg_relation_size | pg_relation_size
> --+--
>  59064320 | 58744832
> (1 row)
>
>
> I suppose you've already found that in discussion above. Included columns
> are stored only in leaf index pages. The difference is the size of
> attributes 'b' which are situated in inner pages of index "t1_a_b_idx".
>

Yeah, I saw that from the code too. I just was confused as they appeared to
work like normal indexes.

I've made another pass of the covering_unique_4.0.patch. Again somethings
are nit picky (sorry), but it made sense to write them down as I noticed
them.

-   multiple entries with identical keys.  An access method that supports
this
+   multiple entries with identical keys. An access method that supports
this

Space removed by mistake.

feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce
uniqueness.
+   (At present, only b-tree supports them.)

Maybe

+   (At present amcanunique is only supported by b-tree
+   indexes.)

As the extra line you've added confuses what "it" or "them" means, so maybe
best to clarify that.


+   INCLUDING aren't used to enforce constraints
(UNIQUE, PRIMARY KEY, etc).

Goes beyond 80 chars.


  right_item = CopyIndexTuple(item);
+ right_item = index_reform_tuple(rel, right_item, rel->rd_index->indnatts,
rel->rd_index->indnkeyatts);

Duplicate assignment. Should this perhaps be:

+ if (rel->rd_index->indnatts == rel->rd_index->indnkeyatts)
+   right_item = CopyIndexTuple(item);
+ else
+ right_item = index_reform_tuple(rel, right_item, rel->rd_index->indnatts,
rel->rd_index->indnkeyatts);

?

- natts = RelationGetNumberOfAttributes(rel);
- indoption = rel->rd_indoption;

- skey = (ScanKey) palloc(natts * sizeof(ScanKeyData));
+ Assert(rel->rd_index->indnkeyatts != 0);
+ Assert(rel->rd_index->indnkeyatts <= rel->rd_index->indnatts);

- for (i = 0; i < natts; i++)
+ nkeyatts = rel->rd_index->indnkeyatts;

Since 

Re: [HACKERS] pg_dump and premature optimizations for objects not to be dumped

2016-01-12 Thread Tom Lane
I wrote:
> I looked into Karsten Hilbert's report here
> http://www.postgresql.org/message-id/20160108114529.gb22...@hermes.hilbert.loc
> of being unable to run pg_upgrade on a database containing the pg_trgm
> extension.  After some investigation, the problem is explained like this:
> ...
> The thing that seems possibly most robust to me is to pull in the
> extension membership data *first* and incorporate it into the initial
> selectDumpableObject tests, thus getting us back to the pre-9.1 state
> of affairs where the initial settings of the object dump flags could
> be trusted.  This might be expensive though; and it would certainly add
> a good chunk of work to the race-condition window where we've taken
> pg_dump's transaction snapshot but not yet acquired lock on any of
> the tables.

Attached is a draft patch that does things this way.  Some simple testing
suggests it's just about exactly the same speed as 9.5 pg_dump, which
means that the "expensive" objection is dismissible.  It's hard to tell
though whether the pre-table-lock race-condition window has gotten
meaningfully wider.

In addition to fixing Karsten's problem, this deals with the bugs I noted
earlier about event triggers and transforms being dumped unconditionally.

I was also able to get rid of some unsightly special cases for procedural
languages, FDWs, and foreign servers, in favor of setting their dump
flags according to standard rules up front.

If we were to put a test rejecting initdb-created objects (those with
OID < FirstNormalObjectId) into selectDumpableObject, it'd be possible
to get rid of selectDumpableDefaultACL, selectDumpableCast, and/or
selectDumpableProcLang, since those would then have various subsets
of the selectDumpableObject rules.  I'm not sure if this would be an
improvement or just confusing; any thoughts?

I'm not very sure what to do with this patch.  I think it should
definitely go into 9.5: it applies cleanly there and it will fix our two
new-in-9.5 bugs with event triggers and transforms.  I'm less enthused
about back-porting it further.  In principle, the extension membership
issues exist all the way back to 9.1, but we haven't had complaints before,
and there's a nonzero chance of changing corner-case behaviors.  (I think
any such changes would likely be for the better, but nonetheless they
would be changes.)  Back-porting it further than about 9.4 would also
be quite a lot of work :-(

Comments?

regards, tom lane

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 7869ee9..ded14f3 100644
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
*** static int	numCatalogIds = 0;
*** 40,69 
  
  /*
   * These variables are static to avoid the notational cruft of having to pass
!  * them into findTableByOid() and friends.  For each of these arrays, we
!  * build a sorted-by-OID index array immediately after it's built, and then
!  * we use binary search in findTableByOid() and friends.  (qsort'ing the base
!  * arrays themselves would be simpler, but it doesn't work because pg_dump.c
!  * may have already established pointers between items.)
   */
- static TableInfo *tblinfo;
- static TypeInfo *typinfo;
- static FuncInfo *funinfo;
- static OprInfo *oprinfo;
- static NamespaceInfo *nspinfo;
- static int	numTables;
- static int	numTypes;
- static int	numFuncs;
- static int	numOperators;
- static int	numCollations;
- static int	numNamespaces;
  static DumpableObject **tblinfoindex;
  static DumpableObject **typinfoindex;
  static DumpableObject **funinfoindex;
  static DumpableObject **oprinfoindex;
  static DumpableObject **collinfoindex;
  static DumpableObject **nspinfoindex;
  
  
  static void flagInhTables(TableInfo *tbinfo, int numTables,
  			  InhInfo *inhinfo, int numInherits);
--- 40,69 
  
  /*
   * These variables are static to avoid the notational cruft of having to pass
!  * them into findTableByOid() and friends.  For each of these arrays, we build
!  * a sorted-by-OID index array immediately after the objects are fetched,
!  * and then we use binary search in findTableByOid() and friends.  (qsort'ing
!  * the object arrays themselves would be simpler, but it doesn't work because
!  * pg_dump.c may have already established pointers between items.)
   */
  static DumpableObject **tblinfoindex;
  static DumpableObject **typinfoindex;
  static DumpableObject **funinfoindex;
  static DumpableObject **oprinfoindex;
  static DumpableObject **collinfoindex;
  static DumpableObject **nspinfoindex;
+ static DumpableObject **extinfoindex;
+ static int	numTables;
+ static int	numTypes;
+ static int	numFuncs;
+ static int	numOperators;
+ static int	numCollations;
+ static int	numNamespaces;
+ static int	numExtensions;
  
+ /* This is an array of object identities, not actual DumpableObjects */
+ static ExtensionMemberId *extmembers;
+ static int	numextmembers;
  
  static void flagInhTables(TableInfo *tbinfo, int numTables,
  			  

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread David Rowley
On 13 January 2016 at 06:47, Jeff Janes  wrote:

>
> Why is omit_opclass a separate patch?  If the included columns now
> never participate in the index ordering, shouldn't it be an inherent
> property of the main patch that you can "cover" things without btree
> opclasses?
>
>
I also wondered this. We can't have covering indexes without fixing the
problem with the following arrays:

  info->indexkeys = (int *) palloc(sizeof(int) * ncolumns);
  info->indexcollations = (Oid *) palloc(sizeof(Oid) * ncolumns);
  info->opfamily = (Oid *) palloc(sizeof(Oid) * ncolumns);

These need to be sized according to the number of key columns, not the
total number of columns. Of course, the TODO item in the patch states this
too.

I don't personally think the covering_unique_4.0.patch is that close to
being too big to review, I think things would make more sense of the
omit_opclass_4.0.patch was included together with this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova

08.01.2016 00:12, David Rowley:
On 7 January 2016 at 06:36, Jeff Janes > wrote:


On Tue, Jan 5, 2016 at 11:55 PM, David Rowley
> wrote:
> create table ab (a int,b int);
> insert into ab select x,y from generate_series(1,20) x(x),
> generate_series(10,1,-1) y(y);
> create index on ab (a) including (b);
> explain select * from ab order by a,b;
> QUERY PLAN
> --
>  Sort  (cost=10.64..11.14 rows=200 width=8)
>Sort Key: a, b
>->  Seq Scan on ab  (cost=0.00..3.00 rows=200 width=8)
> (3 rows)

If you set enable_sort=off, then you get the index-only scan with no
sort.  So it believes the index can be used for ordering (correctly, I
think), just sometimes it thinks it is not faster to do it that way.

I'm not sure why this would be a correctness problem. The covered
column does not participate in uniqueness checks, but it still usually
participates in index ordering.  (That is why dummy op-classes are
needed if you want to include non-sortable-type columns as being
covered.)


If that's the case, then it appears that I've misunderstood INCLUDING. 
From reading _bt_doinsert() it appeared that it'll ignore the 
INCLUDING columns and just find the insert position based on the key 
columns. Yet that's not the way that it appears to work. I was also a 
bit confused, as from working with another database which has very 
similar syntax to this, that one only includes the columns to allow 
index only scans, and the included columns are not indexed, therefore 
can't be part of index quals and the index only provides a sorted path 
for the indexed columns, and not the included columns.


Thank you for properly testing. Order by clause in this case definitely 
doesn't work as expected.
The problem is fixed by patching a planner function 
"build_index_pathkeys()'. It disables using of index if sorting of 
included columns is required.

Test example works correctly now - it always performs seq scan and sort.

Saying that, I'm now a bit confused to why the following does not 
produce 2 indexes which are the same size:


create table t1 (a int, b text);
insert into t1 select x,md5(random()::text) from 
generate_series(1,100) x(x);

create index t1_a_inc_b_idx on t1 (a) including (b);
create index t1_a_b_idx on t1 (a,b);
select pg_relation_Size('t1_a_b_idx'),pg_relation_size('t1_a_inc_b_idx');
 pg_relation_size | pg_relation_size
--+--
 59064320 | 58744832
(1 row)


I suppose you've already found that in discussion above. Included 
columns are stored only in leaf index pages. The difference is the size 
of attributes 'b' which are situatedin inner pages of index "t1_a_b_idx".


Also, if we want INCLUDING() to mean "uniqueness is not enforced on 
these columns, but they're still in the index", then I don't really 
think allowing types without a btree opclass is a good idea. It's 
likely too surprised filled and might not be what the user actually 
wants. I'd suggest that these non-indexed columns would be better 
defined by further expanding the syntax, the first (perhaps not very 
good) thing that comes to mind is:


create unique index idx_name on table (unique_col) also index 
(other,idx,cols) including (leaf,onlycols);


Looking up thread, I don't think I was the first to be confused by this.


Included columns are still in the index physically - they are stored in 
the index relation. But they are not indexedin the true sense of the 
word. It's impossible to use them for index scan or ordering. At the 
beginning, I've got an idea that included columns are supposed to be 
used for combination of unique index on one columns and covering on 
others. In a very rare instances one could prefer a non-unique index 
with included columns "t1_a_inc_b_idx"to a regular multicolumn index 
"t1_a_b_idx". Frankly, I didn't see such use cases at all. Index size 
reduction is not considerable, while we lose some useful index 
functionality on included column. I think that it should be mentioned as 
a note in documentation, but I need help to phrase it clear.


But now I see the reason to create non-unique index with included 
columns - lack of suitable opclass on column "b".
It's impossible to add it into the index as a key column, but that's not 
a problem with INCLUDING clause.

Look at example.

create table t1 (a int, b box);
create index t1_a_inc_b_idx on t1 (a) including (b);
create index on tbl (a,b);
ERROR:  data type box has no default operator class for access method 
"btree"
HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.

create index on tbl (a) including (b);
CREATE INDEX

This functionality is provided by the attached patch 

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-12 Thread Catalin Iacob
On Tue, Jan 12, 2016 at 5:50 PM, Pavel Stehule  wrote:
> Error and Fatal exception classes are introduced in my patch - it was Peter'
> request (if I remember it well), and now I am thinking so it is not good
> idea.

Now everybody is confused :). Your patch does:

-PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
-PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
-PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);

[snip]

+PLy_exc_error = PyErr_NewException("plpy.Error", PLy_exc_base, NULL);
+PLy_exc_fatal = PyErr_NewException("plpy.Fatal", PLy_exc_base, NULL);
+PLy_exc_spi_error = PyErr_NewException("plpy.SPIError",
PLy_exc_base, NULL);

So they are there without the patch, you now make them inherit from
the new BaseError previously they just inherited from Exception.

More soon in another reply I was just typing when this came in.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-12 Thread Pavel Stehule
2016-01-12 17:59 GMT+01:00 Catalin Iacob :

> On Tue, Jan 12, 2016 at 5:50 PM, Pavel Stehule 
> wrote:
> > Error and Fatal exception classes are introduced in my patch - it was
> Peter'
> > request (if I remember it well), and now I am thinking so it is not good
> > idea.
>
> Now everybody is confused :). Your patch does:
>
> -PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
> -PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
> -PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
>
> [snip]
>
> +PLy_exc_error = PyErr_NewException("plpy.Error", PLy_exc_base, NULL);
> +PLy_exc_fatal = PyErr_NewException("plpy.Fatal", PLy_exc_base, NULL);
> +PLy_exc_spi_error = PyErr_NewException("plpy.SPIError",
> PLy_exc_base, NULL);
>
> So they are there without the patch, you now make them inherit from
> the new BaseError previously they just inherited from Exception.
>

I was wrong, I am sorry.


>
> More soon in another reply I was just typing when this came in.
>


Re: [HACKERS] count_nulls(VARIADIC "any")

2016-01-12 Thread Marko Tiikkaja

On 03/01/16 22:49, Jim Nasby wrote:

In the unit test, I'd personally prefer just building a table with the
test cases and the expected NULL/NOT NULL results, at least for all the
calls that would fit that paradigm. That should significantly reduce the
size of the test. Not a huge deal though...


I don't really see the point.  "The size of the test" doesn't seem like 
a worthwhile optimization target, unless the test scripts are somehow 
really unnecessarily large.


Further, if you were developing code related to this, previously you 
could just copy-paste the defective test case in order to easily 
reproduce a problem.  But now suddenly you need a ton of different setup.


I don't expect to really have a say in this, but I think the tests are 
now worse than they were before.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-12 Thread Pavel Stehule
2016-01-11 20:11 GMT+01:00 Jim Nasby :

> On 1/11/16 12:46 PM, Pavel Stehule wrote:
>
>> 2016-01-11 19:41 GMT+01:00 Jim Nasby > >:
>>
>> On 1/11/16 12:33 PM, Pavel Stehule wrote:
>>
>> 1. break compatibility and SPIError replace by Error
>>
>>
>> At this point I've lost track... what's the incompatibility between
>> the two?
>>
>>
>> the name and internal format (but this structure can be visible to user
>> space)
>>
>
> Were Error and Fatal ever documented as classes? All I see is "raise
> plpy.Error(msg) and raise plpy.Fatal(msg) are equivalent to calling
> plpy.error and plpy.fatal, respectively." which doesn't lead me to believe
> I should be trapping on those.
>

Error and Fatal exception classes are introduced in my patch - it was
Peter' request (if I remember it well), and now I am thinking so it is not
good idea.


>
> It's not clear to me why you'd want to handle error and fatal differently
> anyway; an error is an error. Unless fatal isn't supposed to be trappable?
> [1] leads me to believe that you shouldn't be able to trap a FATAL because
> it's supposed to cause the entire session to abort.
>

> Since spiexceptions and SPIError are the only documented exceptions
> classes, I'd say we should stick with those and get rid of the others.
> Worst-case, we can have a compatability GUC, but I think plpy.Error and
> plpy.Fatal were just poorly thought out.
>

I have same opinion now. I remove it from my patch.

Pavel



>
> [1]
> http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Pavel Stehule
2016-01-12 17:01 GMT+01:00 Vladimir Sitnikov :

> VS>>Why doesn't backend reuse already existing good plan?
> PS>this logic is driven by plan cost, not by plan execution time.
>
> It do understand that currently PG replans with $1, $2 and uses
> default selectivities for that.
>
> What I am asking is to make PG aware of "previously used bind values",
> so it would calculate proper selectivities for $1, $2.
>

the implementation is simply - but it hard to design some really general -
it is task for UI

Pavel



>
> PS. It is not the first time the problem bites me, so I hand-crafted a
> testcase.
>



>
> Vladimir
>


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-12 Thread Catalin Iacob
On Mon, Jan 11, 2016 at 7:33 PM, Pavel Stehule  wrote:
> I see it.
>
> it looks like distinguish between Error and SPIError is wrong way. And I
> have any other ugly use case.
>
> If I raise a Error from one PLPythonu function, then in other PLPython
> function I'll trap a SPIError exception, because the information about class
> was lost inside Postgres. And it should be pretty messy.
> I have not information if any exception was User based or SPI based.

This isn't that bad. The real Python exception is only lost if the
first function calls into Postgres which then ends into another
PL/Python function which raises. That's just the way it is. If you use
Psycopg2 on the client and a PL/Python function in the server you also
don't get in the client the real exception that PL/Python raised even
though it's Python at both ends. The analogy isn't perfect since one
is in process and the other cross process but you get the point. If a
PL/Python function calls another one directly and that one raises
something, the caller can catch that just like in regular Python.

> The differentiation between Error and SPIError is wrong, because there isn't
> any difference in reality.

They're similar but not really the same thing. raise Error and
plpy.error are both ways to call ereport(ERROR, ...) while SPIError is
raised when coming back after calling into Postgres to execute SQL
that itself raises an error. Now indeed, that executed SQL raised an
error itself via another ereport(ERROR, ...) somewhere but that's a
different thing.

I'm getting more and more convinced that SPIError should be left
unchanged and should not even inherit from BaseError as it's a
different thing. And as I said this means BaseError isn't the base of
all exceptions that can be raised by the plpy module so then it should
probably not be called BaseError. Maybe something like ReportableError
(the base class of ereport frontends).

Or don't have a base class at all and just allow constructing both
Error and Fatal instances with message, detail, hint etc. What you
loose is you can't catch both Error and Fatal with a single except
ReportableError block but Fatal is maybe not meant to be caught and
Error is also not really meant to be caught either, it's meant to be
raised in order to call ereport(ERROR, ...). I now like this option
(no base class but same attributes in both instances) most.

As I see it, what this patch tries to solve is that raise Error and
plpy.error (and all the others) are not exposing all features of
ereport, they can only pass a message to ereport but can't pass all
the other things ereport accepts: detail, hint etc. The enhancement is
being able to pass all those arguments (as positional or keyword
arguments) when constructing and raising an Error or Fatal instance or
when using the plpy.error helpers. Since the existing helpers accept
multiple arguments already (which they unfortunately and weirdly
concatenate in a tuple whose string representation is pushed into
ereport as message) we can't repurpose the multiple arguments as
ereport detail, hint etc. so Pavel invented paralel plpy.raise_error
and friends which do that. If a bold committer says the string
representation of the tuple is too weird to be relied on we could even
just change meaning of plpy.error and accept (and document) the
compatibility break.

So the patch is almost there I think, it just needs to stop messing
around with SPIError and the spidata attribute.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers