date:20131203

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Heikki Linnakangas


On 12/03/2013 09:25 AM, Jeff Davis wrote:

On Mon, 2013-12-02 at 11:07 +0200, Heikki Linnakangas wrote:

There should be no difference between file-based extensions and
catalog-based extensions. It's just two different ways to install the
same extension. The extension author doesn't need to care about that,
it's the DBA that decides which method to use to install it.

I'm going to object loudly to any proposal that doesn't meet that criteria.


But we're arguably already in this position today. For a SQL-only
extension, the author can choose between:

1. Using a schema/namespace
   a. installable over libpq
   b. installable by non-superusers
   c. no special handling when it comes to administration

2. Using an extension
   a. convenient metadata (e.g. "requires")
   b. upgrade scripts
   c. omitted from pg_dump so can be managed separately
   d. relocatable
   e. not tied to one schema

And if the author decides to change, it requires porting the extension
to the other form.

Note: I'm using "extension" loosely here. We might call the SQL-only,
user-installable variety something else.


Good point. It's not too hard to install an "extension" written as an 
extension as plain schema objects, though. You can just run the .sql 
script through psql. That's what you used to do before extensions were 
invented. (the scripts in contrib contain an explicit check against 
that, but I don't think that's common outside contrib)


Another perspective is that that's already a situation we'd rather not 
have. Let's not make it worse by introducing a third way to install an 
extension, which again requires the extension author to package the 
extension differently.



So how do we get to the point where we have clear advice to the author
of a SQL-only extension? And how do we do that without asking them to
port anything?


Yeah, that's the crucial question of this whole thread.


Stephen mentioned using external tools and/or metadata, but to me that
sounds like it would require porting the extension away from what's on
PGXN today.


Why? The external tool can pick the extension in its current form from 
PGXN, and install it via libpq. The tool might have to jump through some 
hoops to do it, and we might need some new backend functionality to 
support it, but I don't see why the extension author needs to do anything.


That said, it might make the tool easier to write if we place some new 
requirements for extension authors. Like, stipulate that the .sql file 
is in the top-level directory of the extension tarball. But the same 
extension would still be installable with "make; make install".


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.7

2013-12-03 Thread Kyotaro HORIGUCHI

Hello, This is rather trivial and superficial comments as not
fully gripping functions of this patchset.

- Some patches have line offset to master. Rebase needed.

Other random comments follows,

= 0001:

 - You assined HeapTupleGetOid(tuple) into relid to read in
   several points but no modification. Nevertheless, you left
   HeapTupleGetOid not replaced there. I think 'relid' just for
   repeated reading has far small merit compared to demerit of
   lowering readability. You'd be better to make them uniform in
   either way.


   
= 0002:

 - You are identifying the wal_level with the expr 'wal_level >=
   WAL_LEVEL_LOGICAL' but it seems somewhat improper.

 - In heap_insert, you added following comment and code,

   > * Also, if this is a catalog, we need to transmit combocids to
   > * properly decode, so log that as well.
   > */
   >need_tuple_data = RelationIsLogicallyLogged(relation);
   >if (RelationIsAccessibleInLogicalDecoding(relation))
   >log_heap_new_cid(relation, heaptup);

   Actually 'is a catalog' is checkied in
   RelationIsAcc...Decodeing() but this either of naming or
   commnet should be changed for consistent look. (I this the
   name of the macro is rather long but gives a vague
   illustration of the function..)


 - RelationIsAccessibleInLogicalDecoding and
   RelationIsLogicallyLogged are identical in code. Together with
   the name ambiguity, this is quite confising and cause of
   future misuse between these macros, I suppose. Plus the names
   seem too long.

 - In heap_insert, the information conveyed on rdata[3] seems to
   be better to be in rdata[2] because of the scarecity of the
   additional information. XLOG_HEAP_CONTAINS_NEW_TUPLE only
   seems to be needed. Also is in heap_multi_insert and
   heap_upate.

 - In heap_multi_insert, need_cids referred only once so might be
   better removed.

 - In heap_delete, at the point adding replica identity, same to
   the aboves, rdata[3] could be moved into rdata[2] making new
   type like 'xl_heap_replident'.

 - In heapam_xlog.h, the new type xl_heap_header_len is
   inadequate in both of naming which is confising and
   construction on which the header in xl_heap_header is no
   longer be a header and indecisive member name 't_len'..

 - In heapam_xlog.h, XLOG_HEAP_CONTAINS_OLD looks incomplete. And
   it seems to be used in nowhere in this patchset. It should be
   removed.

 - log_heap_new_cid() is called at several part just before other
   xlogs is being inserted. I suppose this should be built in the
   target xlog structures.

 - in RecovoerPreparedTransactions(), any commend needed for the
   reason calling XLogLogicalInfoActive()..

 - In xact.c, the comment for the member 'didLogXid' in
   TransactionStateData seems differ from it's meaning. It
   becomes true when any WAL record for the current transaction
   id just has been written to WAL buffer. So the comment,

   > /* has xid been included in WAL record? */

   would be better be something like (Should need corrected as
   I'm not native speaker.)

/* Any WAL record for this transaction has been emitted ? */

   and also the member name should be something like
   XidIsLogged. (Not so chaned?)

 - The name of the function MarkCurrentTransactionIdLoggedIfAny,
   although irregular abbreviations are discouraged, seems too
   long. Isn't MarkCur(r/rent)XidLoggedIfAny sufficient?  Anyway,
   the work involving this function seems would be better to be
   done in some other way..

 - The comment for RelationGetIndexAttrBitmap() should be edited
   for attrKind.

 - The macro name INDEX_ATTR_BITMAP_KEY should be
   INDEX_ATTR_BITMAP_FKEY. And INDEX_ATTR_BITMAP_IDENTITY_KEY
   should be INDEX_ATTR_BITMAP_REPLID_KEY or something.

 - In rel.h the member name 'rd_idattr' would be better being
   'rd_replidattr' or something like that.

= 0004:

 - Could the macro name 'RelationIsUsedAsCatalogTable' be as
   simple as IsUserCatalogRelation or something? It's from the
   viewpoint of not only simplicity but also similarity to other
   macro and/or functions having closer functionality. You
   already call the table 'user_catalog_table' in rel.h.

To be continued to next mail.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.

2013-12-03 Thread Michael Paquier

On Mon, Dec 2, 2013 at 11:41 PM, Tom Lane  wrote:
> Andres Freund  writes:
> At the same time, I'm pretty skeptical that any simple regression-test
> type facility would have caught the bugs we've fixed lately ...
The replication bug would have been reproducible at least, Heikki
produced a simple test case able to reproduce it. For the MultiXact
stuff... well some more infrastructure in core might be needed before
having a wrapper calling test scripts aimed to manipulate cluster of
nodes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logging WAL when updating hintbit

2013-12-03 Thread Sawada Masahiko

On Tue, Dec 3, 2013 at 4:28 PM, Michael Paquier
 wrote:
> On Tue, Dec 3, 2013 at 3:30 PM, Sawada Masahiko  wrote:
>> On Tue, Dec 3, 2013 at 12:02 PM, Michael Paquier
>>  wrote:
 Indeed, I forgot this code path. Completing for
 saving the state and xlog_redo for replay would be enough.
>>> Wait a minute, I retract this argument. By using this method a master
>>> server would be able to produce WAL files with inconsistent hint bit
>>> data when they are replayed if log_hint_bits is changed after a
>>> restart of the master.
>>
>> How case does it occur?
>> I think pg_rewind can disagree to rewind if log_hint_bits is changed to 
>> 'OFF'.
>> Is this not enough?
>
> After more thinking...
> Before performing a rewind on a node, what we need to know is that
> log_hint_bits was set to true when WAL forked, because of the issue
> that Robert mentioned here:
> http://www.postgresql.org/message-id/519e5493.5060...@vmware.com
> It does not really matter if the node used log_hint_bits set to false
> in its latest state (Node to-be-rewinded might have been restarted
> after WAL forked).
>
> So, after more thinking, yes using XLOG_PARAMETER_CHANGE and
> PGC_POSTMASTER for this parameter would be enough. However on the
> pg_rewind side we would need to track the value of log_hint_bits when
> analyzing the WALs and ensure that it was set to true at fork point.
> This is not something that the core should about though.

Yep, pg_rewind needs to track the value of wal_log_hintbits.
I think value of wal_log_hintbits always needs to have been set true
after fork point.
And if wal_log_hintbits is set false when we perform pg_rewind, we can not that.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Expiring tuples from aggregation in window frames efficiently

2013-12-03 Thread David Rowley

Hi folks,

I was casting my mind back to an idea that came up when windowing functions
were first implemented in 8.4 during some talk about how ROWS BETWEEN might
implemented in the future. This has now been implemented but with the
implementation there is some quadratic behaviour when tuples move off the
top of the window frame. In this case the aggregate value must be
re-calculated for all the remaining rows in the frame for every row that
exits the window's frame. e.g  with SUM(value) OVER(ORDER BY id ROWS
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) where the frame head is always
at the current row.

There is a comment in nodeWindowAgg.c on line 457 explaining that it may be
possible to optimise this by having functions which can "expire" tuples out
of the aggregate value. These are described as "negative transition
function".
Currently I'm looking over this to try to determine how hard this would be
to implement it, with views that if it is within my ability then I'd like
to have a patch ready for the next commitfest.

The purpose of this email is just to publish my rough implementation ideas
with the hope that someone can tell me if I'm on the right track or that
I'm off track somewhere. Also there may be optimisation opportunities that
I've not thought of.

The idea would be to allow an extra function type when a user does CREATE
AGGREGATE, this would be the function which removes a tuple from
aggregation, I'll call these "aggregate subtraction functions" from now on.

For SUM() I think the aggregation subtraction function would just do a -
instead of a +, and look pretty much the same as intN_sum(). It looks like
it would be similar for AVG() where we'd just remove from the count and the
sum. COUNT(col) and COUNT(*) I would imagine would be quite similar though
from what I've manage to figure out so far the implementation of this is a
bit special.
The specification of this aggregate subtraction function would be optional
in CREATE AGGREGATE and if it was present it would be used instead of
looping over all the rows in the frame each time the head of the window
frame moves to the next row.

I'm thinking that it would be useful to make it so these aggregate
subtraction functions returned true if they've managed to remove the tuple
and false to tell the executor to fall back and loop over each tuple in the
frame. I feel this would be useful as it would allow MAX() and MIN() to at
least partially get on-board with this optimisation Let me explain:

If the current MAX() is 10 and we ask the aggregate subtraction function to
subtract a tuple with 9, then the MAX() is still 10. So, in this case we
could just compare 9 with 10 and see that it is lower and then just return
true, thus pretending we actually did the subtraction, but the fact being
no subtraction was required. The same applies to MIN, though of course with
the comparison around the other way. It may also be possible to store a
counter which we could increase each time MAX() receives a tuple with the
current maximum value, this would be set back to 1 when the MAX receives a
higher value. The aggregate subtraction function could decrement that
counter each time it is asked to subtract the current maximum value and
only return false when that counter gets to 0. Though I think this might
mean that MAX() could take a small performance hit when it is used even not
as a windowing aggregate. The pseudo logic would look something along the
lines of:

IN > "1" MAX() = "1", maxcount = 1
IN > "2" MAX() = "2", maxcount = 1 (max increased, reset maxcount to 1)
IN > "2" MAX() = "2", maxcount = 2
OUT > "1" MAX() = "2", maxcount = 2 (value is less than current MAX, so
don't touch the maxcount)
OUT > "2" MAX() = "2", maxcount = 1
OUT > "2" MAX() = ???, maxcount = 0 (maxcount is 0 so return false as we
need to recalculate)

Tonight I'm trying to figure out how all of this could be implemented, I'll
list below my rough idea of how this might be done. Please shout out if
there is something out of order here.

1. Modifiy pg_proc.h adding a new bool flag for aggsubtract function.
2. In CREATE FUNCTION allow AGGSUBTRACT to be specified similar to how
WINDOW is, only demand that these functions return BOOLEAN.
3. Alter CREATE AGGREGATE to allow the AGGSUBTRACT function type, this will
be optional.
4. Write agg subtract functions for COUNT(col), COUNT(*), SUM(), AVG(),
MIN() and MAX()
5. Change eval_windowaggregates() to attempt to use the aggregate
subtraction function if it exists for this aggregate and if that function
returns true use the updated aggregate state.

I think perhaps as a simple initial implementation I could skip steps 2 and
3 and for step 4 just implement int4_sum_neg(), though I'm not quite sure
yet if these steps would be required for initdb to work.

Also the use passing volatile functions to the aggregate requires some
thought. I would imagine the optimisation needs to be disabled in this
case, but I'm not quite sure actually how the code sho

Re: [HACKERS] Get more from indices.

2013-12-03 Thread Etsuro Fujita

I wrote:
> I've modified the patch to work in such a way.  Also, as ISTM the patch
> is more complicated than what the patch really does, I've simplified the
> patch.

I've revised the patch a bit.  Please find attached the patch.

Thanks,

Best regards,
Etsuro Fujita


pathkey_and_uniqueindx_v7_20131203.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.7

2013-12-03 Thread Kyotaro HORIGUCHI

Hello, this is continued comments.

> = 0004:

> To be continued to next mail.


= 0005:

 - In heapam.c, it seems to be better replacing t_self only
   during logical decoding.

 - In GetOldestXmin(), the parameter name 'alreadyLocked' would
   be better being simplly 'nolock' since alreadyLocked seems to
   me suggesting that it will unlock the lock acquired beforehand.

 - Before that, In LogicalDecodingAcquireFreeSlot, lock window
   for procarray is extended after GetOldestXmin, but procarray
   does not seem to be accessed during the additional period. If
   you are holding this lock for the purpose other than accessing
   procArray, it'd be better to provide its own lock object.

> LWLockAcquire(ProcArrayLock, LW_SHARED);
> slot->effective_xmin = GetOldestXmin(true, true, true);
> slot->xmin = slot->effective_xmin;
> 
> if (!TransactionIdIsValid(LogicalDecodingCtl->xmin) ||
>   NormalTransactionIdPrecedes(slot->effective_xmin, 
LogicalDecodingCtl->xmin))
>   LogicalDecodingCtl->xmin = slot->effective_xmin;
> LWLockRelease(ProcArrayLock);

 - In dropdb in dbcommands.c, ereport is invoked referring the
   result of LogicalDecodingCountDBSlots. But it seems better to
   me issueing this exception within LogicalDecodingCountDBSlots
   even if goto is required.

 - In LogStandbySnapshot in standby.c, two complementary
   conditions are imposed on two same unlocks. It might be
   somewhat paranoic but it is safer being like follows,

| XLogRecPtr  recptr = InvalidXLogRecPtr;
| 
|
| /* LogCurrentRunningXacts shoud be done before unlock when logical 
decoding*/ 
| if (wal_level >= WAL_LEVEL_LOGICAL)
|recptr = LogCurrentRunningXacts(running);
| 
| LWLockRelease(ProcArrayLock);
| 
| if (recptr == InvalidXLogRecPtr)
|recptr = LogCurrentRunningXacts(running);
  
 - In tqual.c, in Setup/RevertFrom DecodingSnapshots, the name
   CatalogSnapshotData looks lacking unity with other
   Snapshot*Data's.

= 0007:

 - In heapam.c, the new global variable 'RecentGlobalDataXmin' is
   quite similar to 'RecentGlobalXmin' and does not represents
   what it is. The name should be
   changed. RecentGlobalNonCatalogXmin would be more preferable..

 - Althgough simplly from my teste, the following part in
   heapam.c

> if (IsSystemRelation(scan->rs_rd)
>   || RelationIsAccessibleInLogicalDecoding(scan->rs_rd))
>   heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
> else
>   heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);

   would be readable to be like,

> if (IsSystemRelation(scan->rs_rd)
>   || RelationIsAccessibleInLogicalDecoding(scan->rs_rd))
>   xmin = RecentGlobalXmin
> else
>   xmin = RecentGlobalDataXmin
>   heap_page_prune_opt(scan->rs_rd, buffer, xmin);

index_fetch_heap in indexam.c has similar part to this, and
you coded in latter style in IndexBuildHeapScan in index.c.

 - In procarray.c, you should add documentation for new parameter
   'systable' for GetOldestXmin. And the name 'systable' seems
   somewhat confusing, since its full-splled meaning is
   'including systables'. This name should be changed to
   'include_systable' or 'only_usertable' with inverting or
   something..

0008 and after to come later..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote:
> > Fix a couple of bugs in MultiXactId freezing
> > 
> > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
> > into a multixact to check the members against cutoff_xid.
> 
> > !   /*
> > !* This is a multixact which is not marked LOCK_ONLY, 
> > but which
> > !* is newer than the cutoff_multi.  If the update_xid 
> > is below the
> > !* cutoff_xid point, then we can just freeze the Xmax 
> > in the
> > !* tuple, removing it altogether.  This seems simple, 
> > but there
> > !* are several underlying assumptions:
> > !*
> > !* 1. A tuple marked by an multixact containing a very 
> > old
> > !* committed update Xid would have been pruned away by 
> > vacuum; we
> > !* wouldn't be freezing this tuple at all.
> > !*
> > !* 2. There cannot possibly be any live locking members 
> > remaining
> > !* in the multixact.  This is because if they were 
> > alive, the
> > !* update's Xid would had been considered, via the 
> > lockers'
> > !* snapshot's Xmin, as part the cutoff_xid.
> 
> READ COMMITTED transactions can reset MyPgXact->xmin between commands,
> defeating that assumption; see SnapshotResetXmin().  I have attached an
> isolationtester spec demonstrating the problem.

Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.

> The test spec additionally
> covers a (probably-related) assertion failure, new in 9.3.2.

Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?

> That was the only concrete runtime problem I found during a study of the
> newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.

I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
released the interactions between cutoff_xid/multi would have caused me
to say "back to the drawing" board... I'm not suprised if further things
are lurking there.

>  One thing that
> leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to
> ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId().
> Transactions may start or end between those calls, making the
> GetOldestMultiXactId() result represent a later set of transactions than the
> GetOldestXmin() result.  I suspect that's fine.  New transactions have no
> immediate effect on either cutoff, and transaction end can only increase a
> cutoff.  Using a slightly-lower cutoff than the maximum safe cutoff is always
> okay; consider vacuum_defer_cleanup_age.

Yes, that seems fine to me, with the same reasoning.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Skip hole in log_newpage

2013-12-03 Thread Heikki Linnakangas

The log_newpage function, used to WAL-log a full copy of a page, is 
missing the trick we normally use for full-page images to leave out the 
unused space on the block. That's pretty trivial to implement, so we should.


The place where this matters the most is when building a new B-tree 
index. When wal_level > minimal, all pages in the created index are 
logged with log_newpage, and by default we leave 10% free space on index 
pages. So implementing this reduces the amount of WAL generated by index 
creation by roughly 10%.


Anyone see a problem with this?

- Heikki
*** a/src/backend/access/gin/gininsert.c
--- b/src/backend/access/gin/gininsert.c
***
*** 435,444  ginbuildempty(PG_FUNCTION_ARGS)
  	START_CRIT_SECTION();
  	GinInitMetabuffer(MetaBuffer);
  	MarkBufferDirty(MetaBuffer);
! 	log_newpage_buffer(MetaBuffer);
  	GinInitBuffer(RootBuffer, GIN_LEAF);
  	MarkBufferDirty(RootBuffer);
! 	log_newpage_buffer(RootBuffer);
  	END_CRIT_SECTION();
  
  	/* Unlock and release the buffers. */
--- 435,444 
  	START_CRIT_SECTION();
  	GinInitMetabuffer(MetaBuffer);
  	MarkBufferDirty(MetaBuffer);
! 	log_newpage_buffer(MetaBuffer, false);
  	GinInitBuffer(RootBuffer, GIN_LEAF);
  	MarkBufferDirty(RootBuffer);
! 	log_newpage_buffer(RootBuffer, false);
  	END_CRIT_SECTION();
  
  	/* Unlock and release the buffers. */
*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 83,89  gistbuildempty(PG_FUNCTION_ARGS)
  	START_CRIT_SECTION();
  	GISTInitBuffer(buffer, F_LEAF);
  	MarkBufferDirty(buffer);
! 	log_newpage_buffer(buffer);
  	END_CRIT_SECTION();
  
  	/* Unlock and release the buffer */
--- 83,89 
  	START_CRIT_SECTION();
  	GISTInitBuffer(buffer, F_LEAF);
  	MarkBufferDirty(buffer);
! 	log_newpage_buffer(buffer, true);
  	END_CRIT_SECTION();
  
  	/* Unlock and release the buffer */
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 6207,6222  log_heap_update(Relation reln, Buffer oldbuf,
   * memory and writing them directly to smgr.  If you're using buffers, call
   * log_newpage_buffer instead.
   *
!  * Note: the NEWPAGE log record is used for both heaps and indexes, so do
!  * not do anything that assumes we are touching a heap.
   */
  XLogRecPtr
  log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
! 			Page page)
  {
  	xl_heap_newpage xlrec;
  	XLogRecPtr	recptr;
! 	XLogRecData rdata[2];
  
  	/* NO ELOG(ERROR) from here till newpage op is logged */
  	START_CRIT_SECTION();
--- 6207,6228 
   * memory and writing them directly to smgr.  If you're using buffers, call
   * log_newpage_buffer instead.
   *
!  * If the page follows the standard page layout, with a PageHeader and unused
!  * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
!  * the unused space to be left out from the WAL record, making it smaller.
   */
  XLogRecPtr
  log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
! 			Page page, bool page_std)
  {
  	xl_heap_newpage xlrec;
  	XLogRecPtr	recptr;
! 	XLogRecData rdata[3];
! 
! 	/*
! 	 * Note: the NEWPAGE log record is used for both heaps and indexes, so do
! 	 * not do anything that assumes we are touching a heap.
! 	 */
  
  	/* NO ELOG(ERROR) from here till newpage op is logged */
  	START_CRIT_SECTION();
***
*** 6225,6239  log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
  	xlrec.forknum = forkNum;
  	xlrec.blkno = blkno;
  
  	rdata[0].data = (char *) &xlrec;
  	rdata[0].len = SizeOfHeapNewpage;
  	rdata[0].buffer = InvalidBuffer;
  	rdata[0].next = &(rdata[1]);
  
! 	rdata[1].data = (char *) page;
! 	rdata[1].len = BLCKSZ;
! 	rdata[1].buffer = InvalidBuffer;
! 	rdata[1].next = NULL;
  
  	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
  
--- 6231,6288 
  	xlrec.forknum = forkNum;
  	xlrec.blkno = blkno;
  
+ 	if (page_std)
+ 	{
+ 		/* Assume we can omit data between pd_lower and pd_upper */
+ 		uint16		lower = ((PageHeader) page)->pd_lower;
+ 		uint16		upper = ((PageHeader) page)->pd_upper;
+ 
+ 		if (lower >= SizeOfPageHeaderData &&
+ 			upper > lower &&
+ 			upper <= BLCKSZ)
+ 		{
+ 			xlrec.hole_offset = lower;
+ 			xlrec.hole_length = upper - lower;
+ 		}
+ 		else
+ 		{
+ 			/* No "hole" to compress out */
+ 			xlrec.hole_offset = 0;
+ 			xlrec.hole_length = 0;
+ 		}
+ 	}
+ 	else
+ 	{
+ 		/* Not a standard page header, don't try to eliminate "hole" */
+ 		xlrec.hole_offset = 0;
+ 		xlrec.hole_length = 0;
+ 	}
+ 
  	rdata[0].data = (char *) &xlrec;
  	rdata[0].len = SizeOfHeapNewpage;
  	rdata[0].buffer = InvalidBuffer;
  	rdata[0].next = &(rdata[1]);
  
! 	if (xlrec.hole_length == 0)
! 	{
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 	}
! 	else
! 	{
! 		/* must skip the hole */
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = xlrec.hole_offset;
! 		rdata[1

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Andres Freund

Hi,

On 2013-12-03 13:03:41 +0200, Heikki Linnakangas wrote:
> The log_newpage function, used to WAL-log a full copy of a page, is missing
> the trick we normally use for full-page images to leave out the unused space
> on the block. That's pretty trivial to implement, so we should.
> 
> The place where this matters the most is when building a new B-tree index.
> When wal_level > minimal, all pages in the created index are logged with
> log_newpage, and by default we leave 10% free space on index pages. So
> implementing this reduces the amount of WAL generated by index creation by
> roughly 10%.

Sounds like a good idea to me.

> Anyone see a problem with this?

I haven't looked thoroughly through all callsites, but shouldn't the
vacuumlazy callsite use std = true?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Heikki Linnakangas


On 12/03/2013 01:37 PM, Andres Freund wrote:

I haven't looked thoroughly through all callsites, but shouldn't the
vacuumlazy callsite use std = true?


Well, it's logging an empty page, ie. a page full of zeros. I'm not sure 
if you'd consider that a "standard" page. Like the backup-block code in 
xlog.c, log_newpage actually makes a full page image without the hole if 
pd_lower == 0, even if you pass std = 'true', so the end result is the same.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Andres Freund

On 2013-12-03 13:57:04 +0200, Heikki Linnakangas wrote:
> On 12/03/2013 01:37 PM, Andres Freund wrote:
> >I haven't looked thoroughly through all callsites, but shouldn't the
> >vacuumlazy callsite use std = true?
> 
> Well, it's logging an empty page, ie. a page full of zeros. I'm not sure if
> you'd consider that a "standard" page. Like the backup-block code in xlog.c,
> log_newpage actually makes a full page image without the hole if pd_lower ==
> 0, even if you pass std = 'true', so the end result is the same.

Hm. It should have been PageInit()ed and thus have sensible
pd_lower/upper, right? Otherwise we'd have entered thePageIsNew() branch
above it.
It's obviously not critical, but it seems like a shame to write 8kb when
it's not necessary.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Heikki Linnakangas


On 12/03/2013 02:03 PM, Andres Freund wrote:

On 2013-12-03 13:57:04 +0200, Heikki Linnakangas wrote:

On 12/03/2013 01:37 PM, Andres Freund wrote:

I haven't looked thoroughly through all callsites, but shouldn't the
vacuumlazy callsite use std = true?


Well, it's logging an empty page, ie. a page full of zeros. I'm not sure if
you'd consider that a "standard" page. Like the backup-block code in xlog.c,
log_newpage actually makes a full page image without the hole if pd_lower ==
0, even if you pass std = 'true', so the end result is the same.


Hm. It should have been PageInit()ed and thus have sensible
pd_lower/upper, right? Otherwise we'd have entered thePageIsNew() branch
above it.
It's obviously not critical, but it seems like a shame to write 8kb when
it's not necessary.


Ah, you're right. I was thinking that that is the PageIsNew() branch, 
but it's not.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly

2013-12-03 Thread MauMau


Hello,

I've encountered a small bug and fixed it.  I guess this occurs on all major 
releases.  I saw this happen on 9.2 and 9.4devel.  Please find attached the 
patch and commit this.



[Problem]
If I mistakenly set an invalid value to listen_addresses, say '-1', and 
start the database server, it fails to start as follows.  In my environment 
(RHEL6 for Intel64), it takes about 15 seconds before postgres prints the 
messages.  This is OK.


[maumau@myhost pgdata]$ pg_ctl -w start
waiting for server to startLOG:  could not translate 
host name "-1", service "5450" to address: Temporary failure in name 
resolution

WARNING:  could not create listen socket for "-1"
FATAL:  could not create any TCP/IP sockets
stopped waiting
pg_ctl: could not start server
Examine the log output.
[maumau@myhost pgdata]$

When I start the server without -w and try to stop it, "pg_ctl stop" waits 
for 60 seconds and timed out before it fails.  This is what I'm seeing as a 
problem.  I expected "pg_ctl stop" to respond quickly with success or 
failure depending on the timing.


[maumau@myhost pgdata]$ pg_ctl start
server starting
...(a few seconds later)
[maumau@myhost ~]$ pg_ctl stop
waiting for server to shut 
down.

.. failed
pg_ctl: server does not shut down
HINT: The "-m fast" option immediately disconnects sessions rather than
waiting for session-initiated disconnection.
[maumau@myhost ~]$


[Cause]
The problem occurs in the sequence below:

1. postmaster creates $PGDATA/postmaster.pid.
2. postmaster tries to resolve the value of listen_addresses to IP 
addresses.  This took about 15 seconds in my failure scenario.

3. During 2, pg_ctl sends SIGTERM to postmaster.
4. postmaster terminates immediately without deleting 
$PGDATA/postmaster.pid.  This is because it hasn't set signal handlers yet.
5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears. 
But the file does not disappear and it times out.



[Fix]
Make pg_ctl check if postmaster is still alive, because postmaster might 
have crashed unexpectedly.



Regards
MauMau


pg_stop_fail.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 6:03 AM, Heikki Linnakangas
 wrote:
> The log_newpage function, used to WAL-log a full copy of a page, is missing
> the trick we normally use for full-page images to leave out the unused space
> on the block. That's pretty trivial to implement, so we should.
>
> The place where this matters the most is when building a new B-tree index.
> When wal_level > minimal, all pages in the created index are logged with
> log_newpage, and by default we leave 10% free space on index pages. So
> implementing this reduces the amount of WAL generated by index creation by
> roughly 10%.
>
> Anyone see a problem with this?

Looks good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Robert Haas

On Mon, Dec 2, 2013 at 11:26 PM, Noah Misch  wrote:
> On Mon, Dec 02, 2013 at 08:56:03PM -0500, Tom Lane wrote:
>> Noah Misch  writes:
>> > ... I propose merely changing the syntax to "TABLE FOR ROWS (...)".
>>
>> Ugh :-(.  Verbose and not exactly intuitive, I think.  I don't like
>> any of the other options you listed much better.  Still, the idea of
>> using more than one word might get us out of the bind that a single
>> word would have to be a fully reserved one.
>>
>> > ROWS FROM
>>
>> This one's a little less awful than the rest.  What about "ROWS OF"?
>
> I had considered ROWS OF and liked it, but I omitted it from the list on
> account of the shift/reduce conflict from a naturally-written Bison rule.
> Distinguishing it from a list of column aliases takes extra look-ahead.  We
> could force that to work.  However, if we ever wish to allow an arbitrary
> from_item in the list, it would become ambiguous: is this drawing rows from
> "a" or just using an alias with a column list?
>
>   WITH a AS (SELECT oid FROM pg_am ORDER BY 1) SELECT * FROM rows of(a, a);
>
> ROWS FOR is terse and conflict-free.  "FOR" evokes the resemblance to looping
> over the parenthesized section with the functions acting as generators.

I like the idea of using ROWS + some additional word.  I think I
mildly prefer Tom's suggestion of ROWS FROM to your suggestion of ROWS
FOR, but I can live with either one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.7

2013-12-03 Thread Andres Freund

On 2013-11-29 01:16:39 -0500, Robert Haas wrote:
> On Thu, Nov 14, 2013 at 8:46 AM, Andres Freund  wrote:
> > [ new patches ]
> 
> Here's an updated version of patch #2.  I didn't really like the
> approach you took in the documentation, so I revised it.

Fair enough.

> Apart from that, I spent a lot of time looking at
> HeapSatisfiesHOTandKeyUpdate.  I'm not very happy with your changes.
> The idea seems to be that we'll iterate through all of the HOT columns
> regardless, but that might be very inefficient.  Suppose there are 100
> HOT columns, the last one is the only key column, and only the first
> one has been modified.  Once we look at #1 and determine that it's not
> HOT, we should zoom forward and skip over the next 98, and only look
> at the last one; your version does not behave like that.

Well, the hot bitmap will only contains indexed columns, so for that's
only going to happen if there's actually indexes over all those
columns. And in that case it seems unlikely that the performance
of that routine matters.
That said, keeping the old performance characteristics seems like a good
idea to me. Not sure anymore why I changed it that way.

>  I've taken a crack at rewriting
> this logic, and the result looks cleaner and simpler to me, but I
> haven't tested it beyond the fact that it passes make check.  See what
> you think.

Hm. I think it actually will not abort early in call cases either, but
that looks fixable. Imagine what happens if id_attrs or key_attrs is
empty, ISTM that we'll check all hot columns in that case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 1:31 AM, Jeff Davis  wrote:
> On Sun, 2013-12-01 at 15:48 +0100, Dimitri Fontaine wrote:
>> Jeff Davis  writes:
>> > I don't see why we are trying to accommodate a case where the author
>> > doesn't offer enough full SQL scripts and offers broken downgrade
>> > scripts; or why that case is different from offering broken upgrade
>> > scripts.
>>
>> That's fair enough I guess. I will work on automating the choice of the
>> first full script to use then, for next patch version.
>
> Can we separate this feature out? It's an issue with extensions today,
> and I'm eager to make some progress after the explosion of differing
> opinions today.

+1 for separating that part out.  I thought it was separated, at some point.

> Robert, do you think this is an acceptable approach to solve your pet
> peeve here:
>
> http://www.postgresql.org/message-id/CA
> +tgmoae3qs4qbqfxouzzfxrsxa0zy8ibsoysuutzdumpea...@mail.gmail.com

I'd need to look exactly what's being proposed in more detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.7

2013-12-03 Thread Andres Freund

Hi,

On 2013-11-28 21:15:18 -0500, Robert Haas wrote:
> OK, I've committed the patch to adjust the definition of
> IsSystemRelation()/IsSystemClass() and add
> IsCatalogRelation()/IsCatalogClass().

Thanks for taking care of this!

>  I kibitzed your decision about
> which function to use in a few places - specifically, I made all of
> the places that cared about allow_system_table_mods uses the IsSystem
> functions, and all the places that cared about invalidation messages
> use the IsCatalog functions.  I don't think any of these changes are
> more cosmetic, but I think it may reduce the chance of errors or
> inconsistencies in the face of future changes.

Agreed.

Do you think we need to do anything about the
ERROR:  cannot remove dependency on schema pg_catalog because it is a system 
object
thingy? Imo the current state is much more consistent than the earlier
one, but that's still a quite surprising leftover...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder

2013-12-03 Thread MauMau


Hello,

The attached patch implements the below proposal, and moves libecpg.dll, 
libecpg_compat.dll, and libpgtypes.dll from lib to bin folder on Windows, 
where they should be.


http://www.postgresql.org/message-id/10470B394DB8486F93AC60107CC44C8B@maumau

As Andrew-san said, I don't expect it to be back-ported.

In addition, I'll remove libpq.dll from lib folder unless somebody objects. 
Currently, libpq.dll is placed in both bin and lib.  I guess libpq.dll was 
left in lib because it was considered necessary for ECPG DLLs.


Would removing libpq.dll from lib cause any problem?  No.  The following 
modules in lib need libpq.dll when they are loaded by postgres.exe by 
LoadLibrary().  But the necessary libpq.dll is loaded from bin folder. 
According to the DLL search order rule, DLLs are loaded from the same folder 
as the loading program, which is postgres.exe in our case.


dblink.dll
libpqwalreceiver.dll
postgres_fdw.dll

Regards
MauMau


ECPG_DLL_not_move_libpq.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Greg Stark

On Mon, Dec 2, 2013 at 7:46 PM, Robert Haas  wrote:
>> Just tossing an idea out there. What if you could install an extension
>> by specifying not a local file name but a URL. Obviously there's a
>> security issue but for example we could allow only https URLs with
>> verified domain names that are in a list of approved domain names
>> specified by a GUC.
>
> That's a different feature, but I don't see anything preventing
> someone from implementing that as an extension, today, without any
> core support at all.  It would only be usable in cases where the share
> directory is writable by the database server (i.e. low-security
> installations) and you'd have to make it a function call rather than
> piggybacking on CREATE EXTENSION, but neither of those things sound
> bad to me.  (And if they are bad, they could be addressed by providing
> hooks or event triggers, leaving the rest of the functionality in the
> extension module.)

Well none of this isn't implementable as an extension if you have
write access to the database server's share directory.

This is all about UI. CREATE EXTENSION is about having the core do the
bookkeeping about which files belong to which version of which
extension.

I thought the fundamental problem the "in-catalog" extensions were
trying to solve were the issue with not having access to the
filesystem. If that's the case then being able to say create extension
from http://... would solve that.

If the fundamental problem is that you want multi-tenant databases to
be able to have different .so files visible depending on which
database is opened then that's a bit trickier.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> > On 3 December 2013 02:02, Dimitri Fontaine  wrote:
> > ISTM that the real solution to this particular problem is to decouple
> > the extensions that are currently in contrib from a specific postgres
> > version.
> 
> "Problem"?  It's not a bug that you get hstore 1.2 when you dump from 9.2
> and reload into 9.3; that's a feature.  You wanted an upgrade, presumably,

I don't buy this argument at *all* and it's not going to fly when we've
got multiple versions of an extension available concurrently.  I'm
willing to accept that we have limitations when it comes from a
packaging perspective (today) around extensions but the notion that my
backup utility should intentionally omit information which is required
to restore the database to the same state it was in is ridiculous.

> or you'd not have been going to 9.3 in the first place.  

This notion that a given major version of PG only ever has one version
of an extension associated with it is *also* wrong and only makes any
sense for contrib extensions- which are the exception rather than the
rule today.

> The entire reason
> why the extension mechanism works like it does is to allow that sort of
> reasonably-transparent upgrade.  It would not be a step forward to break
> that by having pg_dump prevent it (which it would fail to do anyway,
> likely, since the receiving installation might not have 1.1 available
> to install).

I agree that there should be a way to *allow* such an upgrade to happen
transparently and perhaps we keep it the way it is for contrib
extensions as a historical artifact, but other extensions are
independent of the PG major version and multiple versions will be
available concurrently for them and having pg_dump willfully ignore the
extension version is a receipe for broken backups.

Step back and consider a user who is just trying to restore his backup
of his 9.2 database into a new server, also with 9.2, as quickly as he
can to get his system online again.  Having four different versions of
extension X installed and available for 9.2, no clue or information
about which version was installed into which databases and getting
mysterious failures and errors because they're not all compatible is not
what anyone is going to want to deal with in that situation.

I certainly don't see extensions (outside of contrib) in the general
sense as being either tied to specific PG versions or being required to
maintain the same API that they started with on day 1.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Dimitri Fontaine

Robert Haas  writes:
>> Can we separate this feature out? It's an issue with extensions today,
>> and I'm eager to make some progress after the explosion of differing
>> opinions today.
>
> +1 for separating that part out.  I thought it was separated, at some point.

  
http://www.postgresql.org/message-id/caltqxtetvi-exhdbspuey3trthug51eswuod8cur2t+rxtg...@mail.gmail.com
  
  http://www.postgresql.org/message-id/m2r4k8jpfl@2ndquadrant.fr

The only way for to bugfix all the reported problems had been to have
regression testing… and it's become a necessary dependency of the
extension templates patch, so I just included it in.

My interdependent git branches development fu seems to have totally
disappeared after the main extension patch that needed 7 of thoses…

> I'd need to look exactly what's being proposed in more detail.

What I did propose is a new GUC default_full_version:

+   default_full_version 
(string)
+   
+
+ This option allows an extension author to avoid shiping all versions
+ of all scripts when shipping an extension. When a version is requested
+ and the matching script does not exist on disk,
+ set default_full_version to the first
+ script you still ship and PostgreSQL will apply the intermediate
+ upgrade script as per the ALTER EXTENSION UPDATE
+ command.
+
+
+ For example, say you did provide the extension pair
+ version 1.0 and are now providing the
+ version 1.1. If you want both current and new users
+ to be able to install the new version, you can provide both the
+ scripts pair--1.0--1.1.sql
+ and pair--1.1.sql, adding to the already
+ existing pair--1.0.sql.
+
+
+ When specifying default_version
+ and default_full_version = 1.0 you can instead only
+ provide only the scripts pair--1.0.sql
+ and pair-1.0--1.1.sql. The CREATE
+ EXTENSION pair; will then automatically use the afore
+ mentionned scripts to install version 1.0 then update it to 1.1.
+
+   

What Jeff is proposing is to simplify that down and have PostgreSQL auto
discover the upgrade cycle when the version asked for isn't directly
available with a creation script.

We would keep the behavior depicted here, just in a fully automated way.

Working on a separate patch for that, then.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Tom Dunstan (pg...@tomd.cc) wrote:
> Extensions in contrib live in a weird place. Totally builtin stuff
> should obviously be dumped without versions, and stuff which is
> completely separate and follows its own release schedule should
> obviously be versioned. I guess we consider all modules in contrib to
> offer the same transparent upgrade guarantees as builtins, so they
> shouldn't be versioned, but it feels like some of them should be, if
> only because some aren't particularly tied in to the backend all that
> tightly. But I guess that's a bogus metric, the true metric is whether
> we want people to treat them as basically built-in, with the upgrade
> guarantees that go along with that.

Note that we don't actually make guarantees about either builtins or
contrib modules when it comes to major version upgrades.  The current
way we package contrib and how we tie contrib releases to PG releases
means that we can get away with omitting the version and saying "well,
if you restore to the same PG major version then you'll get the same
contrib extension version, and if you restore into a different version
then obviously you want the version of contrib with that major
version" but that *only* works for contrib.  We need to accept that
other extensions exist and that they aren't tied to PG major versions
nor to our release schedule.  There's a lot more extensions out there
today which are *not* in contrib than there are ones which *are*.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Trust intermediate CA for client certificates

2013-12-03 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> Uh, this thread actually started with Ian's feature request, and has
> changed to document the current behavior.

Whoops, apologies for that then- I clearly came into it (or perhaps more
accurately, was brought into it) after the start of the thread.  I'm
happy to start a new thread for the discussion around documenting the
current behavior, if that would help. :)

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Jeff Davis (pg...@j-davis.com) wrote:
> Stephen mentioned using external tools and/or metadata, but to me that
> sounds like it would require porting the extension away from what's on
> PGXN today.

Not at all- and that'd be the point.  An external tool could take the
PGXN extension, run 'make', then 'make install' (into a userland
directory), extract out the script and then, with a little help from PG,
run that script in "extension creation mode" via libpq.

Another option, which I generally like better, is to have a new package
format for PGXN that contains the results of "make install",
more-or-less, synonymous to Debian source vs. .deb packages.

Perhaps we could even have psql understand that format and be able to
install the extension via a backslash command instead of having an
external tool, but I think an external tool for dependency tracking and
downloading of necessary dependencies ala Debian would be better than
teaching psql to do that.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Noah Misch

On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote:
> > > Fix a couple of bugs in MultiXactId freezing
> > > 
> > > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
> > > into a multixact to check the members against cutoff_xid.
> > 
> > > ! /*
> > > !  * This is a multixact which is not marked 
> > > LOCK_ONLY, but which
> > > !  * is newer than the cutoff_multi.  If the 
> > > update_xid is below the
> > > !  * cutoff_xid point, then we can just freeze 
> > > the Xmax in the
> > > !  * tuple, removing it altogether.  This seems 
> > > simple, but there
> > > !  * are several underlying assumptions:
> > > !  *
> > > !  * 1. A tuple marked by an multixact containing 
> > > a very old
> > > !  * committed update Xid would have been pruned 
> > > away by vacuum; we
> > > !  * wouldn't be freezing this tuple at all.
> > > !  *
> > > !  * 2. There cannot possibly be any live locking 
> > > members remaining
> > > !  * in the multixact.  This is because if they 
> > > were alive, the
> > > !  * update's Xid would had been considered, via 
> > > the lockers'
> > > !  * snapshot's Xmin, as part the cutoff_xid.
> > 
> > READ COMMITTED transactions can reset MyPgXact->xmin between commands,
> > defeating that assumption; see SnapshotResetXmin().  I have attached an
> > isolationtester spec demonstrating the problem.
> 
> Any idea how to cheat our way out of that one given the current way
> heap_freeze_tuple() works (running on both primary and standby)? My only
> idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> We can't even realistically create a new multixact with fewer members
> with the current format of xl_heap_freeze.

Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
another injection of complexity.

> > The test spec additionally
> > covers a (probably-related) assertion failure, new in 9.3.2.
> 
> Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> last seems actually unrelated, I am not sure why it's 9.3.2
> only. Alvaro, are you looking?

(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)

> > That was the only concrete runtime problem I found during a study of the
> > newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.
> 
> I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
> released the interactions between cutoff_xid/multi would have caused me
> to say "back to the drawing" board... I'm not suprised if further things
> are lurking there.

heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
spurious lock contention due to wraparound of the multixact space.  The
comment is gone, and that mechanism no longer poses a threat.  However, a
non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
XIDs, just updater XIDs) may cause similar spurious contention.

> + /*
> +  * The multixact has an update hidden within.  
> Get rid of it.
> +  *
> +  * If the update_xid is below the cutoff_xid, 
> it necessarily
> +  * must be an aborted transaction.  In a 
> primary server, such
> +  * an Xmax would have gotten marked invalid by
> +  * HeapTupleSatisfiesVacuum, but in a replica 
> that is not
> +  * called before we are, so deal with it in the 
> same way.
> +  *
> +  * If not below the cutoff_xid, then the tuple 
> would have been
> +  * pruned by vacuum, if the update committed 
> long enough ago,
> +  * and we wouldn't be freezing it; so it's 
> either recently
> +  * committed, or in-progress.  Deal with this 
> by setting the
> +  * Xmax to the update Xid directly and remove 
> the IS_MULTI
> +  * bit.  (We know there cannot be running 
> lockers in this
> +  * multi, because it's below the cutoff_multi 
> value.)
> +  */
> +
> + if (TransactionIdPrecedes(update_

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Greg Stark (st...@mit.edu) wrote:
> I thought the fundamental problem the "in-catalog" extensions were
> trying to solve were the issue with not having access to the
> filesystem. If that's the case then being able to say create extension
> from http://... would solve that.

That's not really 'solved' unless you feel we can depend on that "create
extension from URL" to work at pg_restore time...  I wouldn't have
guessed that people would accept that, but I've already been wrong about
such things in this thread once.

Thanks,

Stephen

signature.asc
Description: Digital signature

[HACKERS] Recovery bug in GIN, missing full page image

2013-12-03 Thread Heikki Linnakangas

While looking at Alexander's GIN patch, I noticed an ancient bug in the 
WAL-logging of GIN entry-tree insertions. entryPlaceToPage and 
dataPlacetoPage functions don't make a full-page image of the page, when 
inserting a downlink on a non-leaf page. The comment says:



/*
 * Prevent full page write if child's split occurs. That is needed to
 * remove incomplete splits while replaying WAL
 *
 * data.updateBlkno contains new block number (of newly created right
 * page) for recently splited page.
 */


The code is doing what the comment says, but that's wrong. You can't 
just skip the full page write, it's needed for torn page protection like 
in any other case. The correct fix would've been to change the 
redo-routine to do the incomplete split tracking for the page even if 
it's restored from a full page image.


This was broken by this commit back in 2007:


commit 853d1c3103fa961ae6219f0281885b345593d101
Author: Teodor Sigaev 
Date:   Mon Jun 4 15:56:28 2007 +

Fix bundle bugs of GIN:
- Fix possible deadlock between UPDATE and VACUUM queries. Bug never was
  observed in 8.2, but it still exist there. HEAD is more sensitive to
  bug after recent "ring" of buffer improvements.
- Fix WAL creation: if parent page is stored as is after split then
  incomplete split isn't removed during replay. This happens rather rare, 
only
  on large tables with a lot of updates/inserts.
- Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left
  page after deletion of page. That causes wrong rightlink field: it pointed
  to deleted page.
- add checking of match of clearing incomplete split
- cleanup incomplete split list after proceeding

All of this chages doesn't change on-disk storage, so backpatch...
But second point may be an issue for replaying logs from previous version.


The relevant part is the "Fix WAL creation" item. I searched the 
archives but couldn't find any discussion leading to this fix.


In 2010, Tom actually fixed the redo-routine in commit 
4016bdef8aded77b4903c457050622a5a1815c16, along with other fixes. So all 
we need to do now is to fix that bogus logic in entryPlaceToPage to not 
skip the full-page-image.


Attached is a patch for 9.3. I've whacked the code a lot in master, but 
the same bug is present there too.


- Heikki
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 13ab448..32eba4c 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -381,7 +381,6 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
 {
 	Page		page = BufferGetPage(buf);
 	int			sizeofitem = GinSizeOfDataPageItem(page);
-	int			cnt = 0;
 
 	/* these must be static so they can be returned to caller */
 	static XLogRecData rdata[3];
@@ -401,32 +400,25 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
 	data.isLeaf = GinPageIsLeaf(page) ? TRUE : FALSE;
 
 	/*
-	 * Prevent full page write if child's split occurs. That is needed to
-	 * remove incomplete splits while replaying WAL
-	 *
-	 * data.updateBlkno contains new block number (of newly created right
-	 * page) for recently splited page.
+	 * For incomplete-split tracking, we need the updateBlkno information and
+	 * the inserted item even when we make a full page image of the page, so
+	 * put the buffer reference in a separate XLogRecData entry.
 	 */
-	if (data.updateBlkno == InvalidBlockNumber)
-	{
-		rdata[0].buffer = buf;
-		rdata[0].buffer_std = FALSE;
-		rdata[0].data = NULL;
-		rdata[0].len = 0;
-		rdata[0].next = &rdata[1];
-		cnt++;
-	}
+	rdata[0].buffer = buf;
+	rdata[0].buffer_std = FALSE;
+	rdata[0].data = NULL;
+	rdata[0].len = 0;
+	rdata[0].next = &rdata[1];
 
-	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (char *) &data;
-	rdata[cnt].len = sizeof(ginxlogInsert);
-	rdata[cnt].next = &rdata[cnt + 1];
-	cnt++;
+	rdata[1].buffer = InvalidBuffer;
+	rdata[1].data = (char *) &data;
+	rdata[1].len = sizeof(ginxlogInsert);
+	rdata[1].next = &rdata[2];
 
-	rdata[cnt].buffer = InvalidBuffer;
-	rdata[cnt].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
-	rdata[cnt].len = sizeofitem;
-	rdata[cnt].next = NULL;
+	rdata[2].buffer = InvalidBuffer;
+	rdata[2].data = (GinPageIsLeaf(page)) ? ((char *) (btree->items + btree->curitem)) : ((char *) &(btree->pitem));
+	rdata[2].len = sizeofitem;
+	rdata[2].next = NULL;
 
 	if (GinPageIsLeaf(page))
 	{
@@ -442,7 +434,7 @@ dataPlaceToPage(GinBtree btree, Buffer buf, OffsetNumber off, XLogRecData **prda
 btree->curitem++;
 			}
 			data.nitem = btree->curitem - savedPos;
-			rdata[cnt].len = sizeofitem * data.nitem;
+			rdata[2].len = sizeofitem * data.nitem;
 		}
 		else
 		{
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 8ead38f..f6c06

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Tom Lane

Noah Misch  writes:
> On Mon, Dec 02, 2013 at 08:56:03PM -0500, Tom Lane wrote:
>> Ugh :-(.  Verbose and not exactly intuitive, I think.  I don't like
>> any of the other options you listed much better.  Still, the idea of
>> using more than one word might get us out of the bind that a single
>> word would have to be a fully reserved one.

> I had considered ROWS OF and liked it, but I omitted it from the list on
> account of the shift/reduce conflict from a naturally-written Bison rule.
> Distinguishing it from a list of column aliases takes extra look-ahead.

Hmm, yeah, you're right --- at least one of the first two words needs
to be reserved (not something that can be a ColId, at least), or else
we can't tell them from a table name and alias.  So this approach
doesn't give us all that much extra wiggle room.  We do have a number
of already-fully-reserved prepositions (FOR, FROM, IN, ON, TO) but none
of them seem like great choices.

> ROWS FOR is terse and conflict-free.  "FOR" evokes the resemblance to looping
> over the parenthesized section with the functions acting as generators.

Meh.  I don't find that analogy compelling.

After sleeping on it, your other suggestion of TABLE OF, or possibly
TABLE FROM, is starting to grow on me.

Who else has an opinion?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Tom Lane

Andres Freund  writes:
> On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
>> The test spec additionally
>> covers a (probably-related) assertion failure, new in 9.3.2.

> Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> last seems actually unrelated, I am not sure why it's 9.3.2
> only. Alvaro, are you looking?

Is this bad enough that we need to re-wrap the release?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip hole in log_newpage

2013-12-03 Thread Tom Lane

Heikki Linnakangas  writes:
> The log_newpage function, used to WAL-log a full copy of a page, is 
> missing the trick we normally use for full-page images to leave out the 
> unused space on the block. That's pretty trivial to implement, so we should.

> Anyone see a problem with this?

+1.  Don't forget to bump the WAL version identifier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed feature: Selective Foreign Keys

2013-12-03 Thread Robert Haas

On Mon, Dec 2, 2013 at 6:08 PM, Tom Dunstan  wrote:
> On 3 Dec 2013, at 03:37, Robert Haas  wrote:
>> I also like this feature.   It would be really neat if a FOREIGN KEY
>> constraint with a WHERE clause could use a *partial* index on the
>> foreign table provided that the index would be guaranteed to be predOK
>> for all versions of the foreign key checking query.  That might be
>> hard to implement, though.
>
> Well, with this patch, under the hood the FK query is doing (in the case of 
> RESTRICT):
>
> SELECT 1 FROM ONLY "public"."comment" x WHERE (the id) OPERATOR(pg_catalog.=) 
> "parent_id" AND (parent_entity = 'event') FOR KEY SHARE OF x;
>
> If we stick a partial index on the column, disable seq scans and run the 
> query, we get:
>
> tom=# create index comment_event_id on comment (parent_id) where 
> parent_entity = 'event';
> CREATE INDEX
> tom=# set enable_seqscan = off;
> SET
> tom=# explain SELECT 1 FROM ONLY "public"."comment" x WHERE 20 
> OPERATOR(pg_catalog.=) "parent_id" AND (parent_entity = 'event') FOR KEY 
> SHARE OF x;
>QUERY PLAN
> 
>  LockRows  (cost=0.12..8.15 rows=1 width=6)
>->  Index Scan using comment_event_id on comment x  (cost=0.12..8.14 
> rows=1 width=6)
>  Index Cond: (20 = parent_id)
>  Filter: (parent_entity = 'event'::commentable_entity)
> (4 rows)
>
> Is that what you had in mind?

Yeah, more or less, but the key is ensuring that it wouldn't let you
create the constraint in the first place if the partial index
specified *didn't* match the WHERE clause.  For example, suppose the
partial index says WHERE parent_entity = 'event' but the constraint
definition is WHERE parent_event = 'somethingelse'.  That ought to
fail, just as creating a regular foreign constraint will fail if
there's no matching unique index.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> >> The test spec additionally
> >> covers a (probably-related) assertion failure, new in 9.3.2.
> 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> Is this bad enough that we need to re-wrap the release?

Tentatively I'd say no, the only risk is loosing locks afaics. Thats
much bettter than corrupting rows as in 9.3.1. But I'll look into it in
a bit more detail as soon as I am of the phone call I am on.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> After sleeping on it, your other suggestion of TABLE OF, or possibly
> TABLE FROM, is starting to grow on me.
> 
> Who else has an opinion?

Alright, for my 2c, I like having this syntax include 'TABLE' simply
because it's what folks coming from Oracle might be looking for.
Following from that, to keep it distinct from the spec's notion of
'TABLE', my preference is 'TABLE FROM'.  I don't particularly like
'TABLE OF', nor do I like the various 'ROWS' suggestions.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 8:44 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> > On 3 December 2013 02:02, Dimitri Fontaine  wrote:
>> > ISTM that the real solution to this particular problem is to decouple
>> > the extensions that are currently in contrib from a specific postgres
>> > version.
>>
>> "Problem"?  It's not a bug that you get hstore 1.2 when you dump from 9.2
>> and reload into 9.3; that's a feature.  You wanted an upgrade, presumably,
>
> I don't buy this argument at *all* and it's not going to fly when we've
> got multiple versions of an extension available concurrently.  I'm
> willing to accept that we have limitations when it comes from a
> packaging perspective (today) around extensions but the notion that my
> backup utility should intentionally omit information which is required
> to restore the database to the same state it was in is ridiculous.
>
>> or you'd not have been going to 9.3 in the first place.
>
> This notion that a given major version of PG only ever has one version
> of an extension associated with it is *also* wrong and only makes any
> sense for contrib extensions- which are the exception rather than the
> rule today.
>
>> The entire reason
>> why the extension mechanism works like it does is to allow that sort of
>> reasonably-transparent upgrade.  It would not be a step forward to break
>> that by having pg_dump prevent it (which it would fail to do anyway,
>> likely, since the receiving installation might not have 1.1 available
>> to install).
>
> I agree that there should be a way to *allow* such an upgrade to happen
> transparently and perhaps we keep it the way it is for contrib
> extensions as a historical artifact, but other extensions are
> independent of the PG major version and multiple versions will be
> available concurrently for them and having pg_dump willfully ignore the
> extension version is a receipe for broken backups.
>
> Step back and consider a user who is just trying to restore his backup
> of his 9.2 database into a new server, also with 9.2, as quickly as he
> can to get his system online again.  Having four different versions of
> extension X installed and available for 9.2, no clue or information
> about which version was installed into which databases and getting
> mysterious failures and errors because they're not all compatible is not
> what anyone is going to want to deal with in that situation.

I think Tom's original idea here was that new versions of extensions
*shouldn't ever* be backward-incompatible, and therefore if this
problem arises it's the extension author's fault.  It isn't however
clear that this dream is likely to be realized in practice.  For
example, the only difference between hstore 1.0 and hstore 1.1 is that
we dropped the => operator, for the very good reason that we have been
slowly working towards deprecating => as an operator name so that we
can eventually use it for the purpose that the SQL standard specifies.
 Given that we've done it in core, we can hardly say that no one will
ever do this anywhere else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
> consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
> already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
> another injection of complexity.

I think its pretty much checked that way already, but the problem seems
to be how to avoid checks on xid commit/abort in that case. I've
complained in 20131121200517.gm7...@alap2.anarazel.de that the old
pre-condition that multixacts aren't checked when they can't be relevant
(via OldestVisibleM*) isn't observed anymore.
So, if we re-introduce that condition again, we should be on the safe
side with that, right?

> > > The test spec additionally
> > > covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> regression.)

Yea, I just don't see why yet... Looking now.

> heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
> spurious lock contention due to wraparound of the multixact space.  The
> comment is gone, and that mechanism no longer poses a threat.  However, a
> non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
> XIDs, just updater XIDs) may cause similar spurious contention.

Yea, I noticed that that comment was missing as well. I think what we
should do now is to rework freezing in HEAD to make all this more
reasonable.

> Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a
> pre-9.3 world.  Most or all of that isn't new with the patch at hand, but it
> does complicate study.

Yea, Alvaro sent a patch for that somewhere, it seems a patch in the
series got lost when foreign key locks were originally applied.

I think we seriously need to put a good amount of work into the
multixact.c stuff in the next months. Otherwise it will be a maintenance
nightmore for a fair bit more time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 8:43 AM, Dimitri Fontaine  wrote:
> What Jeff is proposing is to simplify that down and have PostgreSQL auto
> discover the upgrade cycle when the version asked for isn't directly
> available with a creation script.
>
> We would keep the behavior depicted here, just in a fully automated way.
>
> Working on a separate patch for that, then.

I like the idea of making it automatic, but it won't work in all
cases.  For example, suppose someone ships 1.0, 1.0--1.2, 1.1, and
1.1--1.2.  Since versions aren't intrinsically ordered, the system has
no way of knowing whether it's preferable to run 1.0 and then 1.0--1.2
or instead run 1.1 and then 1.1--1.2.  So I think we will need either
to introduce a way of ordering version numbers (which Tom has
previously opposed) or some concept like your default_full_version.

In more normal cases, however, the system can (and probably should)
figure out what was intended by choosing the *shortest* path to get to
the intended version.  For example, if someone ships 1.0, 1.0--1.1,
1.1, and 1.1--1.2, the system should choose to run 1.1 and then
1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2.  But that can
be automatic: only if there are two paths of equal length (as in the
example in the previous paragraph) do we need help from the user to
figure out what to do.

We should also consider the possibility of a user trying to
deliberately install and older release.  For example, if the user has
1.0, 1.0--1.1, 1.1, 1.1--1.2, and 1.2--1.0 (a downgrade script) with
default_full_version = 1.2, an attempt to install 1.0 should run just
the 1.0 script, NOT 1.2 and then 1.2--1.0.

Putting all that together, I'm inclined to suggest that what we really
need is a LIST of version numbers, rather than just one.  If there one
path to the version we're installing is shorter than any other, we
choose that, period.  If there are multiple paths of equal length, we
break the tie by choosing which version number appears first in the
aforementioned list.  If that still doesn't break the tie, either
because none of the starting points are mentioned in that list or
because there are multiple equal-length paths starting in the same
place, we give up and emit an error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > > The test spec additionally
> > > covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> regression.)

The backtrace for the Assert() you found is:

#4  0x004f1da5 in CreateMultiXactId (nmembers=2, members=0x1ce17d8)
at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:708
#5  0x004f1aeb in MultiXactIdExpand (multi=6241831, xid=6019366, 
status=MultiXactStatusUpdate)
at /home/andres/src/postgresql/src/backend/access/transam/multixact.c:462
#6  0x004a5d8e in compute_new_xmax_infomask (xmax=6241831, 
old_infomask=4416, old_infomask2=16386, add_to_xmax=6019366, 
mode=LockTupleExclusive, is_update=1 '\001', result_xmax=0x7fffca02a700, 
result_infomask=0x7fffca02a6fe, 
result_infomask2=0x7fffca02a6fc) at 
/home/andres/src/postgresql/src/backend/access/heap/heapam.c:4651
#7  0x004a2d27 in heap_update (relation=0x7f9fc45cc828, 
otid=0x7fffca02a8d0, newtup=0x1ce1740, cid=0, crosscheck=0x0, 
wait=1 '\001', hufd=0x7fffca02a850, lockmode=0x7fffca02a82c) at 
/home/andres/src/postgresql/src/backend/access/heap/heapam.c:3304
#8  0x00646f04 in ExecUpdate (tupleid=0x7fffca02a8d0, oldtuple=0x0, 
slot=0x1ce12c0, planSlot=0x1ce0740, epqstate=0x1ce0120, 
estate=0x1cdfe98, canSetTag=1 '\001') at 
/home/andres/src/postgresql/src/backend/executor/nodeModifyTable.c:690

So imo it isn't really a new problem, it existed all along :/. We only
don't hit it in your terstcase before because we spuriously thought that
a tuple was in-progress if *any* member of the old multi were still
running in some cases instead of just the updater. But I am pretty sure
it can also reproduced in 9.3.1.

Imo the MultiXactIdSetOldestMember() call in heap_update() needs to be
moved outside of the if (satisfies_key). Everything else is vastly more
complex.
Alvaro, correct?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Noah Misch

On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
> On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote:
> > > Any idea how to cheat our way out of that one given the current way
> > > heap_freeze_tuple() works (running on both primary and standby)? My only
> > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > > We can't even realistically create a new multixact with fewer members
> > > with the current format of xl_heap_freeze.
> > 
> > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update 
> > XID
> > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers 
> > are
> > already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
> > another injection of complexity.
> 
> I think its pretty much checked that way already, but the problem seems
> to be how to avoid checks on xid commit/abort in that case. I've
> complained in 20131121200517.gm7...@alap2.anarazel.de that the old
> pre-condition that multixacts aren't checked when they can't be relevant
> (via OldestVisibleM*) isn't observed anymore.
> So, if we re-introduce that condition again, we should be on the safe
> side with that, right?

What specific commit/abort checks do you have in mind?

> > > > The test spec additionally
> > > > covers a (probably-related) assertion failure, new in 9.3.2.
> > > 
> > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > > last seems actually unrelated, I am not sure why it's 9.3.2
> > > only. Alvaro, are you looking?
> > 
> > (For clarity, the other problem demonstrated by the test spec is also a 
> > 9.3.2
> > regression.)
> 
> Yea, I just don't see why yet... Looking now.

Sorry, my original report was rather terse.  I speak of the scenario exercised
by the second permutation in that isolationtester spec.  The multixact is
later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
does freeze it to InvalidTransactionId per the code I cited in my first
response on this thread, which wrongly removes a key lock.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> Sorry, my original report was rather terse.  I speak of the scenario exercised
> by the second permutation in that isolationtester spec.  The multixact is
> later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> does freeze it to InvalidTransactionId per the code I cited in my first
> response on this thread, which wrongly removes a key lock.

That one is clear, I was only confused about the Assert() you
reported. But I think I've explained that elsewhere.

I currently don't see fixing the errorneous freezing of lockers (not the
updater though) without changing the wal format or synchronously waiting
for all lockers to end. Which both see like a no-go?

While it's still a major bug it seems to still be much better than the
previous case of either inaccessible or reappearing rows.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

Hi,

On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
> > On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > > > On Sat, Nov 30, 2013 at 01:06:09AM +, Alvaro Herrera wrote:
> > > > Any idea how to cheat our way out of that one given the current way
> > > > heap_freeze_tuple() works (running on both primary and standby)? My only
> > > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > > > We can't even realistically create a new multixact with fewer members
> > > > with the current format of xl_heap_freeze.
> > > 
> > > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all 
> > > update XID
> > > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax 
> > > consumers are
> > > already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit 
> > > yet
> > > another injection of complexity.
> > 
> > I think its pretty much checked that way already, but the problem seems
> > to be how to avoid checks on xid commit/abort in that case. I've
> > complained in 20131121200517.gm7...@alap2.anarazel.de that the old
> > pre-condition that multixacts aren't checked when they can't be relevant
> > (via OldestVisibleM*) isn't observed anymore.
> > So, if we re-introduce that condition again, we should be on the safe
> > side with that, right?
> 
> What specific commit/abort checks do you have in mind?

MultiXactIdIsRunning() does a TransactionIdIsInProgress() for each
member which in turn does TransactionIdDidCommit(). Similar when locking
a tuple that's locked/updated without a multixact where we go for a
TransactionIdIsInProgress() in XactLockTableWait().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Dynamic configuration via LDAP in postmaster

2013-12-03 Thread Vasily Soshnikov

I need advise about where is best place for adding such features.

Currently I found that 'postmaster' have event loop(including handling
SIGHUP) inside PostgressMain(postgress.c)  for realoding configuration
file, based on my investigation my plan is handling ldap events just before
SIGHUP.

PS I guess tomorrow I will start implement this feature inside
'postmaster', but before I start I wish to know expert opinion about where
are most good place for dispatching of incomming messages(about
configuration has changed etc) from the ldap.


-- 




*Regard,Soshnikov Vasily mailto:dedok@gmail.com *

Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2013-12-03 Thread Antonin Houska

The changes are very well divided into logical units, so I think I could
understand the ideas. I'm not too familiar with the ecpg internals, so
consider this at least a coding review.

git apply:  Clean, except for one finding below

Build:  no errors/warnings

Documentation build: no errors/warnings. The changes do appear in ecpg.html.

Regression tests: all passed

Non-Linux platforms: I can't verify, but I haven't noticed any new
dependencies added.

Comments (in the source code): good.

(My) additional comments:

22.patch

tuples_left > 0

instead of just

tuples_left

seems to me safer in for-loops. Not sure if there's a convention for
this though.

23.patch

git apply --verbose ~/cybertec/patches/ecpq/23.patch
/home/anton/cybertec/patches/ecpq/23.patch:494: space before tab in indent.
/*--
/home/anton/cybertec/patches/ecpq/23.patch:495: space before tab in indent.
   translator: this string will be truncated at
149 characters expanded.  */
/home/anton/cybertec/patches/ecpq/23.patch:4019: trailing whitespace.
exec sql close :curname;

Tests - 23.patch

src/interfaces/ecpg/test/sql/cursorsubxact.pgc

/*
 * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT
 * is used with an already existing name.
 */

Shouldn't it be "... if a CURSOR is used with an already existing
name?". Or just "... implicit RELEASE SAVEPOINT after an error"?
I'd also appreciate a comment where exactly the savepoint is
(implicitly) released.

23.patch and 24.patch
-

SO_MAJOR_VERSION and also interfaces/ecpg/include/ecpglib.h is changed

Thus all client applications probably need to be preprocessed & compiled
against the PG 9.4. I don't know how this is usually enforced. I'm
mentioning it for the sake of completeness.

// Antonin Houska (Tony)

On 11/28/2013 03:21 PM, Boszormenyi Zoltan wrote:
> 2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta:
>> 2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:
>>> 2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:
 2013-11-12 07:01 keltezéssel, Noah Misch írta:
> On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:
>> The old contents of my GIT repository was removed so you need to
>> clone it fresh. https://github.com/zboszor/ecpg-readahead.git
>> I won't post the humongous patch again, since sending a 90KB
>> compressed file to everyone on the list is rude.
> Patches of that weight show up on a regular basis.  I don't think it's 
> rude.

 OK, here it is.
>>>
>>> ...
>>> Subsequent patches will come as reply to this email.
>>
>> Infrastructure changes in ecpglib/execute.c to split up
>> ECPGdo and ecpg_execute() and expose the parts as
>> functions internal to ecpglib.
> 
> Rebased after killing the patch that changed the DECLARE CURSOR command tag.
> All infrastructure patches are attached, some of them compressed.
> 
> Best regards,
> Zoltán Böszörményi
> 
> 
> 
> 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value

2013-12-03 Thread Bruce Momjian

On Fri, Nov 29, 2013 at 04:14:00PM -0500, Bruce Momjian wrote:
> On Thu, Mar 28, 2013 at 03:06:30PM -0400, Steve Singer wrote:
> > So to summarise:
> > 
> > Plan A: The first patch I attached for pg_upgrade + documentation
> > changes, and changing the other places that call PQconndefaults() to
> > accept failures on either out of memory, or an invalid PGSERVICE
> > 
> > Plan B: Create a new function PQconndefaults2(char * errorBuffer) or
> > something similar that returned error information to the caller.
> > 
> > Plan C: PQconndefaults() just ignores an invalid service but
> > connection attempts fail because other callers of
> > conninfo_add_defaults still pay attention to connection failures.
> > This is the second patch I sent.
> > 
> > Plan D:  Service lookup failures are always ignored by
> > conninfo_add_defaults. If you attempt to connect with a bad
> > PGSERVICE set it will behave as if no PGSERVICE value was set.   I
> > don't think anyone explicitly proposed this yet.
> > 
> > Plan 'D' is the only option that I'm opposed to, it will effect a
> > lot more applications then ones that call PQconndefaults() and I
> > feel it will confuse users.
> > 
> > I'm not convinced plan B is worth the effort of having to maintain
> > two versions of PQconndefaults() for a while to fix a corner case.
> 
> OK, I am back to looking at this issue from March.  I went with option
> C, patch attached, which seems to be the most popular.  It is similar to
> Steve's patch, but structured differently and includes a doc patch.

Patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trust intermediate CA for client certificates

2013-12-03 Thread Bruce Momjian

On Mon, Dec  2, 2013 at 05:35:06PM -0500, Andrew Dunstan wrote:
> 
> On 12/02/2013 04:17 PM, Tom Lane wrote:
> >Bruce Momjian  writes:
> >>Sorry, I should have said:
> >>Tom is saying that for his openssl version, a client that passed
> >>an intermediate certificate had to supply a certificate _matching_
> >>something in the remote root.crt, not just signed by it.
> >>At least I think that was the issue, rather than requiring the client to
> >>supply a "root" certificate, meaning the client can supply an
> >>intermediate or root certificicate, as long as it appears in the
> >>root.crt file on the remote end.
> >As far as the server is concerned, anything listed in its root.crt *is* a
> >trusted root CA.  Doesn't matter if it's a child of some other CA.
> 
> 
> But it does need to be signed by a trusted signatory. At least in my
> test script (pretty ugly, but shown below for completeness), the
> Intermediate CA cert is signed with the Root cert rather than being
> self-signed as the Root cert is, and so if the server doesn't have
> that root cert as a trusted cert the validation fails.
> 
> In case 1, we put the root CA cert on the server and append the
> intermediate CA cert to the client's cert. This succeeds. In case 2,
> we put the intermediate CA cert on the server without the root CA's
> cert, and use the bare client cert. This fails. In case 3, we put
> both the root and the intermediate certs in the server's root.crt,
> and use the bare client key, and as expected this succeeds.
> 
> So the idea that you can just plonk any Intermediate CA cert in
> root.crt and have all keys it signs validated is not true, AFAICT.
> 
> OpenSSL version 1.0.0j was used in these tests, on a Fedora 16 box.

OK, that behavior matches the behavior Ian observed and also matches my
most recent doc patch.  I know Tom saw something different, but unless
he can reproduce it, I am thinking my doc patch is our best solution.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Dimitri Fontaine

Stephen Frost  writes:
> That's not really 'solved' unless you feel we can depend on that "create
> extension from URL" to work at pg_restore time...  I wouldn't have
> guessed that people would accept that, but I've already been wrong about
> such things in this thread once.

Basically, with the extra software I want to build out-of-core, what you
have is an externally maintained repository and the scripts are
downloaded at CREATE EXTENSION time.

With the Extension Template, you then have a solid cache you can rely on
at pg_restore time.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add full object name to the tag field

2013-12-03 Thread David Johnston

Robert Haas wrote
> On Mon, Dec 2, 2013 at 9:49 AM, Asit Mahato <

> rigid.asit@

> > wrote:
>> Hi all,
>>
>> I am a newbie. I am unable to understand the to do statement given below.
>>
>> Add full object name to the tag field. eg. for operators we need
>> '=(integer,
>> integer)', instead of just '='.
>>
>> please help me out with an example.
>>
>> Thanks and Regards,
>> Asit Mahato
> 
> Cast the OID of the operator to regoperator instead of regoper.

This seems too simple an answer to be useful, and utterly confusing to the
OP.  The ToDo item in question is in the pg_dump/pg_restore section.  In
would seem to be possibly referring to this e-mail thread:

"Re: pg_dump sort order for functions"

http://www.postgresql.org/message-id/flat/9837222c1001120712t1e64eb2fg9f502757b6002...@mail.gmail.com#9837222c1001120712t1e64eb2fg9f502757b6002...@mail.gmail.com

though there is no link to prior discussion attached to the ToDo item.  The
thread itself suggests there was yet prior discussion on the topic though my
quick search did not turn anything up.

It seems that not all objects (though it appears functions are currently one
exception) are fully descriptive in their tag/name output which make
deterministic ordering more difficult.  The goal is, say for operators, to
output not only the base operator symbol (regoper) but the types associated
with the left-hand and right-hand sides (regoperator).  The additional type
information makes the entire name unique (barring cross-schema conflicts at
least, which can be mitigated) and allows for deterministic ordering.

Asit, hopefully this gives you enough context to ask better questions which
others can answer more fully.  Also, it may help to give more details about
yourself and your goals and not just throw out that you are a newbie.  The
later gives people little guidance in how they should structure their help.

David J.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Add-full-object-name-to-the-tag-field-tp5781167p5781431.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
> Stephen Frost  writes:
> > That's not really 'solved' unless you feel we can depend on that "create
> > extension from URL" to work at pg_restore time...  I wouldn't have
> > guessed that people would accept that, but I've already been wrong about
> > such things in this thread once.
> 
> Basically, with the extra software I want to build out-of-core, what you
> have is an externally maintained repository and the scripts are
> downloaded at CREATE EXTENSION time.

I should have included above "unless we actually dump the extension
objects out during pg_dump."

> With the Extension Template, you then have a solid cache you can rely on
> at pg_restore time.

Extension templates are not needed for us to be able to dump and restore
extensions.  I do not understand why you continue to argue for extension
templates as a solution to that problem when it's utter overkill and far
worse than just dumping the extension objects out at pg_dump time and
having a way to add them back as part of the extension on restore.

I understand that you once proposed that and it was shot down but I
think we need to move past that now that we've seen what the alternative
is..  That isn't to say anything about the code or about you
specifically, but, for my part, I really don't like nor see the value of
sticking script blobs into the catalog as some kind of representation of
database objects.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Christian Kruse

Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 100));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.

Greetings,
 CK

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



pgpok2vtj3rMM.pgp
Description: PGP signature

Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

2013-12-03 Thread Bruce Momjian

On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
> > I wonder if we ought to mark each page as all-visible in
> > raw_heap_insert() when we first initialize it, and then clear the flag
> > when we come across a tuple that isn't all-visible.  We could try to
> > set hint bits on the tuple before placing it on the page, too, though
> > I'm not sure of the details.
> 
> I went with the per-page approach because I wanted to re-use the vacuum
> lazy function.  Is there some other code that does this already?  I am
> trying to avoid yet-another set of routines that would need to be
> maintained or could be buggy.  This hit bit setting is tricky.
> 
> And thanks much for the review!

So, should I put this in the next commit fest?  I still have an unknown
about the buffer number to use here:

!   /* XXX use 0 or real offset? */
!   ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
!  BufferGetBlockNumber(buf) : 0, offnum);

Is everyone else OK with this approach?  Updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
new file mode 100644
index 951894c..44ae5d8
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
***
*** 107,112 
--- 107,114 
  #include "access/rewriteheap.h"
  #include "access/transam.h"
  #include "access/tuptoaster.h"
+ #include "access/visibilitymap.h"
+ #include "commands/vacuum.h"
  #include "storage/bufmgr.h"
  #include "storage/smgr.h"
  #include "utils/memutils.h"
*** typedef OldToNewMappingData *OldToNewMap
*** 172,177 
--- 174,180 
  
  /* prototypes for internal functions */
  static void raw_heap_insert(RewriteState state, HeapTuple tup);
+ static void update_page_vm(Relation relation, Page page, BlockNumber blkno);
  
  
  /*
*** end_heap_rewrite(RewriteState state)
*** 280,285 
--- 283,289 
  		state->rs_buffer);
  		RelationOpenSmgr(state->rs_new_rel);
  
+ 		update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno);
  		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
  
  		smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
*** raw_heap_insert(RewriteState state, Heap
*** 632,637 
--- 636,642 
  			 */
  			RelationOpenSmgr(state->rs_new_rel);
  
+ 			update_page_vm(state->rs_new_rel, page, state->rs_blockno);
  			PageSetChecksumInplace(page, state->rs_blockno);
  
  			smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
*** raw_heap_insert(RewriteState state, Heap
*** 677,679 
--- 682,704 
  	if (heaptup != tup)
  		heap_freetuple(heaptup);
  }
+ 
+ static void
+ update_page_vm(Relation relation, Page page, BlockNumber blkno)
+ {
+ 	Buffer		vmbuffer = InvalidBuffer;
+ 	TransactionId visibility_cutoff_xid;
+ 
+ 	visibilitymap_pin(relation, blkno, &vmbuffer);
+ 	Assert(BufferIsValid(vmbuffer));
+ 
+ 	if (!visibilitymap_test(relation, blkno, &vmbuffer) &&
+ 		heap_page_is_all_visible(relation, InvalidBuffer, page,
+  &visibility_cutoff_xid))
+ 	{
+ 		PageSetAllVisible(page);
+ 		visibilitymap_set(relation, blkno, InvalidBlockNumber,
+ 		  InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid);
+ 	}
+ 	ReleaseBuffer(vmbuffer);
+ }
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
new file mode 100644
index 7f40d89..a42511b
*** a/src/backend/access/heap/visibilitymap.c
--- b/src/backend/access/heap/visibilitymap.c
*** visibilitymap_set(Relation rel, BlockNum
*** 278,284 
  		map[mapByte] |= (1 << mapBit);
  		MarkBufferDirty(vmBuf);
  
! 		if (RelationNeedsWAL(rel))
  		{
  			if (XLogRecPtrIsInvalid(recptr))
  			{
--- 278,284 
  		map[mapByte] |= (1 << mapBit);
  		MarkBufferDirty(vmBuf);
  
! 		if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf))
  		{
  			if (XLogRecPtrIsInvalid(recptr))
  			{
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index fe2d9e7..4f6578f
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** static void lazy_record_dead_tuple(LVRel
*** 151,158 
  	   ItemPointer itemptr);
  static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
  static int	vac_cmp_itemptr(const void *left, const void *right);
- static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- 		 TransactionId *visibility_cutoff_xid);
  
  
  /*
--- 151,156 
*** lazy_vacuum_page(Relation onerel, BlockN
*** 1197,1203 
  	 * check if the page has become all-visible.
  	 */
  	if (!visibilitymap_test(onerel, blkno, vmbuffer) &&
! 		heap_page_is_all_visible(onerel, buffer, &visibility_cutoff_xid))
  	{
  		Assert(BufferIsValid(*vmbuffer));
  		PageSe

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Dimitri Fontaine

Robert Haas  writes:
> In more normal cases, however, the system can (and probably should)
> figure out what was intended by choosing the *shortest* path to get to
> the intended version.  For example, if someone ships 1.0, 1.0--1.1,
> 1.1, and 1.1--1.2, the system should choose to run 1.1 and then
> 1.1--1.2, not 1.0 and then 1.0--1.1 and then 1.1--1.2.  But that can
> be automatic: only if there are two paths of equal length (as in the
> example in the previous paragraph) do we need help from the user to
> figure out what to do.

Yeah.

> We should also consider the possibility of a user trying to
> deliberately install and older release.  For example, if the user has
> 1.0, 1.0--1.1, 1.1, 1.1--1.2, and 1.2--1.0 (a downgrade script) with
> default_full_version = 1.2, an attempt to install 1.0 should run just
> the 1.0 script, NOT 1.2 and then 1.2--1.0.

In what I did, if you want version 1.0 and we have a script --1.0.sql
around, then we just use that script, never kicking the path chooser.

The path chooser at CREATE EXTENSION time is only execised when we don't
have a direct script to support that specific version you're asking.

> Putting all that together, I'm inclined to suggest that what we really
> need is a LIST of version numbers, rather than just one.  If there one
> path to the version we're installing is shorter than any other, we
> choose that, period.  If there are multiple paths of equal length, we

That's what Jeff did propose, yes.

> break the tie by choosing which version number appears first in the
> aforementioned list.  If that still doesn't break the tie, either
> because none of the starting points are mentioned in that list or
> because there are multiple equal-length paths starting in the same
> place, we give up and emit an error.

Jeff also did mention about tiebreakers without entering into any level
of details.

We won't be able to just use default_version as the tiebreaker list
here, because of the following example:

  default_version = 1.2, 1.0

  create extension foo version '1.1';

With such a setup it would prefer 1.2--1.1 to 1.0--1.1, which doesn't
look like what we want. Instead, we want

  default_version = 1.2
  create_from_version_candidates = 1.0

  create extension foo version '1.1';

Then the tie breaker is the 1.0 in "create_from_version_candidates" so
we would run foo--1.0.sql and then foo--1.0--1.1.sql.

Comments?

Baring objections, I'm going to prepare a new branch to support
developping that behavior against only file based extensions, and submit
a spin-off patch to the current CF entry.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic configuration via LDAP in postmaster

2013-12-03 Thread Heikki Linnakangas


On 12/03/2013 05:44 PM, Vasily Soshnikov wrote:

I need advise about where is best place for adding such features.

Currently I found that 'postmaster' have event loop(including handling
SIGHUP) inside PostgressMain(postgress.c)  for realoding configuration
file, based on my investigation my plan is handling ldap events just before
SIGHUP.

PS I guess tomorrow I will start implement this feature inside
'postmaster', but before I start I wish to know expert opinion about where
are most good place for dispatching of incomming messages(about
configuration has changed etc) from the ldap.


postmaster is kept very small on purpose, because if that process dies, 
it will take the whole server down with it. I'd suggest writing a custom 
background worker that talks with the ldap server. It can then write the 
changes to the configuration files, and send SIGHUP to reload.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2013-12-03 Thread Boszormenyi Zoltan


Thanks for the review.

2013-12-03 16:48 keltezéssel, Antonin Houska írta:

The changes are very well divided into logical units, so I think I could
understand the ideas. I'm not too familiar with the ecpg internals, so
consider this at least a coding review.


git apply:  Clean, except for one finding below

Build:  no errors/warnings

Documentation build: no errors/warnings. The changes do appear in ecpg.html.

Regression tests: all passed

Non-Linux platforms: I can't verify, but I haven't noticed any new
dependencies added.

Comments (in the source code): good.


(My) additional comments:


22.patch


tuples_left > 0

instead of just

tuples_left

seems to me safer in for-loops. Not sure if there's a convention for
this though.


I'll look at it, maybe the >0 had a reason.



23.patch


git apply --verbose ~/cybertec/patches/ecpq/23.patch
/home/anton/cybertec/patches/ecpq/23.patch:494: space before tab in indent.
 /*--
/home/anton/cybertec/patches/ecpq/23.patch:495: space before tab in indent.
translator: this string will be truncated at
149 characters expanded.  */
/home/anton/cybertec/patches/ecpq/23.patch:4019: trailing whitespace.
 exec sql close :curname;


I will fix the extra spaces.


Tests - 23.patch


src/interfaces/ecpg/test/sql/cursorsubxact.pgc


 /*
  * Test the implicit RELEASE SAVEPOINT if a SAVEPOINT
  * is used with an already existing name.
  */

Shouldn't it be "... if a CURSOR is used with an already existing
name?". Or just "... implicit RELEASE SAVEPOINT after an error"?
I'd also appreciate a comment where exactly the savepoint is
(implicitly) released.


If you do this:

SAVEPOINT A;

SAVEPOINT A; /* same savepoint name */

then the operations between the two are implicitly committed
to the higher subtransaction, i.e. it works as if there was a
RELEASE SAVEPOINT A; before the second SAVEPOINT A;
It happens to be tested with a DECLARE CURSOR statement
and the runtime has to refresh its knowledge about the cursor
by propagating it into a subtransaction one level higher.


23.patch and 24.patch
-

SO_MAJOR_VERSION and also interfaces/ecpg/include/ecpglib.h is changed

Thus all client applications probably need to be preprocessed & compiled
against the PG 9.4. I don't know how this is usually enforced. I'm
mentioning it for the sake of completeness.


The soversion has changed because of the changes in already
existing exported functions.



// Antonin Houska (Tony)


On 11/28/2013 03:21 PM, Boszormenyi Zoltan wrote:

2013-11-20 14:53 keltezéssel, Boszormenyi Zoltan írta:

2013-11-20 14:41 keltezéssel, Boszormenyi Zoltan írta:

2013-11-12 07:15 keltezéssel, Boszormenyi Zoltan írta:

2013-11-12 07:01 keltezéssel, Noah Misch írta:

On Mon, Nov 11, 2013 at 10:17:54AM +0100, Boszormenyi Zoltan wrote:

The old contents of my GIT repository was removed so you need to
clone it fresh. https://github.com/zboszor/ecpg-readahead.git
I won't post the humongous patch again, since sending a 90KB
compressed file to everyone on the list is rude.

Patches of that weight show up on a regular basis.  I don't think it's rude.

OK, here it is.

...
Subsequent patches will come as reply to this email.

Infrastructure changes in ecpglib/execute.c to split up
ECPGdo and ecpg_execute() and expose the parts as
functions internal to ecpglib.

Rebased after killing the patch that changed the DECLARE CURSOR command tag.
All infrastructure patches are attached, some of them compressed.

Best regards,
Zoltán Böszörményi







--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

Hi Alvaro, Noah,


On 2013-12-03 15:57:10 +0100, Andres Freund wrote:
> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > >> The test spec additionally
> > >> covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > > last seems actually unrelated, I am not sure why it's 9.3.2
> > > only. Alvaro, are you looking?
> > 
> > Is this bad enough that we need to re-wrap the release?
> 
> Tentatively I'd say no, the only risk is loosing locks afaics. Thats
> much bettter than corrupting rows as in 9.3.1. But I'll look into it in
> a bit more detail as soon as I am of the phone call I am on.

After looking, I think I am revising my opinion. The broken locking
behaviour (outside of freeze, which I don't see how we can fix in time),
is actually bad.
Would that stop us from making the release date with packages?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-12-03 Thread Bruce Momjian

On Wed, Apr 24, 2013 at 03:33:42PM -0400, Andrew Dunstan wrote:
> 
> On 04/23/2013 07:53 PM, Timothy Garnett wrote:
> >Hi All,
> >
> >Currently the -j option to pg_restore, which allows for
> >parallelization in the restore, can only be used if the input file
> >is a regular file and not, for ex., a pipe.  However this is a
> >pretty common occurrence for us (usually in the form of pg_dump |
> >pg_restore to copy an individual database or some tables thereof
> >from one machine to another).  While there's no good way to
> >parallelize the data load steps when reading from a pipe, the
> >index and constraint building can still be parallelized and as
> >they are generally CPU bound on our machines we've found quite a
> >bit of speedup from doing so.
> >
> >Attached is two diffs off of the REL9_2_4 tag that I've been
> >using.  The first is a simple change that serially loads the data
> >section before handing off the remainder of the restore to the
> >existing parallelized restore code (the .ALT. diff).  The second
> >which gets more parallelization but is a bit more of a change uses
> >the existing dependency analysis code to allow index building etc.
> >to occur in parallel with data loading. The data loading tasks are
> >still performed serially in the main thread, but non-data loading
> >tasks are scheduled in parallel as their dependencies are
> >satisfied (with the caveat that the main thread can only dispatch
> >new tasks between data loads).
> >
> >Anyways, the question is if people think this is generally useful.
> >If so I can clean up the preferred choice a bit and rebase it off
> >of master, etc.
> 
> 
> I don't think these are bad ideas at all, and probably worth doing.
> Note that there are some fairly hefty changes affecting this code in
> master, so your rebasing could be tricky.

Is there any progress on this:  doing parallel pg_restore from a pipe?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Dimitri Fontaine

Stephen Frost  writes:
> I understand that you once proposed that and it was shot down but I
> think we need to move past that now that we've seen what the alternative
> is..  That isn't to say anything about the code or about you
> specifically, but, for my part, I really don't like nor see the value of
> sticking script blobs into the catalog as some kind of representation of
> database objects.

Well yeah, I'm having quite a hard time to withdraw that proposal, which
is the fourth one in three years, and that had been proposed to me on
this very mailing list, and got the infamous "community buy-in" about a
year ago.

It's always a hard time when you're being told that the main constraints
you had to work with suddenly are no more, because after all this work,
we realize that imposing those constraints actually made no sense.

I understand that it can happen, it still really sucks when it does.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Fabrízio de Royes Mello

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse
wrote:

> Hi Fabrizio,
>
> looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
> applies and compiles w/o errors or warnings. I set up a master and two
> hot standbys replicating from the master, one with 5 minutes delay and
> one without delay. After that I created a new database and generated
> some test data:
>
> CREATE TABLE test (val INTEGER);
> INSERT INTO test (val) (SELECT * FROM generate_series(0, 100));
>
> The non-delayed standby nearly instantly had the data replicated, the
> delayed standby was replicated after exactly 5 minutes. I did not
> notice any problems, errors or warnings.
>
>

Thanks for your review Christian...

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Pavel Stehule

2013/12/3 Stephen Frost 

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > After sleeping on it, your other suggestion of TABLE OF, or possibly
> > TABLE FROM, is starting to grow on me.
> >
> > Who else has an opinion?
>
> Alright, for my 2c, I like having this syntax include 'TABLE' simply
> because it's what folks coming from Oracle might be looking for.
> Following from that, to keep it distinct from the spec's notion of
> 'TABLE', my preference is 'TABLE FROM'.  I don't particularly like
> 'TABLE OF', nor do I like the various 'ROWS' suggestions.
>

+1

Pavel


>
> Thanks,
>
> Stephen
>

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Tom Lane

Andres Freund  writes:
>> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
>>> Is this bad enough that we need to re-wrap the release?

> After looking, I think I am revising my opinion. The broken locking
> behaviour (outside of freeze, which I don't see how we can fix in time),
> is actually bad.
> Would that stop us from making the release date with packages?

That's hardly answerable when you haven't specified how long you
think it'd take to fix.

In general, though, I'm going to be exceedingly unhappy if this release
introduces new regressions.  If we have to put off the release to fix
something, maybe we'd better do so.  And we'd damn well better get it
right this time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 12:22:33 -0500, Tom Lane wrote:
> Andres Freund  writes:
> >> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> >>> Is this bad enough that we need to re-wrap the release?
> 
> > After looking, I think I am revising my opinion. The broken locking
> > behaviour (outside of freeze, which I don't see how we can fix in time),
> > is actually bad.
> > Would that stop us from making the release date with packages?
> 
> That's hardly answerable when you haven't specified how long you
> think it'd take to fix.

There's two things that are broken as-is:
1) the freezing of multixacts: The new state is much better than the old
  one because the old one corrupted data, while the new one somes removes
  locks when you explicitly FREEZE.
2) The broken locking behaviour in Noah's test without the
   FREEZE.

I don't see a realistic chance of fixing 1) in 9.3. Not even sure if it
can be done without changing the freeze WAL format.  But 2) should be fixed
and basically is a oneliner + comments + test. Alvaro?

> In general, though, I'm going to be exceedingly unhappy if this release
> introduces new regressions.  If we have to put off the release to fix
> something, maybe we'd better do so.  And we'd damn well better get it
> right this time.

I think that's really hard for the multixacts stuff. There's lots of
stuff not really okay in there, but we can't do much about that now :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2013-12-03 Thread Tom Lane

Atri Sharma  writes:
> Please find attached the latest patch for WITHIN GROUP. This patch is
> after fixing the merge conflicts.

I've started to look at this patch now.  I have a couple of immediate
reactions to the catalog changes:

1. I really hate the way you've overloaded the transvalue to do something
that has approximately nothing to do with transition state (and haven't
updated catalogs.sgml to explain that, either).  Seems like it'd be
cleaner to just hardwire a bool column that distinguishes regular and
hypothetical input rows.  And why do you need aggtranssortop for that?
I fail to see the point of sorting on the flag column.

2 I also don't care for the definition of aggordnargs, which is the number
of direct arguments to an ordered set function, except when it isn't.
Rather than overloading it to be both a count and a couple of flags,
I wonder whether we shouldn't expand aggisordsetfunc to be a three-way
"aggregate kind" field (ordinary agg, ordered set, or hypothetical set
function).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Tom Lane

Andres Freund  writes:
> Any idea how to cheat our way out of that one given the current way
> heap_freeze_tuple() works (running on both primary and standby)? My only
> idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> We can't even realistically create a new multixact with fewer members
> with the current format of xl_heap_freeze.

Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one).  The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Magnus Hagander

On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
>
> Maybe we should just bite the bullet and change the WAL format for
> heap_freeze (inventing an all-new record type, not repurposing the old
> one, and allowing WAL replay to continue to accept the old one).  The
> implication for users would be that they'd have to update slave servers
> before the master when installing the update; which is unpleasant, but
> better than living with a known data corruption case.
>

Agreed. It may suck, but it sucks less.

How badly will it break if they do the upgrade in the wrong order though.
Will the slaves just stop (I assume this?) or is there a risk of a
wrong-order upgrade causing extra breakage? And if they do shut down, would
just upgrading the slave fix it, or would they then have to rebuild the
slave? (actually, don't we recommend they always rebuild the slave
*anyway*? In which case the problem is even smaller..)

I think we've always told people to upgrade the slave first, and it's the
logical thing that AFAIK most other systems require as well, so that's not
an unreasonable requirement at all.

I assume we'd then get rid of the old record type completely in 9.4, right?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread David E. Wheeler

On Dec 3, 2013, at 9:14 AM, Dimitri Fontaine  wrote:

> I understand that it can happen, it still really sucks when it does.
> 
>  

I have not followed this project closely, Dimitri, but I for one have 
appreciated your tenacity in following through on it. Extensions are awesome, 
thanks to you, and I’m happy to see all efforts to make it more so.

Thank you.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Noah Misch

On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > Sorry, my original report was rather terse.  I speak of the scenario 
> > exercised
> > by the second permutation in that isolationtester spec.  The multixact is
> > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> > does freeze it to InvalidTransactionId per the code I cited in my first
> > response on this thread, which wrongly removes a key lock.
> 
> That one is clear, I was only confused about the Assert() you
> reported. But I think I've explained that elsewhere.
> 
> I currently don't see fixing the errorneous freezing of lockers (not the
> updater though) without changing the wal format or synchronously waiting
> for all lockers to end. Which both see like a no-go?

Not fixing it at all is the real no-go.  We'd take both of those undesirables
before just tolerating the lost locks in 9.3.

The attached patch illustrates the approach I was describing earlier.  It
fixes the test case discussed above.  I haven't verified that everything else
in the system is ready for it, so this is just for illustration purposes.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 5384,5412  heap_freeze_tuple(HeapTupleHeader tuple, TransactionId 
cutoff_xid,
TransactionId   update_xid;

/*
!* This is a multixact which is not marked LOCK_ONLY, 
but which
!* is newer than the cutoff_multi.  If the update_xid 
is below the
!* cutoff_xid point, then we can just freeze the Xmax 
in the
!* tuple, removing it altogether.  This seems simple, 
but there
!* are several underlying assumptions:
!*
!* 1. A tuple marked by an multixact containing a very 
old
!* committed update Xid would have been pruned away by 
vacuum; we
!* wouldn't be freezing this tuple at all.
!*
!* 2. There cannot possibly be any live locking members 
remaining
!* in the multixact.  This is because if they were 
alive, the
!* update's Xid would had been considered, via the 
lockers'
!* snapshot's Xmin, as part the cutoff_xid.
!*
!* 3. We don't create new MultiXacts via 
MultiXactIdExpand() that
!* include a very old aborted update Xid: in that 
function we only
!* include update Xids corresponding to transactions 
that are
!* committed or in-progress.
 */
update_xid = HeapTupleGetUpdateXid(tuple);
if (TransactionIdPrecedes(update_xid, cutoff_xid))
!   freeze_xmax = true;
}
}
else if (TransactionIdIsNormal(xid) &&
--- 5384,5404 
TransactionId   update_xid;

/*
!* This is a multixact which is not marked LOCK_ONLY, 
but which is
!* newer than the cutoff_multi.  To advance 
relfrozenxid, we must
!* eliminate any updater XID that falls below 
cutoff_xid.
!* Affected multixacts may also contain outstanding 
lockers, which
!* we must preserve.  Forming a new multixact is 
tempting, but
!* solving it that way requires a WAL format change.  
Instead,
!* mark the tuple LOCK_ONLY.  The updater XID is still 
present,
!* but consuming code will notice LOCK_ONLY and assume 
it's gone.
 */
update_xid = HeapTupleGetUpdateXid(tuple);
if (TransactionIdPrecedes(update_xid, cutoff_xid))
!   {
!   tuple->t_infomask |= HEAP_XMAX_LOCK_ONLY;
!   changed = true;
!   }
}
}
else if (TransactionIdIsNormal(xid) &&

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Alvaro Herrera

Tom Lane wrote:
> Andres Freund  writes:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Maybe we should just bite the bullet and change the WAL format for
> heap_freeze (inventing an all-new record type, not repurposing the old
> one, and allowing WAL replay to continue to accept the old one).  The
> implication for users would be that they'd have to update slave servers
> before the master when installing the update; which is unpleasant, but
> better than living with a known data corruption case.

That was my suggestion too (modulo, I admit, the bit about it being a
new, separate record type.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Tom Lane

Magnus Hagander  writes:
> On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane  wrote:
>> Maybe we should just bite the bullet and change the WAL format for
>> heap_freeze (inventing an all-new record type, not repurposing the old
>> one, and allowing WAL replay to continue to accept the old one).  The
>> implication for users would be that they'd have to update slave servers
>> before the master when installing the update; which is unpleasant, but
>> better than living with a known data corruption case.

> Agreed. It may suck, but it sucks less.

> How badly will it break if they do the upgrade in the wrong order though.
> Will the slaves just stop (I assume this?) or is there a risk of a
> wrong-order upgrade causing extra breakage?

I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize.  Installing the updated version should
allow it to resume functioning.  Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO.  We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.

> I assume we'd then get rid of the old record type completely in 9.4, right?

Yeah, we should be able to drop it in 9.4, since we'll surely have other
WAL format changes anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic configuration via LDAP in postmaster

2013-12-03 Thread Vasily Soshnikov

Thank you for the advice, nowadays we(company where I work) use such scheme
but that scheme is not always useful at the stage of development of the
back-end infrastructure. Also we have found a better solution : have ldap
for dynamic configuraion out of the box.

So question is following: if you got such task, where do you put ldap
functionality inside postgres daemon?

PS I have read the basics of ideology Posgresql and I know why you guys
make main server tiny in functionality and this is great, but we would like
to have more complex solution.
This feature may be useful not only for me and my company.

On 3 December 2013 20:54, Heikki Linnakangas wrote:

> On 12/03/2013 05:44 PM, Vasily Soshnikov wrote:
>
>> I need advise about where is best place for adding such features.
>>
>> Currently I found that 'postmaster' have event loop(including handling
>> SIGHUP) inside PostgressMain(postgress.c)  for realoding configuration
>> file, based on my investigation my plan is handling ldap events just
>> before
>> SIGHUP.
>>
>> PS I guess tomorrow I will start implement this feature inside
>> 'postmaster', but before I start I wish to know expert opinion about where
>> are most good place for dispatching of incomming messages(about
>> configuration has changed etc) from the ldap.
>>
>
> postmaster is kept very small on purpose, because if that process dies, it
> will take the whole server down with it. I'd suggest writing a custom
> background worker that talks with the ldap server. It can then write the
> changes to the configuration files, and send SIGHUP to reload.
>
> - Heikki
>

-- 

*Regard,Soshnikov Vasily mailto:dedok@gmail.com *

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Magnus Hagander

On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane  wrote:
> >> Maybe we should just bite the bullet and change the WAL format for
> >> heap_freeze (inventing an all-new record type, not repurposing the old
> >> one, and allowing WAL replay to continue to accept the old one).  The
> >> implication for users would be that they'd have to update slave servers
> >> before the master when installing the update; which is unpleasant, but
> >> better than living with a known data corruption case.
>
> > Agreed. It may suck, but it sucks less.
>
> > How badly will it break if they do the upgrade in the wrong order though.
> > Will the slaves just stop (I assume this?) or is there a risk of a
> > wrong-order upgrade causing extra breakage?
>
> I assume what would happen is the slave would PANIC upon seeing a WAL
> record code it didn't recognize.  Installing the updated version should
> allow it to resume functioning.  Would be good to test this, but if it
> doesn't work like that, that'd be another bug to fix IMO.  We've always
> foreseen the possible need to do something like this, so it ought to
> work reasonably cleanly.
>

Yeah, as long as that's tested and actually works,  that sounds like an
acceptable thing to deal with.


> > I assume we'd then get rid of the old record type completely in 9.4,
> right?
>
> Yeah, we should be able to drop it in 9.4, since we'll surely have other
> WAL format changes anyway.
>

And even if not, there's no point in keeping it unless we actually support
replication from 9.3 -> 9.4, I think, and I don't believe anybody has even
considered working on that yet :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> > On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > > Sorry, my original report was rather terse.  I speak of the scenario 
> > > exercised
> > > by the second permutation in that isolationtester spec.  The multixact is
> > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  
> > > 9.3.2
> > > does freeze it to InvalidTransactionId per the code I cited in my first
> > > response on this thread, which wrongly removes a key lock.
> > 
> > That one is clear, I was only confused about the Assert() you
> > reported. But I think I've explained that elsewhere.
> > 
> > I currently don't see fixing the errorneous freezing of lockers (not the
> > updater though) without changing the wal format or synchronously waiting
> > for all lockers to end. Which both see like a no-go?
> 
> Not fixing it at all is the real no-go.  We'd take both of those undesirables
> before just tolerating the lost locks in 9.3.

I think it's changing the wal format then.

> The attached patch illustrates the approach I was describing earlier.  It
> fixes the test case discussed above.  I haven't verified that everything else
> in the system is ready for it, so this is just for illustration purposes.

That might be better than the current state because the potential damage
from such not frozen locks would be to get "could not access status of
transaction ..."  errors (*).
But the problem I see with it is that a FOR UPDATE/NO KEY UPDATE lock taken out 
by
UPDATE is different than the respective lock taken out by SELECT FOR
SHARE:
typedef enum
{
MultiXactStatusForKeyShare = 0x00,
MultiXactStatusForShare = 0x01,
MultiXactStatusForNoKeyUpdate = 0x02,
MultiXactStatusForUpdate = 0x03,
/* an update that doesn't touch "key" columns */
MultiXactStatusNoKeyUpdate = 0x04,
/* other updates, and delete */
MultiXactStatusUpdate = 0x05
} MultiXactStatus;

Ignoring the difference that way isn't going to fly nicely.

*: which already are possible because we do not check multis properly
against OldestVisibleMXactId[] anymore.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Alvaro Herrera

Noah Misch wrote:

> > On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > > Sorry, my original report was rather terse.  I speak of the scenario 
> > > exercised
> > > by the second permutation in that isolationtester spec.  The multixact is
> > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  
> > > 9.3.2
> > > does freeze it to InvalidTransactionId per the code I cited in my first
> > > response on this thread, which wrongly removes a key lock.
> > 
> > That one is clear, I was only confused about the Assert() you
> > reported. But I think I've explained that elsewhere.
> > 
> > I currently don't see fixing the errorneous freezing of lockers (not the
> > updater though) without changing the wal format or synchronously waiting
> > for all lockers to end. Which both see like a no-go?
> 
> Not fixing it at all is the real no-go.  We'd take both of those undesirables
> before just tolerating the lost locks in 9.3.

Well, unless I misunderstand, this is only a problem in the case that
cutoff_multi is not yet past but cutoff_xid is; and that there are
locker transactions still running.  So it's really a corner case.  Not
saying it's impossible to hit, mind.

> The attached patch illustrates the approach I was describing earlier.  It
> fixes the test case discussed above.  I haven't verified that everything else
> in the system is ready for it, so this is just for illustration purposes.

Wow, this is scary.  I don't oppose it in principle, but we'd better go
over the whole thing once more to ensure every place that cares is
prepared to deal with this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 13:11:13 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Maybe we should just bite the bullet and change the WAL format for
> heap_freeze (inventing an all-new record type, not repurposing the old
> one, and allowing WAL replay to continue to accept the old one).  The
> implication for users would be that they'd have to update slave servers
> before the master when installing the update; which is unpleasant, but
> better than living with a known data corruption case.

I wondered about that myself. How would you suggest the format to look
like?
ISTM per tuple we'd need:

* OffsetNumber off
* uint16 infomask
* TransactionId xmin
* TransactionId xmax

I don't see why we'd need infomask2, but so far being overly skimpy in
that place hasn't shown itself to be the greatest idea?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Jeff Davis

On Tue, 2013-12-03 at 10:08 +0200, Heikki Linnakangas wrote:
> Another perspective is that that's already a situation we'd rather not 
> have. Let's not make it worse by introducing a third way to install an 
> extension, which again requires the extension author to package the 
> extension differently.

+1.

> Why? The external tool can pick the extension in its current form from 
> PGXN, and install it via libpq. The tool might have to jump through some 
> hoops to do it, and we might need some new backend functionality to 
> support it, but I don't see why the extension author needs to do anything.

Ideally, it would end up the same whether it was installed by either
method -- the same entries in pg_extension, etc. I assume that's what
you mean by "new backend functionality".

This sounds like Inline Extensions to me, which was previously proposed.

If I recall, that proposal trailed off because of issues with
dump/reload. If you dump the contents of the extension, it's not really
an extension; but if you don't, then the administrator can't back up the
database (because he might not have all of the extension templates for
the extensions installed). That's when the idea appeared for extension
templates stored in the catalog, so that the administrator would always
have all of the necessary templates present.

A few interesting messages I found:
http://www.postgresql.org/message-id/CA
+TgmobJ-yCHt_utgJJL9WiiPssUAJWFd=3=ulrob9nhbpc...@mail.gmail.com
http://www.postgresql.org/message-id/CA
+tgmoztujw7beyzf1dzdcrbg2djcvtyageychx8d52oe1g...@mail.gmail.com
http://www.postgresql.org/message-id/CA
+tgmoa_0d6ef8upc03qx0unhjfzozeosno_ofucf5jgw+8...@mail.gmail.com
http://www.postgresql.org/message-id/18054.1354751...@sss.pgh.pa.us

(+Robert,Tom because I am referencing their comments)

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Alvaro Herrera

Andres Freund wrote:

> I wondered about that myself. How would you suggest the format to look
> like?
> ISTM per tuple we'd need:
> 
> * OffsetNumber off
> * uint16 infomask
> * TransactionId xmin
> * TransactionId xmax
> 
> I don't see why we'd need infomask2, but so far being overly skimpy in
> that place hasn't shown itself to be the greatest idea?

I don't see that we need the xmin; a simple bit flag indicating whether
the Xmin was frozen should be enough.

For xmax we need more detail, as you propose.  In infomask, are you
proposing to store the complete infomask, or just the flags being
changed?  Note we have a set of bits used in other wal records,
XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2013-12-03 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> 1. I really hate the way you've overloaded the transvalue to do
 Tom> something that has approximately nothing to do with transition
 Tom> state (and haven't updated catalogs.sgml to explain that,
 Tom> either).  Seems like it'd be cleaner to just hardwire a bool
 Tom> column that distinguishes regular and hypothetical input rows.

The intention here is that while the provided functions all fit the
spec's idea of how inverse distribution or hypothetical set functions
work, the actual implementation mechanisms are more generally
applicable than that and a user-supplied function could well find
something else useful to do with them. Accordingly, hardcoding stuff
seemed inappropriate.

 Tom> And why do you need aggtranssortop for that?  I fail to see the
 Tom> point of sorting on the flag column.

It is convenient to the implementation to be able to rely on
encountering the hypothetical row deterministically before (or in some
cases after, as in cume_dist) its peers in the remainder of the sort
order. Removing that sort would make the results of the functions
incorrect.

There should probably be some comments about that. oops.

 Tom> 2 I also don't care for the definition of aggordnargs, which is
 Tom> the number of direct arguments to an ordered set function,
 Tom> except when it isn't.  Rather than overloading it to be both a
 Tom> count and a couple of flags, I wonder whether we shouldn't
 Tom> expand aggisordsetfunc to be a three-way "aggregate kind" field
 Tom> (ordinary agg, ordered set, or hypothetical set function).

It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic "any") - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Jeff Davis

On Tue, 2013-12-03 at 09:20 -0500, Stephen Frost wrote:
> * Jeff Davis (pg...@j-davis.com) wrote:
> > Stephen mentioned using external tools and/or metadata, but to me that
> > sounds like it would require porting the extension away from what's on
> > PGXN today.
> 
> Not at all- and that'd be the point.  An external tool could take the
> PGXN extension, run 'make', then 'make install' (into a userland
> directory), extract out the script and then, with a little help from PG,
> run that script in "extension creation mode" via libpq.

What is stopping Extension Templates, as proposed, from being this
special "extension creation mode"? What would be a better design?

It seems like the porting issue is just a matter of finding someone to
write a tool to reliably translate packages from PGXN into a form
suitable to be sent using SQL commands; which we would need anyway for
this special mode.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why we are going to have to go DirectIO

2013-12-03 Thread Josh Berkus

All,

https://lkml.org/lkml/2013/11/24/133

What this means for us:

http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data

It seems clear that Kernel.org, since 2.6, has been in the business of
pushing major, hackish, changes to the IO stack without testing them or
even thinking too hard about what the side-effects might be.  This is
perhaps unsurprising given that two of the largest sponsors of the
Kernel -- who, incidentally, do 100% of the performance testing -- don't
use the IO stack.

This says to me that Linux will clearly be an undependable platform in
the future with the potential to destroy PostgreSQL performance without
warning, leaving us scrambling for workarounds.  Too bad the
alternatives are so unpopular.

I don't know where we'll get the resources to implement our own storage,
but it's looking like we don't have a choice.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
 wrote:
> On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse 
> wrote:
>>
>> Hi Fabrizio,
>>
>> looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
>> applies and compiles w/o errors or warnings. I set up a master and two
>> hot standbys replicating from the master, one with 5 minutes delay and
>> one without delay. After that I created a new database and generated
>> some test data:
>>
>> CREATE TABLE test (val INTEGER);
>> INSERT INTO test (val) (SELECT * FROM generate_series(0, 100));
>>
>> The non-delayed standby nearly instantly had the data replicated, the
>> delayed standby was replicated after exactly 5 minutes. I did not
>> notice any problems, errors or warnings.
>>
>
> Thanks for your review Christian...

So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift.  I view that as insufficient reason to reject the
feature, but others disagreed.  Unless some of those people have
changed their minds, I don't think this patch has much future here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Noah Misch

On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> > > I currently don't see fixing the errorneous freezing of lockers (not the
> > > updater though) without changing the wal format or synchronously waiting
> > > for all lockers to end. Which both see like a no-go?
> > 
> > Not fixing it at all is the real no-go.  We'd take both of those 
> > undesirables
> > before just tolerating the lost locks in 9.3.
> 
> I think it's changing the wal format then.

I'd rather have an readily-verifiable fix that changes WAL format than a
tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

> > The attached patch illustrates the approach I was describing earlier.  It
> > fixes the test case discussed above.  I haven't verified that everything 
> > else
> > in the system is ready for it, so this is just for illustration purposes.
> 
> That might be better than the current state because the potential damage
> from such not frozen locks would be to get "could not access status of
> transaction ..."  errors (*).

> *: which already are possible because we do not check multis properly
> against OldestVisibleMXactId[] anymore.

Separate issue.  That patch adds to the ways we can end up with an extant
multixact referencing an locker XID no longer found it CLOG, but it doesn't
introduce that problem.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 1:44 PM, Josh Berkus  wrote:
> All,
>
> https://lkml.org/lkml/2013/11/24/133
>
> What this means for us:
>
> http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data
>
> It seems clear that Kernel.org, since 2.6, has been in the business of
> pushing major, hackish, changes to the IO stack without testing them or
> even thinking too hard about what the side-effects might be.  This is
> perhaps unsurprising given that two of the largest sponsors of the
> Kernel -- who, incidentally, do 100% of the performance testing -- don't
> use the IO stack.
>
> This says to me that Linux will clearly be an undependable platform in
> the future with the potential to destroy PostgreSQL performance without
> warning, leaving us scrambling for workarounds.  Too bad the
> alternatives are so unpopular.
>
> I don't know where we'll get the resources to implement our own storage,
> but it's looking like we don't have a choice.

This seems like a strange reaction to an article that's mostly about
how Linux is now *fixing* a problem that could cause PostgreSQL to
experience performance problems.  I agree that we'll probably
eventually need to implement our own storage layer, but this article
isn't evidence of urgency so far as I can determine on first
read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Joshua D. Drake



On 12/03/2013 10:46 AM, Robert Haas wrote:


On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
 wrote:

On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse 
wrote:


Hi Fabrizio,

looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
applies and compiles w/o errors or warnings. I set up a master and two
hot standbys replicating from the master, one with 5 minutes delay and
one without delay. After that I created a new database and generated
some test data:

CREATE TABLE test (val INTEGER);
INSERT INTO test (val) (SELECT * FROM generate_series(0, 100));

The non-delayed standby nearly instantly had the data replicated, the
delayed standby was replicated after exactly 5 minutes. I did not
notice any problems, errors or warnings.



Thanks for your review Christian...


So, I proposed this patch previously and I still think it's a good
idea, but it got voted down on the grounds that it didn't deal with
clock drift.  I view that as insufficient reason to reject the
feature, but others disagreed.  Unless some of those people have
changed their minds, I don't think this patch has much future here.



I would agree that it is a good idea.

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 13:49:49 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> > On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > > Not fixing it at all is the real no-go.  We'd take both of those 
> > > undesirables
> > > before just tolerating the lost locks in 9.3.
> > 
> > I think it's changing the wal format then.
> 
> I'd rather have an readily-verifiable fix that changes WAL format than a
> tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

Well, who's going to write that then? I can write something up, but I
really would not like not to be solely responsible for it.

That means we cannot release 9.3 on schedule anyway, right?

> > > The attached patch illustrates the approach I was describing earlier.  It
> > > fixes the test case discussed above.  I haven't verified that everything 
> > > else
> > > in the system is ready for it, so this is just for illustration purposes.
> > >
> > That might be better than the current state because the potential damage
> > from such not frozen locks would be to get "could not access status of
> > transaction ..."  errors (*).
> >
> > *: which already are possible because we do not check multis properly
> > against OldestVisibleMXactId[] anymore.
>
> Separate issue.  That patch adds to the ways we can end up with an extant
> multixact referencing an locker XID no longer found it CLOG, but it doesn't
> introduce that problem.

Sure, that was an argument in favor of your idea, not against it ;).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Noah Misch

On Tue, Dec 03, 2013 at 10:03:32AM -0500, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > After sleeping on it, your other suggestion of TABLE OF, or possibly
> > TABLE FROM, is starting to grow on me.
> > 
> > Who else has an opinion?
> 
> Alright, for my 2c, I like having this syntax include 'TABLE' simply
> because it's what folks coming from Oracle might be looking for.
> Following from that, to keep it distinct from the spec's notion of
> 'TABLE', my preference is 'TABLE FROM'.  I don't particularly like
> 'TABLE OF', nor do I like the various 'ROWS' suggestions.

I like having "ROWS" in there somehow, because it denotes the distinction from
SQL-standard TABLE().  Suppose we were to implement the SQL-standard TABLE(),
essentially just mapping it to UNNEST().  Then we'd have "TABLE (f())" that
unpacks the single array returned by f(), and we'd have "TABLE FROM (f())"
that unpacks the set of rows returned by f().  The word "FROM" alone does not
indicate that difference the way including "ROWS" does.  (I don't object to
having "FROM" in addition to "ROWS".)

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Andres Freund

On 2013-12-03 15:40:44 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > I wondered about that myself. How would you suggest the format to look
> > like?
> > ISTM per tuple we'd need:
> > 
> > * OffsetNumber off
> > * uint16 infomask
> > * TransactionId xmin
> > * TransactionId xmax
> > 
> > I don't see why we'd need infomask2, but so far being overly skimpy in
> > that place hasn't shown itself to be the greatest idea?

> I don't see that we need the xmin; a simple bit flag indicating whether
> the Xmin was frozen should be enough.

Yea, that would work as well.

> For xmax we need more detail, as you propose.  In infomask, are you
> proposing to store the complete infomask, or just the flags being
> changed?  Note we have a set of bits used in other wal records,
> XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here.

Tentatively the complete one. I don't think we'd win enough by using
compute_infobits/fix_infomask_from_infobits and we'd need to extend the
bits stored in there unless we are willing to live with not transporting
XMIN/XMAX_COMMITTED which doesn't seem like a good idea.

Btw, why is it currently ok to modify the tuple in heap_freeze_tuple()
without being in a critical section?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Andres Freund

On 2013-12-03 13:46:28 -0500, Robert Haas wrote:
> On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello
>  wrote:
> > On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse 
> > wrote:
> >>
> >> Hi Fabrizio,
> >>
> >> looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It
> >> applies and compiles w/o errors or warnings. I set up a master and two
> >> hot standbys replicating from the master, one with 5 minutes delay and
> >> one without delay. After that I created a new database and generated
> >> some test data:
> >>
> >> CREATE TABLE test (val INTEGER);
> >> INSERT INTO test (val) (SELECT * FROM generate_series(0, 100));
> >>
> >> The non-delayed standby nearly instantly had the data replicated, the
> >> delayed standby was replicated after exactly 5 minutes. I did not
> >> notice any problems, errors or warnings.
> >>
> >
> > Thanks for your review Christian...
> 
> So, I proposed this patch previously and I still think it's a good
> idea, but it got voted down on the grounds that it didn't deal with
> clock drift.  I view that as insufficient reason to reject the
> feature, but others disagreed.

I really fail to see why clock drift should be this patch's
responsibility. It's not like the world would go under^W data corruption
would ensue if the clocks drift. Your standby would get delayed
imprecisely. Big deal. From what I know of potential users of this
feature, they would set it to at the very least 30min - that's WAY above
the range for acceptable clock-drift on servers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-03 Thread Joshua D. Drake



On 12/03/2013 10:44 AM, Josh Berkus wrote:


All,

https://lkml.org/lkml/2013/11/24/133

What this means for us:

http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data

It seems clear that Kernel.org, since 2.6, has been in the business of
pushing major, hackish, changes to the IO stack without testing them or
even thinking too hard about what the side-effects might be.  This is
perhaps unsurprising given that two of the largest sponsors of the
Kernel -- who, incidentally, do 100% of the performance testing -- don't
use the IO stack.

This says to me that Linux will clearly be an undependable platform in
the future with the potential to destroy PostgreSQL performance without
warning, leaving us scrambling for workarounds.  Too bad the
alternatives are so unpopular.

I don't know where we'll get the resources to implement our own storage,
but it's looking like we don't have a choice.



This seems rather half cocked. I read the article. They found a problem, 
that really will only affect a reasonably small percentage of users, 
created a test case, reported it, and a patch was produced.


Kind of like how we do it.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

2013-12-03 Thread Robert Haas

On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian  wrote:
> On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
>> > I wonder if we ought to mark each page as all-visible in
>> > raw_heap_insert() when we first initialize it, and then clear the flag
>> > when we come across a tuple that isn't all-visible.  We could try to
>> > set hint bits on the tuple before placing it on the page, too, though
>> > I'm not sure of the details.
>>
>> I went with the per-page approach because I wanted to re-use the vacuum
>> lazy function.  Is there some other code that does this already?  I am
>> trying to avoid yet-another set of routines that would need to be
>> maintained or could be buggy.  This hit bit setting is tricky.
>>
>> And thanks much for the review!
>
> So, should I put this in the next commit fest?

+1.

> I still have an unknown
> about the buffer number to use here:
>
> !   /* XXX use 0 or real offset? */
> !   ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
> !  BufferGetBlockNumber(buf) : 0, offnum);
>
> Is everyone else OK with this approach?  Updated patch attached.

I started looking at this a little more the other day but got bogged
down in other things before I got through it all.  I think we're going
to want to revise this so that it doesn't go through the functions
that current assume that they're always deal with a shared_buffer, but
I haven't figured out exactly what the most elegant way of doing that
is yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Select query performance and shared buffers

2013-12-03 Thread Metin Doslu

We have several independent tables on a multi-core machine serving Select
queries. These tables fit into memory; and each Select queries goes over
one table's pages sequentially. In this experiment, there are no indexes or
table joins.

When we send concurrent Select queries to these tables, query performance
doesn't scale out with the number of CPU cores. We find that complex Select
queries scale out better than simpler ones. We also find that increasing
the block size from 8 KB to 32 KB, or increasing shared_buffers to include
the working set mitigates the problem to some extent.

For our experiments, we chose an 8-core machine with 68 GB of memory from
Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and
set shared_buffers to 4 GB.

We then generated 1, 2, 4, and 8 separate tables using the data generator
from the industry standard TPC-H benchmark. Each table we generated, called
lineitem-1, lineitem-2, etc., had about 750 MB of data. Next, we sent 1, 2,
4, and 8 concurrent Select queries to these tables to observe the scale out
behavior. Our expectation was that since this machine had 8 cores, our run
times would stay constant all throughout. Also, we would have expected the
machine's CPU utilization to go up to 100% at 8 concurrent queries. Neither
of those assumptions held true.

We found that query run times degraded as we increased the number of
concurrent Select queries. Also, CPU utilization flattened out at less than
50% for the simpler queries. Full results with block size of 8KB are below:

 Table select count(*)TPC-H Simple (#6)[2]
 TPC-H Complex (#1)[1]
1 Table  / 1 query   1.5 s2.5 s
 8.4 s
2 Tables / 2 queries 1.5 s2.5 s
 8.4 s
4 Tables / 4 queries 2.0 s2.9 s
 8.8 s
8 Tables / 8 queries 3.3 s4.0 s
 9.6 s

We then increased the block size (BLCKSZ) from 8 KB to 32 KB and recompiled
PostgreSQL. This change had a positive impact on query completion times.
Here are the new results with block size of 32 KB:

 Table select count(*)TPC-H Simple (#6)[2]
 TPC-H Complex (#1)[1]
1 Table  / 1 query   1.5 s2.3 s
 8.0 s
2 Tables / 2 queries 1.5 s2.3 s
 8.0 s
4 Tables / 4 queries 1.6 s2.4 s
 8.1 s
8 Tables / 8 queries 1.8 s2.7 s
 8.3 s

As a quick side, we also repeated the same experiment on an EC2 instance
with 16 CPU cores, and found that the scale out behavior became worse
there. (We also tried increasing the shared_buffers to 30 GB. This change
completely solved the scaling out problem on this instance type, but hurt
our performance on the hi1.4xlarge instances.)

Unfortunately, increasing the block size from 8 to 32 KB has other
implications for some of our customers. Could you help us out with the
problem here?

What can we do to identify the problem's root cause? Can we work around it?

Thank you,
Metin

[1] http://examples.citusdata.com/tpch_queries.html#query-1
[2] http://examples.citusdata.com/tpch_queries.html#query-6

[HACKERS] Problem with displaying "wide" tables in psql

2013-12-03 Thread Sergey Muraviov

Hi.

Psql definitely have a problem with displaying "wide" tables.
Even in expanded mode, they look horrible.
So I tried to solve this problem.

Before the patch:
postgres=# \x 1
Expanded display (expanded) is on.
postgres=# \pset border 2
Border style (border) is 2.
postgres=# select * from pg_stats;

+-[ RECORD 1
]---+--
















































--+
| schemaname | pg_catalog

































































































  |
| tablename  | pg_proc

...

and after:

+-[ RECORD 1
]---+-+
| schemaname | pg_catalog
   |
| tablename  | pg_proc
|
| attname| proname
|
| inherited  | f
|
| null_frac  | 0
|
| avg_width  | 64
   |
| n_distinct | -0.823159
|
| most_common_vals   |
{max,min,overlaps,has_column_privileg

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-03 Thread Alvaro Herrera

Andres Freund wrote:
> On 2013-12-03 13:49:49 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> > > On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > > > Not fixing it at all is the real no-go.  We'd take both of those 
> > > > undesirables
> > > > before just tolerating the lost locks in 9.3.
> > > 
> > > I think it's changing the wal format then.
> > 
> > I'd rather have an readily-verifiable fix that changes WAL format than a
> > tricky fix that avoids doing so.  So, modulo not having seen the change, +1.
> 
> Well, who's going to write that then? I can write something up, but I
> really would not like not to be solely responsible for it.

I will give this a shot.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying "wide" tables in psql

2013-12-03 Thread Pavel Stehule

Hello

do you know a pager less trick

http://merlinmoncure.blogspot.cz/2007/10/better-psql-with-less.html

Regards

Pavel Stehule


2013/12/3 Sergey Muraviov 

> Hi.
>
> Psql definitely have a problem with displaying "wide" tables.
> Even in expanded mode, they look horrible.
> So I tried to solve this problem.
>
> Before the patch:
> postgres=# \x 1
> Expanded display (expanded) is on.
> postgres=# \pset border 2
> Border style (border) is 2.
> postgres=# select * from pg_stats;
>
> +-[ RECORD 1
> ]---+--
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
> --+
> | schemaname | pg_catalog
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-03 Thread Josh Berkus

On 12/03/2013 10:59 AM, Joshua D. Drake wrote:
> This seems rather half cocked. I read the article. They found a problem,
> that really will only affect a reasonably small percentage of users,
> created a test case, reported it, and a patch was produced.

"Users with at least one file bigger than 50% of RAM" is unlikely to be
a small percentage.

> 
> Kind of like how we do it.

I like to think we'd have at least researched the existing literature on
2Q algorithms (which is extensive) before implementing our own.  Oh,
wait, we *did*.  Nor is this the first ill-considered performance hack
pushed into mainline kernels without any real testing.  It's not even
the first *that year*.

While I am angry over this -- no matter what Kernel.org fixes now, we're
going to have to live with their mistake for 3 years -- the DirectIO
thing isn't just me; when I've gone to Linux Kernel events to talk about
IO, that's the response I've gotten from most Linux hackers: "you
shouldn't be using the filesystem, use DirectIO and implement your own
storage."

That's why they don't feel that it's a problem to break the IO stack;
they really don't believe that anyone who cares about performance should
be using it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Jeff Davis (pg...@j-davis.com) wrote:
> This sounds like Inline Extensions to me, which was previously proposed.

I've not looked at that proposal very carefully, but I agree that what
we're describing is a lot closer to 'inline extensions' than 'extension
templates'.

> If I recall, that proposal trailed off because of issues with
> dump/reload. If you dump the contents of the extension, it's not really
> an extension; but if you don't, then the administrator can't back up the
> database (because he might not have all of the extension templates for
> the extensions installed). That's when the idea appeared for extension
> templates stored in the catalog, so that the administrator would always
> have all of the necessary templates present.

When it comes to dump/reload, I'd much rather see a mechanism which uses
our deep understanding of the extension's objects (as database objects)
to implement the dump/reload than a text blob which is carried forward
from major version to major version and may even fail to run.  I
realize that's different from extension files which are out on the
filesystem, but I do not see that as a bad thing.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Tom Lane

Noah Misch  writes:
> On Tue, Dec 03, 2013 at 10:03:32AM -0500, Stephen Frost wrote:
>> Alright, for my 2c, I like having this syntax include 'TABLE' simply
>> because it's what folks coming from Oracle might be looking for.
>> Following from that, to keep it distinct from the spec's notion of
>> 'TABLE', my preference is 'TABLE FROM'.  I don't particularly like
>> 'TABLE OF', nor do I like the various 'ROWS' suggestions.

> I like having "ROWS" in there somehow, because it denotes the distinction from
> SQL-standard TABLE().  Suppose we were to implement the SQL-standard TABLE(),
> essentially just mapping it to UNNEST().  Then we'd have "TABLE (f())" that
> unpacks the single array returned by f(), and we'd have "TABLE FROM (f())"
> that unpacks the set of rows returned by f().  The word "FROM" alone does not
> indicate that difference the way including "ROWS" does.

Hm ... fair point, except that "ROWS" doesn't seem to suggest the right
thing either, at least not to me.  After further thought I've figured
out what's been grating on me about Noah's suggestions: he suggests that
we're distinguishing "TABLE [FROM ELEMENTS]" from "TABLE FROM ROWS",
but this is backwards.  What UNNEST() really does is take an array,
extract the elements, and make a table of those.  Similarly, what our
feature does is take a set (the result of a set-returning function),
extract the rows, and make a table of those.  So what would seem
appropriate to me is "TABLE [FROM ARRAY]" versus "TABLE FROM SET".
Now I find either of those phrases to be one word too many, but the key
point is that I'd probably prefer something involving SET over something
involving ROWS.  (Both of those are unreserved_keyword, so this doesn't
move the ball at all in terms of finding an unambiguous syntax.)

Another issue is that if you are used to the Oracle syntax, in which an
UNNEST() is presumed, it's not exactly clear that TABLE ROWS, or any other
phrase including TABLE, *doesn't* also imply an UNNEST.  So to me that's
kind of a strike against Stephen's preference --- I'm thinking we might be
better off not using the word TABLE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Tom Lane

Stephen Frost  writes:
> When it comes to dump/reload, I'd much rather see a mechanism which uses
> our deep understanding of the extension's objects (as database objects)
> to implement the dump/reload than a text blob which is carried forward
> from major version to major version and may even fail to run.

Note that we're already doing that in the binary_upgrade code path.
I agree that generalizing that approach sounds like a better idea
than keeping a text blob around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

2013-12-03 Thread Bruce Momjian

On Tue, Dec  3, 2013 at 02:01:52PM -0500, Robert Haas wrote:
> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian  wrote:
> > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
> >> > I wonder if we ought to mark each page as all-visible in
> >> > raw_heap_insert() when we first initialize it, and then clear the flag
> >> > when we come across a tuple that isn't all-visible.  We could try to
> >> > set hint bits on the tuple before placing it on the page, too, though
> >> > I'm not sure of the details.
> >>
> >> I went with the per-page approach because I wanted to re-use the vacuum
> >> lazy function.  Is there some other code that does this already?  I am
> >> trying to avoid yet-another set of routines that would need to be
> >> maintained or could be buggy.  This hit bit setting is tricky.
> >>
> >> And thanks much for the review!
> >
> > So, should I put this in the next commit fest?
> 
> +1.

OK, I added it to the January commit fest.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time-Delayed Standbys

2013-12-03 Thread Simon Riggs

On 18 October 2013 19:03, Fabrízio de Royes Mello
 wrote:

> The attached patch is a continuation of Robert's work [1].

Reviewing v2...

> I made some changes:
> - use of Latches instead of pg_usleep, so we don't have to wakeup regularly.

OK

> - call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because
> might change the trigger file's location

OK

> - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
> XLOG_XACT_COMMIT_COMPACT checks

Why just those? Why not aborts and restore points also?

> - don't care about clockdrift because it's an admin problem.

Few minor points on things

* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.

* The loop exit in recoveryDelay() is inelegant, should break if <= 0

* There's a spelling mistake in sample

* The patch has whitespace in one place

and one important point...

* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delaying

if (xlogctl->recoveryPause)
  recoveryPausesHere();

We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.

I think recovery_time_delay should be called
_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.

And to make the name consistent with other parameters, I suggest
min_standby_apply_delay

We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.

We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.

I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-03 Thread Stephen Frost

* Jeff Davis (pg...@j-davis.com) wrote:
> On Tue, 2013-12-03 at 09:20 -0500, Stephen Frost wrote:
> > * Jeff Davis (pg...@j-davis.com) wrote:
> > > Stephen mentioned using external tools and/or metadata, but to me that
> > > sounds like it would require porting the extension away from what's on
> > > PGXN today.
> > 
> > Not at all- and that'd be the point.  An external tool could take the
> > PGXN extension, run 'make', then 'make install' (into a userland
> > directory), extract out the script and then, with a little help from PG,
> > run that script in "extension creation mode" via libpq.
> 
> What is stopping Extension Templates, as proposed, from being this
> special "extension creation mode"? What would be a better design?

The extra catalog tables which store SQL scripts in text columns is one
of my main objections to the as-proposed Extension Templates.  I view
those scripts as a poor man's definition of database objects which are
defined properly in the catalog already.  The other big issue is that
there isn't an easy way to see how we could open up the ability to
create extensions to non-superusers with this approach.

What I think we should really be mulling over is if we need anything
further when it comes to non-superuser extensions; a new namespace (eg:
schemas for extensions, or maybe prefix for user extensions, or just a
notion of ownership which gets combined with the name when doing
operations with an extension)?  a new name (not extensions, but
something else)?

> It seems like the porting issue is just a matter of finding someone to
> write a tool to reliably translate packages from PGXN into a form
> suitable to be sent using SQL commands; which we would need anyway for
> this special mode.

That's what I was thinking and hoping. :)  Of course, we haven't yet
figured out exactly what we want this special mode to look like, so it's
a bit tricky to ask anyone to write such a tool.  I keep thinking this
should be something like: create a schema, set the search path to that
schema, run the extension script more-or-less as is, then 'register'
that schema as being an extension with a certain version.  That
'registration' process could also handle renaming the schema, if the
user wants the extension in a different schema (or perhaps the initial
schema was some kind of "temporary" schema) or moving the objects into
an existing schema, if that's what is requested.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-12-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Another issue is that if you are used to the Oracle syntax, in which an
> UNNEST() is presumed, it's not exactly clear that TABLE ROWS, or any other
> phrase including TABLE, *doesn't* also imply an UNNEST.  So to me that's
> kind of a strike against Stephen's preference --- I'm thinking we might be
> better off not using the word TABLE.

I see the concern there, but I would think a bit of documentation around
that would help them find UNNEST quickly, if that's what they're really
looking for.  On the flip side, I imagine it could be jarring seeing
'TABLE FROM' when you're used to Oracle's 'TABLE'.

I haven't got any great suggestions about how to incorporate 'SET' and I
I do still like 'TABLE' as that's what we're building, but I'll be happy
to have this capability even if it's 'TABLE FROM SET ROWS THING'.

Thanks,

Stephen

signature.asc
Description: Digital signature

1 2 >

1 - 100 of 150 matches

Mail list logo