Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-20 Thread Jeff Davis
On Tue, 2014-08-19 at 12:54 +0200, Tomas Vondra wrote:
 The use-case for this is tracking a chosen subtree of contexts - e.g.
 aggcontext and below, so I'd expect the tracked subtrees to be relatively
 shallow. Am I right?

Right.

 My fear is that by removing the inheritance bit, we'll hurt cases with a
 lot of child contexts. For example, array_agg currently creates a separate
 context for each group - what happens if you have 100k groups and do
 MemoryContextGetAllocated? I guess iterating over 100k groups is not free.

Good point. We don't want to make checking the allocated space into an
expensive operation, because then we will be forced to call it less
frequently, which implies that we'd be sloppier about actually fitting
in work_mem.

 Wouldn't the solution with inheritance and propagating the accounting info
 to the parent actually better? Or maybe allowing both, having two flags
 when creating a context instead of one?

That seems overly complicated. We only have one driving use case, so two
orthogonal options sounds excessive. Perhaps one option if we can't
solve the performance problem and we need to isolate the changes to
hashagg.

 Also, do we really need to track allocated bytes - couldn't we track
 kilobytes or something and use smaller data types to get below the 64B?

Good idea.

I attached a patch that uses two uint32 fields so that it doesn't
increase the size of MemoryContextData, and it tracks memory usage for
all contexts. I was unable to detect any performance regression vs.
master, but on my machine the results are noisy.

It would be easier to resolve the performance concern if I could
reliably get the results Robert is getting. I think I was able to
reproduce the regression with the old patch, but the results were still
noisy.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 242,247  typedef struct AllocChunkData
--- 242,249 
  #define AllocChunkGetPointer(chk)	\
  	((AllocPointer)(((char *)(chk)) + ALLOC_CHUNKHDRSZ))
  
+ static void update_allocation(MemoryContext context, size_t size);
+ 
  /*
   * These functions implement the MemoryContext API for AllocSet contexts.
   */
***
*** 250,256  static void AllocSetFree(MemoryContext context, void *pointer);
  static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
  static void AllocSetInit(MemoryContext context);
  static void AllocSetReset(MemoryContext context);
! static void AllocSetDelete(MemoryContext context);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
  static void AllocSetStats(MemoryContext context, int level);
--- 252,258 
  static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
  static void AllocSetInit(MemoryContext context);
  static void AllocSetReset(MemoryContext context);
! static void AllocSetDelete(MemoryContext context, MemoryContext parent);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
  static void AllocSetStats(MemoryContext context, int level);
***
*** 500,505  AllocSetContextCreate(MemoryContext parent,
--- 502,510 
  	 errdetail(Failed while creating memory context \%s\.,
  			   name)));
  		}
+ 
+ 		update_allocation((MemoryContext) context, blksize);
+ 
  		block-aset = context;
  		block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block-endptr = ((char *) block) + blksize;
***
*** 590,595  AllocSetReset(MemoryContext context)
--- 595,601 
  		else
  		{
  			/* Normal case, release the block */
+ 			update_allocation(context, -(block-endptr - ((char*) block)));
  #ifdef CLOBBER_FREED_MEMORY
  			wipe_mem(block, block-freeptr - ((char *) block));
  #endif
***
*** 611,617  AllocSetReset(MemoryContext context)
   * But note we are not responsible for deleting the context node itself.
   */
  static void
! AllocSetDelete(MemoryContext context)
  {
  	AllocSet	set = (AllocSet) context;
  	AllocBlock	block = set-blocks;
--- 617,623 
   * But note we are not responsible for deleting the context node itself.
   */
  static void
! AllocSetDelete(MemoryContext context, MemoryContext parent)
  {
  	AllocSet	set = (AllocSet) context;
  	AllocBlock	block = set-blocks;
***
*** 623,628  AllocSetDelete(MemoryContext context)
--- 629,644 
  	AllocSetCheck(context);
  #endif
  
+ 	/*
+ 	 * Parent is already unlinked from context, so can't use
+ 	 * update_allocation().
+ 	 */
+ 	while (parent != NULL)
+ 	{
+ 		parent-total_allocated -= context-total_allocated;
+ 		parent = parent-parent;
+ 	}
+ 
  	/* Make it look empty, just in case... */
  	MemSetAligned(set-freelist, 0, sizeof(set-freelist));
  	set-blocks = NULL;
***
*** 678,683  AllocSetAlloc(MemoryContext context, Size size)
--- 

Re: [HACKERS] Trove with PostgreSQL-XC

2014-08-20 Thread Mark Kirkwood

On 19/08/14 23:14, Vivek Singh Raghuwanshi wrote:


Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but
i am not sure about XC.


AFAIK [1], vanilla Postgres support for Trove is still at the review 
stage, and there is quite a bit to finish, and some stuff to fix.


I'd think that getting Trove to provision a complete XC VM (or more 
sensibly a set of VMs that run XC) is going to require fair bit of extra 
work.


It looks like replication will be deployable via Trove for the supported 
flavours (Mysql, Mongo, Cassandra, Counch, Redis) see 
https://wiki.openstack.org/wiki/Trove/Replication-And-Clustering . I'm 
not seeing any specific discussion of sharding notice.


Cheers

Mark

[1] I've been helping debug the Postgres Trove patch



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-08-20 Thread Dilip kumar
15 July 2014 19:29 Amit Kapila Wrote,

Implementation details:
---
1. This feature is implemented only for tar format in windows
as native windows utilites are not able to create symlinks while
extracting files from tar (It might be possible to create symlinks
if cygwin is installed on your system, however I feel we need this
feature to work for native windows as well).  Another reason to not
create it for non-tar (plain) format is that plain format can update the
symlinks via -T option and backing up symlink file during that
operation can lead to spurious symlinks after archive recovery.

I have reviewed the patch and did not find any major comments.

There are some comments I would like to share with you


1.  Rebase the patch to current GIT head.



2.  +  * Construct symlink file

+  */

+  initStringInfo(symlinkfbuf);
I think declaration and initialization of symlinkfbuf string can be 
moved under #ifdef WIN32 compile time macro,
for other platform it’s simply allocated and freed which can be avoided.


3.  +  /*

+  * native windows utilites are not able 
create symlinks while

+  * extracting files from tar.

+  */

Rephrase the above sentence and fix spelling mistake  (utilities 
are not able to create)

I haven’t done the testing yet, once I finish with testing i will share the 
result with you.


Regards,
Dilip



Re: [HACKERS] KNN searches support for SP-GiST [GSOC'14]

2014-08-20 Thread Heikki Linnakangas

On 08/20/2014 03:35 AM, Vladislav Sterzhanov wrote:

Hi there, pg-Hackers!
Here I go with the patch which brings up the possibility to perform
nearest-neighbour searches on SP-GiSTs (as of now includes implementation
for quad and kd trees). Pre-reviewed by my GSoC mentor Alexander Korotkov.
Sample benchmarking script included in the attachment (dumps the current
geonames archive and runs several searches on the (latitude, longitude)
points), which demonstrates the dramatic improvements against plain
searches and sorting.


Cool!

I think this needs a high-level description in access/spgist/README on 
how this works. You can probably copy-paste the similar description from 
gist's README, because it's the same design. If I understand correctly, 
the support functions return distances between the nodes and the query, 
and the SP-GiST code uses those distances to return the rows in order. 
An important detail there is that the distance returned for an inner 
node is a lower bound of the distance of any node in that branch.


A binary heap would be a better data structure than a red-black tree for 
the queue of nodes to visit/return. I know that GiST also uses a 
red-black for the same thing, but that's no reason not to do better 
here. There is a binary heap implementation in the backend too (see 
src/backend/lib/binaryheap.c), so it's not any more difficult to use.


Does the code handle ties correctly? The distance function is just an 
approximation, so it's possible that there are two items with same 
distance, but are not equal according to the ordering operator. As an 
extreme example, if you hack the distance function to always return 0.0, 
do you still get correct results (albeit slowly)?


The new suppLen field in spgConfigOut is not mentioned in the docs. Why 
not support pass-by-value supplementary values?


A couple of quick comments on the code:


@@ -137,8 +138,8 @@ spg_quad_picksplit(PG_FUNCTION_ARGS)
 {
spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0);
spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1);
-   int i;
-   Point  *centroid;
+   int i;
+   Point *centroid;

 #ifdef USE_MEDIAN
/* Use the median values of x and y as the centroid point */


Please remove all unnecessary whitespace changes like this.


@@ -213,14 +215,19 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
}

Assert(in-nNodes == 4);
+   
+   if (in-norderbys  0)
+   {
+   out-distances = palloc(in-nNodes * sizeof (double *));
+   }


I think that should be sizeof(double).


+   *distances = (double *) malloc(norderbys * sizeof (double *));


No mallocs allowed in the backend, should be palloc.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] August commitfest

2014-08-20 Thread Heikki Linnakangas
Per the schedule agreed on in the Developer Meeting in Ottawa, the 
August commitfest began five days ago.


We don't seem to have a commitfest manager, and the commitfest has so 
far failed to manage itself. To get things going, I'm picking up the 
Commitfest Manager Mace I found from behind the dumpster now.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] August commitfest

2014-08-20 Thread David Rowley
On Wed, Aug 20, 2014 at 7:17 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 Per the schedule agreed on in the Developer Meeting in Ottawa, the August
 commitfest began five days ago.

 We don't seem to have a commitfest manager, and the commitfest has so far
 failed to manage itself. To get things going, I'm picking up the Commitfest
 Manager Mace I found from behind the dumpster now.


Thanks Heikki,
I was wondering if you knew what the plan is for the remaining ready for
committer patches from the June commitfest?

Regards

David Rowley


Re: [HACKERS] New Model For Role Attributes and Fine Grained Permssions

2014-08-20 Thread Heikki Linnakangas

On 08/19/2014 04:27 AM, Brightwell, Adam wrote:

Hi All,

This is a proof-of-concept patch for a new model around role attributes
and fine grained permissions meant to alleviate the current over dependence
on superuser.


Hmm. How does this get us any closer to fine-grained permissions? I 
guess we need this, so that you can grant/revoke the permissions, but I 
thought the hard part is defining what the fine-grained permissions are, 
in a way that you can't escalate yourself to full superuser through any 
of them.


The new syntax could be useful even if we never get around to do 
anything about current superuser checks, so I'm going to give this a 
quick review just on its own merits.


Please add documentation. That's certainly required before this can be 
committed, but it'll make reviewing the syntax much easier too.



This is not yet complete and only serves as a proof-of-concept at this
point, but I wanted to share it in the hopes of receiving comments,
suggestions and general feedback.

The general gist of this patch is as follows:

* New system catalog pg_permission that relates role id's to permissions.

* New syntax.
   - GRANT permission TO role
   - REVOKE permission FROM role
where, permission is one of an enumerated value, such as CREATE ROLE or
CREATE DATABASE.


The syntax for GRANT/REVOKE is quite complicated already. Do we want to 
overload it even more? Also keep in mind that the SQL standard specifies 
GRANT/REVOKE, so we have to be careful to not clash with the SQL 
standard syntax, or any conceivable future syntax the SQL committee 
might add to it (although we have plenty of PostgreSQL extensions in it 
already). For example, this is currently legal:


GRANT CREATE ALL ON TABLESPACE typename TO role

Is that too close to the new syntax, to cause confusion? Does the new 
syntax work for all the more fine-grained permissions you have in mind? 
I'm particularly worried of conflicts with this existing syntax:


GRANT role_name [, ...] TO role_name [, ...] [ WITH ADMIN OPTION ]

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] August commitfest

2014-08-20 Thread Heikki Linnakangas

On 08/20/2014 10:42 AM, David Rowley wrote:

On Wed, Aug 20, 2014 at 7:17 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



Per the schedule agreed on in the Developer Meeting in Ottawa, the August
commitfest began five days ago.

We don't seem to have a commitfest manager, and the commitfest has so far
failed to manage itself. To get things going, I'm picking up the Commitfest
Manager Mace I found from behind the dumpster now.



Thanks Heikki,
I was wondering if you knew what the plan is for the remaining ready for
committer patches from the June commitfest?


There is no plan. I just moved them to the August commitfest, hopefully 
they will actually get committed or rejected by a committer.


I'm doing a quick pass through the list of patches now, and after that 
I'll start nagging people to get stuff reviewed/committed. (There are a 
bunch of patches that are in Waiting on author state, and no activity 
for months; I'm marking them directly as Returned with feedback.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2014-08-20 Thread Heikki Linnakangas

On 07/07/2014 11:46 AM, Abhijit Menon-Sen wrote:

At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote:


Other than some minor comments as mentioned below, I don't have any
more issues, it looks all good.


Thank you. (Updated patch attached.)


I don't think the new GetBufferWithoutRelcache function is in line with 
the existing ReadBuffer API. I think it would be better to add a new 
ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's 
already in cache, and InvalidBuffer otherwise.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] August commitfest

2014-08-20 Thread Heikki Linnakangas
I have cleaned up the Commitfest of patches that have been in Waiting 
for Author state for weeks or more. I believe the list is now an 
accurate representation of the actual state of the patches.


Patch authors
-

Please check if a patch of yours is in Waiting for Author state. It 
means that you need to amend the patch and post a new version. If you do 
nothing, the patch will be marked returned with feedback.


Also, please pick a patch or two, submitted by other people, and review 
them.


Reviewers
-

Please review patches. Every little helps.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-08-20 Thread Heikki Linnakangas

On 08/15/2014 04:58 PM, Bruce Momjian wrote:

On Fri, Aug 15, 2014 at 10:40:59PM +0900, Fujii Masao wrote:

Heh.  I'm not particularly averse to changing this, but I guess I
don't see any particular benefit of changing it either.  Either comma
or comma-space is a legal separator, so why worry about it?


This change might cause me to update the existing documents (which
I need to maintain in my company) including the output example of
default search_path. If the change is for the improvement, I'd be
happy to do that, but it seems not.

Also there might be some PostgreSQL extensions which their regression test
shows the default search_path. This patch would make their developers
spend the time to update the test. I'm sure that they are fine with that if
it's for an improvement. But not.


Well, rename GUC often too for clearity, so I don't see adjusting
white-space as something to avoid either.  It is always about short-term
adjustments vs. long-term clarity.


I think this is an improvement, although a really minor one. Although 
Robert  Fujii questioned if this is worth it, I didn't hear anyone 
objecting strongly, so committed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-08-20 Thread Heikki Linnakangas

Hi Andres,

Are you planning to continue working on this? Summarizing the discussion 
so far:


* Robert listed a bunch of little cleanup tasks 
(http://www.postgresql.org/message-id/ca+tgmozshvvqjakul6p3kdvzvpibtgkzoti3m+fvvjg5v+x...@mail.gmail.com). 
Amit posted yet more detailed commends.


* We talked about changing the file layout. I think everyone is happy 
with your proposal here: 
http://www.postgresql.org/message-id/20140702131729.gp21...@awork2.anarazel.de, 
with an overview description of what goes where 
(http://www.postgresql.org/message-id/CA+TgmoYuxfsm2dSy48=jgutrh1mozrvmjlhgqrfku7_wpv-...@mail.gmail.com)


* Talked about nested spinlocks. The consensus seems to be to (continue 
to) forbid nested spinlocks, but allow atomic operations while holding a 
spinlock. When atomics are emulated with spinlocks, it's OK to acquire 
the emulation spinlock while holding another spinlock.


* Talked about whether emulating atomics with spinlocks is a good idea. 
You posted performance results showing that at least with the patch to
use atomics to implement LWLocks, the emulation performs more or less 
the same as the current spinlock-based implementation. I believe 
everyone was more or less satisfied with that.



So ISTM we have consensus that the approach to spinlock emulation and 
nesting stuff is OK. To finish the patch, you'll just need to address 
the file layout and the laundry list of other little details that have 
been raised.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-20 Thread Heikki Linnakangas

On 07/25/2014 07:10 PM, Alexey Klyukin wrote:

Greetings,

I'd like to propose a patch for checking subject alternative names entry in
the SSL certificate for DNS names during SSL authentication.


Thanks! I just ran into this missing feature last week, while working on 
my SSL test suite. So +1 for having the feature.


This patch needs to be rebased over current master branch, thanks to my 
refactoring that moved all OpenSSL-specific stuff to be-secure-openssl.c.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Compute attr_needed for child relations (was Re: [HACKERS] inherit support for foreign tables)

2014-08-20 Thread Etsuro Fujita

Hi Ashutish,

(2014/08/14 22:30), Ashutosh Bapat wrote:

On Thu, Aug 14, 2014 at 10:05 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:

(2014/08/08 18:51), Etsuro Fujita wrote:
  (2014/06/30 22:48), Tom Lane wrote:
  I wonder whether it isn't time to change that.  It was coded
like that
  originally only because calculating the values would've been a
waste of
  cycles at the time.  But this is at least the third place
where it'd be
  useful to have attr_needed for child rels.

  I've revised the patch.

There was a problem with the previous patch, which will be described
below.  Attached is the updated version of the patch addressing that.



Here are some more comments


Thank you for the further review!


attr_needed also has the attributes used in the restriction clauses as
seen in distribute_qual_to_rels(), so, it looks unnecessary to call
pull_varattnos() on the clauses in baserestrictinfo in functions
check_selective_binary_conversion(), remove_unused_subquery_outputs(),
check_index_only().


IIUC, I think it's *necessary* to do that in those functions since the 
attributes used in the restriction clauses in baserestrictinfo are not 
added to attr_needed due the following code in distribute_qual_to_rels.


/*
 * If it's a join clause (either naturally, or because delayed by
 * outer-join rules), add vars used in the clause to targetlists of 
their

 * relations, so that they will be emitted by the plan nodes that scan
 * those relations (else they won't be available at the join node!).
 *
 * Note: if the clause gets absorbed into an EquivalenceClass then this
 * may be unnecessary, but for now we have to do it to cover the case
 * where the EC becomes ec_broken and we end up reinserting the 
original

 * clauses into the plan.
 */
if (bms_membership(relids) == BMS_MULTIPLE)
{
List   *vars = pull_var_clause(clause,
   PVC_RECURSE_AGGREGATES,
   PVC_INCLUDE_PLACEHOLDERS);

add_vars_to_targetlist(root, vars, relids, false);
list_free(vars);
}


Although in case of RTE_RELATION, the 0th entry in attr_needed
corresponds to FirstLowInvalidHeapAttributeNumber + 1, it's always safer
to use it is RelOptInfo::min_attr, in case get_relation_info() wants to
change assumption or somewhere down the line some other part of code
wants to change attr_needed information. It may be unlikely, that it
would change, but it will be better to stick to comments in RelOptInfo
  443 AttrNumber  min_attr;   /* smallest attrno of rel (often
0) */
  444 AttrNumber  max_attr;   /* largest attrno of rel */
  445 Relids *attr_needed;/* array indexed [min_attr ..
max_attr] */


Good point!  Attached is the revised version of the patch.

Thanks,

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 799,806  check_selective_binary_conversion(RelOptInfo *baserel,
  	}
  
  	/* Collect all the attributes needed for joins or final output. */
! 	pull_varattnos((Node *) baserel-reltargetlist, baserel-relid,
!    attrs_used);
  
  	/* Add all the attributes used by restriction clauses. */
  	foreach(lc, baserel-baserestrictinfo)
--- 799,810 
  	}
  
  	/* Collect all the attributes needed for joins or final output. */
! 	for (i = baserel-min_attr; i = baserel-max_attr; i++)
! 	{
! 		if (!bms_is_empty(baserel-attr_needed[i - baserel-min_attr]))
! 			attrs_used = bms_add_member(attrs_used,
! 		i - FirstLowInvalidHeapAttributeNumber);
! 	}
  
  	/* Add all the attributes used by restriction clauses. */
  	foreach(lc, baserel-baserestrictinfo)
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***
*** 577,588  set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
  		childrel-has_eclass_joins = rel-has_eclass_joins;
  
  		/*
! 		 * Note: we could compute appropriate attr_needed data for the child's
! 		 * variables, by transforming the parent's attr_needed through the
! 		 * translated_vars mapping.  However, currently there's no need
! 		 * because attr_needed is only examined for base relations not
! 		 * otherrels.  So we just leave the child's attr_needed empty.
  		 */
  
  		/*
  		 * Compute the child's size.
--- 577,585 
  		childrel-has_eclass_joins = rel-has_eclass_joins;
  
  		/*
! 		 * Compute the child's attr_needed.
  		 */
+ 		adjust_appendrel_attr_needed(rel, childrel, appinfo);
  
  		/*
  		 * Compute the child's size.
***
*** 2173,2178  remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel)
--- 2170,2176 
  {
  	Bitmapset  *attrs_used = NULL;
  	ListCell   *lc;
+ 	int			i;
  
  	/*
  	 * Do nothing if subquery has UNION/INTERSECT/EXCEPT: in principle we
***
*** 

Re: [HACKERS] better atomics - v0.5

2014-08-20 Thread Andres Freund
Hi,

On 2014-08-20 12:43:05 +0300, Heikki Linnakangas wrote:
 Are you planning to continue working on this?

Yes, I am! I've recently been on holiday and now I'm busy with catching
up with everything that has happened since. I hope to have a next
version ready early next week.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-08-20 Thread Etsuro Fujita

Hi Noah,

Thank you for the review!

(2014/07/02 11:23), Noah Misch wrote:

On Fri, Jun 20, 2014 at 05:04:06PM +0900, Kyotaro HORIGUCHI wrote:

Attached is the rebased patch of v11 up to the current master.


I've been studying this patch.

SELECT FOR UPDATE on the inheritance parent fails with a can't-happen error
condition, even when SELECT FOR UPDATE on the child foreign table alone would
have succeeded.


To fix this, I've modified the planner and executor so that the planner 
adds wholerow as well as ctid and tableoid as resjunk output columns to 
the plan for an inheritance tree that contains foreign tables, and that
while the executor uses the ctid and tableoid in the EPQ processing for 
child regular tables as before, the executor uses the wholerow and 
tableoid for child foreign tables.  Patch attached.  This is based on 
the patch [1].



The patch adds zero tests.  Add principal tests to postgres_fdw.sql.  Also,
create a child foreign table in foreign_data.sql; this will make dump/reload
tests of the regression database exercise an inheritance tree that includes a
foreign table.


Done.  I used your tests as reference.  Thanks!


The inheritance section of ddl.sgml should mention child foreign tables, at
least briefly.  Consider mentioning it in the partitioning section, too.


Done.


Your chosen ANALYZE behavior is fair, but the messaging from a database-wide
ANALYZE VERBOSE needs work:

INFO:  analyzing test_foreign_inherit.parent
INFO:  parent: scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 
1 rows in sample, 1 estimated total rows
INFO:  analyzing test_foreign_inherit.parent inheritance tree
WARNING:  relcache reference leak: relation child not closed
WARNING:  relcache reference leak: relation tchild not closed
WARNING:  relcache reference leak: relation parent not closed

Please arrange to omit the 'analyzing tablename inheritance tree' message,
since this ANALYZE actually skipped that task.  (The warnings obviously need a
fix, too.)  I do find it awkward that adding a foreign table to an inheritance
tree will make autovacuum stop collecting statistics on that inheritance tree,
but I can't think of a better policy.


I think it would be better that this is handled in the same way as an 
inheritance tree that turns out to be a singe table that doesn't have 
any descendants in acquire_inherited_sample_rows().  That would still 
output the message as shown below, but I think that that would be more 
consistent with the existing code.  The patch works so.  (The warnings 
are also fixed.)


INFO:  analyzing public.parent
INFO:  parent: scanned 0 of 0 pages, containing 0 live rows and 0 dead 
rows; 0 rows in sample, 0 estimated total rows

INFO:  analyzing public.parent inheritance tree
INFO:  analyzing pg_catalog.pg_authid
INFO:  pg_authid: scanned 1 of 1 pages, containing 1 live rows and 0 
dead rows; 1 rows in sample, 1 estimated total rows



The rest of these review comments are strictly observations; they're not
requests for you to change the patch, but they may deserve more discussion.


I'd like to give my opinions on those things later on.

Sorry for the long delay.

[1] http://www.postgresql.org/message-id/53f4707c.8030...@lab.ntt.co.jp

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 115,120  static void fileGetForeignRelSize(PlannerInfo *root,
--- 115,125 
  static void fileGetForeignPaths(PlannerInfo *root,
  	RelOptInfo *baserel,
  	Oid foreigntableid);
+ static ForeignPath *fileReparameterizeForeignPath(PlannerInfo *root,
+   RelOptInfo *baserel,
+   Oid foreigntableid,
+   ForeignPath *path,
+   Relids required_outer);
  static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
     RelOptInfo *baserel,
     Oid foreigntableid,
***
*** 143,149  static bool check_selective_binary_conversion(RelOptInfo *baserel,
  static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
  			  FileFdwPlanState *fdw_private);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
! 			   FileFdwPlanState *fdw_private,
  			   Cost *startup_cost, Cost *total_cost);
  static int file_acquire_sample_rows(Relation onerel, int elevel,
  		 HeapTuple *rows, int targrows,
--- 148,154 
  static void estimate_size(PlannerInfo *root, RelOptInfo *baserel,
  			  FileFdwPlanState *fdw_private);
  static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
! 			   FileFdwPlanState *fdw_private, List *join_conds,
  			   Cost *startup_cost, Cost *total_cost);
  static int file_acquire_sample_rows(Relation onerel, int elevel,
  		 HeapTuple *rows, int targrows,
***
*** 161,166  file_fdw_handler(PG_FUNCTION_ARGS)
--- 166,172 
  
  	fdwroutine-GetForeignRelSize = fileGetForeignRelSize;
  	fdwroutine-GetForeignPaths = fileGetForeignPaths;
+ 	fdwroutine-ReparameterizeForeignPath = 

Re: [HACKERS] August commitfest

2014-08-20 Thread Etsuro Fujita

Hi Heikki,

(2014/08/20 17:50), Heikki Linnakangas wrote:

I have cleaned up the Commitfest of patches that have been in Waiting
for Author state for weeks or more. I believe the list is now an
accurate representation of the actual state of the patches.


Thank you for the work!

I chaged the state of the following patch from Returned with Feedback 
to Needs Review.


https://commitfest.postgresql.org/action/patch_view?id=1386


Patch authors
-



Also, please pick a patch or two, submitted by other people, and review
them.


Will do.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-08-20 Thread Jeevan Chalke
Hi,

I have reviewed this:

I have initialize cur_lineno to UINTMAX - 2. And then observed following
behaviour to check wrap-around.

postgres=# \set PROMPT1 '%/[%l]%R%# '
postgres[18446744073709551613]=# \set PROMPT2 '%/[%l]%R%# '
postgres[18446744073709551613]=# select
postgres[18446744073709551614]-# a
postgres[18446744073709551615]-# ,
postgres[0]-# b
postgres[1]-# from
postgres[2]-# dual;

It is wrapping to 0, where as line number always start with 1. Any issues?

I would like to ignore this as UINTMAX lines are too much for a input
buffer to hold. It is almost NIL chances to hit this.


However, I think you need to use UINT64_FORMAT while printing uint64
number. Currently it is %u which wrap-around at UINT_MAX.
See how pset.lineno is displayed.

Apart from this, I didn't see any issues in my testing.

Patch applies cleanly. make/make install/initdb/make check all are well.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-08-20 Thread Heikki Linnakangas

On 07/20/2014 07:17 PM, Tomas Vondra wrote:

On 19.7.2014 20:24, Tomas Vondra wrote:

On 13.7.2014 21:32, Tomas Vondra wrote:

The current patch only implemnents this for tuples in the main
hash table, not for skew buckets. I plan to do that, but it will
require separate chunks for each skew bucket (so we can remove it
without messing with all of them). The chunks for skew buckets
should be smaller (I'm thinking about ~4kB), because there'll be
more of them, but OTOH those buckets are for frequent values so the
chunks should not remain empty.


I've looked into extending the dense allocation to the skew buckets,
and I think we shouldn't do that. I got about 50% of the changes and
then just threw it out because it turned out quite pointless.

The amount of memory for skew buckets is limited to 2% of work mem,
so even with 100% overhead it'll use ~4% of work mem. So there's
pretty much nothing to gain here. So the additional complexity
introduced by the dense allocation seems pretty pointless.

I'm not entirely happy with the code allocating some memory densely
and some using traditional palloc, but it certainly seems cleaner
than the code I had.

So I think the patch is mostly OK as is.


Attached is v4 of the patch, mostly with minor improvements and cleanups
(removed MemoryContextStats, code related to skew buckets).


Can you remind me what kind of queries this is supposed to help with? 
Could you do some performance testing to demonstrate the effect? And 
also a worst-case scenario.


At a quick glance, I think you're missing a fairly obvious trick in 
ExecHashIncreaseNumBatches: if a chunk contains only a single tuple, you 
can avoid copying it to a new chunk and just link the old chunk to the 
new list instead. Not sure if ExecHashIncreaseNumBatches is called often 
enough for that to be noticeable, but at least it should help in extreme 
cases where you push around huge  100MB tuples.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-20 Thread Greg Stark
On Sat, Aug 9, 2014 at 2:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If the Debian guidelines think that only SO major version need
 be considered, they're wrong, at least for the way we've been treating
 that.

The Debian approach is that you should have precisely one installed
copy of a library for each soname. I guess there's no particular
reason you can't have multiple versions in the repository (possiby
built by different source packages) but users will only be able to
install one of them. This follows from there being a single /usr/lib
and ldconfig picking a single version for each soname.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-20 Thread Heikki Linnakangas
I think this has had enough review for a WIP patch. I'm marking this as 
Returned with Feedback in the commitfest because:


* should use LSNs instead of a md5
* this doesn't do anything useful on its own, hence would need to see 
the whole solution before committing
* not clear how this would be extended if you wanted to do more 
fine-grained than file-level diffs.


But please feel free to continue discussing those items.

On 08/18/2014 03:04 AM, Marco Nenciarini wrote:

Hi Hackers,

This is the first piece of file level incremental backup support, as
described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup

It is not yet complete, but I wish to share it on the list to receive
comments and suggestions.

The point of the patch is adding an option to pg_basebackup and
replication protocol BASE_BACKUP command to generate a backup_profile file.

When taking a full backup with pg_basebackup, the user can request
Postgres to generate a backup_profile file through the --profile option
(-B short option, which I've arbitrarily picked up because both -P and
-p was already taken)

At the moment the backup profile consists of a file with one line per
file detailing modification time, md5, size, tablespace and path
relative to tablespace root (PGDATA or the tablespace)

To calculate the md5 checksum I've used the md5 code present in pgcrypto
contrib as the code in src/include/libpq/md5.h is not suitable for large
files. Since a core feature cannot depend on a piece of contrib, I've
moved the files

contrib/pgcrypto/md5.c
contrib/pgcrypto/md5.h

to

src/backend/utils/hash/md5.c
src/include/utils/md5.h

changing the pgcrypto extension to use them.

There are still some TODOs:

* User documentation

* Remove the pg_basebackup code duplication I've introduced with the
ReceiveAndUnpackTarFileToDir function, which is almost the same of
ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
instead simply extract a tar stream in a destination directory. The
latter could probably be rewritten using the former as component, but it
needs some adjustment to the progress reporting part, which is not
present at the moment in ReceiveAndUnpackTarFileToDir.

* Add header section to backup_profile file which at the moment contains
only the body part. I'm thinking to change the original design and put
the whole backup label as header, which is IMHO clearer and well known.
I would use something like:

START WAL LOCATION: 0/E28 (file 0001000E)
CHECKPOINT LOCATION: 0/E60
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2014-08-14 18:54:01 CEST
LABEL: pg_basebackup base backup
END LABEL

I've attached the current patch based on master.

Any comment will be appreciated.

Regards,
Marco




--
- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Greg Stark
On Tue, Aug 19, 2014 at 10:42 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Is there a way to create a link to a file which only exists as an open
 file descriptor?   If there was, you could create a temp file, open an
 fd, then delete the file.  That would remove the issue with files being
 leaked due to failures of various kinds.


Sort of. On recent Linuxen you can create a file with open(O_TMPFILE)
then use linkat(2) to create a link for it only once it's fully
written.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Greg Stark
c.f.:

   O_TMPFILE (since Linux 3.11)
  Create an unnamed temporary file.  The pathname argument
  specifies a directory; an unnamed inode will be created in
  that directory's filesystem.  Anything written to the
  resulting file will be lost when the last file descriptor is
  closed, unless the file is given a name.

  O_TMPFILE must be specified with one of O_RDWR or O_WRONLY
  and, optionally, O_EXCL.  If O_EXCL is not specified, then
  linkat(2) can be used to link the temporary file into the
  filesystem, making it permanent, using code like the
  following:

  char path[PATH_MAX];
  fd = open(/path/to/dir, O_TMPFILE | O_RDWR,
  S_IRUSR | S_IWUSR);

  /* File I/O on 'fd'... */

  snprintf(path, PATH_MAX,  /proc/self/fd/%d, fd);
  linkat(AT_FDCWD, path, AT_FDCWD, /path/for/file,
  AT_SYMLINK_FOLLOW);

  In this case, the open() mode argument determines the file
  permission mode, as with O_CREAT.

  Specifying O_EXCL in conjunction with O_TMPFILE prevents a
  temporary file from being linked into the filesystem in the
  above manner.  (Note that the meaning of O_EXCL in this case
  is different from the meaning of O_EXCL otherwise.)

  There are two main use cases for O_TMPFILE:

  *  Improved tmpfile(3) functionality: race-free creation of
 temporary files that (1) are automatically deleted when
 closed; (2) can never be reached via any pathname; (3) are
 not subject to symlink attacks; and (4) do not require the
 caller to devise unique names.

  *  Creating a file that is initially invisible, which is then
 populated with data and adjusted to have appropriate
 filesystem attributes (chown(2), chmod(2), fsetxattr(2),
 etc.)  before being atomically linked into the filesystem
 in a fully formed state (using linkat(2) as described
 above).

  O_TMPFILE requires support by the underlying filesystem; only
  a subset of Linux filesystems provide that support.  In the
  initial implementation, support was provided in the ext2,
  ext3, ext4, UDF, Minix, and shmem filesystems.  XFS support
  was added in Linux 3.15.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-08-20 Thread Tomas Vondra
On 20 Srpen 2014, 14:05, Heikki Linnakangas wrote:
 On 07/20/2014 07:17 PM, Tomas Vondra wrote:
 On 19.7.2014 20:24, Tomas Vondra wrote:
 On 13.7.2014 21:32, Tomas Vondra wrote:
 The current patch only implemnents this for tuples in the main
 hash table, not for skew buckets. I plan to do that, but it will
 require separate chunks for each skew bucket (so we can remove it
 without messing with all of them). The chunks for skew buckets
 should be smaller (I'm thinking about ~4kB), because there'll be
 more of them, but OTOH those buckets are for frequent values so the
 chunks should not remain empty.

 I've looked into extending the dense allocation to the skew buckets,
 and I think we shouldn't do that. I got about 50% of the changes and
 then just threw it out because it turned out quite pointless.

 The amount of memory for skew buckets is limited to 2% of work mem,
 so even with 100% overhead it'll use ~4% of work mem. So there's
 pretty much nothing to gain here. So the additional complexity
 introduced by the dense allocation seems pretty pointless.

 I'm not entirely happy with the code allocating some memory densely
 and some using traditional palloc, but it certainly seems cleaner
 than the code I had.

 So I think the patch is mostly OK as is.

 Attached is v4 of the patch, mostly with minor improvements and cleanups
 (removed MemoryContextStats, code related to skew buckets).

 Can you remind me what kind of queries this is supposed to help with?
 Could you do some performance testing to demonstrate the effect? And
 also a worst-case scenario.

The dense allocation? That was not really directed at any specific
queries, it was about lowering hashjoin memory requirements in general.

First to make it more correct with respect to work_mem (hashjoin does not
account for the palloc overhead, so the actual memory consumption might be
much higher than work_mem).

Second to compensate for the memory for more buckets due to
NTUP_PER_BUCKET, which is tweaked in the 'hashjoin dynamic nbuckets'
patch.

There are some numbers / more detailed analysis in
http://www.postgresql.org/message-id/a17d6217fe0c9e459cb45cb764ad727c.squir...@sq.gransy.com


 At a quick glance, I think you're missing a fairly obvious trick in
 ExecHashIncreaseNumBatches: if a chunk contains only a single tuple, you
 can avoid copying it to a new chunk and just link the old chunk to the
 new list instead. Not sure if ExecHashIncreaseNumBatches is called often
 enough for that to be noticeable, but at least it should help in extreme
 cases where you push around huge  100MB tuples.

Yeah, I thought about that too, but it seemed like a rare corner case. But
maybe you're right - it will also eliminate the 'fluctuation' (allocating
100MB chunk, which might get us over work_mem, ...). Not sure if this is
going to help, but it's easy enough to fix I guess.

The other optimization I'm thinking about is that when increasing number
of batches, the expectation is only about 1/2 the chunks will be
necessary. So instead of freeing the chunk, we might keep it and reuse it
later. That might lower the palloc overhead a bit (but it's already very
low, so I don't think that's measurable difference).

regards
Tomas





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Alvaro Herrera
Greg Stark wrote:

   char path[PATH_MAX];
   fd = open(/path/to/dir, O_TMPFILE | O_RDWR,
   S_IRUSR | S_IWUSR);
 
   /* File I/O on 'fd'... */
 
   snprintf(path, PATH_MAX,  /proc/self/fd/%d, fd);
   linkat(AT_FDCWD, path, AT_FDCWD, /path/for/file,
   AT_SYMLINK_FOLLOW);

Hmm, the real trick here is linkat(... /proc/self/foobar), not the
O_TMPFILE: you can have an open file descriptor to an invisible file
simply by creating a normal file and unlinking it.  I looked at linkat()
yesterday but the idea of using /proc/self didn't occur to me.  Nasty
trick :-(  It seems linkat() is quite a bit more portable than
O_TMPFILE, fortunately ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Andres Freund
On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
 MauMau wrote:
 
  With that said, copying to a temporary file like dest.tmp and
  renaming it to dest sounds worthwhile even as a basic copy
  utility.  I want to avoid copying to a temporary file with a fixed
  name like _copy.tmp, because some advanced utility may want to run
  multiple instances of pg_copy to copy several files into the same
  directory simultaneously.  However, I'm afraid multiple dest.tmp
  files might continue to occupy disk space after canceling copy or
  power failure in some use cases, where the copy of the same file
  won't be retried.  That's also the reason why I chose to not use a
  temporary file like cp/copy.
 
 Is there a way to create a link to a file which only exists as an open
 file descriptor?   If there was, you could create a temp file, open an
 fd, then delete the file.  That would remove the issue with files being
 leaked due to failures of various kinds.

Isn't this a solution looking for a problem? We're using tempfiles in
dozens of other places and I really don't see why this is the place to
stop doing so. Just copy to dest.tmp and move it into place. If things
crash during that, the command will be repeated shortly afterwards again
*anyway*. Let's not get into platform specific games here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
  MauMau wrote:
  
   With that said, copying to a temporary file like dest.tmp and
   renaming it to dest sounds worthwhile even as a basic copy
   utility.  I want to avoid copying to a temporary file with a fixed
   name like _copy.tmp, because some advanced utility may want to run
   multiple instances of pg_copy to copy several files into the same
   directory simultaneously.  However, I'm afraid multiple dest.tmp
   files might continue to occupy disk space after canceling copy or
   power failure in some use cases, where the copy of the same file
   won't be retried.  That's also the reason why I chose to not use a
   temporary file like cp/copy.
  
  Is there a way to create a link to a file which only exists as an open
  file descriptor?   If there was, you could create a temp file, open an
  fd, then delete the file.  That would remove the issue with files being
  leaked due to failures of various kinds.
 
 Isn't this a solution looking for a problem? We're using tempfiles in
 dozens of other places and I really don't see why this is the place to
 stop doing so. Just copy to dest.tmp and move it into place. If things
 crash during that, the command will be repeated shortly afterwards again
 *anyway*. Let's not get into platform specific games here.

The issue is what happens if there's a crash while the temp file is in
the middle of being filled.  I agree there are bigger problems to solve
in this area though.  Yes, I agree that a fixed name such as _copy.tmp
is a really bad choice for several reasons.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Greg Stark
On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Hmm, the real trick here is linkat(... /proc/self/foobar), not the
 O_TMPFILE: you can have an open file descriptor to an invisible file
 simply by creating a normal file and unlinking it.  I looked at linkat()
 yesterday but the idea of using /proc/self didn't occur to me.  Nasty
 trick :-(  It seems linkat() is quite a bit more portable than
 O_TMPFILE, fortunately ...

Supposedly linkat(2) on Linux refuses to create a link to a file that
was opened normally and had its last link removed with unlink(2) due
to concerns that this would create security holes.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Andres Freund
On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:
   MauMau wrote:
   
With that said, copying to a temporary file like dest.tmp and
renaming it to dest sounds worthwhile even as a basic copy
utility.  I want to avoid copying to a temporary file with a fixed
name like _copy.tmp, because some advanced utility may want to run
multiple instances of pg_copy to copy several files into the same
directory simultaneously.  However, I'm afraid multiple dest.tmp
files might continue to occupy disk space after canceling copy or
power failure in some use cases, where the copy of the same file
won't be retried.  That's also the reason why I chose to not use a
temporary file like cp/copy.
   
   Is there a way to create a link to a file which only exists as an open
   file descriptor?   If there was, you could create a temp file, open an
   fd, then delete the file.  That would remove the issue with files being
   leaked due to failures of various kinds.
  
  Isn't this a solution looking for a problem? We're using tempfiles in
  dozens of other places and I really don't see why this is the place to
  stop doing so. Just copy to dest.tmp and move it into place. If things
  crash during that, the command will be repeated shortly afterwards again
  *anyway*. Let's not get into platform specific games here.
 
 The issue is what happens if there's a crash while the temp file is in
 the middle of being filled.

The archive command will be be run again a couple seconds and remove the
half-filled temp file.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Alvaro Herrera
Greg Stark wrote:
 On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Hmm, the real trick here is linkat(... /proc/self/foobar), not the
  O_TMPFILE: you can have an open file descriptor to an invisible file
  simply by creating a normal file and unlinking it.  I looked at linkat()
  yesterday but the idea of using /proc/self didn't occur to me.  Nasty
  trick :-(  It seems linkat() is quite a bit more portable than
  O_TMPFILE, fortunately ...
 
 Supposedly linkat(2) on Linux refuses to create a link to a file that
 was opened normally and had its last link removed with unlink(2) due
 to concerns that this would create security holes.

Sigh.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
 Andres Freund wrote:
 Isn't this a solution looking for a problem? We're using tempfiles in
 dozens of other places and I really don't see why this is the place to
 stop doing so. Just copy to dest.tmp and move it into place.

 The issue is what happens if there's a crash while the temp file is in
 the middle of being filled.

 The archive command will be be run again a couple seconds and remove the
 half-filled temp file.

Alternatively, you could use the process PID as part of the temp file
name; which is probably a good idea anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Andres Freund
On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:
  Andres Freund wrote:
  Isn't this a solution looking for a problem? We're using tempfiles in
  dozens of other places and I really don't see why this is the place to
  stop doing so. Just copy to dest.tmp and move it into place.
 
  The issue is what happens if there's a crash while the temp file is in
  the middle of being filled.
 
  The archive command will be be run again a couple seconds and remove the
  half-filled temp file.
 
 Alternatively, you could use the process PID as part of the temp file
 name; which is probably a good idea anyway.

I think that's actually worse, because nothing will clean up those
unless you explicitly scan for all whatever.$pid files, and somehow
kill them.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-08-20 Thread Amit Kapila
On Wed, Aug 20, 2014 at 12:12 PM, Dilip kumar dilip.ku...@huawei.com
wrote:

 I have reviewed the patch and did not find any major comments.

Thanks for the review.

 There are some comments I would like to share with you



 1.  Rebase the patch to current GIT head.

Done.


 2.  +  * Construct symlink file

 +  */

 +  initStringInfo(symlinkfbuf);

 I think declaration and initialization of symlinkfbuf string
can be moved under #ifdef WIN32 compile time macro,

 for other platform it’s simply allocated and freed which can be avoided.

Agreed, I have changed the patch as per your suggestion.


 3.  +  /*

 +  * native windows utilites are not able
create symlinks while

 +  * extracting files from tar.

 +  */



 Rephrase the above sentence and fix spelling mistake
 (utilities are not able to create)

Done.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_basebackup_to_include_symlink_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
 Alternatively, you could use the process PID as part of the temp file
 name; which is probably a good idea anyway.

 I think that's actually worse, because nothing will clean up those
 unless you explicitly scan for all whatever.$pid files, and somehow
 kill them.

True.  As long as the copy command is prepared to get rid of a
pre-existing target file, using a fixed .tmp extension should be fine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 08/15/2014 04:19 PM, Tom Lane wrote:
 Personally I'd prefer to go to the all-lengths approach, but a large
 part of that comes from a subjective assessment that the hybrid approach
 is too messy.  Others might well disagree.

 ... So, that extraction test is about 1% *slower* than the basic Tom Lane
 lengths-only patch, and still 80% slower than original JSONB.  And it's
 the same size as the lengths-only version.

Since it's looking like this might be the direction we want to go, I took
the time to flesh out my proof-of-concept patch.  The attached version
takes care of cosmetic issues (like fixing the comments), and includes
code to avoid O(N^2) penalties in findJsonbValueFromContainer and
JsonbIteratorNext.  I'm not sure whether those changes will help
noticeably on Josh's test case; for me, they seemed worth making, but
they do not bring the code back to full speed parity with the all-offsets
version.  But as we've been discussing, it seems likely that those costs
would be swamped by compression and I/O considerations in most scenarios
with large documents; and of course for small documents it hardly matters.

Even if we don't go this way, there are parts of this patch that would
need to get committed.  I found for instance that convertJsonbArray and
convertJsonbObject have insufficient defenses against overflowing the
overall length field for the array or object.

For my own part, I'm satisfied with the patch as attached (modulo the
need to teach pg_upgrade about the incompatibility).  There remains the
question of whether to take this opportunity to add a version ID to the
binary format.  I'm not as excited about that idea as I originally was;
having now studied the code more carefully, I think that any expansion
would likely happen by adding more type codes and/or commandeering the
currently-unused high-order bit of JEntrys.  We don't need a version ID
in the header for that.  Moreover, if we did have such an ID, it would be
notationally painful to get it to most of the places that might need it.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*** jsonb_from_cstring(char *json, int len)
*** 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len  JENTRY_POSMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg(string too long to represent as jsonb string),
   errdetail(Due to an implementation restriction, jsonb strings cannot exceed %d bytes.,
! 		   JENTRY_POSMASK)));
  
  	return len;
  }
--- 196,207 
  static size_t
  checkStringLen(size_t len)
  {
! 	if (len  JENTRY_LENMASK)
  		ereport(ERROR,
  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
   errmsg(string too long to represent as jsonb string),
   errdetail(Due to an implementation restriction, jsonb strings cannot exceed %d bytes.,
! 		   JENTRY_LENMASK)));
  
  	return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..e47eaea 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***
*** 26,40 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
  			   JsonbValue *result);
! static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
  
! static void fillJsonbValue(JEntry *children, int index,
! 			   char *base_addr, uint32 offset,
  			   JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-20 Thread Thom Brown
On 17 August 2014 21:45, Fabrízio de Royes Mello fabriziome...@gmail.com
wrote:


 On Mon, Jul 28, 2014 at 2:24 PM, Fabrízio de Royes Mello 
 fabriziome...@gmail.com wrote:
 
 
  On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg c...@df7cb.de wrote:
  
   Re: Fabrízio de Royes Mello 2014-07-28
 CAFcNs+pctx4Q2UYsLOvVFWaznO3U0XhPpkMx5DRhR=jw8w3...@mail.gmail.com
There are something that should I do on this patch yet?
  
   I haven't got around to have a look at the newest incarnation yet, but
   I plan to do that soonish. (Of course that shouldn't stop others from
   doing that as well if they wish.)
  
 
  Thanks!
 

 Updated version.


Hi Fabrizio,

+  This form changes the table persistence type from unlogged to
permanent or
+  from unlogged to permanent (see xref
linkend=SQL-CREATETABLE-UNLOGGED).

Shouldn't this read unlogged to permanent or from permanent to unlogged?


ERROR:  table test is not permanent

Perhaps this would be better as table test is unlogged as permanent
doesn't match the term used in the DDL syntax.


Gave the patch a quick test-drive on a replicated instance, and it appears
to operate as described.
-- 
Thom


[HACKERS] Patching for increasing the number of columns

2014-08-20 Thread Mayeul Kauffmann

Hello,

I am trying to patch the server source to increase the number of columns 
above 1600. I'm not planning to commit this but as suggested elsewhere 
[1], someone might suggest a configure option based on this.
I came up with a patch which seems to work (see below), but 3 of the 136 
tests fail.


I understand some will question db design, but, as written elsewhere, 
What would be most helpful though is if the answer to this question 
stop being an attack on the business requirement analysis, database 
design skills, and/or sanity of the requester [1]


I based my attempts on these discussions:
http://www.postgresql.org/message-id/200512221633.jbmgxwm13...@candle.pha.pa.us
http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us
http://dba.stackexchange.com/questions/40137/in-postgresql-is-it-possible-to-change-the-maximum-number-of-columns-a-table-ca

I build this on Ubuntu 14.04, 64 bits. Bash session follows:

==
sudo apt-get install flex
sudo apt-get install bison build-essential
sudo apt-get install libreadline6-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libossp-uuid-dev

version=3_64  # change this if you want to build several versions of 
postgres in parallel

# see also MODIFY THIS TOO below

echo current version is  $version
mkdir -p ~/bin/postgresql_9.3.4
cd ~/bin/postgresql_9.3.4
wget ftp://ftp.postgresql.org/pub/source/v9.3.4/postgresql-9.3.4.tar.bz2
mkdir -p ~/bin/postgresql_9.3.4/patched_$version
tar -xvf postgresql-9.3.*.tar.bz2 -C ~/bin/postgresql_9.3.4/patched_$version
cd patched_$version/postgresql-9.3.*

# use kate (KDE) or your preferred text editor:
kate src/include/access/htup_details.h

# See: 
http://dba.stackexchange.com/questions/40137/in-postgresql-is-it-possible-to-change-the-maximum-number-of-columns-a-table-ca

# Replace this:
#define MaxTupleAttributeNumber 1664/* 8 * 208 */
# by this: (the '#' sign  'define' should be included)
#define MaxTupleAttributeNumber 6656/* 32 * 208 */
# or this:
#define MaxTupleAttributeNumber 13312/* 64 * 208 */

# Replace this:
#define MaxHeapAttributeNumber1600/* 8 * 200 */
# by this: (the '#' sign before 'define' should be included)
#define MaxHeapAttributeNumber6400/* 32 * 200 */
# or this:
#define MaxHeapAttributeNumber12800/* 64 * 200 */


# See: 
http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us 
suggests this:  uint16t_hoff;
# Replace this:  (in TWO PLACES)  (near lines 148 and lines 523. If you 
miss one, postgresql segfaults.)

  uint8 t_hoff; /* sizeof header incl. bitmap, padding */
# by this: (in TWO PLACES)
  uint32 t_hoff; /* sizeof header incl. bitmap, padding */
# or by this:(in TWO PLACES)
uint64 t_hoff; /* sizeof header incl. bitmap, padding */

# Save and close htup_details.h
# (TODO: write the above as a command-line patch)

./configure --with-blocksize=32 --prefix=/usr/local/pgsql_patched_$version

make
make check

# join ... FAILED
# select_views ... FAILED
# without_oid  ... FAILED
# 
#  3 of 136 tests failed. FIXME
# 
(not sure whether I can attach the log and diff of the test here).

I launched the server anyway and logged in with pgadmin3. I created a 
few tables with 2000 integer fields or so. Performed a few insert, 
select, update and join without any issue.
So at least basic join works. And in pgadmin3, the has OIDs porperties 
of tables I created is not checked.


Just to be sure, I performed again all the tests with 'make check' 
without any patch and without raising the blocksize  (configure option), 
and this time all the tests passed (NO failure).


Would anyone have some hint or advice?
Thank you!
Best regards,
Mayeul


[1] http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us

PS: and since it's my first post here: thank you all so much for this 
wonderful DBMS :-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch status: delta relations in AFTER triggers

2014-08-20 Thread David Fetter
On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote:
 Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
  What's the status of the delta relations in AFTER triggers
  patch? I saw you had a lively discussion with Amit in the last
  two weeks, but I couldn't tell if it's still Needs review or
  should be Waiting on Author ?
 
 The patch that David offered to use the tuplestores in C should 
 probably be updated to show both direct use based on what is in 
 TriggerData and SPI use.  I will leave that to David to submit;it 
 doesn't need to be in the same patch-set as the rest; it might, in 
 fact, be better to offer it separately once the API is solid (i.e., 
 after commit of the patch-sets I've just described.)

I'd be happy to.  Is there a repo I can refer to, or should I wait to
start until those patch sets are submitted to -hackers?

 Thanks for picking up the job of CF manager, BTW.  It's a dirty
 job, but somebody's got to do it.

Indeed, and thanks for wrangling this, Heikki.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patching for increasing the number of columns

2014-08-20 Thread Tom Lane
Mayeul Kauffmann mayeul.kauffm...@free.fr writes:
 I am trying to patch the server source to increase the number of columns 
 above 1600. I'm not planning to commit this but as suggested elsewhere 
 [1], someone might suggest a configure option based on this.
 I came up with a patch which seems to work (see below), but 3 of the 136 
 tests fail.

You would have to show us the actual failure diffs to get much useful
comment, but in general increasing the size of tuple headers could
easily lead to changes in plan choices, which would affect output
row ordering (not to mention EXPLAIN results).  This is particularly
the case if you're testing on a 64-bit machine, since the maxalign'd
size of the header would go from 24 to 32 bytes ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-20 Thread Fabrízio de Royes Mello
On Wed, Aug 20, 2014 at 12:35 PM, Thom Brown t...@linux.com wrote:

 Hi Fabrizio,

 +  This form changes the table persistence type from unlogged to
permanent or
 +  from unlogged to permanent (see xref
linkend=SQL-CREATETABLE-UNLOGGED).

 Shouldn't this read unlogged to permanent or from permanent to unlogged?


Fixed.


 ERROR:  table test is not permanent

 Perhaps this would be better as table test is unlogged as permanent
doesn't match the term used in the DDL syntax.


Fixed.


 Gave the patch a quick test-drive on a replicated instance, and it
appears to operate as described.


Thanks for your review.

Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..397b4e6 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 ENABLE ALWAYS RULE replaceable class=PARAMETERrewrite_rule_name/replaceable
 CLUSTER ON replaceable class=PARAMETERindex_name/replaceable
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
 INHERIT replaceable class=PARAMETERparent_table/replaceable
 NO INHERIT replaceable class=PARAMETERparent_table/replaceable
 OF replaceable class=PARAMETERtype_name/replaceable
 NOT OF
 OWNER TO replaceable class=PARAMETERnew_owner/replaceable
-SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 REPLICA IDENTITY {DEFAULT | USING INDEX replaceable class=PARAMETERindex_name/replaceable | FULL | NOTHING}
 
 phraseand replaceable class=PARAMETERtable_constraint_using_index/replaceable is:/phrase
@@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
/varlistentry
 
varlistentry
+termliteralSET {LOGGED | UNLOGGED}/literal/term
+listitem
+ para
+  This form changes the table persistence type from unlogged to permanent or
+  from permanent to unlogged (see xref linkend=SQL-CREATETABLE-UNLOGGED).
+ /para
+ para
+  Changing the table persistence type acquires an literalACCESS EXCLUSIVE/literal lock
+  and rewrites the table contents and associated indexes into new disk files.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termliteralSET WITH OIDS/literal/term
 listitem
  para
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b1c411a..7f497c7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	heap_close(OldHeap, NoLock);
 
 	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace, false,
+	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   OldHeap-rd_rel-relpersistence,
 			   AccessExclusiveLock);
 
 	/* Copy the heap data into the new table in the desired order */
@@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 			  LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
@@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	Datum		reloptions;
 	bool		isNull;
 	Oid			namespaceid;
-	char		relpersistence;
 
 	OldHeap = heap_open(OIDOldHeap, lockmode);
 	OldHeapDesc = RelationGetDescr(OldHeap);
@@ -636,16 +636,10 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	if (isNull)
 		reloptions = (Datum) 0;
 
-	if (forcetemp)
-	{
+	if (relpersistence == RELPERSISTENCE_TEMP)
 		namespaceid = LookupCreationNamespace(pg_temp);
-		relpersistence = RELPERSISTENCE_TEMP;
-	}
 	else
-	{
 		namespaceid = RelationGetNamespace(OldHeap);
-		relpersistence = OldHeap-rd_rel-relpersistence;
-	}
 
 	/*
 	 * Create the new heap, using a temporary name in the same namespace as
@@ -1146,6 +1140,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 	Oid			relfilenode1,
 relfilenode2;
 	Oid			swaptemp;
+	char		swaprelpersistence;
 	CatalogIndexState indstate;
 
 	/* We need writable copies of both pg_class tuples. */
@@ -1177,6 +1172,10 @@ swap_relation_files(Oid r1, Oid r2, bool 

Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Robert Haas
On Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-JJ]
kbak...@its.jnj.com wrote:
 My proof of concept code (steps a though e below) avoided any reading or 
 writing to the pipe (and associated handling of SIGPIPE), it just relied on 
 postmaster open of PIPE with ENXIO to indicate all is clear.

I'm not following.

 Robert, Assuming an algorithm choice is agreed upon in the near future, would 
 you be the logical choice to implement the change?
 I am happy to help, especially with any QNX-specific aspects, but don't want 
 to step on anyone's toes.

I'm unlikely to have time to work on this in the immediate future, but
I may be able to help review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patching for increasing the number of columns

2014-08-20 Thread Mayeul Kauffmann

Tom wrote:
 You would have to show us the actual failure diffs to get much useful 
comment, but in general increasing the size of tuple headers could 
easily lead to

 changes in plan choices

Thank you Tom. So there is some hope! In effect the query plan is 
different for the join and the view tests. The result set is different 
only for the 'without_oid' test.
A side question: Are these tests comprehensive, or should I run other 
tests just to be sure? Hints on where to find those tests are welcome.

Thanks!
(diff below)
Mayeul
*** 
~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/expected/join.out 
2014-03-17 19:35:47.0 +
--- 
~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/results/join.out 
2014-08-20 15:40:56.248603754 +0100

***
*** 2791,2814 
join int4_tbl i1 on b.thousand = f1
right join int4_tbl i2 on i2.f1 = b.tenthous
order by 1;
!QUERY PLAN
! 
-

   Sort
 Sort Key: b.unique1
!-  Nested Loop Left Join
!  -  Seq Scan on int4_tbl i2
   -  Nested Loop Left Join
 Join Filter: (b.unique1 = 42)
 -  Nested Loop
   -  Nested Loop
 -  Seq Scan on int4_tbl i1
!-  Index Scan using tenk1_thous_tenthous 
on tenk1 b
!  Index Cond: ((thousand = i1.f1) AND 
(i2.f1 = tenthous))

   -  Index Scan using tenk1_unique1 on tenk1 a
 Index Cond: (unique1 = b.unique2)
 -  Index Only Scan using tenk1_thous_tenthous on tenk1 c
   Index Cond: (thousand = a.thousand)
! (15 rows)

  select b.unique1 from
tenk1 a join tenk1 b on a.unique1 = b.unique2
--- 2791,2818 
join int4_tbl i1 on b.thousand = f1
right join int4_tbl i2 on i2.f1 = b.tenthous
order by 1;
!   QUERY PLAN
! 
---

   Sort
 Sort Key: b.unique1
!-  Hash Right Join
!  Hash Cond: (b.tenthous = i2.f1)
   -  Nested Loop Left Join
 Join Filter: (b.unique1 = 42)
 -  Nested Loop
   -  Nested Loop
 -  Seq Scan on int4_tbl i1
!-  Bitmap Heap Scan on tenk1 b
!  Recheck Cond: (thousand = i1.f1)
!  -  Bitmap Index Scan on 
tenk1_thous_tenthous

!Index Cond: (thousand = i1.f1)
   -  Index Scan using tenk1_unique1 on tenk1 a
 Index Cond: (unique1 = b.unique2)
 -  Index Only Scan using tenk1_thous_tenthous on tenk1 c
   Index Cond: (thousand = a.thousand)
!  -  Hash
!-  Seq Scan on int4_tbl i2
! (19 rows)

  select b.unique1 from
tenk1 a join tenk1 b on a.unique1 = b.unique2

==

*** 
~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/expected/select_views_1.out 
2014-03-17 19:35:47.0 +
--- 
~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/results/select_views.out 
2014-08-20 15:41:01.212603532 +0100

***
*** 1413,1423 
 WHERE f_leak(cnum) AND ymd = '2011-10-01' AND ymd  '2011-11-01';
QUERY PLAN
--
!  Nested Loop
!Join Filter: (l.cid = r.cid)
 -  Seq Scan on credit_usage r
   Filter: ((ymd = '10-01-2011'::date) AND (ymd  
'11-01-2011'::date))

!-  Materialize
   -  Subquery Scan on l
 Filter: f_leak(l.cnum)
 -  Hash Join
--- 1413,1423 
 WHERE f_leak(cnum) AND ymd = '2011-10-01' AND ymd  '2011-11-01';
QUERY PLAN
--
!  Hash Join
!Hash Cond: (r.cid = l.cid)
 -  Seq Scan on credit_usage r
   Filter: ((ymd = '10-01-2011'::date) AND (ymd  
'11-01-2011'::date))

!-  Hash
   -  Subquery Scan on l
 Filter: f_leak(l.cnum)
 -  Hash Join
***
*** 1446,1456 

   Subquery Scan on my_credit_card_usage_secure
 Filter: f_leak(my_credit_card_usage_secure.cnum)
!-  Nested Loop
!  Join Filter: (l.cid = r.cid)
   -  Seq Scan on credit_usage r
 Filter: ((ymd = '10-01-2011'::date) AND (ymd  
'11-01-2011'::date))

!  -  Materialize
 

Re: [HACKERS] Patch status: delta relations in AFTER triggers

2014-08-20 Thread Kevin Grittner
David Fetter da...@fetter.org wrote:
 On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote:

 The patch that David offered to use the tuplestores in C should
 probably be updated to show both direct use based on what is in
 TriggerData and SPI use.  I will leave that to David to submit;it
 doesn't need to be in the same patch-set as the rest; it might, in
 fact, be better to offer it separately once the API is solid (i.e.,
 after commit of the patch-sets I've just described.)

 I'd be happy to.  Is there a repo I can refer to, or should I wait to
 start until those patch sets are submitted to -hackers?

It would be best to wait for the patches to be posted, or possibly
for them to be committed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch status: delta relations in AFTER triggers

2014-08-20 Thread David Fetter
On Wed, Aug 20, 2014 at 09:59:11AM -0700, Kevin Grittner wrote:
 David Fetter da...@fetter.org wrote:
  On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote:
 
  The patch that David offered to use the tuplestores in C should
  probably be updated to show both direct use based on what is in
  TriggerData and SPI use.  I will leave that to David to submit;it
  doesn't need to be in the same patch-set as the rest; it might,
  in fact, be better to offer it separately once the API is solid
  (i.e., after commit of the patch-sets I've just described.)
 
  I'd be happy to.  Is there a repo I can refer to, or should I wait
  to start until those patch sets are submitted to -hackers?
 
 It would be best to wait for the patches to be posted, or possibly
 for them to be committed.

Works for me. :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patching for increasing the number of columns

2014-08-20 Thread Tom Lane
Mayeul Kauffmann mayeul.kauffm...@free.fr writes:
 Tom wrote:
 You would have to show us the actual failure diffs to get much useful 
 comment, but in general increasing the size of tuple headers could 
 easily lead to changes in plan choices

 Thank you Tom. So there is some hope! In effect the query plan is 
 different for the join and the view tests. The result set is different 
 only for the 'without_oid' test.

Hm.  I think the without_oid test is not showing that anything is broken;
what it's testing is whether a table with oids is physically bigger (more
pages) than one without oids but the same data.  It's not implausible
that your change results in the same number of tuples fitting onto a page
in both cases.  It'd be worth doing the math to make sure that makes
sense.  Not sure if there's an easy way to change the table schema so that
you get different physical sizes in both cases.

The other tests aren't showing any functional issue either AFAICS.
The change away from a nestloop plan in join.out is a bit annoying,
because IIRC that test is specifically intended to check nestloop
parameter management; but that just means the test is brittle.

 A side question: Are these tests comprehensive, or should I run other 
 tests just to be sure? Hints on where to find those tests are welcome.

No, they're not comprehensive, and no, we don't have more :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] After switching primary server while using replication slot.

2014-08-20 Thread Robert Haas
On Tue, Aug 19, 2014 at 6:25 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 Hi all,
 After switching primary serer while using repliaction slot, the
 standby server will not able to connect new primary server.
 Imagine this situation, if primary server has two ASYNC standby
 servers, also use each replication slots.
 And the one standby(A) apply WAL without problems. But another one
 standby(B) has stopped after connected to primary server.
 (or sending WAL is too delayed)

 In this situation, the standby(B) has not received WAL segment file
 while stopping itself.
 And the primary server can not remove WAL segments which has not been
 received to all standby.
 Therefore the primary server have to keep the WAL segment file which
 has not been received to all standby.
 But standby(A) can do checkpoint itself, and then it's possible to
 recycle WAL segments.
 The number of WAL segment of each server are different.
 ( The number of WAL files of standby(A) having smaller than primary server.)
 After the primary server is crashed, the standby(A) promote to primary,
 we can try to connect standby(B) to standby(A) as new standby server.
 But it will be failed because the standby(A) server might not have WAL
 segment files that standby(B) required.

 This sounds valid concern.

 To resolve this situation, I think that we should make master server
 to notify about removal of WAL segment to all standby servers.
 And the standby servers recycle WAL segments files base on that information.

 Thought?

 How does the server recycle WAL files after it's promoted from the
 standby to master?
 It does that as it likes? If yes, your approach would not be enough.

 The approach prevents unexpected removal of WAL files while the standby
 is running. But after the standby is promoted to master, it might recycle
 needed WAL files immediately. So another standby may still fail to retrieve
 the required WAL file after the promotion.

 ISTM that, in order to address this, we might need to log all the replication
 slot activities and replicate them to the standby. I'm not sure if this
 breaks the design of replication slot at all, though.

Yuck.

I believe that the reason why replication slots are not currently
replicated is because we had the idea that the standby could have
slots that don't exist on the master, for cascading replication.  I'm
not sure that works yet, but I think Andres definitely had it in mind
in the original design.

It seems to me that if every machine needs to keep not only the WAL it
requires for itself, but also the WAL that any of other machine in the
replication hierarchy might need, that's pretty much sucks.  Suppose
you have a master with 10 standbys, and each standby has 10 cascaded
standbys.  If one of those standbys goes down, do we really want all
100 other machines to keep copies of all the WAL?  That seems rather
unfortunate, since it's likely that only a few of those many standbys
are machines to which we would consider failing over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks

2014-08-20 Thread Tomas Vondra
Hi,

On 13.8.2014 19:17, Tomas Vondra wrote:
 On 13.8.2014 17:52, Tom Lane wrote:

 * I'm a bit dubious about testing -DRANDOMIZE_ALLOCATED_MEMORY in the
 same build as -DCLOBBER_CACHE_RECURSIVELY, because each of these is
 darned expensive and it's not clear you'd learn anything by running
 them both together.  I think you might be better advised to run two
 separate buildfarm critters with those two options, and thereby perhaps
 get turnaround in something less than 80 days.
 
 OK, I removed this for barnacle/addax/mite, let's see what's the impact.
 
 FWIW We have three other animals running with CLOBBER_CACHE_ALWAYS +
 RANDOMIZE_ALLOCATED_MEMORY, and it takes ~20h per branch. But maybe the
 price when combined with CLOBBER_CACHE_RECURSIVELY is much higher.
 
 * It'd likely be a good idea to take out the TestUpgrade and TestDecoding
 modules from the config too.  Otherwise, we won't be seeing barnacle's
 next report before 2015, judging from the runtime of the check step
 compared to some of the other slow buildfarm machines.  (I wonder whether
 there's an easy way to skip the installcheck step, as that's going to
 require a much longer run than it can possibly be worth too.)
 
 OK, I did this too.
 
 I stopped the already running test on addax and started the test on
 barnacle again. Let's see in a few days/weeks/months what is the result.

It seems to be running much faster (probably after removing the
randomization), and apparently it passed the create_view tests without
crashing, but then crashed at 'constraints' (again, because of OOM).

  PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used
  ExecutorState: 769654952 total in 103 blocks; 114984 free (296
chunks); 769539968 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

I suppose we don't expect 760MB ExecutorState here. Also, there's ~60MB
MessageContext.

It's still running, so I'm attaching the relevant part of log (again,
with MemoryContextStats for backends with VSS = 512MB .

FWIW, it's running against a844c29966d7c0cd6a457e9324f175349bb12df0.

regards
Tomas
[53f203cf.585e:35] LOG:  statement: INSERT INTO CHECK_TBL VALUES (1);
[53f203cf.5862:12] LOG:  statement: CREATE VIEW ro_view8 AS SELECT a, b FROM base_tbl ORDER BY a OFFSET 1;
[53f203cf.5862:13] LOG:  statement: CREATE VIEW ro_view9 AS SELECT a, b FROM base_tbl ORDER BY a LIMIT 1;
[53f203cf.5862:14] LOG:  statement: CREATE VIEW ro_view10 AS SELECT 1 AS a;
[53f203cf.5862:15] LOG:  statement: CREATE VIEW ro_view11 AS SELECT b1.a, b2.b FROM base_tbl b1, base_tbl b2;
[53f203cf.5862:16] LOG:  statement: CREATE VIEW ro_view12 AS SELECT * FROM generate_series(1, 10) AS g(a);
[53f203cf.5862:17] LOG:  statement: CREATE VIEW ro_view13 AS SELECT a, b FROM (SELECT * FROM base_tbl) AS t;
[53f203cf.5862:18] LOG:  statement: CREATE VIEW rw_view14 AS SELECT ctid, a, b FROM base_tbl;
TopMemoryContext: 61792 total in 9 blocks; 4264 free (12 chunks); 57528 used
  CFuncHash: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used
  TopTransactionContext: 8192 total in 1 blocks; 6208 free (23 chunks); 1984 used
SPI Exec: 50323456 total in 15 blocks; 2568088 free (79 chunks); 47755368 used
  CachedPlanSource: 3072 total in 2 blocks; 672 free (0 chunks); 2400 used
SPI Proc: 8192 total in 1 blocks; 8088 free (0 chunks); 104 used
  MessageContext: 4194304 total in 10 blocks; 1120840 free (130 chunks); 3073464 used
  Operator class cache: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used
  smgr relation table: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used
  PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used
  ExecutorState: 243261440 total in 38 blocks; 3436120 free (250 chunks); 239825320 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used
  CacheMemoryContext: 516096 total in 6 blocks; 260744 free (49 chunks); 255352 used
fkeys2p_i: 1024 total in 1 blocks; 144 free (0 chunks); 880 used
fkeys2_i: 1024 total in 1 blocks; 8 free (0 chunks); 1016 used
EventTriggerCache: 8192 total in 1 blocks; 8160 free (114 chunks); 32 used
  Event Trigger Cache: 8192 total in 1 blocks; 3712 free (0 chunks); 4480 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 8 free (0 chunks); 1016 used
pg_authid_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_authid_rolname_index: 1024 total in 1 blocks; 144 free (0 chunks); 880 used
pg_database_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-20 Thread Robert Haas
On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Being able to batch inner and outer relations in a matching way is
 certainly one of the reasons why hashjoin uses that particular scheme.
 There are other reasons, though - for example being able to answer 'Does
 this group belong to this batch?' quickly, and automatically update
 number of batches.

 I'm not saying the lookup is extremely costly, but I'd be very surprised
 if it was as cheap as modulo on a 32-bit integer. Not saying it's the
 dominant cost here, but memory bandwidth is quickly becoming one of the
 main bottlenecks.

Well, I think you're certainly right that a hash table lookup is more
expensive than modulo on a 32-bit integer; so much is obvious.  But if
the load factor is not too large, I think that it's not a *lot* more
expensive, so it could be worth it if it gives us other advantages.
As I see it, the advantage of Jeff's approach is that it doesn't
really matter whether our estimates are accurate or not.  We don't
have to decide at the beginning how many batches to do, and then
possibly end up using too much or too little memory per batch if we're
wrong; we can let the amount of memory actually used during execution
determine the number of batches.  That seems good.  Of course, a hash
join can increase the number of batches on the fly, but only by
doubling it, so you might go from 4 batches to 8 when 5 would really
have been enough.  And a hash join also can't *reduce* the number of
batches on the fly, which might matter a lot.  Getting the number of
batches right avoids I/O, which is a lot more expensive than CPU.

 But the situation here isn't comparable, because there's only one
 input stream.  I'm pretty sure we'll want to keep track of which
 transition states we've spilled due to lack of memory as opposed to
 those which were never present in the table at all, so that we can
 segregate the unprocessed tuples that pertain to spilled transition
 states from the ones that pertain to a group we haven't begun yet.

 Why would that be necessary or useful? I don't see a reason for tracking
 that / segregating the tuples.

Suppose there are going to be three groups: A, B, C.  Each is an
array_agg(), and they're big, so only of them will fit in work_mem at
a time.  However, we don't know that at the beginning, either because
we don't write the code to try or because we do write that code but
our cardinality estimates are way off; instead, we're under the
impression that all four will fit in work_mem.  So we start reading
tuples.  We see values for A and B, but we don't see any values for C
because those all occur later in the input.  Eventually, we run short
of memory and cut off creation of new groups.  Any tuples for C are
now going to get written to a tape from which we'll later reread them.
After a while, even that proves insufficient and we spill the
transition state for B to disk.  Any further tuples that show up for C
will need to be written to tape as well.  We continue processing and
finish group A.

Now it's time to do batch #2.  Presumably, we begin by reloading the
serialized transition state for group B.  To finish group B, we must
look at all the tuples that might possibly fall in that group.  If all
of the remaining tuples are on a single tape, we'll have to read all
the tuples in group B *and* all the tuples in group C; we'll
presumably rewrite the tuples that are not part of this batch onto a
new tape, which we'll then process in batch #3.  But if we took
advantage of the first pass through the input to put the tuples for
group B on one tape and the tuples for group C on another tape, we can
be much more efficient - just read the remaining tuples for group B,
not mixed with anything else, and then read a separate tape for group
C.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-20 Thread Robert Haas
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  1.
  +Number of parallel connections to perform the operation. This
  option will enable the vacuum
  +operation to run on parallel connections, at a time one table
  will
  be operated on one connection.
 
  a. How about describing w.r.t asynchronous connections
  instead of parallel connections?

 I don't think asynchronous is a good choice of word.

 Agreed.

Maybe simultaneous?

 Not sure. How about *concurrent* or *multiple*?

multiple isn't right, but we could say concurrent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Baker, Keith [OCDUS Non-JJ]
Robert and Tom,

Sorry for any confusion, I will try to clarify.

Here is progression of events as I recall them:
- My Initial QNX 6.5 port proposal lacked a robust replacement for the existing 
System V shared memory locking mechanism, a show stopper.
- Robert proposed a nice set of possible alternatives for locking (to enable an 
all POSIX shared memory solution for future platforms).
- Tom and Robert seemed to agree that a combination of file-based locking plus 
pipe-based locking should be a sufficiently robust on platforms without Sys V 
shared memory (e.g., QNX).
- I coded a proof-of-concept patch (fcntl + PIPE) which appeared to work on QNX 
(steps a through e).
- Robert countered with an 11 step algorithm (all in the postmaster)
- Tom suggested elimination of steps 5,6,7,8, and 11 (and swapping order 9 and 
10)

I was just taking a step back to ask what gaps existed in the proof-of-concept 
patch (steps a through e).
Is there a scenario it fails to cover, prompting the seemingly more complex 11 
step algorithm (which added writing data to the pipe and handling of SIGPIPE)?

I am willing to attempt coding of the set of changes for a QNX port (option for 
new locking and all POSIX shared memory, plus a few minor QNX-specific tweaks), 
provided you and Tom are satisfied that the show stoppers have been 
sufficiently addressed.

Please let me know if more discussion is required, or if it would be reasonable 
for me (or someone else of your choosing) to work on the coding effort (perhaps 
targeted for 9.5?)
If on the other hand it has been decided that a QNX port is not in the cards, I 
would like to know (I hope that is not the case given the progress made, but no 
point in wasting anyone's time).

Thanks again for your time, effort, patience, and coaching.

Keith Baker


 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Wednesday, August 20, 2014 12:26 PM
 To: Baker, Keith [OCDUS Non-JJ]
 Cc: Tom Lane; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
 
 On Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-JJ]
 kbak...@its.jnj.com wrote:
  My proof of concept code (steps a though e below) avoided any reading or
 writing to the pipe (and associated handling of SIGPIPE), it just relied on
 postmaster open of PIPE with ENXIO to indicate all is clear.
 
 I'm not following.
 
  Robert, Assuming an algorithm choice is agreed upon in the near future,
 would you be the logical choice to implement the change?
  I am happy to help, especially with any QNX-specific aspects, but don't
 want to step on anyone's toes.
 
 I'm unlikely to have time to work on this in the immediate future, but I may
 be able to help review.
 
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
 Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Josh Berkus
On 08/20/2014 08:29 AM, Tom Lane wrote:
 Since it's looking like this might be the direction we want to go, I took
 the time to flesh out my proof-of-concept patch.  The attached version
 takes care of cosmetic issues (like fixing the comments), and includes
 code to avoid O(N^2) penalties in findJsonbValueFromContainer and
 JsonbIteratorNext

OK, will test.

This means we need a beta3, no?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 This means we need a beta3, no?

If we change the on-disk format, I'd say so.  So we don't want to wait
around too long before deciding.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patching for increasing the number of columns

2014-08-20 Thread Mayeul Kauffmann


On 20/08/14 18:17, Tom Lane wrote:
Hm. I think the without_oid test is not showing that anything is broken; 


The other tests aren't showing any functional issue either AFAICS. 

Thanks a lot Tom! That's very helpful.
I have written more details and some basic SQL tests in the wiki of the 
application (LimeSurvey) which requires this:


http://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux

I will give update here or on that wiki (where most relevant) should I 
find issues while testing.


Cheers,

mayeulk



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-20 Thread Fabrízio de Royes Mello
On Wed, Aug 20, 2014 at 2:43 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 I had a look at this patch, and here are a couple of comments:
 1) Depending on how ArchiveEntry is called to register an object to
 dump, namespace may be NULL, but it is not the case
 namespace-dobj.name, so you could get the namespace name at the top
 of the function that have their verbose output improved with something
 like that:
 const char *namespace = tbinfo-dobj.namespace ?
tbinfo-dobj.namespace-dobj.name : NULL;
 And then simplify the message output as follows:
 if (namespace)
write_msg(blah \%s\.\%s\ blah, namespace, classname);
 else
write_msg(blah \%s\ blah, classname);
 You can as well safely remove the checks on namespace-dobj.name.

Ok

 2) I don't think that this is correct:
 -   ahlog(AH, 1, processing data
 for table \%s\\n,
 - te-tag);
 +   ahlog(AH, 1, processing data
 for table \%s\.\%s\\n,
 + AH-currSchema,
te-tag);
 There are some code paths where AH-currSchema is set to NULL, and I
 think that you should use te-namespace instead.

Yes, you are correct!


 3) Changing only this message is not enough. The following verbose
 messages need to be changed too for consistency:
 - pg_dump: creating $tag $object
 - pg_dump: setting owner and privileges for [blah]

 I have been pondering as well about doing similar modifications to the
 error message paths, but it did not seem worth it as this patch is
 aimed only for the verbose output. Btw, I have basically fixed those
 issues while doing the review, and finished with the attached patch.
 Fabrizio, is this new version fine for you?


Is fine to me.

I just change if (tbinfo-dobj.namespace != NULL) to if
(tbinfo-dobj.namespace).

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 3aebac8..07cc10e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -546,8 +546,13 @@ RestoreArchive(Archive *AHX)
 		/* Both schema and data objects might now have ownership/ACLs */
 		if ((te-reqs  (REQ_SCHEMA | REQ_DATA)) != 0)
 		{
-			ahlog(AH, 1, setting owner and privileges for %s %s\n,
-  te-desc, te-tag);
+			/* Show namespace if available */
+			if (te-namespace)
+ahlog(AH, 1, setting owner and privileges for %s \%s\.\%s\\n,
+	  te-desc, te-namespace, te-tag);
+			else
+ahlog(AH, 1, setting owner and privileges for %s \%s\\n,
+	  te-desc, te-tag);
 			_printTocEntry(AH, te, ropt, false, true);
 		}
 	}
@@ -621,7 +626,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 
 	if ((reqs  REQ_SCHEMA) != 0)		/* We want the schema */
 	{
-		ahlog(AH, 1, creating %s %s\n, te-desc, te-tag);
+		/* Show namespace if available */
+		if (te-namespace)
+			ahlog(AH, 1, creating %s \%s\.\%s\\n,
+  te-desc, te-namespace, te-tag);
+		else
+			ahlog(AH, 1, creating %s \%s\\n, te-desc, te-tag);
+
 
 		_printTocEntry(AH, te, ropt, false, false);
 		defnDumped = true;
@@ -713,8 +724,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te-namespace);
 
-	ahlog(AH, 1, processing data for table \%s\\n,
-		  te-tag);
+	/* Show namespace if available */
+	if (te-namespace)
+		ahlog(AH, 1, processing data for table \%s\.\%s\\n,
+			  te-namespace, te-tag);
+	else
+		ahlog(AH, 1, processing data for table \%s\\n,
+			  te-tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5c0f95f..ea32b42 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1383,6 +1383,8 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 {
 	TableDataInfo *tdinfo = (TableDataInfo *) dcontext;
 	TableInfo  *tbinfo = tdinfo-tdtable;
+	const char *namespace = tbinfo-dobj.namespace ?
+		tbinfo-dobj.namespace-dobj.name : NULL;
 	const char *classname = tbinfo-dobj.name;
 	const bool	hasoids = tbinfo-hasoids;
 	const bool	oids = tdinfo-oids;
@@ -1400,7 +1402,16 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, dumping contents of table %s\n, classname);
+	{
+		/* Print namespace information if available */
+		if (namespace)
+			write_msg(NULL, dumping contents of table \%s\.\%s\\n,
+	  namespace,
+	  classname);
+		else
+			write_msg(NULL, dumping contents of table \%s\\n,
+	  classname);
+	}
 
 	/*
 	 * Make sure we are in proper schema.  We will qualify the 

[HACKERS] documentation update for doc/src/sgml/func.sgml

2014-08-20 Thread Andreas 'ads' Scherbaum


Hi,

attached is a small patch which updates doc/src/sgml/func.sgml. The 
change explains that functions like round() and others might behave 
different depending on your operating system (because of rint(3)) and 
that this is according to an IEEE standard. It also points out that #.5 
is not always rounded up, as expected from a mathematical point of view.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13c71af..da30991 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -924,6 +924,25 @@
 /tgroup
/table
 
+   para
+For functions like round(), log() and sqrt() which run against
+either fixed-precision (NUMERIC) or floating-point numbers (e.g. REAL),
+note that the results of these operations will differ according
+to the input type due to rounding. This is most observable with
+round(), which can end up rounding down as well as up for
+any #.5 value. We use the
+a xmlns=http://en.wikipedia.org/w/index.php?title=IEEE_floating_pointoldid=622007055#Rounding_rules;IEEE's rules/a
+for rounding floating-point numbers which can be machine-specific.
+The bitwise operators work only on integral data types, whereas
+the others are available for all numeric data types.
+  /para
+
+  para
+The bitwise operators are also available for the bit string types
+typebit/type and typebit varying/type, as
+shown in xref linkend=functions-bit-string-op-table.
+   /para
+
   para
 xref linkend=functions-math-random-table shows functions for
 generating random numbers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Alvaro Herrera
Baker, Keith [OCDUS Non-JJ] wrote:

 Please let me know if more discussion is required, or if it would be
 reasonable for me (or someone else of your choosing) to work on the
 coding effort (perhaps targeted for 9.5?)
 If on the other hand it has been decided that a QNX port is not in the
 cards, I would like to know (I hope that is not the case given the
 progress made, but no point in wasting anyone's time).

As I recall, other than the postmaster startup interlock, the other
major missing item you mentioned is SA_RESTART.  That could well turn
out to be a showstopper, so I suggest you study that in more depth.

Are there other major items missing?  Did you have to use
configure --disable-spinlocks for instance?

What's your compiler, and what are the underlying hardware platforms you
want to support?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Josh Berkus
On 08/20/2014 08:29 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 On 08/15/2014 04:19 PM, Tom Lane wrote:
 Personally I'd prefer to go to the all-lengths approach, but a large
 part of that comes from a subjective assessment that the hybrid approach
 is too messy.  Others might well disagree.
 
 ... So, that extraction test is about 1% *slower* than the basic Tom Lane
 lengths-only patch, and still 80% slower than original JSONB.  And it's
 the same size as the lengths-only version.
 
 Since it's looking like this might be the direction we want to go, I took
 the time to flesh out my proof-of-concept patch.  The attached version
 takes care of cosmetic issues (like fixing the comments), and includes
 code to avoid O(N^2) penalties in findJsonbValueFromContainer and
 JsonbIteratorNext.  I'm not sure whether those changes will help
 noticeably on Josh's test case; for me, they seemed worth making, but
 they do not bring the code back to full speed parity with the all-offsets
 version.  But as we've been discussing, it seems likely that those costs
 would be swamped by compression and I/O considerations in most scenarios
 with large documents; and of course for small documents it hardly matters.

Table sizes and extraction times are unchanged from the prior patch
based on my workload.

We should be comparing all-lengths vs length-and-offset maybe using
another workload as well ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Baker, Keith [OCDUS Non-JJ]
Alvaro,

Thanks for your interest and questions.
At this point I have created a proof-of-concept QNX 6.5 port which appears to 
work on the surface (passes regression tests), but needs to be deemed 
production-quality.

To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h
With these macros in place make check runs cleanly (fails in many place 
without them).

+#if defined(__QNX__)
+/* QNX does not support sigaction SA_RESTART. We must retry interrupted 
calls (EINTR) */
+/* Helper macros, used to build our retry macros */
+#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); 
while (_tmp_rc == (val)  errno == EINTR); _tmp_rc; })
+#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int)
+#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *)
+/* override calls known to return EINTR when interrupted */
+#define close(a) PG_RETRY_EINTR(close(a))
+#define fclose(a) PG_RETRY_EINTR(fclose(a))
+#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b))
+#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b))
+#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c))
+#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c))
+#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c))
+#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b))
+#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c))
+#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = 
open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1)  errno == EINTR); _tmp_rc; })
+#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c))
+#define stat(a,b) PG_RETRY_EINTR(stat(a,b))
+#define unlink(a) PG_RETRY_EINTR(unlink(a))
... (Macros for read and write are similar but slightly longer, so I omit 
them here)...
+#endif /* __QNX__ */

Here is what I used for configure, I am open to suggestions:
./configure --without-readline --disable-thread-safety

I am targeting QNX 6.5 on x86, using gcc 4.4.2.

Also, I have an issue to work out for locale support, but expect I can solve 
that.

Keith Baker

 -Original Message-
 From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com]
 Sent: Wednesday, August 20, 2014 4:16 PM
 To: Baker, Keith [OCDUS Non-JJ]
 Cc: Robert Haas; Tom Lane; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
 
 Baker, Keith [OCDUS Non-JJ] wrote:
 
  Please let me know if more discussion is required, or if it would be
  reasonable for me (or someone else of your choosing) to work on the
  coding effort (perhaps targeted for 9.5?) If on the other hand it has
  been decided that a QNX port is not in the cards, I would like to know
  (I hope that is not the case given the progress made, but no point in
  wasting anyone's time).
 
 As I recall, other than the postmaster startup interlock, the other major
 missing item you mentioned is SA_RESTART.  That could well turn out to be a
 showstopper, so I suggest you study that in more depth.
 
 Are there other major items missing?  Did you have to use configure --
 disable-spinlocks for instance?
 
 What's your compiler, and what are the underlying hardware platforms you
 want to support?
 
 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-20 Thread Arthur Silva
What data are you using right now Josh?

There's the github archive http://www.githubarchive.org/
Here's some sample data https://gist.github.com/igrigorik/2017462




--
Arthur Silva



On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus j...@agliodbs.com wrote:

 On 08/20/2014 08:29 AM, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
  On 08/15/2014 04:19 PM, Tom Lane wrote:
  Personally I'd prefer to go to the all-lengths approach, but a large
  part of that comes from a subjective assessment that the hybrid
 approach
  is too messy.  Others might well disagree.
 
  ... So, that extraction test is about 1% *slower* than the basic Tom
 Lane
  lengths-only patch, and still 80% slower than original JSONB.  And it's
  the same size as the lengths-only version.
 
  Since it's looking like this might be the direction we want to go, I took
  the time to flesh out my proof-of-concept patch.  The attached version
  takes care of cosmetic issues (like fixing the comments), and includes
  code to avoid O(N^2) penalties in findJsonbValueFromContainer and
  JsonbIteratorNext.  I'm not sure whether those changes will help
  noticeably on Josh's test case; for me, they seemed worth making, but
  they do not bring the code back to full speed parity with the all-offsets
  version.  But as we've been discussing, it seems likely that those costs
  would be swamped by compression and I/O considerations in most scenarios
  with large documents; and of course for small documents it hardly
 matters.

 Table sizes and extraction times are unchanged from the prior patch
 based on my workload.

 We should be comparing all-lengths vs length-and-offset maybe using
 another workload as well ...

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Bruce Momjian
On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
  Alternatively, you could use the process PID as part of the temp file
  name; which is probably a good idea anyway.
 
  I think that's actually worse, because nothing will clean up those
  unless you explicitly scan for all whatever.$pid files, and somehow
  kill them.
 
 True.  As long as the copy command is prepared to get rid of a
 pre-existing target file, using a fixed .tmp extension should be fine.

Well, then we are back to this comment by MauMau:

 With that said, copying to a temporary file like dest.tmp and
 renaming it to dest sounds worthwhile even as a basic copy utility.
 I want to avoid copying to a temporary file with a fixed name like
 _copy.tmp, because some advanced utility may want to run multiple
 instances of pg_copy to copy several files into the same directory
 simultaneously.  However, I'm afraid multiple dest.tmp files might
 continue to occupy disk space after canceling copy or power failure in
 some use cases, where the copy of the same file won't be retried.
 That's also the reason why I chose to not use a temporary file like
 cp/copy.

Do we want cases where the same directory is used multiple pg_copy
processes?  I can't imagine how that setup would make sense.

I am thinking pg_copy should emit a warning message when it removes an
old temp file.  This might alert people that something odd is happening
if they see the message often.

The pid-extension idea would work as pg_copy can test all pid extension
files to see if the pid is still active.  However, that assumes that the
pid is running on the local machine and not on another machines that has
NFS-mounted this directory, so maybe this is a bad idea, but again, we
are back to the idea that only one process should be writing into this
directory.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-20 Thread Andres Freund
On 2014-08-20 18:58:05 -0400, Bruce Momjian wrote:
 On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
   Alternatively, you could use the process PID as part of the temp file
   name; which is probably a good idea anyway.
  
   I think that's actually worse, because nothing will clean up those
   unless you explicitly scan for all whatever.$pid files, and somehow
   kill them.
  
  True.  As long as the copy command is prepared to get rid of a
  pre-existing target file, using a fixed .tmp extension should be fine.
 
 Well, then we are back to this comment by MauMau:

  With that said, copying to a temporary file like dest.tmp and
  renaming it to dest sounds worthwhile even as a basic copy utility.
  I want to avoid copying to a temporary file with a fixed name like
  _copy.tmp, because some advanced utility may want to run multiple
  instances of pg_copy to copy several files into the same directory
  simultaneously.  However, I'm afraid multiple dest.tmp files might
  continue to occupy disk space after canceling copy or power failure in
  some use cases, where the copy of the same file won't be retried.
  That's also the reason why I chose to not use a temporary file like
  cp/copy.
 
 Do we want cases where the same directory is used multiple pg_copy
 processes?  I can't imagine how that setup would make sense.

I don't think anybody is proposing the _copy.tmp proposal. We've just
argued about the risk of dest.tmp. And I argued - and others seem to
agree - the space usage problem isn't really relevant because archive
commands and such are rerun after failure and can then clean up the temp
file again.

 I am thinking pg_copy should emit a warning message when it removes an
 old temp file.  This might alert people that something odd is happening
 if they see the message often.

Don't really see a point in this. If the archive command or such failed,
that will already have been logged. I'd expect this to be implemented by
passing O_CREAT | O_TRUNC to open(), nothing else.

 The pid-extension idea would work as pg_copy can test all pid extension
 files to see if the pid is still active.  However, that assumes that the
 pid is running on the local machine and not on another machines that has
 NFS-mounted this directory, so maybe this is a bad idea, but again, we
 are back to the idea that only one process should be writing into this
 directory.

I don't actually think we should assume that. There very well could be
one process running an archive command, using differently prefixed file
names or such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-20 Thread Alvaro Herrera
Alvaro Herrera wrote:

 So here's v16, rebased on top of 9bac66020.  As far as I am concerned,
 this is the last version before I start renaming everything to BRIN and
 then commit.

FWIW in case you or others have interest, here's the diff between your
patch and v16.  Also, for illustrative purposes, the diff between
versions yours and mine of the code that got moved to mmpageops.c
because it's difficult to see it from the partial patch.  (There's
nothing to do with that partial diff other than read it directly.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/contrib/pageinspect/mmfuncs.c
--- b/contrib/pageinspect/mmfuncs.c
***
*** 29,35 
  PG_FUNCTION_INFO_V1(minmax_page_type);
  PG_FUNCTION_INFO_V1(minmax_page_items);
  PG_FUNCTION_INFO_V1(minmax_metapage_info);
- PG_FUNCTION_INFO_V1(minmax_revmap_array_data);
  PG_FUNCTION_INFO_V1(minmax_revmap_data);
  
  typedef struct mm_column_state
--- 29,34 
***
*** 388,394  minmax_revmap_data(PG_FUNCTION_ARGS)
  	values[0] = Int64GetDatum((uint64) 0);
  
  	/* Extract (possibly empty) list of TIDs in this page. */
! 	for (i = 0; i  REGULAR_REVMAP_PAGE_MAXITEMS; i++)
  	{
  		ItemPointer	tid;
  
--- 387,393 
  	values[0] = Int64GetDatum((uint64) 0);
  
  	/* Extract (possibly empty) list of TIDs in this page. */
! 	for (i = 0; i  REVMAP_PAGE_MAXITEMS; i++)
  	{
  		ItemPointer	tid;
  
*** a/src/backend/access/minmax/Makefile
--- b/src/backend/access/minmax/Makefile
***
*** 12,17  subdir = src/backend/access/minmax
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = minmax.o mmrevmap.o mmtuple.o mmxlog.o mmsortable.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = minmax.o mmpageops.o mmrevmap.o mmtuple.o mmxlog.o mmsortable.o
  
  include $(top_srcdir)/src/backend/common.mk
*** a/src/backend/access/minmax/minmax.c
--- b/src/backend/access/minmax/minmax.c
***
*** 15,45 
   */
  #include postgres.h
  
- #include access/htup_details.h
  #include access/minmax.h
  #include access/minmax_internal.h
  #include access/minmax_page.h
! #include access/minmax_revmap.h
! #include access/minmax_tuple.h
  #include access/minmax_xlog.h
  #include access/reloptions.h
  #include access/relscan.h
- #include access/xlogutils.h
  #include catalog/index.h
- #include catalog/pg_operator.h
- #include commands/vacuum.h
  #include miscadmin.h
  #include pgstat.h
  #include storage/bufmgr.h
  #include storage/freespace.h
- #include storage/indexfsm.h
- #include storage/lmgr.h
- #include storage/smgr.h
- #include utils/datum.h
- #include utils/lsyscache.h
- #include utils/memutils.h
  #include utils/rel.h
- #include utils/syscache.h
  
  
  /*
--- 15,33 
   */
  #include postgres.h
  
  #include access/minmax.h
  #include access/minmax_internal.h
  #include access/minmax_page.h
! #include access/minmax_pageops.h
  #include access/minmax_xlog.h
  #include access/reloptions.h
  #include access/relscan.h
  #include catalog/index.h
  #include miscadmin.h
  #include pgstat.h
  #include storage/bufmgr.h
  #include storage/freespace.h
  #include utils/rel.h
  
  
  /*
***
*** 75,93  static MMBuildState *initialize_mm_buildstate(Relation idxRel,
  static bool terminate_mm_buildstate(MMBuildState *state);
  static void summarize_range(MMBuildState *mmstate, Relation heapRel,
  BlockNumber heapBlk);
- static bool mm_doupdate(Relation idxrel, BlockNumber pagesPerRange,
- 			mmRevmapAccess *rmAccess, BlockNumber heapBlk,
- 			Buffer oldbuf, OffsetNumber oldoff,
- 			const MMTuple *origtup, Size origsz,
- 			const MMTuple *newtup, Size newsz,
- 			bool samepage, bool *extended);
- static void mm_doinsert(Relation idxrel, BlockNumber pagesPerRange,
- 			mmRevmapAccess *rmAccess, Buffer *buffer, BlockNumber heapblkno,
- 			MMTuple *tup, Size itemsz, bool *extended);
- static Buffer mm_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
-    bool *extended);
  static void form_and_insert_tuple(MMBuildState *mmstate);
- static Size mm_page_get_freespace(Page page);
  
  
  /*
--- 63,69 
***
*** 123,128  mminsert(PG_FUNCTION_ARGS)
--- 99,105 
  	rmAccess = mmRevmapAccessInit(idxRel, pagesPerRange);
  
  restart:
+ 	CHECK_FOR_INTERRUPTS();
  	heapBlk = ItemPointerGetBlockNumber(heaptid);
  	/* normalize the block number to be the first block in the range */
  	heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
***
*** 155,161  restart:
  
  		addValue = index_getprocinfo(idxRel, keyno + 1,
  	 MINMAX_PROCNUM_ADDVALUE);
- 
  		result = FunctionCall5Coll(addValue,
     idxRel-rd_indcollation[keyno],
     PointerGetDatum(mmdesc),
--- 132,137 
***
*** 197,203  restart:
  		/*
  		 * Try to update the tuple.  

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-20 Thread Michael Paquier
On Thu, Aug 21, 2014 at 5:11 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 I just change if (tbinfo-dobj.namespace != NULL) to if
 (tbinfo-dobj.namespace).
Fine for me. I am marking this patch as ready for committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-20 Thread Bruce Momjian
On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
 But more to the point, I thought the consensus was to use the
 highest LSN of all the blocks in the file, no? That's essentially
 free to calculate (if you have to read all the data anyway), and
 isn't vulnerable to collisions.

The highest-LSN approach allows you to read only the tail part of each
8k block.  Assuming 512-byte storage sector sizes, you only have to read
1/8 of the file.

Now, the problem is that you lose kernel prefetch, but maybe
posix_fadvise() would fix that problem.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Andres Freund
Hi,

On 2014-08-20 21:21:41 +, Baker, Keith [OCDUS Non-JJ] wrote:
 To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h
 With these macros in place make check runs cleanly (fails in many place 
 without them).
 
 +#if defined(__QNX__)
 +/* QNX does not support sigaction SA_RESTART. We must retry interrupted 
 calls (EINTR) */
 +/* Helper macros, used to build our retry macros */
 +#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = 
 (exp); while (_tmp_rc == (val)  errno == EINTR); _tmp_rc; })
 +#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int)
 +#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *)
 +/* override calls known to return EINTR when interrupted */
 +#define close(a) PG_RETRY_EINTR(close(a))
 +#define fclose(a) PG_RETRY_EINTR(fclose(a))
 +#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b))
 +#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b))
 +#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c))
 +#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c))
 +#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c))
 +#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b))
 +#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c))
 +#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = 
 open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1)  errno == EINTR); _tmp_rc; 
 })
 +#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c))
 +#define stat(a,b) PG_RETRY_EINTR(stat(a,b))
 +#define unlink(a) PG_RETRY_EINTR(unlink(a))
 ... (Macros for read and write are similar but slightly longer, so I omit 
 them here)...
 +#endif   /* __QNX__ */

I think this is a horrible way to go and unlikely to succeed. You're
surely going to miss calls and it's going to need to be maintained
continuously. We'll miss adding things which will then only break under
load. Which most poeple won't be able to generate under qnx.

The only reasonably way to fake kernel SA_RESTART support is doing so is
in $platform's libc. In the syscall wrapper.

 Here is what I used for configure, I am open to suggestions:
 ./configure --without-readline --disable-thread-safety

Why is the --disable-thread-safety needed?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-20 Thread Claudio Freire
On Wed, Aug 20, 2014 at 8:24 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote:
 But more to the point, I thought the consensus was to use the
 highest LSN of all the blocks in the file, no? That's essentially
 free to calculate (if you have to read all the data anyway), and
 isn't vulnerable to collisions.

 The highest-LSN approach allows you to read only the tail part of each
 8k block.  Assuming 512-byte storage sector sizes, you only have to read
 1/8 of the file.

 Now, the problem is that you lose kernel prefetch, but maybe
 posix_fadvise() would fix that problem.

Sequential read of 512-byte blocks or 8k blocks takes the same amount
of time in rotating media (if they're scheduled right). Maybe not in
SSD media.

Not only, the kernel will read in 4k blocks, instead of 8k (at least in linux).

So, the benefit is dubious.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-20 Thread Andres Freund
On 2014-07-25 18:29:53 -0400, Tom Lane wrote:
  * QNX lacks sigaction SA_RESTART: I modified src/include/port.h 
  to define macros to retry system calls upon EINTR (open,read,write,...) 
  when compiled on QNX
 
 That's pretty scary too.  For one thing, such macros would affect every
 call site whether it's running with SA_RESTART or not.  Do you really
 need it?  It looks to me like we just turn off HAVE_POSIX_SIGNALS if
 you don't have SA_RESTART.  Maybe that code has bit-rotted by now, but
 it did work at one time.

I have pretty much no trust that we're maintaining
!HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I
seriously doubt there's any !HAVE_POSIX_SIGNAL animals and
873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely
that we have much chance of finding such mistakes during development.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-20 Thread Fabrízio de Royes Mello
On Wed, Aug 20, 2014 at 8:24 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Aug 21, 2014 at 5:11 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  I just change if (tbinfo-dobj.namespace != NULL) to if
  (tbinfo-dobj.namespace).
 Fine for me. I am marking this patch as ready for committer.


Thanks!

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-20 Thread Bruce Momjian
On Tue, Aug 19, 2014 at 03:26:56PM -0400, Stephen Frost wrote:
 I think there's been some improvement since I last had to go through the
 pain of setting this all up, and some of it is undoubtably OpenSSL's
 fault, but there's definitely quite a bit more we could be doing to make
 SSL support easier.  I'm hopeful that I'll be able to spend more time on
 this in the future but it's not a priority currently.

I know I updated the docs on this in 2013:

commit fa4add50c4ea97c48881fa8cb3863df80141643c
Author: Bruce Momjian br...@momjian.us
Date:   Fri Dec 6 09:42:08 2013 -0500

docs: clarify SSL certificate authority chain docs

Previously, the requirements of how intermediate certificates were
handled and their chain to root certificates was unclear.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-20 Thread Bruce Momjian
On Tue, Aug 19, 2014 at 03:47:17PM -0400, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  BTW, if we're beating on libpq, I wonder if we shouldn't consider
  bumping the soversion at some point.  I mean, I know that we
  technically don't need to do that if we're only *adding* functions and
  not changing any of the existing stuff in backward-incompatible ways,
  but we might *want* to make some backward-incompatible changes at some
  point, and I think there's a decent argument that any patch in this
  are is already doing that at least to PQgetSSL().  Maybe this would be
  a good time to think if there's anything else we want to do that
  would, either by itself or in combination, justify a bump.
 
 I'm not a big fan of doing it for this specific item, though it's
 technically an API breakage (which means we should actually have
 libpq2-dev packages, make everything that build-deps on libpq-dev
 update to build-dep on libpq2-dev, have libpq6, etc..).  If there are
 other backwards-incompatible things we wish to do, then I agree that
 it'd be good to do them all at the same time (perhaps in conjunction
 with 10.0...).  This is the part where I wish we had been keeping an
 updated list of things we want to change (like on the wiki..).

We have, called Wire Protocol Changes:

https://wiki.postgresql.org/wiki/Todo

Unfortunately, the subsection link doesn't work on Firefox.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add line number as prompt option to psql

2014-08-20 Thread Sawada Masahiko
On Wed, Aug 20, 2014 at 9:00 PM, Jeevan Chalke
jeevan.cha...@enterprisedb.com wrote:
 Hi,

 I have reviewed this:

 I have initialize cur_lineno to UINTMAX - 2. And then observed following
 behaviour to check wrap-around.

 postgres=# \set PROMPT1 '%/[%l]%R%# '
 postgres[18446744073709551613]=# \set PROMPT2 '%/[%l]%R%# '
 postgres[18446744073709551613]=# select
 postgres[18446744073709551614]-# a
 postgres[18446744073709551615]-# ,
 postgres[0]-# b
 postgres[1]-# from
 postgres[2]-# dual;

 It is wrapping to 0, where as line number always start with 1. Any issues?

 I would like to ignore this as UINTMAX lines are too much for a input
 buffer to hold. It is almost NIL chances to hit this.


 However, I think you need to use UINT64_FORMAT while printing uint64
 number. Currently it is %u which wrap-around at UINT_MAX.
 See how pset.lineno is displayed.

 Apart from this, I didn't see any issues in my testing.

 Patch applies cleanly. make/make install/initdb/make check all are well.


Thank you for reviewing the patch!
Attached patch is latest version patch.
I modified the output format of cur_lineno.

Regards,

---
Sawada Masahiko


psql-line-number_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-20 Thread Amit Kapila
On Thu, Aug 21, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com
wrote:
 
  On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com
  wrote:
  
   a. How about describing w.r.t asynchronous connections
   instead of parallel connections?
 
  I don't think asynchronous is a good choice of word.
 
  Agreed.
 
 Maybe simultaneous?
 
  Not sure. How about *concurrent* or *multiple*?

 multiple isn't right, but we could say concurrent.

I also find concurrent more appropriate.
Dilip, could you please change it to concurrent in doc updates,
variables, functions unless you see any objection for the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Compute attr_needed for child relations (was Re: [HACKERS] inherit support for foreign tables)

2014-08-20 Thread Ashutosh Bapat
On Wed, Aug 20, 2014 at 3:25 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 Hi Ashutish,


 (2014/08/14 22:30), Ashutosh Bapat wrote:

 On Thu, Aug 14, 2014 at 10:05 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:

 (2014/08/08 18:51), Etsuro Fujita wrote:
   (2014/06/30 22:48), Tom Lane wrote:
   I wonder whether it isn't time to change that.  It was coded
 like that
   originally only because calculating the values would've been a
 waste of
   cycles at the time.  But this is at least the third place
 where it'd be
   useful to have attr_needed for child rels.

   I've revised the patch.

 There was a problem with the previous patch, which will be described
 below.  Attached is the updated version of the patch addressing that.


  Here are some more comments


 Thank you for the further review!


  attr_needed also has the attributes used in the restriction clauses as
 seen in distribute_qual_to_rels(), so, it looks unnecessary to call
 pull_varattnos() on the clauses in baserestrictinfo in functions
 check_selective_binary_conversion(), remove_unused_subquery_outputs(),
 check_index_only().


 IIUC, I think it's *necessary* to do that in those functions since the
 attributes used in the restriction clauses in baserestrictinfo are not
 added to attr_needed due the following code in distribute_qual_to_rels.


That's right. Thanks for pointing that out.


 /*
  * If it's a join clause (either naturally, or because delayed by
  * outer-join rules), add vars used in the clause to targetlists of
 their
  * relations, so that they will be emitted by the plan nodes that scan
  * those relations (else they won't be available at the join node!).
  *
  * Note: if the clause gets absorbed into an EquivalenceClass then this
  * may be unnecessary, but for now we have to do it to cover the case
  * where the EC becomes ec_broken and we end up reinserting the
 original
  * clauses into the plan.
  */
 if (bms_membership(relids) == BMS_MULTIPLE)
 {
 List   *vars = pull_var_clause(clause,
PVC_RECURSE_AGGREGATES,
PVC_INCLUDE_PLACEHOLDERS);

 add_vars_to_targetlist(root, vars, relids, false);
 list_free(vars);

 }

  Although in case of RTE_RELATION, the 0th entry in attr_needed
 corresponds to FirstLowInvalidHeapAttributeNumber + 1, it's always safer
 to use it is RelOptInfo::min_attr, in case get_relation_info() wants to
 change assumption or somewhere down the line some other part of code
 wants to change attr_needed information. It may be unlikely, that it
 would change, but it will be better to stick to comments in RelOptInfo
   443 AttrNumber  min_attr;   /* smallest attrno of rel (often
 0) */
   444 AttrNumber  max_attr;   /* largest attrno of rel */
   445 Relids *attr_needed;/* array indexed [min_attr ..
 max_attr] */


 Good point!  Attached is the revised version of the patch.


If the patch is not in the commit-fest, can you please add it there? From
my side, the review is done, it should be marked ready for committer,
unless somebody else wants to review.



 Thanks,

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Support for N synchronous standby servers

2014-08-20 Thread Michael Paquier
On Fri, Aug 15, 2014 at 9:28 PM, Fujii Masao masao.fu...@gmail.com wrote:
 You added check_synchronous_standby_num() as the GUC check function for
 synchronous_standby_num, and checked that there. But that seems to be
wrong.
 You can easily see the following error messages even if
synchronous_standby_num
 is smaller than max_wal_senders. The point is that synchronous_standby_num
 should be located before max_wal_senders in postgresql.conf.

 LOG:  invalid value for parameter synchronous_standby_num: 0
 DETAIL:  synchronous_standby_num cannot be higher than max_wal_senders.
I am not sure what I can do here, so I am removing this check in the code,
and simply add a note in the docs that a value of _num higher than
max_wal_senders does not have much meaning.

 I still think that it's strange that replication can be async even when
 s_s_num is larger than zero. That is, I think that the transaction must
 wait for s_s_num sync standbys whether s_s_names is empty or not.
 OTOH, if s_s_num is zero, replication must be async whether s_s_names
 is empty or not. At least for me, it's intuitive to use s_s_num primarily
 to control the sync mode. Of course, other hackers may have different
 thoughts, so we need to keep our ear open for them.
Sure, the compromise looks to be what you propose, and I am fine with that.

 In the above design, one problem is that the number of parameters
 that those who want to set up only one sync replication need to change is
 incremented by one. That is, they need to change s_s_num additionally.
 If we are really concerned about this, we can treat a value of -1 in
 s_s_num as the special value, which allows us to control sync replication
 only by s_s_names as we do now. That is, if s_s_names is empty,
 replication would be async. Otherwise, only one standby with
 high-priority in s_s_names becomes sync one. Probably the default of
 s_s_num should be -1. Thought?

Taking into account those comments, attached is a patch doing the following
things depending on the values of _num and _names:
- If _num = -1 and _names is empty, all the standbys are considered as
async (same behavior as 9.1~, and default).
- If _num = -1 and _names has at least one item, wait for one standby, even
if it is not connected at the time of commit. If one node is found as sync,
other standbys listed in _names with higher priority than the sync one are
in potential state (same as existing behavior).
- If _num = 0, all the standbys are async, whatever the values in _names.
Priority is enforced to 0 for all the standbys. SyncStandbysDefined is set
to false in this case.
- If _num  0, must wait for _num standbys whatever the values in _names
The default value of _num is -1. Documentation has been updated in
consequence.

 The source code comments at the top of syncrep.c need to be udpated.
 It's worth checking whether there are other comments to be updated.
Done. I have updated some comments in other places than the header.
Regards,
-- 
Michael
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2586,2597  include_dir 'conf.d'
  Specifies a comma-separated list of standby names that can support
  firsttermsynchronous replication/, as described in
  xref linkend=synchronous-replication.
! At any one time there will be at most one active synchronous standby;
! transactions waiting for commit will be allowed to proceed after
! this standby server confirms receipt of their data.
! The synchronous standby will be the first standby named in this list
! that is both currently connected and streaming data in real-time
! (as shown by a state of literalstreaming/literal in the
  link linkend=monitoring-stats-views-table
  literalpg_stat_replication//link view).
  Other standby servers appearing later in this list represent potential
--- 2586,2598 
  Specifies a comma-separated list of standby names that can support
  firsttermsynchronous replication/, as described in
  xref linkend=synchronous-replication.
! At any one time there will be at a number of active synchronous standbys
! defined by xref linkend=guc-synchronous-standby-num, transactions
! waiting for commit will be allowed to proceed after those standby
! servers confirm receipt of their data. The synchronous standbys will be
! the first entries named in this list that are both currently connected
! and streaming data in real-time (as shown by a state of
! literalstreaming/literal in the
  link linkend=monitoring-stats-views-table
  literalpg_stat_replication//link view).
  Other standby servers appearing later in this list represent potential
***
*** 2627,2632  include_dir 'conf.d'
--- 2628,2685 
/listitem
   /varlistentry
  
+  varlistentry id=guc-synchronous-standby-num 

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-08-20 Thread furuyao
 When replication slot is not specified in pg_receivexlog, the flush
 location in the feedback message always indicates invalid. So there seems
 to be no need to send the feedback as soon as fsync is issued, in that
 case.
 How should this option work when replication slot is not specified?

Thanks for the review!

The present is not checking the existence of specification of -S. 

The use case when replication slot is not specified.

Because it does fsync, it isn't an original intention.
remote_write is set in synchronous_commit.

To call attention to the user, append following documents.
If you want to report the flush position to the server, should use -S option.

Regards,

--
Furuya Osamu


pg_receivexlog-fsync-feedback-v4.patch
Description: pg_receivexlog-fsync-feedback-v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers