date:20150114

Re: [HACKERS] moving from contrib to bin

2015-01-14 Thread Michael Paquier

On Tue, Dec 23, 2014 at 3:24 PM, Michael Paquier
 wrote:
> On Mon, Dec 22, 2014 at 11:30 PM, Alvaro Herrera
>  wrote:
>> Michael Paquier wrote:
>>
>>> And here is an updated patch following those lines. Similarly to the
>>> things in contrib/, a set of variables are used to define for each
>>> module what are the extra libraries, include dirs, etc. This refactors
>>> quite a bit of code, even if there are a couple of exceptions like
>>> pg_xlogdump/ or pg_basebackup/.
>>
>> In a broad look, this looks a lot better.  I think we should press
>> forward with this whole set of patches and see what the buildfarm
>> thinks.
> Here is a new series of patches for all those things, with the
> following additions:
> - Some cleanup for MSVC scripts compared to last patch
> - Moved documentation to ref/
> - Removed mention to the authors of the utilities moved (?)
> This set of patches is basically made of the former set, with 2
> additional patches for MSVC stuff and documentation pages moved to
> ref/.
Where are we on this? This patch is waiting for input from author for
the last couple of weeks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP PRIVILEGES OWNED BY

2015-01-14 Thread Michael Paquier

On Thu, Dec 18, 2014 at 1:43 AM, Marko Tiikkaja  wrote:
> I don't have a problem with that.  It would probably work, too, since FROM
> is already fully reserved.
Marking patch as returned with feedback as there has been no input
from Marko in the last couple of weeks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support UPDATE table SET(*)=...

2015-01-14 Thread Michael Paquier

On Mon, Dec 15, 2014 at 7:50 PM, Atri Sharma  wrote:
> I have moved patch to current CF and marked it as Waiting on Author since I
> plan to resubmit after addressing the concerns.
Nothing happened in the last month, so marking as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-14 Thread Michael Paquier

On Tue, Jan 13, 2015 at 8:46 PM, Kyotaro HORIGUCHI
 wrote:
> I'll look into the case after this, but I'd like to send a
> revised patch at this point.
Hm. Seems like this patch is not completely baked yet. Horiguchi-san,
as you are obviously still working on it, would you agree to move it
to the next CF?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical column ordering

2015-01-14 Thread Michael Paquier

On Sun, Jan 4, 2015 at 10:37 AM, Alvaro Herrera
 wrote:
> So I'm reworking my patch with that in mind.
Switching to returned with feedback. Alvaro, feel free to add an entry
to the next CF if you are planning to work on it again.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join push-down support for foreign tables

2015-01-14 Thread Michael Paquier

On Fri, Dec 26, 2014 at 1:48 PM, Shigeru Hanada
 wrote:
> Hmm, I agree to support N-way join is very useful.  Postgres-XC's SQL
> generator seems to give us a hint for such case, I'll check it out
> again.
Switching to returned with feedback, as this patch is waiting for
feedback for a couple of weeks now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin

2015-01-14 Thread Michael Paquier

On Wed, Dec 24, 2014 at 3:08 PM, Michael Paquier
 wrote:
> Attached are two patches, one for MinGW/cygwin, a slightly modified
> version from Peter and the second implementing the same thing but for
> the MSVC scripts. The method for MSVC is similar to what is done in
> Peter's patch: roughly it checks if SO_MAJOR_VERSION is present in the
> Makefile of a given library, the path of Makefile is found by looking
> at the location of the .rc in the vcproj file (could be better but I
> could not come up with a better method). TBH, it would be good to be
> completely consistent in the way we build things on Windows, and we
> may as well consider a backpatch to fix this long-standing bug. The
> MSVC patch removes of course the hack copying libpq.dll from lib/ to
> bin/.
>
> I mentioned the fix for MSVC scripts as well here:
> http://www.postgresql.org/message-id/cab7npqqiuepzphd3mmk+q7_cqqrkk_v1fvxknymri660z4d...@mail.gmail.com
Peter, this patch is waiting for input for a couple of weeks. IMO, it
would be good to finally get a fix for this bug, and we have patches
for both MSVC (the patch I sent) and mingw (your stuff).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2015-01-14 Thread Etsuro Fujita

On 2014/12/23 0:36, Tom Lane wrote:
> Yeah, we need to do something about the PlanRowMark data structure.
> Aside from the pre-existing issue in postgres_fdw, we need to fix it
> to support inheritance trees in which more than one rowmark method
> is being used.  That rte.hasForeignChildren thing is a crock,

> The idea I'd had about that was to convert the markType field into a
> bitmask, so that a parent node's markType could represent the logical
> OR of the rowmark methods being used by all its children.  I've not
> attempted to code this up though, and probably won't get to it until
> after Christmas.  One thing that's not clear is what should happen
> with ExecRowMark.

As I said before, that seems to me like a good idea.  So I'll update the
patch based on that if you're okey with it.  Or you've found any problem
concerning the above idea?

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-01-14 Thread Michael Paquier

On Tue, Dec 30, 2014 at 11:48 AM, Andreas Karlsson  wrote:
> Here is my review of the feature.
Marked as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2015-01-14 Thread Michael Paquier

Marking this patch as returned with feedback because it is waiting for
input from the author for now a couple of weeks. Heikki, the
refactoring patch has some value, are you planning to push it?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2015-01-14 Thread Michael Paquier

Marking this patch as returned with feedback for this CF, moving it to
the next one. I doubt that there will be much progress here for the
next couple of days, so let's try at least to get something for this
release cycle.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-01-14 Thread Michael Paquier

On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
 wrote:
> On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov  
> wrote:
>> On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier 
>>> I am attaching an updated patch, with the default fillfactor value at
>>> 75%, and with the page split code using the fillfactor rate.
>>> Thoughts?
>> Rewritten version of patch is attached. I made following changes:
>
> Thanks! With this patch (and my previous version as well) GIN indexes
> with default fillfactor have a size higher than 9.4 indexes, 9.4
> behavior being consistent only with fillfactor=100 and not the default
> of 90. Are we fine with that?
IMO, this patch has value to control random updates on GIN indexes,
but we should have a default fillfactor of 100 to have index size
consistent with 9.4. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2015-01-14 Thread Michael Paquier

On Mon, Dec 15, 2014 at 11:15 PM, Alex Shulgin  wrote:
> Michael Paquier  writes:

 Perhaps ssloptions.[ch], unless you plan to add non-option-related code
 there later?
>>>
>>> I don't think anything else than common options-related code fits in
>>> there, so ssloptions.c makes sense to me.
>>>
 BTW, there is no Regent code in your openssl.c, so the copyright
 statement is incorrect.
>>>
>>> Good catch, I just blindly copied that from some other file.
>> There have been arguments in favor and against this patch... But
>> marking it as returned with feedback because of a lack of activity in
>> the last couple of weeks. Feel free to update if you think that's not
>> correct, and please move this patch to commit fest 2014-12.
>
> Attached is a new version that addresses the earlier feedback: renamed
> the added *.[ch] files and removed incorrect copyright line.
>
> I'm moving this to the current CF.
This patch is statuquo since the beginning of this CF, at the
arguments are standing the same. If there is nothing happening maybe
it would be better to mark it as returned with feedback? Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-01-14 Thread Etsuro Fujita

Here is an example using postgres_fdw.

[Terminal 1]
postgres=# create table t (a int, b int);
CREATE TABLE
postgres=# insert into t values (1, 1);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=# update t set b = b * 2;
UPDATE 1

[Terminal 2]
postgres=# create foreign table ft (a int) server loopback options
(table_name 'lbt');
CREATE FOREIGN TABLE
postgres=# insert into ft values (1);
INSERT 0 1
postgres=# select tableoid, ctid, * from ft;
 tableoid | ctid  | a
--+---+---
25092 | (0,1) | 1
(1 row)

postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a
for update;

[Terminal 1]
postgres=# commit;
COMMIT

[Terminal 2]
postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a
for update;
 tableoid |  ctid  | a
--++---
0 | (4294967295,0) | 1
(1 row)

Note that tableoid and ctid have been changed!

I think the reason for that is because EvalPlanQualFetchRowMarks doesn't
properly set tableoid and ctid for foreign tables, IIUC.  I think this
should be fixed.  Please find attached a patch.  The patch slightly
relates to [1], so if it is reasonable, I'll update [1] on top of this.

[1] https://commitfest.postgresql.org/action/patch_view?id=1386

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 2947,2953  make_tuple_from_result_row(PGresult *res,
  	tuple = heap_form_tuple(tupdesc, values, nulls);
  
  	if (ctid)
! 		tuple->t_self = *ctid;
  
  	/* Clean up */
  	MemoryContextReset(temp_context);
--- 2947,2953 
  	tuple = heap_form_tuple(tupdesc, values, nulls);
  
  	if (ctid)
! 		tuple->t_self = tuple->t_data->t_ctid = *ctid;
  
  	/* Clean up */
  	MemoryContextReset(temp_context);
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 795,800  InitPlan(QueryDesc *queryDesc, int eflags)
--- 795,801 
  	{
  		PlanRowMark *rc = (PlanRowMark *) lfirst(l);
  		Oid			relid;
+ 		char		relkind;
  		Relation	relation;
  		ExecRowMark *erm;
  
***
*** 817,826  InitPlan(QueryDesc *queryDesc, int eflags)
--- 818,833 
  break;
  			case ROW_MARK_COPY:
  /* there's no real table here ... */
+ relkind = rt_fetch(rc->rti, rangeTable)->relkind;
+ if (relkind == RELKIND_FOREIGN_TABLE)
+ 	relid = getrelid(rc->rti, rangeTable);
+ else
+ 	relid = InvalidOid;
  relation = NULL;
  break;
  			default:
  elog(ERROR, "unrecognized markType: %d", rc->markType);
+ relid = InvalidOid;
  relation = NULL;	/* keep compiler quiet */
  break;
  		}
***
*** 831,836  InitPlan(QueryDesc *queryDesc, int eflags)
--- 838,844 
  
  		erm = (ExecRowMark *) palloc(sizeof(ExecRowMark));
  		erm->relation = relation;
+ 		erm->relid = relid;
  		erm->rti = rc->rti;
  		erm->prti = rc->prti;
  		erm->rowmarkId = rc->rowmarkId;
***
*** 2318,2325  EvalPlanQualFetchRowMarks(EPQState *epqstate)
  
  			/* build a temporary HeapTuple control structure */
  			tuple.t_len = HeapTupleHeaderGetDatumLength(td);
! 			ItemPointerSetInvalid(&(tuple.t_self));
! 			tuple.t_tableOid = InvalidOid;
  			tuple.t_data = td;
  
  			/* copy and store tuple */
--- 2326,2342 
  
  			/* build a temporary HeapTuple control structure */
  			tuple.t_len = HeapTupleHeaderGetDatumLength(td);
! 			/* if relid is valid, rel is a foreign table; set system columns */
! 			if (OidIsValid(erm->relid))
! 			{
! tuple.t_self = td->t_ctid;
! tuple.t_tableOid = erm->relid;
! 			}
! 			else
! 			{
! ItemPointerSetInvalid(&(tuple.t_self));
! tuple.t_tableOid = InvalidOid;
! 			}
  			tuple.t_data = td;
  
  			/* copy and store tuple */
*** a/src/backend/executor/nodeForeignscan.c
--- b/src/backend/executor/nodeForeignscan.c
***
*** 22,27 
--- 22,28 
   */
  #include "postgres.h"
  
+ #include "access/htup_details.h"
  #include "executor/executor.h"
  #include "executor/nodeForeignscan.h"
  #include "foreign/fdwapi.h"
***
*** 53,65  ForeignNext(ForeignScanState *node)
  	/*
  	 * If any system columns are requested, we have to force the tuple into
  	 * physical-tuple form to avoid "cannot extract system attribute from
! 	 * virtual tuple" errors later.  We also insert a valid value for
! 	 * tableoid, which is the only actually-useful system column.
  	 */
  	if (plan->fsSystemCol && !TupIsNull(slot))
  	{
  		HeapTuple	tup = ExecMaterializeSlot(slot);
  
  		tup->t_tableOid = RelationGetRelid(node->ss.ss_currentRelation);
  	}
  
--- 54,69 
  	/*
  	 * If any system columns are requested, we have to force the tuple into
  	 * physical-tuple form to avoid "cannot extract system attribute from
! 	 * virtual tuple" errors later.  We also insert a valid value for TID
! 	 * and tableoid, which are the only actually-useful system columns.
  	 */
  	if (plan->fsSystemCol && !TupIsNull(slot))
  	{
  		HeapTuple

Re: [HACKERS] attribute for non-gcc compilers

2015-01-14 Thread Oskari Saarenmaa

15.01.2015, 00:54, Andres Freund kirjoitti:
> I think I'd for now simply not define pg_attribute_aligned() on
> platforms where it's not supported, instead of defining it empty. If we
> need a softer variant we can name it pg_attribute_aligned_if_possible or
> something.

Good point, all attributes that actually change program behavior
(aligned and packed for now) need to error out during compilation if
they're used and they're not supported by the compiler.

Attributes which may improve optimization or just provide more
information for the developers (noreturn, unused, format) can be defined
empty when they're not supported (or are not supported well enough: GCC
< 3 doesn't know about %z in printf format.)

/ Oskari



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Commit fest 2015-12 enters money time

2015-01-14 Thread Michael Paquier

Hi all,

We are soon entering in the money time for this CF. The last month has
been mainly a vacation period, the progress being fantomatic on many
fronts, still there are a couple of patches that are marked as ready
for committer:
- Foreign table inheritance , whose first patch has been committed
- speedup tidbitmap patch: cache page
- pg_basebackup vs. Windows and tablespaces (Extend base backup to
include symlink file used to restore symlinks)
- If pg_hba.conf has changed since last reload, emit a HINT to the
client and server log
- Add restore_command_retry_interval option to control timeout of
restore_command nonzero status code
- documentation update for doc/src/sgml/func.sgml
- Reducing lock strength of trigger and foreign key DDL

I am going to make a first pass on all the patches to check their
status, and please note that the patches waiting for some input from
their authors will be marked as returned with feedback.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-14 Thread Michael Paquier

I wrote:
> I think that we could pass it to a committer.
Marked as such.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Define and document a guaranteed ordering for RETURNING?

2015-01-14 Thread Craig Ringer

Apparently I'm semi-blind - the docs also note that:

> If the INSERT command contains a RETURNING clause, the result will be
similar to that of a SELECT statement containing the columns and values
defined in the RETURNING list, computed over the row(s) inserted by the
command.

... so perhaps it's enough to just explicitly mention that ordering is
preserved.

Re: [HACKERS] Overhauling our interrupt handling

2015-01-14 Thread Kyotaro HORIGUCHI

Hello, I'd synced up this at last.

I think I should finilize my commitfest item for this issue, with
.. "Rejected"?

> This mail is a answer to
> http://archives.postgresql.org/message-id/20150110022542.GF12509%40alap3.anarazel.de
> but I decided that it's a good go move it into a new thread since the
> patchseries has outgrown its original target. It's invasive and touches
> many arcane areas of the code, so that I think more eyes than a long
> deep thread is likely to have, are a good thing.
> 
> As a short description of the goal of the patchseries:
> This started out as steps on the way to be able to safely handle
> terminate_backend() et al when we're reading/writing from the
> client. E.g. because the client is dead and we haven't noticed yet and
> are stuck writing, or because we want to implement a idle_in_transaction
> timeout.
> 
> Making these changes allowed me to see that not doing significant work
> in signal handlers can make code much simpler and more robust. The
> number of bugs postgres had in the past that involved doing too much in
> signal handlers is significant.  Thus I've since extended the
> patchseries to get rid of nearly all nontrivial work in signal
> handlers.

It sounds pretty nice.

> All the patches in the series up to 0008 hav ecommit messages providing
> more detail. A short description of each patch follows:
> 
> 0001: Replace walsender's latch with the general shared latch.
> 
>   New patch that removes ImmediateInteruptOK behaviour from walsender. I
>   think that's a rather good idea, because walsender currently seems to
>   assume WaitLatchOrSocket is reentrant - which I don't think is really
>   guaranteed.
>   Hasn't been reviewed yet, but I think it's not far from being
>   committable.

Deesn't this patchset containing per-socket basis non-blocking
control for win32? It should make the code (above the win32
socket layer itself) more simpler.

> 0002: Use a nonblocking socket for FE/BE communication and block using
>   latches.
> 
>   Has previously been reviewed by Heikki. I think Noah also had a
>   look, although I'm not sure how close that was.
> 
>   I think this can be committed soon.
>
> 0003: Introduce and use infrastructure for interrupt processing during client 
> reads.
> 
>   From here on ImmediateInterruptOK isn't set during client
>   communication. Normal interrupts and sinval/async interrupts are
>   processed outside of signal handlers. Especially the sinval/async
>   greatly simplify the respective code.
> 
>   Has been reviewed by Heikki in an earlier version - I think I
>   addressed all the review comments.
> 
>   I'd like somebody else to look at the sinval/async.c changes
>   before committing. I really like the simplification this change
>   allowed.
> 
>   I think this patch might not be safe without 0004 because I can't
>   convince myself that it's safe to interrupt latch waits with work
>   that actually might also use the same latch (sinval
>   interrupts). But it's easier to review this way as 0004 is quite
>   big.
> 
> 0004: Process 'die' interrupts while reading/writing from the client socket.
> 
>   This is the reason Horiguchi-san started this thread.
> 
>   I think the important debate here is whether we think it's
>   acceptable that there are situations (a full socket buffer, but a
>   alive connection) where the client previously got an error, but
>   not anymore afterwards. I think that's much better than having
>   unkillable connections, but I can see others being of a different
>   opinion.


This patch yields a code a bit confusion like following.

| secure_raw_write(Port *port, const void *ptr, size_t len)
| {
..
| w = WaitLatchOrSocket(MyLatch,
| if (w & WL_LATCH_SET)
...
| errno = EINTR;
| else if (w & WL_SOCKET_WRITEABLE)
| goto wloop;
|
| errno = save_errno;

The errno set when WL_LATCH_SET always vanishes. Specifically,
the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by
EAGAIN. As the result, pg_terminte_backend() cannot kill the
backend till the write blocking is released. errno = save_errno
should be the alternative of the line "errno = EINTR" and I
confirmed that the change leads to the desirable (as of me)
behavior.


> 0005: Move deadlock and other interrupt handling in proc.c out of signal 
> handlers.
> 
>   I'm surprised how comparatively simple this turned out to be. For
>   robustness and understanding I think this is a great improvement.
> 
>   New and not reviewed at all. Needs significant review. But works
>   in my simple testcases.
> 
> 0006: Don't allow immediate interupts during authentication anymore.
> 
>   So far we've set ImmediateInterruptOK to true during large parts
>   of the client authentication - that's not all that pretty,
>   interrupts might arrive while we're in so

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-14 Thread Noah Misch

On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote:
> What I'm seeing now is that the unaccent regression tests when run under
> make check-world abort with
> 
> FATAL:  postmaster became multithreaded during startup
> HINT:  Set the LC_ALL environment variable to a valid locale.

contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense.  I expect the
patch over here will fix it:
http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com

> The behavior is also a bit strange.  The step
> 
> == starting postmaster==
> 
> hangs for one minute before failing.  This probably has nothing to do
> with your change, but maybe pg_regress could detect postmaster startup
> failures better.

Yeah, I think any postmaster startup failure has that effect.  Not ideal.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-14 Thread Amit Kapila

On Wed, Jan 14, 2015 at 9:30 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:
>
> On Wed, Jan 14, 2015 at 9:12 AM, Amit Kapila 
wrote:
>>
>> Here we have to decide what should be the strategy and how much
>> each worker should scan.  As an example one of the the strategy
>> could be if the table size is X MB and there are 8 workers, then
>> divide the work as X/8 MB for each worker (which I have currently
>> used in patch) and another could be each worker does scan
>> 1 block at a time and then check some global structure to see which
>> next block it needs to scan, according to me this could lead to random
>> scan.  I have read that some other databases also divide the work
>> based on partitions or segments (size of segment is not very clear).
>
>
> A block can contain useful tuples, i.e tuples which are visible and
fulfil the quals + useless tuples i.e. tuples which are dead, invisible or
that do not fulfil the quals. Depending upon the contents of these blocks,
esp. the ratio of (useful tuples)/(unuseful tuples), even though we divide
the relation into equal sized runs, each worker may take different time.
So, instead of dividing the relation into number of run = number of
workers, it might be better to divide them into fixed sized runs with size
< (total number of blocks/ number of workers), and let a worker pick up a
run after it finishes with the previous one. The smaller the size of runs
the better load balancing but higher cost of starting with the run. So, we
have to strike a balance.
>

I think your suggestion is good and it somewhat falls inline
with what Robert has suggested, but instead of block-by-block,
you seem to be suggesting of doing it in chunks (where chunk size
is not clear), however the only point against this is that such a
strategy for parallel sequence scan could lead to random scans
which can hurt the operation badly.  Nonetheless, I will think more
on this lines of making work distribution dynamic so that we can
ensure that all workers can be kept busy.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] Define and document a guaranteed ordering for RETURNING?

2015-01-14 Thread Craig Ringer

Hi all

It's recently occurred to me that many people rely on the RETURNING clause
for multi-valued INSERTs, INSERT INTO ... SELECT, etc, returning rows in
the same order they were supplied as input into the INSERT. I often see
lock-step iteration over the inputs to an INSERT and the RETURNING
result-tuples used to map generated sequence values to the corresponding
input.

I can't find anything relevant in the docs about the order of results in
RETURNING, and it strikes me that this should probably be codified one way
or the other.

The docs simply read:

> The optional RETURNING clause causes INSERT to compute and return
value(s) based on each row actually inserted. This is primarily useful for
obtaining values that were supplied by defaults, such as a serial sequence
number. However, any expression using the table's columns is allowed. The
syntax of the RETURNING list is identical to that of the output list of
SELECT.

Looking at the code I don't see any reason why it wouldn't currently be
safe to assume that

INSERT ... VALUES (...), (...), ... RETURNING ...

and

INSERT ... SELECT ... ORDER BY ... RETURNING ...

always return rows in the exact order of their input. But that may not
always be true and if it's not documented as safe, it's not necessarily
wise to rely on it.

If you can't rely on RETURNING order you have to instead identify a
candidate key on the input and use a wider RETURNING result that includes
that candidate key as well as whatever else you want returned. Then map the
results onto the inputs. No fun at all.

So - I propose to document that INSERT ... RETURNING guarantees that the
result-tuples will be in the same order as the input to the INSERT ...
RETURNING statement. So if that order is well-defined, as in the case of
multi-VALUES clauses or SELECT ... ORDER BY ..., the result order is also
well defined.

(An alternative would be adding RETURNING ... WITH ORDINALITY. Which seems
pretty horrible).

Thoughts/comments? Barring objections I'll write up a docs patch for the
docs list.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] initdb -S and tablespaces

2015-01-14 Thread Abhijit Menon-Sen

At 2015-01-14 11:59:08 +0100, and...@2ndquadrant.com wrote:
>
> > +   if (ControlFile->state != DB_SHUTDOWNED &&
> > +   ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > +   perform_fsync(data_directory);
> > +
> 
> a) Please think of a slightly more descriptive name than perform_fsync

OK. (I just copied the name initdb uses, because at the time I was still
thinking in terms of a later patch moving this to src/common.) What do
you think of fsync_recursively? fsync_pgdata?

I think fsync_recursively(data_directory) reads well.

> b) I think we should check here whether fsync is enabled.

OK, will do.

> c) I'm wondering if we should add fsync to the control file and also
>perform an fsync if the last shutdown was clear, but fsync was
>disabled.

Explain? "Add fsync to the control file" means store the value of the
fsync GUC setting in the control file? And would the fsync you mention
be dependent on the setting, or unconditional?

> > +static void
> > +pre_sync_fname(char *fname, bool isdir)
> > +{
> 
> this essentially already exists in the backend inparts. C.f.
> pg_flush_data.

OK, I missed that. I'll rework the patch to use it.

> Theoretically you should also invoke fsync on directories.

OK.

> > +* We need to name the parent of PGDATA.  get_parent_directory() isn't
> > +* enough here, because it can result in an empty string.
> > +*/
> > +   snprintf(pdir, MAXPGPATH, "%s/..", pg_data);
> > +   canonicalize_path(pdir);
> 
> Hm. Why is this neded? Just syncing . should work?

Not sure, will investigate.

Thanks.

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-14 Thread Michael Paquier

On Wed, Jan 14, 2015 at 6:43 PM, Heikki Linnakangas
 wrote:
> Here is a new version. There are now a fair amount of code changes compared
> to the version on github, like the logging and progress information, and a
> lot of comment changes. I also improved the documentation.

Perhaps this could a direct link to the page of pg_rewind with  ?
-possibly new, system. Once complete, the primary and standby can be
+possibly new, system. The pg_rewind utility can be
+used to speed up this process on large clusters.
+Once complete, the primary and standby can be

Missing a dot a the end of the phrase of this paragraph:
+
+  This option specifies the target data directory that is synchronized
+  with the source. The target server must shut down cleanly before
+  running pg_rewind
+

Compilation of pg_rewind on MSVC is not done. It is still listed as a
contrib/ in Mkvcbuild.pm.

The compilation of pg_xlogdump fails because inclusion of
dbcommands_xlog.h is missing in rmgrdesc.c (patch attached).

src/bin/Makefile has not been updated to compile pg_rewind.
-- 
Michael
diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c
index 180818d..deb1b8f 100644
--- a/contrib/pg_xlogdump/rmgrdesc.c
+++ b/contrib/pg_xlogdump/rmgrdesc.c
@@ -23,6 +23,7 @@
 #include "access/xlog_internal.h"
 #include "catalog/storage_xlog.h"
 #include "commands/dbcommands.h"
+#include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
 #include "rmgrdesc.h"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Amit Kapila

On Wed, Jan 14, 2015 at 9:01 PM, Sawada Masahiko 
wrote:
>
> Hi all,
>
> The postgresql.auto.conf is loaded after loading of postgresql.conf
> whenever configuration file is loaded or reloaded.
> This means that parameter in postgresql.auto.conf is quite high
> priority, so the parameter in postgresql.conf does not work at all
> even if user set it manually.
>
> If user want to change stopped postgresql server then user need to
> merge two configuration file(postgresql.conf and postgresql.auto.conf)
> while maintaining the consistency manually.
>

I think one way to currently do it is with the help of Alter System .. Reset
command.  Basically user can check via pg_settings, if the variable
belongs to postgresql.auto.conf and he knows already a duplicate copy
of same is available in postrgresql.conf and that is the value he want to
use, then RESET command could be used to remove the setting from
postgresql.auto.conf

> From an operational perspective having a written config with duplicate
> entries is not good thing.
> I think we need to merge two configuration file into one (or more than
> one, if it uses like 'include' word)
>
> The one solution is to add merging tool/commnand which merges two
> configuration file while maintaining the consistency.

If you want to think of some solution which can make the usage of Alter
System more convenient, then we should think only in terms of change/
remove the value in postgresql.auto.conf as that is the place where we
have added values programatically.

One thought I have in this line is that currently there doesn't seem to be
a way to know if the setting has an entry both in postgresql.conf and
postgresql.auto.conf, if we can have some way of knowing the same
(pg_settings?), then it could be convenient for user to decide if the value
in postgresql.auto.conf is useful or not and if it's not useful then use
Alter System .. Reset command to remove the same from
postgresql.auto.conf.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan  wrote:
> My immediate observation here is that blocks 2 and 9 have identical
> metadata (from their page opaque area), but partially non-matching
> data items (however, the number of items on each block is consistent
> and correct according to that metadata, as far as it goes). I think
> block 9 is supposed to have a right-link to block 5 (block 5 seems to
> think so, at least -- its left link is 9).

I am mistaken on one detail here - blocks 2 and 9 are actually fully
identical. I still have no idea why, though.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)

2015-01-14 Thread Michael Paquier

Hi all,

pg_regress will fail with test suites using only source files if the
destination folders do not exist in the code tree. This is annoying
because this forces to maintain empty folders sql/ and expected/ with
a .gitignore ignoring everything. The issue has been discussed here
(http://www.postgresql.org/message-id/54935d9d.7010...@dunslane.net)
but nobody actually sent a patch, so here is one, and a thread for
this discussion.
Regards,
-- 
Michael
*** a/src/test/regress/pg_regress.c
--- b/src/test/regress/pg_regress.c
***
*** 496,501  convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
--- 496,502 
  {
  	char		testtablespace[MAXPGPATH];
  	char		indir[MAXPGPATH];
+ 	char		result_dir[MAXPGPATH];
  	struct stat st;
  	int			ret;
  	char	  **name;
***
*** 520,525  convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
--- 521,534 
  		/* Error logged in pgfnames */
  		exit(2);
  
+ 	/*
+ 	 * Enforce creation of destination directory if it does not exist yet.
+ 	 * This is useful for tests using only source files.
+ 	 */
+ 	snprintf(result_dir, MAXPGPATH, "%s/%s", dest_dir, dest_subdir);
+ 	if (!directory_exists(result_dir))
+ 		make_directory(result_dir);
+ 
  	snprintf(testtablespace, MAXPGPATH, "%s/testtablespace", outputdir);
  
  #ifdef WIN32
***
*** 565,571  convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
  		/* build the full actual paths to open */
  		snprintf(prefix, strlen(*name) - 6, "%s", *name);
  		snprintf(srcfile, MAXPGPATH, "%s/%s", indir, *name);
! 		snprintf(destfile, MAXPGPATH, "%s/%s/%s.%s", dest_dir, dest_subdir,
   prefix, suffix);
  
  		infile = fopen(srcfile, "r");
--- 574,580 
  		/* build the full actual paths to open */
  		snprintf(prefix, strlen(*name) - 6, "%s", *name);
  		snprintf(srcfile, MAXPGPATH, "%s/%s", indir, *name);
! 		snprintf(destfile, MAXPGPATH, "%s/%s.%s", result_dir,
   prefix, suffix);
  
  		infile = fopen(srcfile, "r");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH CHECK and Column-Level Privileges

2015-01-14 Thread Stephen Frost

Noah,

* Noah Misch (n...@leadboat.com) wrote:
> On Mon, Jan 12, 2015 at 05:16:40PM -0500, Stephen Frost wrote:
> > Alright, here's an updated patch which doesn't return any detail if no
> > values are visible or if only a partial key is visible.
> 
> I browsed this patch.  There's been no mention of foreign key constraints, but
> ri_ReportViolation() deserves similar hardening.  If a user has only DELETE
> privilege on a PK table, FK violation messages should not leak the PK values.
> Modifications to the foreign side are less concerning, since the user will
> often know the attempted value; still, I would lock down both sides.

Ah, yes, good point.

> Please add a comment explaining the safety of each row-emitting message you
> haven't changed.  For example, the one in refresh_by_match_merge() is safe
> because getting there requires MV ownership.

Sure.

> Instead of duplicating an entire ereport() to change whether you include an
> errdetail, use "condition ? errdetail(...) : 0".

Yeah, that's a bit nicer, will do.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Improving RLS qual pushdown

2015-01-14 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Robert Haas wrote:
> > On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed  
> > wrote:
> > > On 14 January 2015 at 13:29, Robert Haas  wrote:
> > >> One thing they could still leak is the number of times they got
> > >> called, and thus possibly the number of unseen rows.  Now if the
> > >> expressions get constant-folded away that won't be an issue, but a
> > >> clever user can probably avoid that.
> > >
> > > Right now, EXPLAIN ANALYSE can be used to tell you the number of
> > > unseen rows. Is that something that people are concerned about, and
> > > are there any plans to change it?
> > 
> > Interesting question.  I don't know.
> 
> Wasn't this part of the "covert channel" discussion that took place way
> before RLS was committed?  As I recall, it was argued that such covert
> channels are acceptable as long as their bandwidth is low.

Yes, it was part of the discussion and no, there's no plans to try and
hide row counts in explain analyze, nor to deal with things like unique
constraint or foreign key reference violations.

There are other areas which need improvement which will help address
covert channel activity (better auditing, better control over what
actions are allowed to diffferent users when it comes to creating
objects, modifying permissions and policies, throttling, etc).

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Improving RLS qual pushdown

2015-01-14 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed  
> wrote:
> > On 14 January 2015 at 13:29, Robert Haas  wrote:
> >> One thing they could still leak is the number of times they got
> >> called, and thus possibly the number of unseen rows.  Now if the
> >> expressions get constant-folded away that won't be an issue, but a
> >> clever user can probably avoid that.
> >
> > Right now, EXPLAIN ANALYSE can be used to tell you the number of
> > unseen rows. Is that something that people are concerned about, and
> > are there any plans to change it?
> 
> Interesting question.  I don't know.

Wasn't this part of the "covert channel" discussion that took place way
before RLS was committed?  As I recall, it was argued that such covert
channels are acceptable as long as their bandwidth is low.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan  wrote:
> My immediate observation here is that blocks 2 and 9 have identical
> metadata (from their page opaque area), but partially non-matching
> data items (however, the number of items on each block is consistent
> and correct according to that metadata, as far as it goes). I think
> block 9 is supposed to have a right-link to block 5 (block 5 seems to
> think so, at least -- its left link is 9).

For the convenience of others, here is a version with a normalized
data column added (interpreting the datums as little-endian Oids).

-- 
Peter Geoghegan
page,itemoffset,ctid,itemlen,nulls,vars,data,normalized data
1,1,"(0,9)",16,f,f,16 0b 00 00 00 00 00 00,2838
1,2,"(2,50)",16,f,f,70 00 00 00 00 00 00 00,112
1,3,"(2,52)",16,f,f,71 00 00 00 00 00 00 00,113
1,4,"(2,24)",16,f,f,ae 00 00 00 00 00 00 00,174
1,5,"(2,25)",16,f,f,af 00 00 00 00 00 00 00,175
1,6,"(2,51)",16,f,f,24 02 00 00 00 00 00 00,548
1,7,"(2,53)",16,f,f,25 02 00 00 00 00 00 00,549
1,8,"(4,66)",16,f,f,3a 03 00 00 00 00 00 00,826
1,9,"(2,55)",16,f,f,3b 03 00 00 00 00 00 00,827
1,10,"(2,56)",16,f,f,3c 03 00 00 00 00 00 00,828
1,11,"(4,15)",16,f,f,70 04 00 00 00 00 00 00,1136
1,12,"(1,31)",16,f,f,71 04 00 00 00 00 00 00,1137
1,13,"(4,14)",16,f,f,bd 04 00 00 00 00 00 00,1213
1,14,"(4,17)",16,f,f,be 04 00 00 00 00 00 00,1214
1,15,"(1,34)",16,f,f,d0 04 00 00 00 00 00 00,1232
1,16,"(1,35)",16,f,f,d1 04 00 00 00 00 00 00,1233
1,17,"(0,2)",16,f,f,df 04 00 00 00 00 00 00,1247
1,18,"(3,10)",16,f,f,e1 04 00 00 00 00 00 00,1249
1,19,"(3,11)",16,f,f,e7 04 00 00 00 00 00 00,1255
1,20,"(7,6)",16,f,f,eb 04 00 00 00 00 00 00,1259
1,21,"(2,23)",16,f,f,ec 04 00 00 00 00 00 00,1260
1,22,"(4,16)",16,f,f,ed 04 00 00 00 00 00 00,1261
1,23,"(4,12)",16,f,f,ee 04 00 00 00 00 00 00,1262
1,24,"(4,64)",16,f,f,89 05 00 00 00 00 00 00,1417
1,25,"(3,31)",16,f,f,8a 05 00 00 00 00 00 00,1418
1,26,"(1,48)",16,f,f,8b 08 00 00 00 00 00 00,2187
1,27,"(4,63)",16,f,f,18 09 00 00 00 00 00 00,2328
1,28,"(0,64)",16,f,f,20 09 00 00 00 00 00 00,2336
1,29,"(0,65)",16,f,f,21 09 00 00 00 00 00 00,2337
1,30,"(4,18)",16,f,f,5c 09 00 00 00 00 00 00,2396
1,31,"(1,11)",16,f,f,5d 09 00 00 00 00 00 00,2397
1,32,"(4,40)",16,f,f,28 0a 00 00 00 00 00 00,2600
1,33,"(3,47)",16,f,f,29 0a 00 00 00 00 00 00,2601
1,34,"(3,48)",16,f,f,2a 0a 00 00 00 00 00 00,2602
1,35,"(3,49)",16,f,f,2b 0a 00 00 00 00 00 00,2603
1,36,"(3,41)",16,f,f,2c 0a 00 00 00 00 00 00,2604
1,37,"(4,45)",16,f,f,2d 0a 00 00 00 00 00 00,2605
1,38,"(3,42)",16,f,f,2e 0a 00 00 00 00 00 00,2606
1,39,"(4,48)",16,f,f,2f 0a 00 00 00 00 00 00,2607
1,40,"(4,49)",16,f,f,30 0a 00 00 00 00 00 00,2608
1,41,"(4,44)",16,f,f,31 0a 00 00 00 00 00 00,2609
1,42,"(3,43)",16,f,f,32 0a 00 00 00 00 00 00,2610
1,43,"(7,26)",16,f,f,33 0a 00 00 00 00 00 00,2611
1,44,"(3,50)",16,f,f,34 0a 00 00 00 00 00 00,2612
1,45,"(4,71)",16,f,f,35 0a 00 00 00 00 00 00,2613
1,46,"(4,47)",16,f,f,37 0a 00 00 00 00 00 00,2615
1,47,"(3,46)",16,f,f,38 0a 00 00 00 00 00 00,2616
1,48,"(3,44)",16,f,f,39 0a 00 00 00 00 00 00,2617
1,49,"(4,41)",16,f,f,3a 0a 00 00 00 00 00 00,2618
1,50,"(0,1)",16,f,f,3b 0a 00 00 00 00 00 00,2619
1,51,"(4,42)",16,f,f,3c 0a 00 00 00 00 00 00,2620
1,52,"(0,61)",16,f,f,5a 0a 00 00 00 00 00 00,2650
1,53,"(0,54)",16,f,f,5b 0a 00 00 00 00 00 00,2651
1,54,"(0,55)",16,f,f,5c 0a 00 00 00 00 00 00,2652
1,55,"(0,56)",16,f,f,5d 0a 00 00 00 00 00 00,2653
1,56,"(0,57)",16,f,f,5e 0a 00 00 00 00 00 00,2654
1,57,"(0,59)",16,f,f,5f 0a 00 00 00 00 00 00,2655
1,58,"(0,50)",16,f,f,60 0a 00 00 00 00 00 00,2656
1,59,"(7,14)",16,f,f,61 0a 00 00 00 00 00 00,2657
1,60,"(0,42)",16,f,f,62 0a 00 00 00 00 00 00,2658
1,61,"(0,43)",16,f,f,63 0a 00 00 00 00 00 00,2659
1,62,"(0,68)",16,f,f,64 0a 00 00 00 00 00 00,2660
1,63,"(0,69)",16,f,f,65 0a 00 00 00 00 00 00,2661
1,64,"(7,3)",16,f,f,66 0a 00 00 00 00 00 00,2662
1,65,"(7,4)",16,f,f,67 0a 00 00 00 00 00 00,2663
1,66,"(0,53)",16,f,f,68 0a 00 00 00 00 00 00,2664
1,67,"(7,22)",16,f,f,69 0a 00 00 00 00 00 00,2665
1,68,"(7,23)",16,f,f,6a 0a 00 00 00 00 00 00,2666
1,69,"(7,24)",16,f,f,6b 0a 00 00 00 00 00 00,2667
1,70,"(1,72)",16,f,f,6c 0a 00 00 00 00 00 00,2668
1,71,"(1,73)",16,f,f,6d 0a 00 00 00 00 00 00,2669
1,72,"(1,74)",16,f,f,6e 0a 00 00 00 00 00 00,2670
1,73,"(1,6)",16,f,f,6f 0a 00 00 00 00 00 00,2671
1,74,"(1,7)",16,f,f,70 0a 00 00 00 00 00 00,2672
1,75,"(1,75)",16,f,f,71 0a 00 00 00 00 00 00,2673
1,76,"(1,76)",16,f,f,72 0a 00 00 00 00 00 00,2674
1,77,"(1,67)",16,f,f,73 0a 00 00 00 00 00 00,2675
1,78,"(0,40)",16,f,f,74 0a 00 00 00 00 00 00,2676
1,79,"(0,41)",16,f,f,75 0a 00 00 00 00 00 00,2677
1,80,"(1,50)",16,f,f,76 0a 00 00 00 00 00 00,2678
1,81,"(1,51)",16,f,f,77 0a 00 00 00 00 00 00,2679
1,82,"(1,49)",16,f,f,78 0a 00 00 00 00 00 00,2680
1,83,"(1,58)",16,f,f,79 0a 00 00 00 00 00 00,2681
1,84,"(1,59)",16,f,f,7a 0a 00 00 00 00 00 00,2682
1,85,"(1,89)",16,f,f,7b 0a 00 00 00 00 00 00,2683
1,86,"(1,70)",16,f,f,7c 0a 00 00 00 00 00 00,2684
1,87,"(1,71)",16,f,f,7d 0a 00 00 00 00 00 00,2685
1,88

Re: [HACKERS] Safe memory allocation functions

2015-01-14 Thread Alvaro Herrera

Robert Haas wrote:
> On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane  wrote:
> > However, there is a larger practical problem with this whole concept,
> > which is that experience should teach us to be very wary of the assumption
> > that asking for memory the system can't give us will just lead to nice
> > neat malloc-returns-NULL behavior.  Any small perusal of the mailing list
> > archives will remind you that very often the end result will be SIGSEGV,
> > OOM kills, unrecoverable trap-on-write when the kernel realizes it can't
> > honor a copy-on-write promise, yadda yadda.  Agreed that it's arguable
> > that these only occur in misconfigured systems ... but misconfiguration
> > appears to be the default in a depressingly large fraction of systems.
> > (This is another reason for "_safe" not being the mot juste :-()
> 
> I don't really buy this.  It's pretty incredible to think that after a
> malloc() failure there is absolutely no hope of carrying on sanely.
> If that were true, we wouldn't be able to ereport() out-of-memory
> errors at any severity less than FATAL, but of course it doesn't work
> that way.  Moreover, AllocSetAlloc() contains malloc() and, if that
> fails, calls malloc() again with a smaller value, without even
> throwing an error.

I understood Tom's point differently: instead of malloc() failing,
malloc() will return a supposedly usable pointer, but later usage of it
will lead to a crash of some sort.  We know this does happen in reality,
because people do report it; but we also know how to fix it.  And for
systems that have been correctly set up, the new behavior (using some
plan B for when malloc actually fails instead of spuriously succeeding
only to cause a later crash) will be much more convenient.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby

2015-01-14 Thread Michael Paquier

On Thu, Jan 15, 2015 at 7:13 AM, Robert Haas  wrote:
> Instead of doing this:
>
> if (len < sizeof(buf))
> buf[len] = '\0';
>
> ...I would suggest making the size of the buffer one greater than the
> size of the read(), and then always nul-terminating the buffer.  It
> seems to me that would make the code easier to reason about.
How about the attached then? This way we still detect the same way any
invalid values:
-   if ((len = read(fd, buf, sizeof(buf))) < 0)
+   if ((len = read(fd, buf, sizeof(buf) - 1)) < 0)
Regards,
-- 
Michael
diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c
index d6b1692..2f9f2b4 100644
--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -418,7 +418,7 @@ CheckForExternalTrigger(void)
 		return;
 	}
 
-	if ((len = read(fd, buf, sizeof(buf))) < 0)
+	if ((len = read(fd, buf, sizeof(buf) - 1)) < 0)
 	{
 		fprintf(stderr, "WARNING: could not read \"%s\": %s\n",
 triggerPath, strerror(errno));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)

2015-01-14 Thread Amit Langote

On 06-01-2015 PM 03:40, Amit Langote wrote:
> 
> I agree that while we are discussing these points, we could also be
> discussing how we solve problems of existing partitioning implementation
> using whatever the above things end up being. Proposed approaches to
> solve those problems might be useful to drive the first step as well or
> perhaps that's how it should be done anyway.
> 

I realize the discussion has not quite brought us to *conclusions* so
far though surely there has been valuable input from people. Anyway,
starting a new thread with the summary of what has been (please note
that the order of listing the points does not necessarily connote the
priority):

* It has been repeatedly pointed out that we may want to decouple
partitioning from inheritance because implementing partitioning as an
extension of inheritance mechanism means that we have to keep all the
existing semantics which might limit what we want to do with the special
case of it which is partitioning; in other words, we would find
ourselves in difficult position where we have to inject a special case
code into a very generalized mechanism that is inheritance.
Specifically, do we regard a partitions as pg_inherits children of its
partitioning parent?

* Syntax: do we want to make it similar to one of the many other
databases out there? Or we could invent our own? I like the syntax that
Robert suggested that covers the cases of RANGE and LIST partitioning
without actually having to use those keywords explicitly; something like
the following:

CREATE TABLE parent PARTITION ON (column [ USING opclass ] [, ... ]);

CREATE TABLE child PARTITION OF parent_name
   FOR VALUES { (value, ...) [ TO (value, ...) ] }

So instead of making a hard distinction between range and list
partitioning, you can say:

CREATE TABLE child_name PARTITION OF parent_name FOR VALUES (3, 5, 7);

wherein, child is effectively a LIST partition

CREATE TABLE child PARTITION OF parent_name FOR VALUES (8) TO (12);

wherein, child is effectively a RANGE partition on one column

CREATE TABLE child PARTITION OF parent_name FOR VALUES(20, 120) TO (30,
130);

wherein, child is effectively a RANGE partition on two columns

I wonder if we could add a clause like DISTRIBUTED BY to complement
PARTITION ON that represents a hash distributed/partitioned table (that
could be a syntax to support sharded tables maybe; we would definitely
want to move ahead in that direction I guess)

* Catalog: We would like to have a catalog structure suitable to
implement capabilities like multi-column partitioning, sub-partitioning
(with arbitrary number of levels in the hierarchy). I had suggested
that we create two new catalogs viz. pg_partitioned_rel,
pg_partition_def to store metadata about a partition key of a
partitioned relation and partition bound info of a partition,
respectively. Also, see the point about on-disk representation of
partition bounds

* It is desirable to treat partitions as pg_class relations with perhaps
a new relkind(s). We may want to choose an implementation where only the
lowest level relations in a partitioning hierarchy have storage; those
at the upper layers are mere placeholder relations though of course with
associated constraints determined by partitioning criteria (with
appropriate metadata entered into the additional catalogs). I am not
quite sure if each kind of the relations involved in the partitioning
scheme have separate namespaces and, if they are, how we implement that

* In the initial implementation, we could just live with partitioning on
a set of columns (and not arbitrary expressions of them)

* We perhaps do not need multi-column LIST partitions as they are not
very widely used and may complicate the implementation

* There are a number of suggestions about how we represent partition
bounds (on-disk) - pg_node_tree, RECORD (a composite type or the rowtype
associated with the relation itself), etc. Important point to consider
here may be that partition key may contain more than one column

* How we represent partition definition in memory (for a given
partitioned relation) - important point to remember is that such a
representation should be efficient to iterate through or
binary-searchable. Also see the points about tuple-routing and
partition-pruning

* Overflow/catchall partition: it seems we do not want/need them. It
might seem desirable for example in cases where a big transaction enters
a large number of tuples all but one of which find a defined partition;
we may not want to error out in such case but instead enter that erring
tuple into the overflow partition instead. If we choose to implement
that, we would like to also implement the capability to move the tuples
into the appropriate partition once it's defined. Related is the notion
of automatically creating partitions if one is not already defined for a
just entered tuple; but there may be locking troubles if many concurrent
sessions try to do that

* Tuple-routing: based on the intern

Re: [HACKERS] Parallel Seq Scan

2015-01-14 Thread Jim Nasby


On 1/13/15 9:42 PM, Amit Kapila wrote:

As an example one of the the strategy
could be if the table size is X MB and there are 8 workers, then
divide the work as X/8 MB for each worker (which I have currently
used in patch) and another could be each worker does scan
1 block at a time and then check some global structure to see which
next block it needs to scan, according to me this could lead to random
scan.  I have read that some other databases also divide the work
based on partitions or segments (size of segment is not very clear).


Long-term I think we'll want a mix between the two approaches. Simply doing 
something like blkno % num_workers is going to cause imbalances, but trying to 
do this on a per-block basis seems like too much overhead.

Also long-term, I think we also need to look at a more specialized version of 
parallelism at the IO layer. For example, during an index scan you'd really 
like to get IO requests for heap blocks started in the background while the 
backend is focused on the mechanics of the index scan itself. While this could 
be done with the stuff Robert has written I have to wonder if it'd be a lot 
more efficient to use fadvise or AIO. Or perhaps it would just be better to 
deal with an entire index page (remembering TIDs) and then hit the heap.

But I agree with Robert; there's a lot yet to be done just to get *any* kind of 
parallel execution working before we start thinking about how to optimize it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo fix in alter_table.sgml

2015-01-14 Thread Michael Paquier

Robert Haas wrote:
> But that's not a "typo" as stated in $SUBJECT but rather a "markup fix".
Definitely. Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure  wrote:
> yeah.  via:
> cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index',
> s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv
> header;

My immediate observation here is that blocks 2 and 9 have identical
metadata (from their page opaque area), but partially non-matching
data items (however, the number of items on each block is consistent
and correct according to that metadata, as far as it goes). I think
block 9 is supposed to have a right-link to block 5 (block 5 seems to
think so, at least -- its left link is 9).

So now the question is: how did that inconsistency arise? It didn't
necessarily arise at the time of the (presumed) split of block 2 to
create 9. It could be that the opaque area was changed by something
else, some time later. I'll investigate more.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-14 Thread Josh Berkus

On 01/15/2015 02:24 AM, Robert Haas wrote:
> I think your idea of adding read-only GUCs with the same names as all
> of the recovey.conf parameters is a clear win.  Even if we do nothing
> more than that, it makes the values visible from the SQL level, and
> that's good.  But I think we should go further and make them really be
> GUCs.  Otherwise, if you want to be able to change one of those
> parameters other than at server startup time, you've got to invent a
> separate system for reloading them on SIGHUP.  If you make them part
> of the GUC mechanism, you can use that.  I think it's quite safe to
> say that the whole reason we *have* a GUC mechanism is so that all of
> our configuration can be done through one grand, unified mechanism
> rather than being fragmented across many files.

+1

I do find it ironic that the creator of the Grand Unified Configuration
Settings is arguing against unifying the files.

> Some renaming might be in order.  Heikki previously suggested merging
> all of the recovery_target_whatever settings down into a single
> parameter recovery_target='kindofrecoverytarget furtherdetailsgohere',
> like recovery_target='xid 1234' or recovery_target='name bob'.  Maybe
> that would be more clear (or not). 

Not keen on non-atomic settings, personally.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan  wrote:
> This is great, but it's not exactly clear which bt_page_items() page
> is which - some are skipped, but I can't be sure which. Would you mind
> rewriting that query to indicate which block is under consideration by
> bt_page_items()?

yeah.  via:
cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index',
s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv
header;

merlin
page,itemoffset,ctid,itemlen,nulls,vars,data
1,1,"(0,9)",16,f,f,16 0b 00 00 00 00 00 00
1,2,"(2,50)",16,f,f,70 00 00 00 00 00 00 00
1,3,"(2,52)",16,f,f,71 00 00 00 00 00 00 00
1,4,"(2,24)",16,f,f,ae 00 00 00 00 00 00 00
1,5,"(2,25)",16,f,f,af 00 00 00 00 00 00 00
1,6,"(2,51)",16,f,f,24 02 00 00 00 00 00 00
1,7,"(2,53)",16,f,f,25 02 00 00 00 00 00 00
1,8,"(4,66)",16,f,f,3a 03 00 00 00 00 00 00
1,9,"(2,55)",16,f,f,3b 03 00 00 00 00 00 00
1,10,"(2,56)",16,f,f,3c 03 00 00 00 00 00 00
1,11,"(4,15)",16,f,f,70 04 00 00 00 00 00 00
1,12,"(1,31)",16,f,f,71 04 00 00 00 00 00 00
1,13,"(4,14)",16,f,f,bd 04 00 00 00 00 00 00
1,14,"(4,17)",16,f,f,be 04 00 00 00 00 00 00
1,15,"(1,34)",16,f,f,d0 04 00 00 00 00 00 00
1,16,"(1,35)",16,f,f,d1 04 00 00 00 00 00 00
1,17,"(0,2)",16,f,f,df 04 00 00 00 00 00 00
1,18,"(3,10)",16,f,f,e1 04 00 00 00 00 00 00
1,19,"(3,11)",16,f,f,e7 04 00 00 00 00 00 00
1,20,"(7,6)",16,f,f,eb 04 00 00 00 00 00 00
1,21,"(2,23)",16,f,f,ec 04 00 00 00 00 00 00
1,22,"(4,16)",16,f,f,ed 04 00 00 00 00 00 00
1,23,"(4,12)",16,f,f,ee 04 00 00 00 00 00 00
1,24,"(4,64)",16,f,f,89 05 00 00 00 00 00 00
1,25,"(3,31)",16,f,f,8a 05 00 00 00 00 00 00
1,26,"(1,48)",16,f,f,8b 08 00 00 00 00 00 00
1,27,"(4,63)",16,f,f,18 09 00 00 00 00 00 00
1,28,"(0,64)",16,f,f,20 09 00 00 00 00 00 00
1,29,"(0,65)",16,f,f,21 09 00 00 00 00 00 00
1,30,"(4,18)",16,f,f,5c 09 00 00 00 00 00 00
1,31,"(1,11)",16,f,f,5d 09 00 00 00 00 00 00
1,32,"(4,40)",16,f,f,28 0a 00 00 00 00 00 00
1,33,"(3,47)",16,f,f,29 0a 00 00 00 00 00 00
1,34,"(3,48)",16,f,f,2a 0a 00 00 00 00 00 00
1,35,"(3,49)",16,f,f,2b 0a 00 00 00 00 00 00
1,36,"(3,41)",16,f,f,2c 0a 00 00 00 00 00 00
1,37,"(4,45)",16,f,f,2d 0a 00 00 00 00 00 00
1,38,"(3,42)",16,f,f,2e 0a 00 00 00 00 00 00
1,39,"(4,48)",16,f,f,2f 0a 00 00 00 00 00 00
1,40,"(4,49)",16,f,f,30 0a 00 00 00 00 00 00
1,41,"(4,44)",16,f,f,31 0a 00 00 00 00 00 00
1,42,"(3,43)",16,f,f,32 0a 00 00 00 00 00 00
1,43,"(7,26)",16,f,f,33 0a 00 00 00 00 00 00
1,44,"(3,50)",16,f,f,34 0a 00 00 00 00 00 00
1,45,"(4,71)",16,f,f,35 0a 00 00 00 00 00 00
1,46,"(4,47)",16,f,f,37 0a 00 00 00 00 00 00
1,47,"(3,46)",16,f,f,38 0a 00 00 00 00 00 00
1,48,"(3,44)",16,f,f,39 0a 00 00 00 00 00 00
1,49,"(4,41)",16,f,f,3a 0a 00 00 00 00 00 00
1,50,"(0,1)",16,f,f,3b 0a 00 00 00 00 00 00
1,51,"(4,42)",16,f,f,3c 0a 00 00 00 00 00 00
1,52,"(0,61)",16,f,f,5a 0a 00 00 00 00 00 00
1,53,"(0,54)",16,f,f,5b 0a 00 00 00 00 00 00
1,54,"(0,55)",16,f,f,5c 0a 00 00 00 00 00 00
1,55,"(0,56)",16,f,f,5d 0a 00 00 00 00 00 00
1,56,"(0,57)",16,f,f,5e 0a 00 00 00 00 00 00
1,57,"(0,59)",16,f,f,5f 0a 00 00 00 00 00 00
1,58,"(0,50)",16,f,f,60 0a 00 00 00 00 00 00
1,59,"(7,14)",16,f,f,61 0a 00 00 00 00 00 00
1,60,"(0,42)",16,f,f,62 0a 00 00 00 00 00 00
1,61,"(0,43)",16,f,f,63 0a 00 00 00 00 00 00
1,62,"(0,68)",16,f,f,64 0a 00 00 00 00 00 00
1,63,"(0,69)",16,f,f,65 0a 00 00 00 00 00 00
1,64,"(7,3)",16,f,f,66 0a 00 00 00 00 00 00
1,65,"(7,4)",16,f,f,67 0a 00 00 00 00 00 00
1,66,"(0,53)",16,f,f,68 0a 00 00 00 00 00 00
1,67,"(7,22)",16,f,f,69 0a 00 00 00 00 00 00
1,68,"(7,23)",16,f,f,6a 0a 00 00 00 00 00 00
1,69,"(7,24)",16,f,f,6b 0a 00 00 00 00 00 00
1,70,"(1,72)",16,f,f,6c 0a 00 00 00 00 00 00
1,71,"(1,73)",16,f,f,6d 0a 00 00 00 00 00 00
1,72,"(1,74)",16,f,f,6e 0a 00 00 00 00 00 00
1,73,"(1,6)",16,f,f,6f 0a 00 00 00 00 00 00
1,74,"(1,7)",16,f,f,70 0a 00 00 00 00 00 00
1,75,"(1,75)",16,f,f,71 0a 00 00 00 00 00 00
1,76,"(1,76)",16,f,f,72 0a 00 00 00 00 00 00
1,77,"(1,67)",16,f,f,73 0a 00 00 00 00 00 00
1,78,"(0,40)",16,f,f,74 0a 00 00 00 00 00 00
1,79,"(0,41)",16,f,f,75 0a 00 00 00 00 00 00
1,80,"(1,50)",16,f,f,76 0a 00 00 00 00 00 00
1,81,"(1,51)",16,f,f,77 0a 00 00 00 00 00 00
1,82,"(1,49)",16,f,f,78 0a 00 00 00 00 00 00
1,83,"(1,58)",16,f,f,79 0a 00 00 00 00 00 00
1,84,"(1,59)",16,f,f,7a 0a 00 00 00 00 00 00
1,85,"(1,89)",16,f,f,7b 0a 00 00 00 00 00 00
1,86,"(1,70)",16,f,f,7c 0a 00 00 00 00 00 00
1,87,"(1,71)",16,f,f,7d 0a 00 00 00 00 00 00
1,88,"(1,56)",16,f,f,7e 0a 00 00 00 00 00 00
1,89,"(1,57)",16,f,f,7f 0a 00 00 00 00 00 00
1,90,"(1,52)",16,f,f,80 0a 00 00 00 00 00 00
1,91,"(1,53)",16,f,f,81 0a 00 00 00 00 00 00
1,92,"(1,43)",16,f,f,82 0a 00 00 00 00 00 00
1,93,"(7,1)",16,f,f,83 0a 00 00 00 00 00 00
1,94,"(1,61)",16,f,f,84 0a 00 00 00 00 00 00
1,95,"(1,62)",16,f,f,85 0a 00 00 00 00 00 00
1,96,"(0,29)",16,f,f,86 0a 00 00 00 00 00 00
1,97,"(0,30)",16,f,f,87 0a 00 00 00 00 00 00
1,98,"(6,35)",16,f,f,88 0a 00 00 00 00 00 00
1,99,"(1,36)",16,f,f,89 0a 00 00 00 00 00 00
1,100,"(1,37)",16,f,f,8a 0a 00 00 00 00 00 00
1,101,"(1,63)",

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

This is great, but it's not exactly clear which bt_page_items() page
is which - some are skipped, but I can't be sure which. Would you mind
rewriting that query to indicate which block is under consideration by
bt_page_items()?

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

2015-01-14 Thread Tom Lane

Andrew Dunstan  writes:
> On 01/14/2015 07:29 PM, Tom Lane wrote:
>> If you don't find that problematic, how about this case?
>> 
>> regression=# create table src2 (f1 int, primary key(oid)) with oids;
>> CREATE TABLE
>> regression=# create table dst2 (like src2 including indexes);
>> ERROR:  column "oid" named in key does not exist

> I agree it's odd, and probably wrong, although it's been like that for a 
> very long time, hasn't it?

Sure, LIKE has always behaved this way.  It still seems wrong though.
As a reference point, creating a table that inherits from src1 or src2
will result in it having oids (even if you say WITHOUT OIDS).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

2015-01-14 Thread Andrew Dunstan



On 01/14/2015 07:29 PM, Tom Lane wrote:

dst1 doesn't get an OID column:

regression=# create table src1 (f1 int) with oids;
CREATE TABLE
regression=# create table dst1 (like src1);
CREATE TABLE
regression=# \d+ src1
  Table "public.src1"
  Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
  f1 | integer |   | plain   |  |
Has OIDs: yes

regression=# \d+ dst1
  Table "public.dst1"
  Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
  f1 | integer |   | plain   |  |


If you don't find that problematic, how about this case?

regression=# create table src2 (f1 int, primary key(oid)) with oids;
CREATE TABLE
regression=# create table dst2 (like src2 including indexes);
ERROR:  column "oid" named in key does not exist





I agree it's odd, and probably wrong, although it's been like that for a 
very long time, hasn't it?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure  wrote:
> On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan  wrote:
>> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure  wrote:
>>> (gdb) print  BufferGetBlockNumber(buf)
>>> $15 = 9
>>>
>>> ..and it stays 9, continuing several times having set breakpoint.
>>
>>
>> And the index involved? I'm pretty sure that this in an internal page, no?
>
> The index is the oid index on pg_class.  Some more info:
>
> *) temp table churn is fairly high.  Several dozen get spawned and
> destroted at the start of a replication run, all at once, due to some
> dodgy coding via dblink.  During the replication run, the temp table
> churn rate drops.
>
> *)  running btreecheck, I see:
> cds2=# select bt_index_verify('pg_class_oid_index');
> NOTICE:  page 7 of index "pg_class_oid_index" is deleted
> NOTICE:  page 10 of index "pg_class_oid_index" is deleted
> NOTICE:  page 12 of index "pg_class_oid_index" is deleted
>  bt_index_verify
> ─
>
>
> cds2=# select bt_leftright_verify('pg_class_oid_index');
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 2, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> [repeat infinity until cancel]
>
> which looks like the index is corrupted?  ISTM _bt_moveright is
> hanging because it's trying to move from block 9 to block 9 and so
> loops forever.

per Peter the following might be useful:

cds2=# select * from bt_metap('pg_class_oid_index');
 magic  │ version │ root │ level │ fastroot │ fastlevel
┼─┼──┼───┼──┼───
 340322 │   2 │3 │ 1 │3 │ 1

cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from
generate_series(1,12) s;
 blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │
free_size │ btpo_prev │ btpo_next │ btpo  │ btpo_flags
───┼──┼┼┼───┼───┼───┼───┼───┼───┼
 1 │ l│119 │  0 │16 │  8192 │
5768 │ 0 │ 4 │ 0 │  1
 2 │ l│ 25 │  0 │16 │  8192 │
7648 │ 4 │ 9 │ 0 │  1
 3 │ r│  8 │  0 │15 │  8192 │
7996 │ 0 │ 0 │ 1 │  2
 4 │ l│178 │  0 │16 │  8192 │
4588 │ 1 │ 2 │ 0 │  1
 5 │ l│  7 │  0 │16 │  8192 │
8008 │ 9 │11 │ 0 │  1
 6 │ l│  5 │  0 │16 │  8192 │
8048 │11 │ 8 │ 0 │  1
 7 │ d│  0 │  0 │ 0 │  8192 │
   0 │-1 │-1 │ 12366 │  0
 8 │ l│187 │  0 │16 │  8192 │
4408 │ 6 │ 0 │ 0 │  1
 9 │ l│ 25 │  0 │16 │  8192 │
7648 │ 4 │ 9 │ 0 │  1
10 │ d│  0 │  0 │ 0 │  8192 │
   0 │-1 │-1 │ 12366 │  0
11 │ l│  6 │  0 │16 │  8192 │
8028 │ 5 │ 6 │ 0 │  1
12 │ d│  0 │  0 │ 0 │  8192 │
   0 │-1 │-1 │ 10731 │  0


merlin
cds2=# select (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s;
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
NOTICE:  page is deleted
 itemoffset │  ctid  │ itemlen │ nulls │ vars │  data   
┼┼─┼───┼──┼─
  1 │ (0,9)  │  16 │ f │ f│ 16 0b 00 00 00 00 00 00
  2 │ (2,50) │  16 │ f │ f│ 70 00 00 00 00 00 00 00
  3 │ (2,52) │  16 │ f │ f│ 71 00 00 00 00 00 00 00
  4 │ (2,24) │  16 │ f │ f│ ae 00 00 00 00 00 00 00
  5 │ (2,25) │  16 │ f │ f│ af 00 00 00 00 00 00 00
  6 │ (2,51) │  16 │ f │ f│ 24

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure  wrote:
> The index is the oid index on pg_class.  Some more info:
>
> *) temp table churn is fairly high.  Several dozen get spawned and
> destroted at the start of a replication run, all at once, due to some
> dodgy coding via dblink.  During the replication run, the temp table
> churn rate drops.
>
> *)  running btreecheck, I see:

I knew I wrote that tool for a reason.   :-)

It would be great if I could get a raw dump of the index using
contrib/pageinspect at this juncture too, as already described.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

2015-01-14 Thread Tom Lane

Andres Freund  writes:
> Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section
> sufficient and acceptable. It's after all the HPPA section that doesn't
> really play by the rules.

Works for me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?

2015-01-14 Thread Tom Lane

dst1 doesn't get an OID column:

regression=# create table src1 (f1 int) with oids;
CREATE TABLE
regression=# create table dst1 (like src1);
CREATE TABLE
regression=# \d+ src1
 Table "public.src1"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
+-+---+-+--+-
 f1 | integer |   | plain   |  | 
Has OIDs: yes

regression=# \d+ dst1
 Table "public.dst1"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
+-+---+-+--+-
 f1 | integer |   | plain   |  | 


If you don't find that problematic, how about this case?

regression=# create table src2 (f1 int, primary key(oid)) with oids;
CREATE TABLE
regression=# create table dst2 (like src2 including indexes);
ERROR:  column "oid" named in key does not exist

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan  wrote:
> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure  wrote:
>> (gdb) print  BufferGetBlockNumber(buf)
>> $15 = 9
>>
>> ..and it stays 9, continuing several times having set breakpoint.
>
>
> And the index involved? I'm pretty sure that this in an internal page, no?

The index is the oid index on pg_class.  Some more info:

*) temp table churn is fairly high.  Several dozen get spawned and
destroted at the start of a replication run, all at once, due to some
dodgy coding via dblink.  During the replication run, the temp table
churn rate drops.

*)  running btreecheck, I see:
cds2=# select bt_index_verify('pg_class_oid_index');
NOTICE:  page 7 of index "pg_class_oid_index" is deleted
NOTICE:  page 10 of index "pg_class_oid_index" is deleted
NOTICE:  page 12 of index "pg_class_oid_index" is deleted
 bt_index_verify
─


cds2=# select bt_leftright_verify('pg_class_oid_index');
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 2, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
[repeat infinity until cancel]

which looks like the index is corrupted?  ISTM _bt_moveright is
hanging because it's trying to move from block 9 to block 9 and so
loops forever.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure  wrote:
> (gdb) print  BufferGetBlockNumber(buf)
> $15 = 9
>
> ..and it stays 9, continuing several times having set breakpoint.


And the index involved? I'm pretty sure that this in an internal page, no?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan  wrote:
> On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan  wrote:
>> Could you write some code to print out the block number (i.e.
>> "BlockNumber blkno") if there are more than, say, 5 retries within
>> _bt_moveright()?
>
> Obviously I mean that the block number should be printed, no matter
> whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you
> should just move this to the top of the for(;;) loop, so it's always
> available to print:
>
> BlockNumber blkno = BufferGetBlockNumber(buf);

(gdb) print  BufferGetBlockNumber(buf)
$15 = 9

..and it stays 9, continuing several times having set breakpoint.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-01-14 Thread Kouhei Kaigai

> On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai  wrote:
> > When custom-scan node replaced a join-plan, it shall have at least two
> > child plan-nodes. The callback handler of PlanCustomPath needs to be
> > able to call create_plan_recurse() to transform the underlying
> > path-nodes to plan-nodes, because this custom-scan node may take other
> > built-in scan or sub-join nodes as its inner/outer input.
> > In case of FDW, it shall kick any underlying scan relations to remote
> > side, thus we may not expect ForeignScan has underlying plans...
> 
> Do you have an example of this?
>
Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
--
 Custom Scan (GpuHashJoin)  (cost=2968.00..140120.31 rows=3970922 width=143)
   Hash clause 1: (aid = aid)
   Hash clause 2: (bid = bid)
   Bulkload: On
   ->  Custom Scan (GpuScan) on t0  (cost=500.00..57643.00 rows=409 
width=77)
   ->  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4 width=37)
 hash keys: aid
 nBatches: 1  Buckets: 46000  Memory Usage: 99.99%
 ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37)
 ->  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4 width=37)
   hash keys: bid
   nBatches: 1  Buckets: 46000  Memory Usage: 49.99%
   ->  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Thursday, January 15, 2015 2:07 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
> Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5]
> Custom Plan API)
> 
> On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai  wrote:
> > When custom-scan node replaced a join-plan, it shall have at least two
> > child plan-nodes. The callback handler of PlanCustomPath needs to be
> > able to call create_plan_recurse() to transform the underlying
> > path-nodes to plan-nodes, because this custom-scan node may take other
> > built-in scan or sub-join nodes as its inner/outer input.
> > In case of FDW, it shall kick any underlying scan relations to remote
> > side, thus we may not expect ForeignScan has underlying plans...
> 
> Do you have an example of this?
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

2015-01-14 Thread Andres Freund

On 2015-01-14 12:27:42 -0500, Robert Haas wrote:
> On Mon, Jan 12, 2015 at 12:57 PM, Andres Freund  
> wrote:
> > On 2015-01-10 23:03:36 +0100, Andres Freund wrote:
> >> On 2015-01-10 16:09:42 -0500, Tom Lane wrote:
> >> > I've not tried to build HEAD on my HPPA dinosaur for awhile, but I did
> >> > just now, and I am presented with boatloads of this:
> >> >
> >> > ../../../src/include/storage/s_lock.h:759: warning: `S_UNLOCK' redefined
> >> > ../../../src/include/storage/s_lock.h:679: warning: this is the location 
> >> > of the previous definition
> >> >
> >> > which is not too surprising because the "default" definition at line 679
> >> > precedes the HPPA-specific one at line 759.
> >>
> >> That's 0709b7ee72e4bc71ad07b7120acd117265ab51d0.
> >>
> >> Not too surprising that it broke and wasn't noticed without access to
> >> hppa - the hppa code uses gcc inline assembly outside of the big
> >> defined(__GNUC__) and inside the section headed "Platforms that use
> >> non-gcc inline assembly".
> >>
> >> > I'm not particularly interested in untangling the effects of the recent
> >> > hackery in s_lock.h enough to figure out how the overall structure got
> >> > broken, but I trust one of you will clean up the mess.
> >>
> >> I think it's easiest solved by moving the gcc inline assembly up to the
> >> rest of the gcc inline assembly. That'll require duplicating a couple
> >> lines, but seems easier to understand nonetheless. Not pretty.
> >
> > Robert, do you have a better idea?
> 
> How about hoisting the entire hppa section up to the top of the file,
> before the __GNUC__ || __INTEL_COMPILER section?

We could do that, but I'd rather not see that uglyness at the top
everytime I open the file :P

Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section
sufficient and acceptable. It's after all the HPPA section that doesn't
really play by the rules.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

2015-01-14 Thread Andres Freund

On 2015-01-14 17:46:39 -0500, Robert Haas wrote:
> On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund  wrote:
> > I think it's better than the alternatives:
> >
> > a) Don't support 64bit atomics on any 32bit platform. I think that'd be
> >sad because there's some places that could greatly benefit from being
> >able to read/store/manipulate e.g. LSNs atomically.
> > b) Double the size of 64bit atomics on 32bit platforms, and add
> >TYPEALIGN() to every access inside the atomics implementation.
> > c) Require 64 atomics to be aligned appropriately manually in every
> >place they're embedded. I think that's completely impractical.
> >
> > The only viable one imo is a)
> 
> I can't really fault that reasoning, but if __attribute__((align))
> only works on some platforms, then you've got silent, subtle breakage
> on the ones where it doesn't.

The __attribute__((align()))'s are in compiler specific code sections
anyway - and there's asserts ensuring that the alignment is correct in
all routines where it matters (IIRC). Those are what caught the
problem. C.f.
http://archives.postgresql.org/message-id/20150108204635.GK6299%40alap3.anarazel.de

I think I'd for now simply not define pg_attribute_aligned() on
platforms where it's not supported, instead of defining it empty. If we
need a softer variant we can name it pg_attribute_aligned_if_possible or
something.

Sounds sane?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

2015-01-14 Thread Robert Haas

On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund  wrote:
> I think it's better than the alternatives:
>
> a) Don't support 64bit atomics on any 32bit platform. I think that'd be
>sad because there's some places that could greatly benefit from being
>able to read/store/manipulate e.g. LSNs atomically.
> b) Double the size of 64bit atomics on 32bit platforms, and add
>TYPEALIGN() to every access inside the atomics implementation.
> c) Require 64 atomics to be aligned appropriately manually in every
>place they're embedded. I think that's completely impractical.
>
> The only viable one imo is a)

I can't really fault that reasoning, but if __attribute__((align))
only works on some platforms, then you've got silent, subtle breakage
on the ones where it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

2015-01-14 Thread Andres Freund

On 2015-01-14 15:48:47 -0500, Robert Haas wrote:
> On Tue, Jan 13, 2015 at 4:18 PM, Oskari Saarenmaa  wrote:
> > Commit db4ec2ffce35 added alignment attributes for 64-bit atomic
> > variables as required on 32-bit platforms using
> > __attribute__((aligned(8)).  That works fine with GCC, and would work
> > with Solaris Studio Compiler [1] and IBM XL C [2], but src/include/c.h
> > defines __attribute__ as an empty macro when not using GCC.
> > Unfortunately we can't just disable that #define and enable all
> > __attributes__ for Solaris CC and XLC as we use a bunch of attributes
> > that are not supported by those compilers and using them unconditionally
> > would generate a lot of warnings.
> >
> > Attached a patch that defines custom macros for each attribute and
> > enables them individually for compilers that support them and never
> > defines __attribute__.
> >
> > I have tested this with GCC 4.9.2 on Linux x86-64 and Solaris CC 5.12 on
> > Sparc (32-bit build); I don't have access to an IBM box with XLC.
> 
> I guess my first question is whether we want to be relying on
> __attribute__((aligned)) in the first place.

I think it's better than the alternatives:

a) Don't support 64bit atomics on any 32bit platform. I think that'd be
   sad because there's some places that could greatly benefit from being
   able to read/store/manipulate e.g. LSNs atomically.
b) Double the size of 64bit atomics on 32bit platforms, and add
   TYPEALIGN() to every access inside the atomics implementation.
c) Require 64 atomics to be aligned appropriately manually in every
   place they're embedded. I think that's completely impractical.

The only viable one imo is a)

Since the atomics code is compiler dependant anyway, I don't much of a
problem with relying on compiler specific alignment
instructions. Really, the only problem right now is that __attribute__
is defined to be empty on compilers where it has meaning.

> If we do, then this seems like a pretty sensible and necessary change.

I think it's also an improvment independent from the alignment
code. Having a more widely portable __attribute__((noreturn)),
__attribute__((unused)), __attribute__((format(..))) seems like a good
general improvement. And all of these are supported in other compilers
than gcc.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby

2015-01-14 Thread Robert Haas

On Wed, Jan 14, 2015 at 8:24 AM, Michael Paquier
 wrote:
> In pg_standby.c, we use a 32-byte buffer in CheckForExternalTrigger to
> which is read the content of the trigger file defined by -f:
> CheckForExternalTrigger(void)
> {
> charbuf[32];
> [...]
> if ((len = read(fd, buf, sizeof(buf))) < 0)
> {
> stuff();
> }
>
> if (len < sizeof(buf))
> buf[len] = '\0';
> However it happens that if the file read contains 32 bytes or more, we
> enforce \0 in a position out of bounds. While looking at that, I also
> noticed that pg_standby can trigger a failover as long as the
> beginning of buf matches either "smart" or "fast", but I think we
> should check for a perfect match instead of a comparison of the first
> bytes, no?
>
> Attached is a patch to fix the out-of-bound issue as well as
> improvements for the detection of the trigger file content. I think
> that the out-of-bound fix should be backpatched, while the
> improvements for the trigger file are master-only. Coverity has
> pointed out the out-of-bound issue.

I would imagine that the goal might have been to accept not only smart
but also smart\r and smart\n and smart\r\n.  It's a little weird that
we also accept smartypants, though.

Instead of doing this:

if (len < sizeof(buf))
buf[len] = '\0';

...I would suggest making the size of the buffer one greater than the
size of the read(), and then always nul-terminating the buffer.  It
seems to me that would make the code easier to reason about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-14 Thread Robert Haas

On Wed, Jan 7, 2015 at 8:44 AM, Andrew Dunstan  wrote:
> I understand, but I think "pg_rewind" is likely to be misleading to many
> users who will say "but I don't want just to rewind".
>
> I'm not wedded to the name I suggested, but I think we should look at
> possible alternative names. We do have experience of misleading names
> causing confusion (e.g. "initdb").

FWIW, pg_rewind sounds pretty good to me.  But maybe I'm just not
understanding what the use case for this, apart from rewinding?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-14 Thread Tomas Vondra

On 12.1.2015 01:28, Ali Akbar wrote:
> 
> > Or else we implement what you suggest below (more comments below):
> >
> > Thinking about the 'release' flag a bit more - maybe we
> could do
> > this
> > instead:
> >
> > if (release && astate->private_cxt)
> > MemoryContextDelete(astate->mcontext);
> > else if (release)
> > {
> > pfree(astate->dvalues);
> > pfree(astate->dnulls);
> > pfree(astate);
> > }
> >
> > i.e. either destroy the whole context if possible, and
> just free the
> > memory when using a shared memory context. But I'm afraid
> this would
> > penalize the shared memory context, because that's
> intended for
> > cases where all the build states coexist in parallel and
> then at some
> > point are all converted into a result and thrown away.
> Adding pfree()
> > calls is no improvement here, and just wastes cycles.
> >
> >
> > As per Tom's comment, i'm using "parent memory context" instead of
> > "shared memory context" below.
> >
> > In the future, if some code writer decided to use
> subcontext=false,
> > to save memory in cases where there are many array
> accumulation, and
> > the parent memory context is long-living, current code can cause
> > memory leak. So i think we should implement your suggestion
> > (pfreeing astate), and warn the implication in the API
> comment. The
> > API user must choose between release=true, wasting cycles but
> > preventing memory leak, or managing memory from the parent memory
> > context.
> 
> I'm wondering whether this is necessary after fixing makeArrayResult to
> use the privat_cxt flag. It's still possible to call makeMdArrayResult
> directly (with the wrong 'release' value).
> 
> Another option might be to get rid of the 'release' flag altogether, and
> just use the 'private_cxt' - I'm not aware of a code using release=false
> with private_cxt=true (e.g. to build the same array twice from the same
> astate). But maybe there's such code, and another downside is that it'd
> break the existing API.
> 
> > In one possible use case, for efficiency maybe the caller will
> > create a special parent memory context for all array accumulation.
> > Then uses makeArrayResult* with release=false, and in the end
> > releasing the parent memory context once for all.
> 
> Yeah, although I'd much rather not break the existing code at all. That
> is - my goal is not to make it slower unless absolutely necessary (and
> in that case the code may be fixed per your suggestion). But I'm not
> convinced it's worth it.
> 
> 
> OK. Do you think we need to note this in the comments? Something like
> this: If using subcontext=false, the caller must be careful about memory
> usage, because makeArrayResult* will not free the memory used.

Yes, I think it's worth mentioning.

> But I think it makes sense to move the error handling into
> initArrayResultArr(), including the get_element_type() call, and remove
> the element_type from the signature. This means initArrayResultAny()
> will call the get_element_type() twice, but I guess that's negligible.
> And everyone who calls initArrayResultArr() will get the error handling
> for free.
> 
> Patch v7 attached, implementing those two changes, i.e.
> 
>   * makeMdArrayResult(..., astate->private_cxt)
>   * move error handling into initArrayResultArr()
>   * remove element_type from initArrayResultArr() signature
> 
>  
> Reviewing the v7 patch:
> - applies cleanly to current master. patch format, whitespace, etc is good
> - make check runs without error
> - performance & memory usage still consistent
> 
> If you think we don't have to add the comment (see above), i'll mark
> this as ready for committer

Attached is v8 patch, with a few comments added:

1) before initArrayResult() - explaining when it's better to use a
   single memory context, and when it's more efficient to use a
   separate memory context for each array build state

2) before makeArrayResult() - explaining that it won't free memory
   when allocated in a single memory context (and that a pfree()
   has to be used if necessary)

3) before makeMdArrayResult() - explaining that it's illegal to use
   release=true unless using a subcontext

Otherwise the v8 patch is exactly the same as v7. Assuming the comments
make it sufficiently clear, I agree with marking this as 'ready for
committer'.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/executor/node

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-14 Thread Peter Eisentraut

On 1/1/15 11:04 PM, Noah Misch wrote:
>> Clusters hosted on OS X fall into these categories:
>>
>> 1) Unaffected configuration.  This includes everyone setting a valid messages
>>locale via LANG, LC_ALL or LC_MESSAGES.
>> 2) Affected configuration.  Through luck and light use, the cluster would not
>>experience the crashes/hangs.
>> 3) Cluster would experience the crashes/hangs.
>>
>> DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message
>> instead.  DBAs in (1) don't care.  Since intermittent postmaster hangs are 
>> far
>> worse than startup failure, if (2) and (3) have similar population, FATAL is
>> the better bet.  If (2) is sufficiently more populous than (3), then the many
>> small pricks from startup failure do add up to hurt more than the occasional
>> postmaster hang.  Who knows how that calculation plays out.
> 
> The first attached patch, for all branches, adds LOG-level messages and an
> assertion.  So cassert builds will fail hard, while others won't.  The second
> patch, for master only, changes the startup-time message to FATAL.  If we
> decide to use FATAL in all branches, I would just squash them into one.

What I'm seeing now is that the unaccent regression tests when run under
make check-world abort with

FATAL:  postmaster became multithreaded during startup
HINT:  Set the LC_ALL environment variable to a valid locale.

My locale settings for this purpose are

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

The environment variable LANG is set, all the other ones are not set.

I could presumably set LC_ALL, but then I lose the ability to set
different locales for different categories.

Running

LC_ALL=$LANG make -C contrib/unaccent check

doesn't fix it; I get the same error.

The behavior is also a bit strange.  The step

== starting postmaster==

hangs for one minute before failing.  This probably has nothing to do
with your change, but maybe pg_regress could detect postmaster startup
failures better.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 6:46 AM, Kyotaro HORIGUCHI
 wrote:
>> Is it possible to use the parallel query infrastructure being built by
>> Robert or to do something like parallel seq scan? That will work, not just
>> for Postgres FDW but all the FDWs.
>
> But, I think, from the performance view, every scan of multiple
> foreign scans don't need correnponding local process.

Quite so.  I think this is largely a separate project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving RLS qual pushdown

2015-01-14 Thread Robert Haas

On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed  wrote:
> On 14 January 2015 at 13:29, Robert Haas  wrote:
>> One thing they could still leak is the number of times they got
>> called, and thus possibly the number of unseen rows.  Now if the
>> expressions get constant-folded away that won't be an issue, but a
>> clever user can probably avoid that.
>
> Right now, EXPLAIN ANALYSE can be used to tell you the number of
> unseen rows. Is that something that people are concerned about, and
> are there any plans to change it?

Interesting question.  I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo fix in alter_table.sgml

2015-01-14 Thread Robert Haas

On Wed, Jan 14, 2015 at 2:55 AM, Michael Paquier
 wrote:
> I noticed that SET STATISTICS was not in a  block in 
> alter_table.sgml:
>   
> -  SET STATISTICS acquires a SHARE UPDATE EXCLUSIVE 
> lock.
> +  SET STATISTICS acquires a
> +  SHARE UPDATE EXCLUSIVE lock.
>   
> That's a small detail, still.. Patch is attached.

OK, committed.  But that's not a "typo" as stated in $SUBJECT but
rather a "markup fix".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 6:25 AM, John Gorman  wrote:
> One approach that I has worked well for me is to break big jobs into much
> smaller bite size tasks. Each task is small enough to complete quickly.
>
> We add the tasks to a task queue and spawn a generic worker pool which eats
> through the task queue items.
>
> This solves a lot of problems.
>
> - Small to medium jobs can be parallelized efficiently.
> - No need to split big jobs perfectly.
> - We don't get into a situation where we are waiting around for a worker to
> finish chugging through a huge task while the other workers sit idle.
> - Worker memory footprint is tiny so we can afford many of them.
> - Worker pool management is a well known problem.
> - Worker spawn time disappears as a cost factor.
> - The worker pool becomes a shared resource that can be managed and reported
> on and becomes considerably more predictable.

I think this is a good idea, but for now I would like to keep our
goals somewhat more modest: let's see if we can get parallel
sequential scan, and only parallel sequential scan, working and
committed.  Ultimately, I think we may need something like what you're
talking about, because if you have a query with three or six or twelve
different parallelizable operations in it, you want the available CPU
resources to switch between those as their respective needs may
dictate.  You certainly don't want to spawn a separate pool of workers
for each scan.

But I think getting that all working in the first version is probably
harder than what we should attempt.  We have a bunch of problems to
solve here just around parallel sequential scan and the parallel mode
infrastructure: heavyweight locking, prefetching, the cost model, and
so on.  Trying to add to that all of the problems that might attend on
a generic task queueing infrastructure fills me with no small amount
of fear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OOM on EXPLAIN with lots of nodes

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane  wrote:
> I wrote:
>> Heikki Linnakangas  writes:
>>> But do we really need to backpatch any of this?
>
>> Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB
>> peak in 9.3 and up.  That seems like a pretty nasty regression.
>
> I did a bit more measurement of the time and backend memory consumption
> for Alexey's example EXPLAIN:
>
> 9.2: 0.9 sec, circa 200 MB
> HEAD: 56 sec, circa 7300 MB
> with patch below: 3.7 sec, circa 300 MB
>
> So while this doesn't get us all the way back down to where we were before
> we started trying to guarantee unique table/column identifiers in EXPLAIN
> printouts, it's at least a lot closer.
>
> Not sure whether to just commit this to HEAD and call it a day, or to
> risk back-patching.

I think we need to back-patch something; that's a pretty nasty
regression, and I have some EDB-internal reports that might be from
the same cause.  I'm not too concerned about forcibly breaking the API
here, but I can understand why somebody might want to do that.  If we
do, I like the idea of renaming ExplainInitState() or maybe by
replacing it by a NewExplainState() function that is used instead.
But I'm not sure how necessary it is really.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane  wrote:
> However, there is a larger practical problem with this whole concept,
> which is that experience should teach us to be very wary of the assumption
> that asking for memory the system can't give us will just lead to nice
> neat malloc-returns-NULL behavior.  Any small perusal of the mailing list
> archives will remind you that very often the end result will be SIGSEGV,
> OOM kills, unrecoverable trap-on-write when the kernel realizes it can't
> honor a copy-on-write promise, yadda yadda.  Agreed that it's arguable
> that these only occur in misconfigured systems ... but misconfiguration
> appears to be the default in a depressingly large fraction of systems.
> (This is another reason for "_safe" not being the mot juste :-()

I don't really buy this.  It's pretty incredible to think that after a
malloc() failure there is absolutely no hope of carrying on sanely.
If that were true, we wouldn't be able to ereport() out-of-memory
errors at any severity less than FATAL, but of course it doesn't work
that way.  Moreover, AllocSetAlloc() contains malloc() and, if that
fails, calls malloc() again with a smaller value, without even
throwing an error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 4:18 PM, Oskari Saarenmaa  wrote:
> Commit db4ec2ffce35 added alignment attributes for 64-bit atomic
> variables as required on 32-bit platforms using
> __attribute__((aligned(8)).  That works fine with GCC, and would work
> with Solaris Studio Compiler [1] and IBM XL C [2], but src/include/c.h
> defines __attribute__ as an empty macro when not using GCC.
> Unfortunately we can't just disable that #define and enable all
> __attributes__ for Solaris CC and XLC as we use a bunch of attributes
> that are not supported by those compilers and using them unconditionally
> would generate a lot of warnings.
>
> Attached a patch that defines custom macros for each attribute and
> enables them individually for compilers that support them and never
> defines __attribute__.
>
> I have tested this with GCC 4.9.2 on Linux x86-64 and Solaris CC 5.12 on
> Sparc (32-bit build); I don't have access to an IBM box with XLC.

I guess my first question is whether we want to be relying on
__attribute__((aligned)) in the first place.  If we do, then this
seems like a pretty sensible and necessary change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan  wrote:
> Could you write some code to print out the block number (i.e.
> "BlockNumber blkno") if there are more than, say, 5 retries within
> _bt_moveright()?

Obviously I mean that the block number should be printed, no matter
whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you
should just move this to the top of the for(;;) loop, so it's always
available to print:

BlockNumber blkno = BufferGetBlockNumber(buf);

Thanks
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure  wrote:
> so it looks like nobody ever exits from _bt_moveright.   any last
> requests before I start bisecting down?

Could you write some code to print out the block number (i.e.
"BlockNumber blkno") if there are more than, say, 5 retries within
_bt_moveright()? You'll also want to print out the name of the index
(so rel->rd_rel->relname.data). That'll probably only trip when this
happens (5 retries are very unusual). Then, use contrib/pageinspect to
dump that block's contents (I think that'll work, since no one sits on
a buffer lock here).

I'd like to know what index we're dealing with here. I wonder if it's
"pg_attribute_relid_attnam_index".

So: call "bt_page_stats('pg_whatever', n)", n being the relevant block
number. Try and do the same with the right link (which bt_page_stats()
will show). Then try and call bt_page_items() against one or both of
those blocks/pages. I'd like to see what that shows. I'm wondering if
some invariant has been violated.

Which _bt_moveright() caller are we talking about here, BTW? I guess
it's just the _bt_first()/ _bt_search() one. That might vary, of
course.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory leak in vacuumlo

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 2:58 AM, Michael Paquier
 wrote:
> Coverity has pointed out that vacuumlo.c is leaking memory when
> grabbing the list of candidate table names and schemas. Attached is a
> patch to fix the leak.

It's only leaking in error cases, the first of which is quite
unlikely.  But I don't see that as a reason not to fix it, so --
committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching

2015-01-14 Thread Tomas Vondra

On 11.12.2014 23:46, Tomas Vondra wrote:
> On 11.12.2014 22:16, Robert Haas wrote:
>> On Thu, Dec 11, 2014 at 2:51 PM, Tomas Vondra  wrote:
>>
>>> The idea was that if we could increase the load a bit (e.g. using 2
>>> tuples per bucket instead of 1), we will still use a single batch in
>>> some cases (when we miss the work_mem threshold by just a bit). The
>>> lookups will be slower, but we'll save the I/O.
>>
>> Yeah.  That seems like a valid theory, but your test results so far
>> seem to indicate that it's not working out like that - which I find
>> quite surprising, but, I mean, it is what it is, right?
> 
> Not exactly. My tests show that as long as the outer table batches fit
> into page cache, icreasing the load factor results in worse performance
> than batching.
> 
> When the outer table is "sufficiently small", the batching is faster.
> 
> Regarding the "sufficiently small" - considering today's hardware, we're
> probably talking about gigabytes. On machines with significant memory
> pressure (forcing the temporary files to disk), it might be much lower,
> of course. Of course, it also depends on kernel settings (e.g.
> dirty_bytes/dirty_background_bytes).
> 
> If we could identify those cases (at least the "temp files > RAM") then
> maybe we could do this. Otherwise we're going to penalize all the other
> queries ...
> 
> Maybe the best solution for now is "increase the work_mem a bit"
> recommendation.

I think it's time to mark this patch as rejected (or maybe returned with
feedback). The patch was meant as an attempt to implement Robert's idea
from the hashjoin patch, but apparently we have no clear idea how to do
it without hurting performance for many existing users.

Maybe we can try later again, but there's no poin in keeping this in the
current CF.

Any objections?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder

2015-01-14 Thread Arne Scheffer


Hi,

we will also remove the following is lc_collate hint in the next version, 
showing only mandatory info as suggested.

/* for those who use COLLATE although their default is already 
the wanted */
if (strcmp(collname, localeptr) == 0)
{
appendStringInfo(sortorderInformation, " (%s is 
LC_COLLATE)", collname);
}

Anybody insisting on that?

Arne

Note: I see, at the moment we use the wrong default for DESC. We'll fix that.

On Wed, 14 Jan 2015, Heikki Linnakangas wrote:


On 01/14/2015 05:26 PM, Timmer, Marius wrote:

Hello Heikki,

abbreviated version:
Sorry, the problem is only the unhandy patch text format, not different 
opinions how to proceed.


Long version:
The v7 patch file already addressed your suggestions,
but the file contained serveral (old) local commits,
the new ones at the end of the patch text/file.


Ah, missed that. I stopped reading when I saw the old stuff there :-).


v7.1 is attached and addresses this issue providing a clean patch file.


Ok, thanks, will take a look.


V8 will - as mentioned - add missing docs and regression tests,


Great!

- Heikki





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder

2015-01-14 Thread Timmer, Marius

Hello Heikki,

abbreviated version:
Sorry, the problem is only the unhandy patch text format, not different 
opinions how to proceed.

Long version:
The v7 patch file already addressed your suggestions,
but the file contained serveral (old) local commits,
the new ones at the end of the patch text/file.

v7.1 is attached and addresses this issue providing a clean patch file.

V8 will - as mentioned - add missing docs and regression tests,
Mike suggested.

VlG-Arne & Marius




---
Marius Timmer
Zentrum für Informationsverarbeitung
Westfälische Wilhelms-Universität Münster
Einsteinstraße 60

mtimm...@uni-muenster.de

Am 13.01.2015 um 18:52 schrieb Heikki Linnakangas :

> On 01/13/2015 06:04 PM, Timmer, Marius wrote:
>>  -malloc() (StringInfo is used as suggested now).
>
> There really shouldn't be any snprintf() calls in the patch, when StringInfo 
> is used correctly...
>
>> @@ -1187,6 +1187,7 @@ explain (verbose, costs off) select * from matest0 
>> order by 1-id;
>>  Sort
>>Output: matest0.id, matest0.name, ((1 - matest0.id))
>>Sort Key: ((1 - matest0.id))
>> +   Sort Order: ASC NULLS LAST
>>->  Result
>>  Output: matest0.id, matest0.name, (1 - matest0.id)
>>  ->  Append
>
> This patch isn't going to be committed with this output format. Please change 
> per my suggestion earlier:
>
>> I don't like this output. If there are a lot of sort keys, it gets
>> difficult to match the right ASC/DESC element to the sort key it applies
>> to. (Also, there seems to be double-spaces in the list)
>>
>> I would suggest just adding the information to the Sort Key line. As
>> long as you don't print the modifiers when they are defaults (ASC and
>> NULLS LAST), we could print the information even in non-VERBOSE mode. So
>> it would look something like:
>>
>> Sort Key: sortordertest.n1 NULLS FIRST, sortordertest.n2 DESC
>
> Or if you don't agree with that, explain why.
>
> - Heikki
>



explain_sortorder-v7_1.patch
Description: explain_sortorder-v7_1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund  wrote:
> On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote:
>> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund  
>> wrote:
>> > If you gdb in, and type 'fin' a couple times, to wait till the function
>> > finishes, is there actually any progress? I'm wondering whether it's
>> > just many catalog accesses + contention, or some other
>> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and
>> > see how often it's hit.
>>
>> well, i restarted the database, forgetting my looper was running which
>> immediately spun up and it got stuck again with a similar profile
>> (lots of cpu in spinlock):
>>
>> Samples: 3K of event 'cycles', Event count (approx.): 2695723228
>>  31.16%  postgres[.] s_lock
>>  22.32%  postgres[.] tas
>>  12.13%  postgres[.] tas
>>   5.93%  postgres[.] spin_delay
>>   5.69%  postgres[.] LWLockRelease
>>   3.75%  postgres[.] LWLockAcquireCommon
>>   3.61%  perf[.] 0x000526c4
>>   2.51%  postgres[.] FunctionCall2Coll
>>   1.48%  libc-2.17.so[.] 0x0016a132
>>
>> > If you gdb in, and type 'fin' a couple times,
>> (gdb) fin
>> Run till exit from #0  0x7ff4c63f7a97 in semop () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> 0x006de073 in PGSemaphoreLock ()
>> (gdb) fin
>> Run till exit from #0  0x006de073 in PGSemaphoreLock ()
>>
>> It returned once.  Second time, it didn't at least so far (minute or so).
>
> Hm, that's autovac though, not the normal user backends that actually do
> stuff, right? If you could additionally check those, it'd be great.


got it to reproduce.  had to take a break to get some work done.


(gdb) fin
Run till exit from #0  0x7f7b07099dc3 in select ()
   from /lib/x86_64-linux-gnu/libc.so.6
0x008d3ac4 in pg_usleep ()
(gdb) fin
Run till exit from #0  0x008d3ac4 in pg_usleep ()
0x00750b69 in s_lock ()
(gdb) fin
Run till exit from #0  0x00750b69 in s_lock ()
0x00750844 in LWLockRelease ()
(gdb) fin
Run till exit from #0  0x00750844 in LWLockRelease ()
0x0073 in LockBuffer ()
(gdb) fin
Run till exit from #0  0x0073 in LockBuffer ()
0x004b2db4 in _bt_relandgetbuf ()
(gdb) fin
Run till exit from #0  0x004b2db4 in _bt_relandgetbuf ()
0x004b7116 in _bt_moveright ()
(gdb) fin
Run till exit from #0  0x004b7116 in _bt_moveright ()



so it looks like nobody ever exits from _bt_moveright.   any last
requests before I start bisecting down?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-14 Thread Robert Haas

On Tue, Jan 13, 2015 at 4:41 PM, Peter Eisentraut  wrote:
> I think the key point I'm approaching is that the information should
> only ever be in one place, all the time.  This is not dissimilar from
> why we took the tablespace location out of the system catalogs.  Users
> might have all kinds of workflows for how they back up, restore, and
> move their tablespaces.  This works pretty well right now, because the
> authoritative configuration information is always in plain view.  The
> proposal is essentially that we add another location for this
> information, because the existing location is incompatible with some
> operating system tools.  And, when considered by a user, that second
> location might or might not collide with or overwrite the first location
> at some mysterious times.
>
> So I think the preferable fix is not to add a second location, but to
> make the first location compatible with said operating system tools,
> possibly in the way I propose above.

I see.  I'm a little concerned that following symlinks may be cheaper
than whatever system we would come up with for caching the
tablespace-name-to-file-name mappings.  But that concern might be
unfounded, and apart from it I have no reason to oppose your proposal,
if you want to do the work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2015-01-14 Thread Jim Nasby


On 1/13/15 5:02 AM, David Rowley wrote:


I can't quite get my head around what you mean here, as the idea sounds quite 
similar to something that's been discussed already and ruled out.
If we're joining relation a to relation b, say the plan chosen is a merge join. 
If we put some special node as the parent of the merge join then how will we 
know to skip or not skip any sorts that are there especially for the merge 
join, or perhaps the planner choose an index scan as a sorted path, where now 
that the join is removed, could become a faster seqscan. The whole plan 
switching node discussion came from this exact problem. Nobody seemed to like 
the non-optimal plan that was not properly optimised for having the relation 
removed.


In my mind this is the same as a root level Alternative Plan, so you'd be free 
to do whatever you wanted in the alternate:

-> blah blah
  -> Alternate
-> Merge join
   ...
-> SeqScan

I'm guessing this would be easier to code, but that's just a guess. The other 
advantage is if you can't eliminate the join to table A at runtime you could 
still eliminate table B, whereas a top-level Alternate node doesn't have that 
flexibility.

This does have a disadvantage of creating more plan variations to consider. 
With a single top-level Alternate node there's only one other option. I believe 
multiple Alternates would create more options to consider.

Ultimately, unless this is easier to code than a top-level alternate, it's 
probably not worth pursuing.


It also seems that transitioning through needless nodes comes at a cost. This 
is why I quite liked the Alternative Plan node idea, as it allowed me to skip 
over the alternative plan node at plan initialisation.


For init I would expect this to result in a smaller number of nodes than a 
top-level Alternate, because you wouldn't be duplicating all the stuff above 
the joins. That said, I rather doubt it's worth worrying about the cost to 
init; won't it be completely swamped by extra planning cost, no matter how we 
go about this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-14 Thread Stephen Frost

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> Turns out it wasn't as simple as that. prepend_row_security_policies()
> really could get called multiple times for the same RTE, because the
> call to query_tree_walker() at the end of fireRIRrules() would descend
> into the just-added quals again. The simplest fix seems to be to
> process RLS in a separate loop at the end, so that it can have it's
> own infinite recursion detection, which is different from that needed
> for pre-existing security quals and with check options from security
> barrier views. This approach simplifies things a bit, and ensures that
> we only try to expand RLS once for each RTE.

Right, I specifically recall having prepend_row_security_policies()
getting called multiple times for the same RTE.  I like this approach of
using a separate loop though and it strikes me that it lends more
weight to the argument that we're better off with these as independent
considerations.

> > Also, I'm thinking that it would be better to refactor things a bit
> > and have prepend_row_security_policies() just return the new
> > securityQuals and withCheckOptions to add. Then fireRIRrules() would
> > only have to recurse into the new quals being added, not the
> > already-processed quals.

Hmm, good point.

> Turns out that refactoring actually became necessary in order to fix
> this bug, but I think it makes things cleaner and more efficient.

Sounds good, I'll take a look.

> Here's an updated patch with a new test for this bug. I've been
> developing the fixes for these RLS issues as one big patch, but I
> suppose it would be easy to split up, if that's preferred.

I'm alright with it as-is for now.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ereport bug

2015-01-14 Thread Robert Haas

On Wed, Jan 14, 2015 at 1:40 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin
>>  wrote:
>>> I am attaching to this letter a test case that shows the behavior
>>> errcontext() macro and the way to fix it.
>
>> So the upshot of this is that given errfinish(A, B, C), where A, B,
>> and C are expressions, my gcc is choosing to evaluate C, then B, then
>> A, then the errfinish call itself.  But whoever wrote the errcontext()
>> macro evidently thought, in this kind of situation, the compiler would
>> be certain to evaluate A, then B, then C, then errfinish.  But it
>> doesn't.
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a

Oops, sorry.  I missed that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-14 Thread Dean Rasheed

On 14 January 2015 at 08:43, Dean Rasheed  wrote:
> On 12 January 2015 at 14:24, Stephen Frost  wrote:
>> Interesting, thanks for the work!  I had been suspicious that there was
>> an issue with the recursion handling.
>>
>
> So continuing to review the RLS code, I spotted the following in
> prepend_row_security_policies():
>
> /*
>  * We may end up getting called multiple times for the same RTE, so check
>  * to make sure we aren't doing double-work.
>  */
> if (rte->securityQuals != NIL)
> return false;
>
> which looked suspicious. I assume that's just a hang-up from an
> earlier attempt to prevent infinite recursion in RLS expansion, but
> actually it's wrong because in the case of an UPDATE to a security
> barrier view on top of a table with RLS enabled, the view's
> securityQuals will get added to the RTE first, and that shouldn't
> prevent the underlying table's RLS securityQuals from being added.
>
> AFAICT, it should be safe to just remove the above code. I can't see
> how prepend_row_security_policies() could end up getting called more
> than once for the same RTE.
>

Turns out it wasn't as simple as that. prepend_row_security_policies()
really could get called multiple times for the same RTE, because the
call to query_tree_walker() at the end of fireRIRrules() would descend
into the just-added quals again. The simplest fix seems to be to
process RLS in a separate loop at the end, so that it can have it's
own infinite recursion detection, which is different from that needed
for pre-existing security quals and with check options from security
barrier views. This approach simplifies things a bit, and ensures that
we only try to expand RLS once for each RTE.

> Also, I'm thinking that it would be better to refactor things a bit
> and have prepend_row_security_policies() just return the new
> securityQuals and withCheckOptions to add. Then fireRIRrules() would
> only have to recurse into the new quals being added, not the
> already-processed quals.
>

Turns out that refactoring actually became necessary in order to fix
this bug, but I think it makes things cleaner and more efficient.

Here's an updated patch with a new test for this bug. I've been
developing the fixes for these RLS issues as one big patch, but I
suppose it would be easy to split up, if that's preferred.

Regards,
Dean
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index 9cbbcfb..34ddf41
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** inheritance_planner(PlannerInfo *root)
*** 790,795 
--- 790,796 
  {
  	Query	   *parse = root->parse;
  	int			parentRTindex = parse->resultRelation;
+ 	Bitmapset  *resultRTindexes = NULL;
  	List	   *final_rtable = NIL;
  	int			save_rel_array_size = 0;
  	RelOptInfo **save_rel_array = NULL;
*** inheritance_planner(PlannerInfo *root)
*** 814,820 
--- 815,835 
  	 * (1) would result in a rangetable of length O(N^2) for N targets, with
  	 * at least O(N^3) work expended here; and (2) would greatly complicate
  	 * management of the rowMarks list.
+ 	 *
+ 	 * Note that any RTEs with security barrier quals will be turned into
+ 	 * subqueries during planning, and so we must create copies of them too,
+ 	 * except where they are target relations, which will each only be used
+ 	 * in a single plan.
  	 */
+ 	resultRTindexes = bms_add_member(resultRTindexes, parentRTindex);
+ 	foreach(lc, root->append_rel_list)
+ 	{
+ 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+ 		if (appinfo->parent_relid == parentRTindex)
+ 			resultRTindexes = bms_add_member(resultRTindexes,
+ 			 appinfo->child_relid);
+ 	}
+ 
  	foreach(lc, root->append_rel_list)
  	{
  		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
*** inheritance_planner(PlannerInfo *root)
*** 885,905 
  			{
  RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
  
! if (rte->rtekind == RTE_SUBQUERY)
  {
  	Index		newrti;
  
  	/*
  	 * The RTE can't contain any references to its own RT
! 	 * index, so we can save a few cycles by applying
! 	 * ChangeVarNodes before we append the RTE to the
! 	 * rangetable.
  	 */
  	newrti = list_length(subroot.parse->rtable) + 1;
  	ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0);
  	ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0);
  	ChangeVarNodes((Node *) subroot.append_rel_list, rti, newrti, 0);
  	rte = copyObject(rte);
  	subroot.parse->rtable = lappend(subroot.parse->rtable,
  	rte);
  }
--- 900,928 
  			{
  RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
  
! /*
!  * Copy subquery RTEs and RTEs with security barrier quals
!  * that will be turned into subqueries, except those that are
!  * target relations.
!  */
! if (rte->rtekind == RTE_SUBQUERY ||
! 	(rte->securityQuals

Re: [HACKERS] libpq bad async behaviour

2015-01-14 Thread Daurnimator

On 14 January 2015 at 08:40, Andres Freund  wrote:

> I think that kind of solution isn't likely to be satisfying. The amount
> of porting work is just not going to be worth the cost. And it won't be
> easily hideable in the API at all as the callers will expect a normal
> fd.
>

All that consumers of the API need is something they can `select()` or
equivalent on.

> > Yeah, this is a problem. Fixing it isn't easy, though, I think.
>
> I think
> extern PostgresPollingStatusType PQconnectPoll(PGconn *conn);
> has the right interface. It returns what upper layers need to wait
> for. I think we should extend pretty much that to more interfaces.

This would be a fine solution. That enum indeed has the correct
values/semantics.

> This
> likely means that we'll need extended versions of PQFlush() and
> PQconsumeInput() - afaics it shouldn't be much more?
>

PQping?
PQconnectPoll already has it.

Though, I think we could probably even reduce this down to a single common
function for all cases:
PQpoll() or similar.

Re: [HACKERS] ereport bug

2015-01-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin
>  wrote:
>> I am attaching to this letter a test case that shows the behavior
>> errcontext() macro and the way to fix it.

> So the upshot of this is that given errfinish(A, B, C), where A, B,
> and C are expressions, my gcc is choosing to evaluate C, then B, then
> A, then the errfinish call itself.  But whoever wrote the errcontext()
> macro evidently thought, in this kind of situation, the compiler would
> be certain to evaluate A, then B, then C, then errfinish.  But it
> doesn't.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Peter Geoghegan

On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure  wrote:
> I'll try to pull commits that Peter suggested and see if that helps
> (I'm getting ready to bring the database down).  I can send the code
> off-list if you guys think it'd help.

Thanks for the code!

I think it would be interesting to see if the bug reproduces without
the page split commit (40dae7ec537c5619fc93ad602c62f37be786d161) first
(which was applied afterwards). Then, without the page deletion commit
(efada2b8e920adfdf7418862e939925d2acd1b89). But without the page split
commit seems most interesting at the moment. It would be easier to
diagnose the issue if the bug still reproduces without that commit -
that way, we can be reasonably confident that there are no confounding
factors from the new page split code.

There were some bugfixes to those two during the 9.4 beta cycle, too.
It might be a bit tricky to generate a 9.4 that lacks one or both of
those commits.

Bugfix commits in reverse chronological order:

c73669c0e0168923e3f9e787beec980f55af2bd8 (deletion)

c91a9b5a285e20e54cf90f3660ce51ce3a5c2ef4 (incomplete split)

4fafc4ecd9e4d224d92c4a8549c5646860787a5d (deletion)

4a5d55ec2b711e13438a32d119a809a22ced410b (incomplete split)

77fe2b6d795f3f4ed282c9c980920e128a57624e (deletion)

7d98054f0dd115f57ad0ec1f424a66c13459013b (deletion)

954523cdfe229f1cb99a43a19e291a557ae2822d (incomplete split)

I think that's all of them (apart from the original commits, of course).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

2015-01-14 Thread Robert Haas

On Mon, Jan 12, 2015 at 12:57 PM, Andres Freund  wrote:
> On 2015-01-10 23:03:36 +0100, Andres Freund wrote:
>> On 2015-01-10 16:09:42 -0500, Tom Lane wrote:
>> > I've not tried to build HEAD on my HPPA dinosaur for awhile, but I did
>> > just now, and I am presented with boatloads of this:
>> >
>> > ../../../src/include/storage/s_lock.h:759: warning: `S_UNLOCK' redefined
>> > ../../../src/include/storage/s_lock.h:679: warning: this is the location 
>> > of the previous definition
>> >
>> > which is not too surprising because the "default" definition at line 679
>> > precedes the HPPA-specific one at line 759.
>>
>> That's 0709b7ee72e4bc71ad07b7120acd117265ab51d0.
>>
>> Not too surprising that it broke and wasn't noticed without access to
>> hppa - the hppa code uses gcc inline assembly outside of the big
>> defined(__GNUC__) and inside the section headed "Platforms that use
>> non-gcc inline assembly".
>>
>> > I'm not particularly interested in untangling the effects of the recent
>> > hackery in s_lock.h enough to figure out how the overall structure got
>> > broken, but I trust one of you will clean up the mess.
>>
>> I think it's easiest solved by moving the gcc inline assembly up to the
>> rest of the gcc inline assembly. That'll require duplicating a couple
>> lines, but seems easier to understand nonetheless. Not pretty.
>
> Robert, do you have a better idea?

How about hoisting the entire hppa section up to the top of the file,
before the __GNUC__ || __INTEL_COMPILER section?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ereport bug

2015-01-14 Thread Robert Haas

On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin
 wrote:
> I am attaching to this letter a test case that shows the behavior
> errcontext() macro and the way to fix it.

So the upshot of this is that given errfinish(A, B, C), where A, B,
and C are expressions, my gcc is choosing to evaluate C, then B, then
A, then the errfinish call itself.  But whoever wrote the errcontext()
macro evidently thought, in this kind of situation, the compiler would
be certain to evaluate A, then B, then C, then errfinish.  But it
doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-01-14 Thread Robert Haas

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai  wrote:
> When custom-scan node replaced a join-plan, it shall have at least two
> child plan-nodes. The callback handler of PlanCustomPath needs to be
> able to call create_plan_recurse() to transform the underlying path-nodes
> to plan-nodes, because this custom-scan node may take other built-in
> scan or sub-join nodes as its inner/outer input.
> In case of FDW, it shall kick any underlying scan relations to remote
> side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Glyn Astill

- Original Message -


> From: Sawada Masahiko 
> To: Tom Lane 
> Cc: PostgreSQL-development 
> Sent: Wednesday, 14 January 2015, 16:09
> Subject: Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
> 
> On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane  wrote:
>>  Sawada Masahiko  writes:
>>>  The postgresql.auto.conf is loaded after loading of postgresql.conf
>>>  whenever configuration file is loaded or reloaded.
>>>  This means that parameter in postgresql.auto.conf is quite high
>>>  priority, so the parameter in postgresql.conf does not work at all
>>>  even if user set it manually.
>> 
>>>  If user want to change stopped postgresql server then user need to
>>>  merge two configuration file(postgresql.conf and postgresql.auto.conf)
>>>  while maintaining the consistency manually.
>> 
  From an operational perspective having a written config with 
> duplicate
>>>  entries is not good thing.
>>>  I think we need to merge two configuration file into one (or more than
>>>  one, if it uses like 'include' word)
>> 
>>>  The one solution is to add merging tool/commnand which merges two
>>>  configuration file while maintaining the consistency.
>>>  This topic have been already discussed?
>> 
>>  Yes.  The entire reason that postgresql.auto.conf is separate is that
>>  we despaired of reading and rewriting postgresql.conf automatically
>>  without making a hash of material in the comments.  Calling the logic
>>  a "merge tool" does not make that problem go away.
>> 
> 
> The merge tool do not only to merge the all parameters in two
> configuration into one file but also to remove duplicate parameters.
> That is, the configuration files will be one file in logically.

I think the point is that you can't really interpret the context of comments 
with an automated tool, so you'd probably construct a misleading conf file.  I 
created something to mash conf files together in perl, but mash them together 
is all it does: https://github.com/glynastill/pg_upgrade_conf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2015-01-14 Thread Robert Haas

On Fri, Jan 2, 2015 at 11:52 AM, Bruce Momjian  wrote:
> OK, so given your stats, the feature give a 12.5% reduction in I/O.  If
> that is significant, shouldn't we see a performance improvement?  If we
> don't see a performance improvement, is I/O reduction worthwhile?  Is it
> valuable in that it gives non-database applications more I/O to use?  Is
> that all?
>
> I suggest we at least document that this feature as mostly useful for
> I/O reduction, and maybe say CPU usage and performance might be
> negatively impacted.
>
> OK, here is the email I remember from Fujii Masao this same thread that
> showed a performance improvement for WAL compression:
>
> 
> http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT=hnr7ly516kft5ffqf4pp1q...@mail.gmail.com
>
> Why are we not seeing the 33% compression and 15% performance
> improvement he saw?  What am I missing here?

Bruce, some database workloads are I/O bound and others are CPU bound.
Any patch that reduces I/O by using CPU is going to be a win when the
system is I/O bound and a loss when it is CPU bound.  I'm not really
sure what else to say about that; it seems pretty obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-14 Thread Robert Haas

On Mon, Jan 5, 2015 at 12:54 PM, Tom Lane  wrote:
> TBH, my first reaction to this entire patch is unfavorable: it's a
> solution in search of a problem.  It adds substantial complication not
> only for users but for PG developers in order to solve a rather narrow
> performance issue.
>
> What would make sense to me is to teach the planner about inlining
> SQL functions that include ORDER BY clauses, so that the performance
> issue of a double sort could be avoided entirely transparently to
> the user.

That's not a bad idea, but it only helps for SQL functions.  Actually,
the problem I have run into in the past was not that the planner
didn't know the ordering of the SRF's return value, but that it had no
statistics for it, and therefore made bad optimization decisions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Sawada Masahiko

On Thu, Jan 15, 2015 at 1:15 AM, Tom Lane  wrote:
> Sawada Masahiko  writes:
>> On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane  wrote:
>>> Yes.  The entire reason that postgresql.auto.conf is separate is that
>>> we despaired of reading and rewriting postgresql.conf automatically
>>> without making a hash of material in the comments.  Calling the logic
>>> a "merge tool" does not make that problem go away.
>
>> The merge tool do not only to merge the all parameters in two
>> configuration into one file but also to remove duplicate parameters.
>> That is, the configuration files will be one file in logically.
>
> I'll just say one more time that if we thought this could work, we'd not
> have arrived at the separate-files design to begin with.  You can work
> on it if you like, but I will bet a good deal that you will not end up
> with something that gets accepted.
>

Yep, I don't intend to propose again that.
Because I thought that the maintaining of configuration file will be
complicated,
so I just thought we can add supporting tool.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-14 Thread Gabriele Bartolini

Hi Marco,

  thank you for sending an updated patch. I am writing down a report of
this initial (and partial) review.

IMPORTANT: This patch is not complete, as stated by Marco. See the
"Conclusions" section for my proposed TODO list.

== Patch application

I have been able to successfully apply your patch and compile it.
Regression tests passed.

== Initial run

I have created a fresh new instance of PostgreSQL and activated streaming
replication to be used by pg_basebackup. I have done a pgbench run with
scale 100.

I have taken a full consistent backup with pg_basebackup (in plain format):

pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -x


I have been able to verify that the backup_profile is correctly placed in
the destination PGDATA directory. Here is an excerpt:

POSTGRESQL BACKUP PROFILE 1
START WAL LOCATION: 0/358 (file 00010003)
CHECKPOINT LOCATION: 0/38C
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2015-01-14 10:07:07 CET
LABEL: pg_basebackup base backup
FILE LIST
\N  \N  t   1421226427  206 backup_label
\N  \N  t   1421225508  88  postgresql.auto.conf
...


As suggested by Marco, I have manually taken the LSN from this file (next
version must do this automatically).
I have then executed pg_basebackup and activated the incremental feature by
using the LSN from the previous backup, as follows:

LSN=$(awk '/^START WAL/{print $4}' backup_profile)

pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -I $LSN -x


The time taken by this operation has been much lower than the previous one
and the size is much lower (I have not done any operation in the meantime):

du -hs backup-1421226*
1,5Gbackup-1421226427
17M backup-1421226427


I have done some checks on the file system and then used the prototype of
recovery script in Python written by Marco.

./recover.py backup-1421226427 backup-1421226427 new-data

The cluster started successfully. I have then run a pg_dump of the pgbench
database and were able to reload it on the initial cluster.

== Conclusions

The first run of this patch seems promising.

While the discussion on the LSN map continues (which is mainly an
optimisation of this patch), I would really like to see this patch progress
as it would be a killer feature in several contexts (not in every context).

Just in this period we are releasing file based incremental backup for
Barman and customers using the alpha version are experiencing on average a
deduplication ratio between 50% to 70%. This is for example an excerpt of
"barman show-backup" from one of our customers (a daily saving of 550GB is
not bad):

 Base backup information:
Disk usage   : 1.1 TiB (1.1 TiB with WALs)
Incremental size : 564.6 GiB (-50.60%)
...

My opinion, Marco, is that for version 5 of this patch, you:

1) update the information on the wiki (it is outdated - I know you have
been busy with LSN map optimisation)
2) modify pg_basebackup in order to accept a directory (or tar file) and
automatically detect the LSN from the backup profile
3) add the documentation regarding the backup profile and pg_basebackup

Once we have all of this, we can continue trying the patch. Some unexplored
paths are:

* tablespace usage
* tar format
* performance impact (in both "read-only" and heavily updated contexts)
* consistency checks

I would then leave for version 6 the pg_restorebackup utility (unless you
want to do everything at once).

One limitation of the current recovery script is that it cannot accept
multiple incremental backups (it just accepts three parameters: base
backup, incremental backup and merge destination). Maybe you can change the
syntax as follows:

./recover.py DESTINATION BACKUP_1 BACKUP_2 [BACKUP_3, ...]

Thanks a lot for working on this.

I am looking forward to continuing the review.

Ciao,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia - Managing Director
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it

2015-01-13 17:21 GMT+01:00 Marco Nenciarini :

> Il 13/01/15 12:53, Gabriele Bartolini ha scritto:
> > Hi Marco,
> >
> >   could you please send an updated version the patch against the current
> > HEAD in order to facilitate reviewers?
> >
>
> Here is the updated patch for incremental file based backup.
>
> It is based on the current HEAD.
>
> I'm now working to the client tool to rebuild a full backup starting
> from a file based incremental backup.
>
> Regards,
> Marco
>
> --
> Marco Nenciarini - 2ndQuadrant Italy
> PostgreSQL Training, Services and Support
> marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
>

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Tom Lane

Sawada Masahiko  writes:
> On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane  wrote:
>> Yes.  The entire reason that postgresql.auto.conf is separate is that
>> we despaired of reading and rewriting postgresql.conf automatically
>> without making a hash of material in the comments.  Calling the logic
>> a "merge tool" does not make that problem go away.

> The merge tool do not only to merge the all parameters in two
> configuration into one file but also to remove duplicate parameters.
> That is, the configuration files will be one file in logically.

I'll just say one more time that if we thought this could work, we'd not
have arrived at the separate-files design to begin with.  You can work
on it if you like, but I will bet a good deal that you will not end up
with something that gets accepted.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

2015-01-14 Thread Andres Freund

On 2015-01-14 11:09:10 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I've already given up... Given how infrequent it is, suppressing it for
> > gull seems sufficient.
> 
> I'm confused --- I see no format warnings in gull's current reports.

Sorry it was me being confused. I somehow switched gaur and gull because
I had tabs open for both. Turns out that gaur doesn't doesn't do %z at
all...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

2015-01-14 Thread Tom Lane

Andres Freund  writes:
> I've already given up... Given how infrequent it is, suppressing it for
> gull seems sufficient.

I'm confused --- I see no format warnings in gull's current reports.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Sawada Masahiko

On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane  wrote:
> Sawada Masahiko  writes:
>> The postgresql.auto.conf is loaded after loading of postgresql.conf
>> whenever configuration file is loaded or reloaded.
>> This means that parameter in postgresql.auto.conf is quite high
>> priority, so the parameter in postgresql.conf does not work at all
>> even if user set it manually.
>
>> If user want to change stopped postgresql server then user need to
>> merge two configuration file(postgresql.conf and postgresql.auto.conf)
>> while maintaining the consistency manually.
>
>>> From an operational perspective having a written config with duplicate
>> entries is not good thing.
>> I think we need to merge two configuration file into one (or more than
>> one, if it uses like 'include' word)
>
>> The one solution is to add merging tool/commnand which merges two
>> configuration file while maintaining the consistency.
>> This topic have been already discussed?
>
> Yes.  The entire reason that postgresql.auto.conf is separate is that
> we despaired of reading and rewriting postgresql.conf automatically
> without making a hash of material in the comments.  Calling the logic
> a "merge tool" does not make that problem go away.
>

The merge tool do not only to merge the all parameters in two
configuration into one file but also to remove duplicate parameters.
That is, the configuration files will be one file in logically.

It will be clearly complicated work that the user need to rewrite
postgresql.conf manually while maintaining the consistency.
(On top of that, The executing of ALTER SYSTEM command is not allowed
user except super user.)

Is it bad that ALTER SYSTEM parses postgresql.conf again at
AlterSystemSetConfigFile() to get line number and file name of that
parameter?
(or postgresql continue to keep line number and file name of parameter)

Regards,

---
Sawada Masahiko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund

On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote:
> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund  wrote:
> > If you gdb in, and type 'fin' a couple times, to wait till the function
> > finishes, is there actually any progress? I'm wondering whether it's
> > just many catalog accesses + contention, or some other
> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and
> > see how often it's hit.
> 
> well, i restarted the database, forgetting my looper was running which
> immediately spun up and it got stuck again with a similar profile
> (lots of cpu in spinlock):
> 
> Samples: 3K of event 'cycles', Event count (approx.): 2695723228
>  31.16%  postgres[.] s_lock
>  22.32%  postgres[.] tas
>  12.13%  postgres[.] tas
>   5.93%  postgres[.] spin_delay
>   5.69%  postgres[.] LWLockRelease
>   3.75%  postgres[.] LWLockAcquireCommon
>   3.61%  perf[.] 0x000526c4
>   2.51%  postgres[.] FunctionCall2Coll
>   1.48%  libc-2.17.so[.] 0x0016a132
> 
> > If you gdb in, and type 'fin' a couple times,
> (gdb) fin
> Run till exit from #0  0x7ff4c63f7a97 in semop () from
> /lib/x86_64-linux-gnu/libc.so.6
> 0x006de073 in PGSemaphoreLock ()
> (gdb) fin
> Run till exit from #0  0x006de073 in PGSemaphoreLock ()
> 
> It returned once.  Second time, it didn't at least so far (minute or so).

Hm, that's autovac though, not the normal user backends that actually do
stuff, right? If you could additionally check those, it'd be great.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund  wrote:
> If you gdb in, and type 'fin' a couple times, to wait till the function
> finishes, is there actually any progress? I'm wondering whether it's
> just many catalog accesses + contention, or some other
> problem. Alternatively set a breakpoint on ScanPgRelation() or so and
> see how often it's hit.

well, i restarted the database, forgetting my looper was running which
immediately spun up and it got stuck again with a similar profile
(lots of cpu in spinlock):

Samples: 3K of event 'cycles', Event count (approx.): 2695723228
 31.16%  postgres[.] s_lock
 22.32%  postgres[.] tas
 12.13%  postgres[.] tas
  5.93%  postgres[.] spin_delay
  5.69%  postgres[.] LWLockRelease
  3.75%  postgres[.] LWLockAcquireCommon
  3.61%  perf[.] 0x000526c4
  2.51%  postgres[.] FunctionCall2Coll
  1.48%  libc-2.17.so[.] 0x0016a132

> If you gdb in, and type 'fin' a couple times,
(gdb) fin
Run till exit from #0  0x7ff4c63f7a97 in semop () from
/lib/x86_64-linux-gnu/libc.so.6
0x006de073 in PGSemaphoreLock ()
(gdb) fin
Run till exit from #0  0x006de073 in PGSemaphoreLock ()

It returned once.  Second time, it didn't at least so far (minute or so).

>> I can send the code off-list if you guys think it'd help.
>
> Might be interesting.

sent (off-list).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Tom Lane

Sawada Masahiko  writes:
> The postgresql.auto.conf is loaded after loading of postgresql.conf
> whenever configuration file is loaded or reloaded.
> This means that parameter in postgresql.auto.conf is quite high
> priority, so the parameter in postgresql.conf does not work at all
> even if user set it manually.

> If user want to change stopped postgresql server then user need to
> merge two configuration file(postgresql.conf and postgresql.auto.conf)
> while maintaining the consistency manually.

>> From an operational perspective having a written config with duplicate
> entries is not good thing.
> I think we need to merge two configuration file into one (or more than
> one, if it uses like 'include' word)

> The one solution is to add merging tool/commnand which merges two
> configuration file while maintaining the consistency.
> This topic have been already discussed?

Yes.  The entire reason that postgresql.auto.conf is separate is that
we despaired of reading and rewriting postgresql.conf automatically
without making a hash of material in the comments.  Calling the logic
a "merge tool" does not make that problem go away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-14 Thread Sawada Masahiko

Hi all,

The postgresql.auto.conf is loaded after loading of postgresql.conf
whenever configuration file is loaded or reloaded.
This means that parameter in postgresql.auto.conf is quite high
priority, so the parameter in postgresql.conf does not work at all
even if user set it manually.

If user want to change stopped postgresql server then user need to
merge two configuration file(postgresql.conf and postgresql.auto.conf)
while maintaining the consistency manually.

>From an operational perspective having a written config with duplicate
entries is not good thing.
I think we need to merge two configuration file into one (or more than
one, if it uses like 'include' word)

The one solution is to add merging tool/commnand which merges two
configuration file while maintaining the consistency.
This topic have been already discussed?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder

2015-01-14 Thread Heikki Linnakangas


On 01/14/2015 05:26 PM, Timmer, Marius wrote:

Hello Heikki,

abbreviated version:
Sorry, the problem is only the unhandy patch text format, not different 
opinions how to proceed.

Long version:
The v7 patch file already addressed your suggestions,
but the file contained serveral (old) local commits,
the new ones at the end of the patch text/file.


Ah, missed that. I stopped reading when I saw the old stuff there :-).


v7.1 is attached and addresses this issue providing a clean patch file.


Ok, thanks, will take a look.


V8 will - as mentioned - add missing docs and regression tests,


Great!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund

On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote:
> On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund  wrote:
> > On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
> >> Merlin Moncure  writes:
> >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane  wrote:
> >> >> What are the autovac processes doing (according to pg_stat_activity)?
> >>
> >> > pid,running,waiting,query
> >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
> >
> > It'd be interesting to know whether that vacuum gets very frequent
> > semaphore wakeups. Could you strace it for a second or three?
> 
> for 30 seconds+ it just looks like this:
> mmoncure@mernix2 ~ $ sudo strace -p 7105
> Process 7105 attached
> semop(5701638, {{4, -1, 0}}, 1

Ok. So that explains why it's not interruptible.

> all of other processes are yielding out of the spinlock, for example:
> select(0, NULL, NULL, NULL, {0, 1408})  = 0 (Timeout)

Note the above isn't the spinlock, it's the process's semaphore. It'll
only get set if the refcount ever indicates that nobody but autovac is
holding the lock.

> > How did this perform < 9.4?
> this is a new project. However, I can run it vs earlier version.
> 
> Can you guess how many times these dynamic
> > statements are planned? How many different relations are accessed in the
> > dynamically planned queries?
> 
> only once or twice, and only a couple of tables.

Hm. Odd. The first -g profile seemed to indicate a hell of a lot time
was spent in LWLockRelease() - indicating that there's actually
progress. Later profiles/backtraces were less clear.

If you gdb in, and type 'fin' a couple times, to wait till the function
finishes, is there actually any progress? I'm wondering whether it's
just many catalog accesses + contention, or some other
problem. Alternatively set a breakpoint on ScanPgRelation() or so and
see how often it's hit.

> I can send the code off-list if you guys think it'd help.

Might be interesting.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Merlin Moncure

On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund  wrote:
> On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
>> Merlin Moncure  writes:
>> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane  wrote:
>> >> What are the autovac processes doing (according to pg_stat_activity)?
>>
>> > pid,running,waiting,query
>> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
>
> It'd be interesting to know whether that vacuum gets very frequent
> semaphore wakeups. Could you strace it for a second or three?

for 30 seconds+ it just looks like this:
mmoncure@mernix2 ~ $ sudo strace -p 7105
Process 7105 attached
semop(5701638, {{4, -1, 0}}, 1

all of other processes are yielding out of the spinlock, for example:
select(0, NULL, NULL, NULL, {0, 1408})  = 0 (Timeout)

> How did this perform < 9.4?
this is a new project. However, I can run it vs earlier version.

Can you guess how many times these dynamic
> statements are planned? How many different relations are accessed in the
> dynamically planned queries?

only once or twice, and only a couple of tables. This is an
operation that should only take few seconds (inserting a few 10s of
thousands of rows), that has blocked for many hours now.  Usually it
runs through taking a few seconds.  This is either a deadlock or a
deadlock emulating sequence of operations.

I'll try to pull commits that Peter suggested and see if that helps
(I'm getting ready to bring the database down).  I can send the code
off-list if you guys think it'd help.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-14 Thread Andres Freund

On 2015-01-14 10:13:32 -0500, Tom Lane wrote:
> Merlin Moncure  writes:
> > Yes, it is pg_class is coming from  LockBufferForCleanup ().  As you
> > can see above, it has a shorter runtime.   So it was killed off once
> > about a half hour ago which did not free up the logjam.  However, AV
> > spawned it again and now it does not respond to cancel.
> 
> Interesting.  That seems like there might be two separate issues at
> play.  It's plausible that LockBufferForCleanup might be interfering
> with other attempts to scan the index, but then why wouldn't killing
> the AV have unstuck things?

LockBufferForCleanup() unfortunately isn't interruptible. I've every now
and then seen vacuums being stuck for a long while, trying to acquire
cleanup locks - IIRC I complained about that on list even.  So autovac
will only be cancelled when going to the next page.  And even if the
page ever gets a zero (well, one) refcount, by the time autovac is woken
up via the semaphore, it'll often end up being used again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 130 matches

Mail list logo