date:20151215

Re: [HACKERS] pg_stat_replication log positions vs base backups

2015-12-15 Thread Michael Paquier

On Mon, Dec 14, 2015 at 8:59 AM, Michael Paquier
 wrote:
> On Mon, Dec 14, 2015 at 1:01 AM, Magnus Hagander  wrote:
>> I've applied these two patches now.
>>
>> The one that fixes the initialization backpatched to 9.3 which is the oldest
>> one that has it, and the one that changes the actual 0-vs-NULL output to 9.5
>> only as it's a behaviour change.
>
> Thanks!

Interesting. I got just today a bug report that is actually a symptom
that people should be careful about: it is possible that
pg_stat_replication reports 1/potential for sync_priority/sync_state
in the case of a WAL sender in "backup" state: a base backup just
needs to reuse the shared memory slot of a standby that was previously
connected. Commit 61c7bee of Magnus fixes the issue, just let's be
careful if there are similar reports that do not include this fix.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-12-15 Thread Michael Paquier

On Tue, Dec 15, 2015 at 8:41 PM, Fabien COELHO  wrote:
>> PG code usually avoids that, and I recall static analyze tools type
>> coverity complaining that this may lead to undefined behavior. While I
>> agree that this would lead to NaN...
>
>
> Hmmm. In this case that is what is actually wanted. If there is no
> transaction, the tps or average latency or whatever is "NaN", I cannot help
> it, and IEEE 754 allow that. So in this case the tool is wrong if it
> complains, or at least we are right to ignore the warning. Maybe there is
> some special comment to say "ignore this warning on the next line" if it
> occurs, if this is an issue.

Yeah, that's actually fine. I just had a look at Windows stuff, and
things seem to be correct on this side for double:
https://msdn.microsoft.com/en-us/library/aa691373%28v=vs.71%29.aspx
And then I also had a look at src/port/snprintf.c, where things get
actually weird when no transactions are run for a script (emulated
with 2 scripts, one with @1 and the second with @1):
 - 0 transactions (0.0% of total, tps = 0.00)
 - latency average = -1.#IO ms
 - latency stddev = -1.#IO ms
And it seems that this is a bug in fmtfloat() because it does not
handle nan values correctly. Others, any thoughts about that?
It is possible to address things within your patch by using isnan()
and print another message but I think that we had better patch
snprintf.c if my analysis is right.

Oh, and actually when trying to compile the patch on Windows things
are failing because int64_t is undefined :) After switching to int64
things worked better.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-15 Thread Amit Kapila

On Wed, Dec 16, 2015 at 4:54 AM, Robert Haas  wrote:
>
> On Fri, Dec 11, 2015 at 11:49 PM, Amit Kapila 
wrote:
> >> But is it important enough to be worthwhile?  Maybe, maybe not.  I
> >> think we should be working toward a world where the Gather is at the
> >> top of the plan tree as often as possible, in which case
> >> asynchronously kicking off a Gather node won't be that exciting any
> >> more - see notes on the "parallelism + sorting" thread where I talk
> >> about primitives that would allow massively parallel merge joins,
> >> rather than 2 or 3 way parallel.  From my point of view, the case
> >> where we really need some kind of asynchronous execution solution is a
> >> ForeignScan, and in particular a ForeignScan which is the child of an
> >> Append.  In that case it's obviously really useful to be able to kick
> >> off all the foreign scans and then return a tuple from whichever one
> >> coughs it up first.
> >
> > How will this be better than doing the same thing in a way we have done
> > Parallel Sequential Scan at ExecutorRun() time?
>
> I'm not sure if this is what you are asking, but I think it probably
> should be done at ExecutorRun() time, rather than a separate phase.
>

Yes, thats one thing I wanted to know, yet another point which is not
clear to me about this Async infrastructure is why the current
infrastructure
of Parallelism can't be used to achieve the Async benefits of ForeignScan?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] "pg_upgrade" cannot write to log file pg_upgrade_internal.log

2015-12-15 Thread Michael Paquier

On Wed, Dec 16, 2015 at 3:28 PM, Michael Paquier
 wrote:
> On Tue, Dec 15, 2015 at 6:54 PM, Yang, Leo  wrote:
>> I run “pg_upgrade -xx” fromPostgreSQL8.3 to PostgreSQL9.4.5 on window server
>> 2008.  It will show some wrong information:
>>
>> cannot write to log file pg_upgrade_internal.log
>>
>> Failure, exiting.

By the way, pgsql-hackers is the mailing list for development where
new features are being discussed. In case of a bug, you had better
send a report to pgsql-bugs if you think you found a bug.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "pg_upgrade" cannot write to log file pg_upgrade_internal.log

2015-12-15 Thread Michael Paquier

On Tue, Dec 15, 2015 at 6:54 PM, Yang, Leo  wrote:
> I run “pg_upgrade -xx” fromPostgreSQL8.3 to PostgreSQL9.4.5 on window server
> 2008.  It will show some wrong information:
>
> cannot write to log file pg_upgrade_internal.log
>
> Failure, exiting.
> It seems that the upgrade can be successful on window 7. It seems to be a
> permissions problem. I get some information about restricting privileges
> from source code, and I find the changes “Run pg_upgrade and pg_resetxlog
> with restricted privileges on Windows, so that they don't fail when run by
> an administrator (refer to
> http://www.postgresql.org/docs/9.4/static/release-9-4-2.html )”  for release
> 9.4.2. Why did you do this?

The original commit regarding the use of restricted permissions is that:
commit: 2366761bf9eb8e9aa3820b9f3491bcfc48b8188f
author: Andrew Dunstan 
date: Mon, 30 Mar 2015 17:16:57 -0400
Run pg_upgrade and pg_resetxlog with restricted token on Windows

And that's a problem that has been discussed here to correct the fact
that pg_upgrade cannot run with a user that has administrator rights
on Windows:
http://www.postgresql.org/message-id/CAEB4t-NpXGiP5Bqvv3P+d+x=v4bqe+awg+g7ennbn8icpxe...@mail.gmail.com

Note that pg_upgrade_internal.log gets opened *before* the restricted
token is taken for the process running pg_upgrade when the options are
parsed, so it seems hard to believe that your problem and this fix are
related.

> How can I solve this problem? I hope you can help me.

I think that you should first check if the user under which you are
running the upgrade has sufficient permissions to write those log
files. As far as I can see, there is nothing to blame from the code.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabling an index temporarily

2015-12-15 Thread Jeff Janes

On Tue, Dec 15, 2015 at 7:56 PM, Jim Nasby  wrote:
> On 12/13/15 9:27 PM, Tom Lane wrote:
>>
>> Corey Huinker  writes:
>>>
>>> >So, I'd propose we following syntax:
>>> >ALTER INDEX foo SET DISABLED
>>> >-- does the SET indisvalid = false shown earlier.
>>
>> This is exactly*not*  what Tatsuo-san was after, though; he was asking
>> for a session-local disable, which I would think would be by far the more
>> common use-case.  It's hard for me to see much of a reason to disable an
>> index globally while still paying all the cost to maintain it.  Seems to
>> me the typical work flow would be more like "disable index in a test
>> session, try all your queries and see how well they work, if you conclude
>> you don't need the index then drop it".
>
>
> Both have value.
>
> Sometimes the only realistic way to test this is to disable the index
> server-wide and see if anything blows up. Actually, in my experience, that's
> far more common than having some set of queries you can test against and
> call it good.
>
> FWIW, I also don't see the use case for disabling maintenance on an index.
> Just drop it and if you know you'll want to recreate it squirrel away
> pg_get_indexdef() before you do.

If someone wants to make "squirreling away the pg_get_indexdef"
easier, particularly for an entire table or an entire schema or an
entire database, I certainly wouldn't object.  I am not a masochist.

But also, while loading 1.5 million records into a table with 250
million records is horribly, rebuilding all the indexes on a 251.5
million record table from scratch is even more horrible.  I don't know
if suspending maintenance (either globally or just for one session)
and then doing a bulk fix-up would be less horrible, but would be
willing to give it a test run.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in TupleQueueReaderNext() ?

2015-12-15 Thread Rushabh Lathia

Hi All,

TupleQueueReaderNext() always pass true for the nowait into
shm_mq_receive() call. I think here it need to pass the nowait
which is passed by the caller of TupleQueueReaderNext.

This is usefull if the caller want TupleQueueReaderNext() to wait
until it gets the tuple from the particular queue.

PFA to fix the same.

Regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index d625b0d..276956e 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -535,7 +535,7 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
 		void	   *data;
 
 		/* Attempt to read a message. */
-		result = shm_mq_receive(reader->queue, &nbytes, &data, true);
+		result = shm_mq_receive(reader->queue, &nbytes, &data, nowait);
 
 		/* If queue is detached, set *done and return NULL. */
 		if (result == SHM_MQ_DETACHED)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-12-15 Thread Fabien COELHO




On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas  wrote:

It looks fine to me except that I think we should spell out "param" as
"parameter" throughout, instead of abbreviating.


Fine for me. I have updated the first patch as attached (still looking
at the second).


This doc update threshold -> parameter & sgml and C code looks ok to me.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

2015-12-15 Thread Jon Erdman

On Tue, 15 Dec 2015 21:48:52 -0600
Jim Nasby  wrote:

> On 12/13/15 7:37 AM, David Fetter wrote:
> > As I understand it, pushing these into a library has been proposed but
> > not rejected.  That it hasn't happened yet is mostly about the lack of
> > tuits (the round ones) to rewrite the functionality as libraries and
> > refactor pg_dump/pg_restore to use only calls to same.  As usual, it's
> > less about writing the code and more about the enormous amount of
> > testing any such a refactor would entail.
> 
> My understanding as well. IIRC Jon Erdman brought this question up a 
> couple years ago and the response was "It'd probably be accepted, it's 
> just that no one has done the work."

To be clear, if I understand what you are referring to, my name pops up in this 
discussion because in Amsterdam around the time that FDWs were originally 
launched, I had a "brilliant" idea that a great candidate for an FDW would be 
one that could read from a custom dump (-Fc) file and expose it as though it 
contained regular tables, so you could restore one erroneously deleted row from 
a 2TB dump without loading the whole thing, or the whole table in question.

On the face of it this seemed relatively simple since a custom dump has a TOC 
and all the requisite goodies to make this doable, plus the code exists to 
interpret that (for restoring just one table out of a dump file) and all that 
was needed was the "glue" to hook it into FDW.

Initially the reaction (from Magnus if I'm not mistaken) was "that's stupid, 
who would want that", but later Dave Page was wholly on board with it.

At the next pgcon I spoke up on the same subject at the end of a talk about 
FDWs where Tom was in attendance, and all agreed my idea had merit...however, 
unexpectedly they (including Tom) agreed that trying to turn that part of our 
command line functionality into a library (the proper solution) was more effort 
than it was worth, and that if I wanted to try it I should just cut and paste 
the relevant code out of pg_dump and into my FDW, rather than trying to 
refactor and share said code in a .so. [I was *VERY* surprised by this!]

No one said it couldn't be done, but even the "wise men on the mount" conceded 
that it was such a huge undertaking that it was not worth the effort, and 
duplicating and subsequently maintaining said duplicated code was the better 
part of valor.

> > I believe that refactoring much of pg_dump's functionality for the
> > current version of the server into SQL-accessible functions and making
> > pg_dump use only those functions is achievable with available
> > resources.
> >
> > Such a refactor need not be all-or-nothing.  For example, the
> > dependency resolution stuff is a first step that appears to be worth
> > doing by itself even if the effort then pauses, possibly for some
> > time.
> 
> If someone wanted to spend time on this, I suspect it'd be worth looking 
> at how bad some of the backward compatibility issues would be if done in 
> the server. Maybe they wouldn't be that bad. I suspect the audience for 
> this code would be much larger if it was in the server as opposed to a C 
> library.
> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Jon Erdman (aka StuckMojo)
PostgreSQL Zealot

pgpfBipAAobAu.pgp
Description: PGP signature

Re: [HACKERS] extend pgbench expressions with functions

2015-12-15 Thread Michael Paquier

On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas  wrote:
> On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier  
> wrote:
>> I have looked for now at the first patch and finished with the
>> attached while looking at it. Perhaps a committer could look already
>> at that?
>
> It looks fine to me except that I think we should spell out "param" as
> "parameter" throughout, instead of abbreviating.

Fine for me. I have updated the first patch as attached (still looking
at the second).
-- 
Michael


0001-Make-pgbench-documentation-more-precise-for-function.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-15 Thread Haribabu Kommi

On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey  wrote:
> But the run dies.
>
> NOTICE:  SRID value -32897 converted to the officially unknown SRID value 0
> ERROR:  Unknown geometry type: 2139062143 - Invalid type
>
> From the message it looks like geometry gets corrupted at some point,
> causing a read to fail on very screwed up metadata.

Thanks for the test. There was some problem in advance_combination_function
in handling pass by reference data. Here I attached updated patch with the fix.


Regards,
Hari Babu
Fujitsu Australia


parallelagg_poc_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabling an index temporarily

2015-12-15 Thread Jim Nasby

On 12/13/15 9:27 PM, Tom Lane wrote:

Corey Huinker  writes:

>So, I'd propose we following syntax:
>ALTER INDEX foo SET DISABLED
>-- does the SET indisvalid = false shown earlier.

This is exactly*not*  what Tatsuo-san was after, though; he was asking
for a session-local disable, which I would think would be by far the more
common use-case.  It's hard for me to see much of a reason to disable an
index globally while still paying all the cost to maintain it.  Seems to
me the typical work flow would be more like "disable index in a test
session, try all your queries and see how well they work, if you conclude
you don't need the index then drop it".

Both have value.

Sometimes the only realistic way to test this is to disable the index 
server-wide and see if anything blows up. Actually, in my experience, 
that's far more common than having some set of queries you can test 
against and call it good.

FWIW, I also don't see the use case for disabling maintenance on an 
index. Just drop it and if you know you'll want to recreate it squirrel 
away pg_get_indexdef() before you do.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

2015-12-15 Thread Jim Nasby


On 12/13/15 7:37 AM, David Fetter wrote:

As I understand it, pushing these into a library has been proposed but
not rejected.  That it hasn't happened yet is mostly about the lack of
tuits (the round ones) to rewrite the functionality as libraries and
refactor pg_dump/pg_restore to use only calls to same.  As usual, it's
less about writing the code and more about the enormous amount of
testing any such a refactor would entail.


My understanding as well. IIRC Jon Erdman brought this question up a 
couple years ago and the response was "It'd probably be accepted, it's 
just that no one has done the work."



I believe that refactoring much of pg_dump's functionality for the
current version of the server into SQL-accessible functions and making
pg_dump use only those functions is achievable with available
resources.

Such a refactor need not be all-or-nothing.  For example, the
dependency resolution stuff is a first step that appears to be worth
doing by itself even if the effort then pauses, possibly for some
time.


If someone wanted to spend time on this, I suspect it'd be worth looking 
at how bad some of the backward compatibility issues would be if done in 
the server. Maybe they wouldn't be that bad. I suspect the audience for 
this code would be much larger if it was in the server as opposed to a C 
library.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5RC1 wraps today

2015-12-15 Thread Jim Nasby


On 12/15/15 7:31 AM, Tom Lane wrote:

Accordingly, the release team has decided to wrap 9.5RC1 today, Tuesday,
for announcement Friday the 18th.  I'll start making the tarballs around
5PM (2200 UTC).  Sorry for the short notice, but there's no better way.


Since the 18th is my birthday, I think this is actually a good luck sign. ;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Clarify vacuum verbose message

2015-12-15 Thread Jim Nasby

VACUUM VERBOSE spits out two different messages for the heap, one of 
which is rather confusing:


INFO:  "trades": removed 625664 row versions in 20967 pages
...
INFO:  "trades": found 3282 removable, 56891627 nonremovable row 
versions in 1986034 out of 1986034 pages


After discussion with RhodiumToad I think I now understand how this can 
happen:


20:00 < RhodiumToad> the LP_DEAD slot is where the index entries for the 
deleted row point to, so that has to stay
20:01 < RhodiumToad> so for example, if you delete a lot of rows, then 
try and do a lot of updates (which will hint the

 pages as needing pruning),
20:01 < RhodiumToad> then do more updates or a seqscan (to let prune 
look at the pages),
20:02 < RhodiumToad> then do a vacuum, the vacuum will see a lot of 
LP_DEAD slots to remove index entries for, but not

 actual tuples

This example is from a table that was VACUUM FULL'd this weekend and had 
a nightly batch process run last night. That process INSERTs a bunch of 
rows and then does a bunch of UPDATEs on different subsets of those 
rows. I don't believe there would have been a large amount of deletes; 
I'll check with them tomorrow.


IMHO we need to change the messages so they are explicit about line 
pointers vs actual tuples. Trying to obfuscate that just leads to 
confusion. heap_page_prune needs to report only non-rootlp tuples that 
were pruned. (None of the other callers care about the return value.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-15 Thread Robert Haas

On Mon, Dec 14, 2015 at 3:34 AM, Kyotaro HORIGUCHI
 wrote:
> Yes, the most significant and obvious (but hard to estimate the
> benefit) target of async execution is (Merge)Append-ForeignScan,
> which is narrow but freuquently used.  And this patch has started
> from it.
>
> It is because of the startup-heavy nature of FDW. So I involved
> sort as a target later then redesigned to give the ability on all
> nodes.  If it is obviously over-done for the (currently) expected
> benefit and if it is preferable to shrink this patch so as to
> touch only the portion where async-exec has a benefit, I'll do
> so.

Suppose we equip each EState with the ability to fire "callbacks".
Callbacks have the signature:

typedef bool (*ExecCallback)(PlanState *planstate, TupleTableSlot
*slot, void *context);

Executor nodes can register immediate callbacks to be run at the
earliest possible opportunity using a function like
ExecRegisterCallback(estate, callback, planstate, slot, context).
They can registered deferred callbacks that will be called when a file
descriptor becomes ready for I/O, or when the process latch is set,
using a call like ExecRegisterFileCallback(estate, fd, event,
callback, planstate, slot, context) or
ExecRegisterLatchCallback(estate, callback, planstate, slot, context).

To execute callbacks, an executor node can call ExecFireCallbacks(),
which will fire immediate callbacks in order of registration, and wait
for the file descriptors for which callbacks have been registered and
for the process latch when no immediate callbacks remain but there are
still deferred callbacks.  It will return when (1) there are no
remaining immediate or deferred callbacks or (2) one of the callbacks
returns "true".

Then, suppose we add a function bool ExecStartAsync(PlanState *target,
ExecCallback callback, PlanState *cb_planstate, void *cb_context).
For non-async-aware plan nodes, this just returns false.  async-aware
plan nodes should initiate some work, register some callbacks, and
return.  The callback that get registered should arrange in turn to
register the callback passed as an argument when a tuple becomes
available, passing the planstate and context provided by
ExecStartAsync's caller, plus the TupleTableSlot containing the tuple.

So, in response to ExecStartAsync, if there's no tuple currently
available, postgres_fdw can send a query to the remote server and
request a callback when the fd becomes ready-ready.  It must save the
callback passed to ExecStartAsync inside the PlanState someplace so
that when a tuple becomes available it can register that callback.

ExecAppend can call ExecStartAsync on each of its subplans.  For any
subplan where ExecStartAsync returns false, ExecAppend will just
execute it normally, by calling ExecProcNode repeatedly until no more
tuples are returned.  But for async-capable subplans, it can call
ExecStartAsync on all of them, and then call ExecFireCallbacks.  The
tuple-ready callback it passes to its child plans will take the tuple
provided by the child plan and store it into the Append node's slot.
It will then return true if, and only if, ExecFireCallbacks is being
invoked from ExecAppend (which it can figure out via some kind of
signalling either through its own PlanState or centralized signalling
through the EState).  That way, if ExecAppend were itself invoked
asynchronously, its tuple-ready callback could simply populate a slot
appropriately register its invoker's tuple-ready callback.  Whether
called synchronously or asynchronously, each invocation of as
asynchronous append after the first would just need to again
ExecStartAsync on the child that last returned a tuple.

It seems pretty straightforward to fit Gather into this infrastructure.

It is unclear to me how useful this is beyond ForeignScan, Gather, and
Append.  MergeAppend's ordering constraint makes it less useful; we
can asynchronously kick off the request for the next tuple before
returning the previous one, but we're going to need to have that tuple
before we can return the next one.  But it could be done.  It could
potentially even be applied to seq scans or index scans using some set
of asynchronous I/O interfaces, but I don't see how it could be
applied to joins or aggregates, which typically can't really proceed
until they get the next tuple.  They could be plugged into this
interface easily enough but it would only help to the extent that it
enabled asynchrony elsewhere in the plan tree to be pulled up towards
the root.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Noah Misch

On Tue, Dec 15, 2015 at 11:04:07AM -0500, Robert Haas wrote:
> On Tue, Dec 15, 2015 at 9:17 AM, Andres Freund  wrote:
> > On 2015-12-15 09:09:39 -0500, Tom Lane wrote:
> >> In the end, if you're building an old branch, you should be doing it with
> >> old tools.

I grant that's the most risk-averse choice, because each release probably does
get the most testing with compilers contemporary to its dot-zero release date.
Your statement takes a step beyond neglecting warnings.  It is sound advice
for packagers, but it is overkill for the average end user.  We don't
emphasize the similar risks that apply to new libc, Perl, kernels, etc.  You
have back-patched changes to let new compilers build working/optimal code,
e.g. 649839d and e709807; that's great to continue doing.  Acquiring an old
compiler is tedious, and the risk of being the first to encounter a new
miscompilation is low.

> I think that it might be worth back-patching some of the warning fixes
> we've done would be a good idea.  Like this one:
> 
> -   if (!res || !res->cmdStatus || strncmp(res->cmdStatus, "INSERT
> ", 7) != 0)
> +   if (!res || strncmp(res->cmdStatus, "INSERT ", 7) != 0)
> return "";
> 
> I really don't see how back-patching that can hurt anything, and it
> gets rid of a warning, so great.  But not all cases are going to be so
> clear cut, and getting all supported back-branches to compile warning
> free on every contributor's current toolchain sounds like a treadmill
> I don't want to get on.

Agreed.  We don't have or need a firm ban on back-branch warning fixes, but
let's continue to cite the low value of contributions having that aim.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-15 Thread Tomas Vondra


On 12/16/2015 01:26 AM, Simon Riggs wrote:


There is an interesting real world case where we might get some use of
these thoughts.

If we have Orders and OrderItems (FK->Orders)
and we also know (and can Assert) Order.order_date <= OrderItems.ship_date
then a restriction on Orders.order_date > X => OrderItem.ship_date > X
when the two tables are joined on OrderId
and also a restriction on OrderItems.ship_date >= X => Orders.order_date
< X when the two tables are joined on OrderId

Such an assertion could be checked during the FK check, so would not be
expensive to maintain.

One for the future, at least, since we don't have any way of expressing
or enforcing that just yet.


There's a concept of "correlation maps", described in a paper [1] 
presented on VLDB 2009. It essentially talks about deriving conditions 
between attributes of the same table, but I guess it might be modified 
to track the correlations through foreign keys.


Interestingly enough, the data for the paper paper were collected on 
PostgreSQL, but the correlation maps were implemented in an application 
layer on top of the database.


[1] http://www.vldb.org/pvldb/2/vldb09-199.pdf

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-12-15 Thread David Rowley

On 25 August 2015 at 17:25, David Rowley 
wrote:

> On 24 August 2015 at 14:29, Tom Lane  wrote:
>
>> David Rowley  writes:
>> > I have to admit I don't much like it either, originally I had this as an
>> > extra property that was only seen in EXPLAIN VERBOSE.
>>
>> Seems like a reasonable design from here.
>
>
> The attached patch has the format in this way.
>

I've attached a rebased patch against current master as there were some
conflicts from the recent changes to LATERAL join.

On reviewing the patch again I was reminded that the bulk of the changes in
the patch are in analyzejoins.c. These are mostly just a refactor of the
current code to make it more reusable. The diff looks a bit more scary than
it actually is.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

unique_joins_16-12-2015_7a290ab1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-15 Thread Simon Riggs

On 7 December 2015 at 16:44, Simon Riggs  wrote:

> On 6 December 2015 at 16:38, Tom Lane  wrote:
>
> >> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
>> >> that much; it's certainly far less useful than an equality condition.
>>
>> > I'd have thought that my link to a thread of a reported 1100 to 2200
>> times
>> > performance improvement might have made you think twice about claiming
>> the
>> > uselessness of this idea.
>>
>
> Personally, I think this optimization is a long way down the list of
> important items because I don't see it as a typical use case. There are
> already patches submitted for more important items, so this isn't the right
> place to be arguing such things. Not every beneficial optimization has a
> wide use case.
>

There is an interesting real world case where we might get some use of
these thoughts.

If we have Orders and OrderItems (FK->Orders)
and we also know (and can Assert) Order.order_date <= OrderItems.ship_date
then a restriction on Orders.order_date > X => OrderItem.ship_date > X when
the two tables are joined on OrderId
and also a restriction on OrderItems.ship_date >= X => Orders.order_date <
X when the two tables are joined on OrderId

Such an assertion could be checked during the FK check, so would not be
expensive to maintain.

One for the future, at least, since we don't have any way of expressing or
enforcing that just yet.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-15 Thread Robert Haas

On Fri, Dec 11, 2015 at 11:49 PM, Amit Kapila  wrote:
>> But is it important enough to be worthwhile?  Maybe, maybe not.  I
>> think we should be working toward a world where the Gather is at the
>> top of the plan tree as often as possible, in which case
>> asynchronously kicking off a Gather node won't be that exciting any
>> more - see notes on the "parallelism + sorting" thread where I talk
>> about primitives that would allow massively parallel merge joins,
>> rather than 2 or 3 way parallel.  From my point of view, the case
>> where we really need some kind of asynchronous execution solution is a
>> ForeignScan, and in particular a ForeignScan which is the child of an
>> Append.  In that case it's obviously really useful to be able to kick
>> off all the foreign scans and then return a tuple from whichever one
>> coughs it up first.
>
> How will this be better than doing the same thing in a way we have done
> Parallel Sequential Scan at ExecutorRun() time?

I'm not sure if this is what you are asking, but I think it probably
should be done at ExecutorRun() time, rather than a separate phase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-15 Thread Tomas Vondra


Hi,

I've reviewed the patch today, after re-reading the whole discussion.

Overall I'm quite happy with the design - a function is certainly better 
for the use-case. Not just because of the keyword handling issues, but 
because the checks happen in a particular order and terminate once a 
match is found, and that's very difficult to do with a view. So +1 to 
the function approach. Also +1 to abandoning the NOTICE idea and just 
generating rows.


The one unsolved issue is what to do with 1e24cf64. My understanding is 
that the current patch still requires reverting of that patch, but my 
feeling is TL won't be particularly keen about doing that. Or am I 
missing something?


The current patch (v6) also triggers a few warnings during compilation, 
about hostname/address being unitialized in pg_hba_lookup(). That 
happens because 'address' is only set when (! PG_ARGISNULL(2)). Fixing 
it is as simple as


char*address = NULL;
char*hostname = NULL;

at the beginning of the function (this seems correct to me).

The current patch also does not handle 'all' keywords correctly - it 
apparently just compares the values as strings, which is incorrect. For 
example this


SELECT * FROM pg_hba_lookup('all', 'all')

matches this pg_hba.conf line

localallalltrust

That's clearly incorrect, as Alvaro pointed out.

I'm also wondering whether we really need three separate functions in 
pg_proc.


pg_hba_lookup(database, user)
pg_hba_lookup(database, user, address)
pg_hba_lookup(database, user, address, ssl_inuse)

Clearly, that's designed to match the local/host/hostssl/hostnossl cases 
available in pg_hba. But why not to simply use default values instead?


pg_hba_lookup(database TEXT, user TEXT,
  address TEXT DEFAULT NULL,
  ssl_inuse BOOLEAN DEFAULT NULL)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "pg_upgrade" cannot write to log file pg_upgrade_internal.log

2015-12-15 Thread Yang, Leo

I run "pg_upgrade -xx" fromPostgreSQL8.3 to PostgreSQL9.4.5 on window server 
2008.  It will show some wrong information:

cannot write to log file pg_upgrade_internal.log
Failure, exiting.

It seems that the upgrade can be successful on window 7. It seems to be a 
permissions problem. I get some information about restricting privileges from 
source code, and I find the changes "Run pg_upgrade and pg_resetxlog with 
restricted privileges on Windows, so that they don't fail when run by an 
administrator (refer to 
http://www.postgresql.org/docs/9.4/static/release-9-4-2.html )"  for release 
9.4.2. Why did you do this? How can I solve this problem? I hope you can help 
me.

Best Regards,
Leo

Re: [HACKERS] extend pgbench expressions with functions

2015-12-15 Thread Robert Haas

On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier
 wrote:
> On Mon, Nov 9, 2015 at 6:46 AM, Robert Haas  wrote:
>> On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO  wrote:
>>> After looking at the generated html version, I find that the "1/param" and
>>> "2/param" formula are very simple and pretty easy to read, and they would
>>> not be really enhanced with additional spacing.
>>>
>>> ISTM that adaptative spacing (no spacing for level 1 operations, some for
>>> higher level) is a good approach for readability, ie:
>>>
>>>f(i) - f(i+1)
>>>  ^ no spacing here
>>> ^ some spacing here
>>>
>>> So I would suggest to keep the submitted version, unless this is a blocker.
>>
>> Well, I think with the ".0" version it looks more like floating-point
>> math, and I like the extra white-space.  But I'm happy to hear other
>> opinions.
>
> -  defined as (max + min) / 2.0, then value i
> +  defined as (max+min)/2, with
> This thing reminds me a bit of the little of TeX I know, when writing
> things like "\sqrt{1-e^2}" spaces would be printed in the converted
> html, and as that's floating arythmetic, we should have as well a .0.
> So I would agree on both points with Robert.
>
> I have looked for now at the first patch and finished with the
> attached while looking at it. Perhaps a committer could look already
> at that?

It looks fine to me except that I think we should spell out "param" as
"parameter" throughout, instead of abbreviating.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Petr Jelinek


On 2015-12-15 13:51, Andres Freund wrote:

On 2015-12-15 13:46:29 +0100, Petr Jelinek wrote:

I don't think that approach alone is good enough. It might be ok for
selective replication where the replication is driven by tables anyway, but
in general and especially for failover it's not good enough to tell user
that we handle some sequences and they have to fix the rest manually.


I think it solves roughly 80-90% of all usages of sequences. That's a
significant improvement over the status quo.

I'm not saying it's perfect, just that it's applicable to 9.4, and might
be good enough initially.


And I am saying that I think more can and should be done even for 9.4/5.




That's not much different than fixing them all in practice as you
script it anyway.


If you can easily script it, it's just the same type (sequences owned by
a single column), everything else starts to be a bit more complicated anyway.



Well, there is some difference between scripting it for general use-case 
and scripting it with domain knowledge, but I see what you mean.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-12-15 Thread Robert Haas

On Sun, Dec 13, 2015 at 6:35 AM, and...@anarazel.de  wrote:
> On 2015-12-12 21:15:52 -0500, Robert Haas wrote:
>> On Sat, Dec 12, 2015 at 1:17 PM, and...@anarazel.de  
>> wrote:
>> > Here's two patches doing that. The first is an adaption of your
>> > constants patch, using an enum and also converting xlog.c's locks. The
>> > second is the separation into distinct tranches.
>>
>> Personally, I prefer the #define approach to the enum, but I can live
>> with doing it this way.
>
> I think the lack needing to adjust the 'last defined' var is worth it...
>
>> Other than that, I think these patches look
>> good, although if it's OK with you I would like to make a pass over
>> the comments and the commit messages which seem to me that they could
>> benefit from a bit of editing (but not much substantive change).
>
> Sounds good to me. You'll then commit that?

Yes.  Done!

In terms of this project overall, NumLWLocks() now knows about only
four categories of stuff: fixed lwlocks, backend locks (proc.c),
replication slot locks, and locks needed by extensions.  I think it'd
probably be fine to move the backend locks into PGPROC directly, and
the replication slot locks into ReplicationSlot.  I don't know if that
will improve performance but it doesn't seem like it should regress
anything, though we should probably test that.  I'm not sure what to
do about extension-requested locks - maybe give those their own
tranche somehow?

I think we should also look at tranche-ifying the locks counted in
NUM_FIXED_LWLOCKS but not NUM_INDIVIDUAL_LWLOCKS.  That's basically
just the lock manager locks and the predicate lock manager locks.
That would get us to a place where every lock in the system has a
descriptive name, either via the tranche or because it's an
individually named lock, which sounds excellent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2015-12-15 Thread Tomas Vondra


Hi,

On 12/07/2015 03:47 PM, Stas Kelvich wrote:

Hello, fixed.


I've looked at the patch today, seems mostly fine to me.

Three comments though:

1) (nitpicking) There seem to be some minor whitespace issues, i.e.
   trailing spaces, empty lines being added/removed, etc.

2) one of the regression tests started to fail

   SELECT '-1e-700'::cube AS cube;

   This used to return (0) but now I get (-0). As this query existed in
   1.0, it's probably due to change in the C code. Now sure where.

3) I wonder why the docs were changed like this:

   
-   It does not matter which order the opposite corners of a cube are
-   entered in.  The cube functions
-   automatically swap values if needed to create a uniform
-   lower left — upper right internal representation.
+   When corners coincide cube stores only one corner along with a
special flag in order to reduce size wasted.
   

   Was the old behavior removed? I don't think so - it seems to behave
   as before, so why to remove this information? Maybe it's not useful?
   But then why add the bit about optimizing storage of points?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] _mdfd_getseg can be expensive

2015-12-15 Thread Andres Freund

On 2014-11-01 18:23:47 +0100, Andres Freund wrote:
> On 2014-11-01 12:57:40 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2014-10-31 18:48:45 -0400, Tom Lane wrote:
> > >> While the basic idea is sound, this particular implementation seems
> > >> pretty bizarre.  What's with the "md_seg_no" stuff, and why is that
> > >> array typed size_t?
> > 
> > > It stores the length of the array of _MdfdVec entries.
> > 
> > Oh.  "seg_no" seems like not a very good choice of name then.
> > Perhaps "md_seg_count" or something like that would be more intelligible.
> 
> That's fine with me.

Went with md_num_open_segs - reading the code that was easier to
understand.

> So I'll repost a version with those fixes.

Took a while. But here we go. The attached version is a significantly
revised version of my earlier patch. Notably I've pretty much entirely
revised the code in _mdfd_getseg() to be more similar to the state in
master. Also some comment policing.

As it's more than a bit painful to test with large numbers of 1GB
segments, I've rebuilt my local postgres with 100kb segments. Running a
pgbench -s 300 with 128MB shared buffers clearly shows the efficiency
differnces: 320tps vs 4900 in an assertion enabled build. Obviously
that's kind of an artificial situation...

But I've also verified this with a) fake relations built out of sparse
files, b) actually large relations. The latter shows performance
benefits as well, but my patience limits the amount of testing I was
willing to do...

Kevin, Robert: I've CCed you because Robert commented in the recent
mdnblocks() blocks thread that Kevin lamented the O(n)'ness of md.c...

Andres
>From a0b30154714df3e13bb668fa9590f7be40e4de35 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 15 Dec 2015 19:01:36 +0100
Subject: [PATCH 1/2] Faster PageIsVerified() for the all zeroes case.

---
 src/backend/storage/page/bufpage.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index df77bb2..de9800f 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -81,7 +81,7 @@ bool
 PageIsVerified(Page page, BlockNumber blkno)
 {
 	PageHeader	p = (PageHeader) page;
-	char	   *pagebytes;
+	size_t	   *pagebytes;
 	int			i;
 	bool		checksum_failure = false;
 	bool		header_sane = false;
@@ -118,10 +118,17 @@ PageIsVerified(Page page, BlockNumber blkno)
 			return true;
 	}
 
-	/* Check all-zeroes case */
+	/*
+	 * Check all-zeroes case. Luckily BLCKSZ is guaranteed to always be a
+	 * multiple of size_t - and it's much faster compare memory using the
+	 * processor's native word size.
+	 */
+	StaticAssertStmt(BLCKSZ == (BLCKSZ / sizeof(size_t)) * sizeof(size_t),
+	 "BLCKSZ has to be a multiple of sizeof(size_t)");
+
 	all_zeroes = true;
-	pagebytes = (char *) page;
-	for (i = 0; i < BLCKSZ; i++)
+	pagebytes = (size_t *) page;
+	for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
 	{
 		if (pagebytes[i] != 0)
 		{
-- 
2.6.0.rc3

>From 15150d1ba7f17d3a241be3073a03e6bb9a36a937 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 15 Dec 2015 19:01:36 +0100
Subject: [PATCH 2/2] Improve scalability of md.c for large relations.

Previously several routines in md.c were O(#segments) - a problem these
days, where it's not uncommon to have tables in the multi TB range.

Replace the linked list of segments hanging of SMgrRelationData with an
array of opened segments. That allows O(1) access to individual
segments, if they've previously been opened.
---
 src/backend/storage/smgr/md.c   | 337 +++-
 src/backend/storage/smgr/smgr.c |   4 +-
 src/include/storage/smgr.h  |   8 +-
 3 files changed, 200 insertions(+), 149 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 42a43bb..cb163f4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -92,27 +92,23 @@
  *	out to an unlinked old copy of a segment file that will eventually
  *	disappear.
  *
- *	The file descriptor pointer (md_fd field) stored in the SMgrRelation
- *	cache is, therefore, just the head of a list of MdfdVec objects, one
- *	per segment.  But note the md_fd pointer can be NULL, indicating
- *	relation not open.
+ *	File descriptors are stored in md_seg_fds arrays inside SMgrRelation. The
+ *	length of these arrays is stored in md_num_open_segs.  Note that
+ *	md_num_open_segs having a specific value does not necessarily mean the
+ *	relation doesn't have additional segments; we may just not have opened the
+ *	next segment yet.  (We could not have "all segments are in the array" as
+ *	an invariant anyway, since another backend could extend the relation while
+ *	we aren't looking.)  We do not have entries for inactive segments,
+ *	however; as soon as we find a partial segment, we assume that any
+ *	subsequent segments are inactive.
  *
- *	Also note that mdfd_chain == NUL

Re: [HACKERS] pam auth - add rhost item

2015-12-15 Thread Tomas Vondra

Actually, one more thing - the patch should probably update the docs 
too, because client-auth.sgml currently says this in the "auth-pam" section:


   
...
PAM is used only to validate user name/password pairs.
...
   

I believe that's no longer true, because the patch adds PAM_RHOST to the 
user/password fields.


Regarding the other PAM_* fields, none of them strikes me as very useful 
for our use case.


In a broader sense, I think this patch is quite desirable, despite being 
rather simple (which is good). I certainly don't agree with suggestions 
that we can already do things like this through pg_hba.conf. If we're 
providing PAM authentication, let's make it as complete/useful as 
possible. In some cases modifying PAM may not be feasible - e.g. some 
management systems rely on PAM as much as possible, and doing changes in 
other ways is a major hassle.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pam auth - add rhost item

2015-12-15 Thread Tomas Vondra


Hi,

On 11/25/2015 01:45 PM, Grzegorz Sampolski wrote:

Well, this is not matter since pam_set_item expect this argument as a
string.
Besides there is not always possible to get map from ip address to
hostname. So hostname is just a synonim for whatever information you
cat get about remote machine.


I'm no PAM guru, but I don't see how this implies that we should 
entirely abandon FQDN if it's available. Other tools relying on PAM have 
to face the same question, so how do they address it?


For example this [1] sssd ticket suggests that for example OpenSSH makes 
this configurable - when UseDNS=yes then it attempts to resolve the IP 
address to a FQDN, with UseDNS=no it passes the IP address without 
attempting to use DNS.


[1] https://fedorahosted.org/sssd/ticket/908

So maybe we need a knob for this, similar to UseDNS in OpenSSH?

Otherwise, the patch seems fine to me, except for whitespace issues. 
Please, make sure you use tabs for indentation (and not spaces).



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comparing two PostgreSQL databases -- order of pg_dump output

2015-12-15 Thread rsindlin

Hi Joe,

I have run into what seems to be a similar issue with pg_dump --schema-only
in its trigger ordering.  Did you ever find a satisfactory solution to this? 
I posted my specific problem on  DBA.StackExchange

 
, and based on some research I did, it seems like it could be an issue
related to the Collate setting of the DB.  I was wondering if you had come
across anything supporting or refuting that.

Thanks,
-Randall



--
View this message in context: 
http://postgresql.nabble.com/Comparing-two-PostgreSQL-databases-order-of-pg-dump-output-tp4751332p5877720.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Robert Haas

On Tue, Dec 15, 2015 at 11:08 AM, Andres Freund  wrote:
>> However, I don't think this is exactly what you are proposing.  I'm
>> skeptical of the idea that _mdfd_getseg() should probe ahead to see
>> whether we're dealing with a malformed relation where the intermediate
>> segments still exist but have zero length.
>
> That's not exactly what I was thinking of. I'm was thinking of doing a
> _mdnblocks(reln, forknum, v) == RELSEG_SIZE check in _mdfd_getseg()'s
> main loop, whenever nextsegno < targetseg. That'll make that check
> rather cheap.  Which sounds pretty much like your 2).

Hmm, yes it does.  But now that I think about it, we're not otherwise
doing _mdnblocks() in that loop.  So that would add a system call per
loop iteration.  That doesn't seem like such a swell idea.

If you're OK with it, I think I'll commit the original patch.  That
seems like a good thing to do regardless of what we decide about the
rest of this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund

On 2015-12-15 11:28:10 -0500, Robert Haas wrote:
> Hmm, yes it does.  But now that I think about it, we're not otherwise
> doing _mdnblocks() in that loop.  So that would add a system call per
> loop iteration.  That doesn't seem like such a swell idea.

We'd do that only when intially open the relation (that far). And after
we've already executed a, much more expensive, open() on the same
file. So I'm not particularly bothered by that.

> If you're OK with it, I think I'll commit the original patch.  That
> seems like a good thing to do regardless of what we decide about the
> rest of this.

Ok.

I thought for a moment that that patch would need a bit more editing
because of:
/*
 * This is the last segment we want to keep. Truncate 
the file to
 * the right length, and clear chain link that points 
to any
 * remaining segments (which we shall zap). NOTE: if 
nblocks is
 * exactly a multiple K of RELSEG_SIZE, we will 
truncate the K+1st
 * segment to 0 length but keep it. This adheres to the 
invariant
 * given in the header comments.
 */
in mdtruncate(). But afaics this doesn't change:
 *  On disk, a relation must consist of consecutively numbered segment
 *  files in the pattern
 *  -- Zero or more full segments of exactly RELSEG_SIZE blocks each
 *  -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE 
blocks
 *  -- Optionally, any number of inactive segments of size 0 blocks.
at all, so I think we're good.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund

On 2015-12-15 10:53:58 -0500, Robert Haas wrote:
> On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund  wrote:
> > Unless in recovery in the startup process, or when EXTENSION_CREATE is
> > passed to it. Depending on whether it or mdnblocks were called first,
> > and depending on which segment is missing.  In that case it'd *possibly*
> > pad the last block in a sgment with zeroes, Yes, only the last block:
> 
> Yes, that's clearly inadequate.  I think the fact that it only pads
> the last block, possibly creating a sparse file, should also be fixed,
> but as a separate patch.

I was wondering about that - but after a bit of thinking I'm disinclined
to got hat way. Consider the scenario where we write to the last block
in a humongous relation, drop that relation, and then crash, before a
checkpoint. Right now we'll write the last block of each segment during
recovery. After such a change we'd end up rewriting the whole file...

I think we should primarily update that comment.

> 1. Before extending a relation to RELSEG_SIZE, verify that the next
> segment of the relation doesn't already exist on disk.  If we find out
> that it does, then throw an error and refuse the extension.

That bit wouldn't work directly that way IIRC - the segment is allowed
to exist as an 'inactive' segment. We only truncate segments to 0 in
mdtruncate()...  But a size 0 check should do the trick.

> 2. Teach _mdfd_getseg() that a segment of a size less than RELSEG_SIZE
> is, by definition, the last segment, and don't even check whether
> further files exist on disk.  mdnblocks() already behaves this way, so
> it would just be making things consistent.

> However, I don't think this is exactly what you are proposing.  I'm
> skeptical of the idea that _mdfd_getseg() should probe ahead to see
> whether we're dealing with a malformed relation where the intermediate
> segments still exist but have zero length.

That's not exactly what I was thinking of. I'm was thinking of doing a
_mdnblocks(reln, forknum, v) == RELSEG_SIZE check in _mdfd_getseg()'s
main loop, whenever nextsegno < targetseg. That'll make that check
rather cheap.  Which sounds pretty much like your 2).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Robert Haas

On Tue, Dec 15, 2015 at 9:17 AM, Andres Freund  wrote:
> On 2015-12-15 09:09:39 -0500, Tom Lane wrote:
>> In the end, if you're building an old branch, you should be doing it with
>> old tools.
>
> That I don't buy for even one second. Old branches are used in up2date
> environments in production. Absolutely regularly. apt.pg.o, yum.pg.o et
> al do provide them for that.

Really?  I'm kind of with Tom; I don't expect that keeping old
branches warning-free on new compilers is really workable.  I think
the situation today is actually better than it was a few years ago, at
least for me.  I get some warnings on older branches, but with other
toolchains I've used for PG hacking at other times, it was much worse.
I think that it might be worth back-patching some of the warning fixes
we've done would be a good idea.  Like this one:

-   if (!res || !res->cmdStatus || strncmp(res->cmdStatus, "INSERT
", 7) != 0)
+   if (!res || strncmp(res->cmdStatus, "INSERT ", 7) != 0)
return "";

I really don't see how back-patching that can hurt anything, and it
gets rid of a warning, so great.  But not all cases are going to be so
clear cut, and getting all supported back-branches to compile warning
free on every contributor's current toolchain sounds like a treadmill
I don't want to get on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Robert Haas

On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund  wrote:
> On 2015-12-09 16:50:06 -0500, Robert Haas wrote:
>> Now, if subsequent to this an index scan happens to sweep through and
>> try to fetch a block in 123456.2, it will work!  This happens because
>> _mdfd_getseg() doesn't care about the length of the segments; it only
>> cares whether or not they exist.
>
> Unless in recovery in the startup process, or when EXTENSION_CREATE is
> passed to it. Depending on whether it or mdnblocks were called first,
> and depending on which segment is missing.  In that case it'd *possibly*
> pad the last block in a sgment with zeroes, Yes, only the last block:

Yes, that's clearly inadequate.  I think the fact that it only pads
the last block, possibly creating a sparse file, should also be fixed,
but as a separate patch.

>> If 123456.1 were actually missing,
>> then we'd never test whether 123456.2 exists and we'd get an error.
>> But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite
>> happy to fetch blocks in 123456.2; it considers the empty 123456.1
>> file to be a sufficient reason to look for 123456.2, and seeing that
>> file, and finding the block it wants therein, it happily returns that
>> block, blithely ignoring the fact that it passed over a completely .1
>> segment before returning a block from .2.  This is maybe not the
>> smartest thing ever.
>
> I think it might be worthwhile to add a check against this. It'd not be
> very hard to ensure the segment is indeed large enough iff further
> segments exist. Possibly not an ERROR, but a WARNING might be useful. To
> avoid overhead and absurd log spamming we should make that check only if
> the the segment isn't already open.

We could:

1. Before extending a relation to RELSEG_SIZE, verify that the next
segment of the relation doesn't already exist on disk.  If we find out
that it does, then throw an error and refuse the extension.

2. Teach _mdfd_getseg() that a segment of a size less than RELSEG_SIZE
is, by definition, the last segment, and don't even check whether
further files exist on disk.  mdnblocks() already behaves this way, so
it would just be making things consistent.

If we did these things, then if you got a hole in your relation, the
entire system would agree that the data after the hole doesn't exist
any more, and if the segment prior to the hole got re-extended, it
would be prevented from "bridging" the hole.  We wouldn't be incurring
any additional overhead, either, because I bet #2 would save more
cycles than #1 costs.

However, I don't think this is exactly what you are proposing.  I'm
skeptical of the idea that _mdfd_getseg() should probe ahead to see
whether we're dealing with a malformed relation where the intermediate
segments still exist but have zero length.  If we fix mdnblocks() as
I've proposed, then that scenario will become much less probable.  Of
course, the operating system could still decide to truncate a segment
other than the last for some reason or other, but it could also decide
to remove the file completely, or to remove several files, or lots of
other things.

>> The comment in mdnblocks.c says this:
>>
>>  * Because we pass O_CREAT, we will create the
>> next segment (with
>>  * zero length) immediately, if the last
>> segment is of length
>>  * RELSEG_SIZE.  While perhaps not strictly
>> necessary, this keeps
>>  * the logic simple.
>>
>> I don't really see how this "keeps the logic simple".  What it does is
>> allow sequential scans and index scans to have two completely
>> different notions of how many accessible blocks there are in the
>> relation.  Granted, this kind of thing really shouldn't happen, but
>> sometimes bad things do happen.  Therefore, I propose the attached
>> patch.
>
> I'm in favor of changing this behaviour - it's indeed weird and somewhat
> error prone.
>
> Do you want to change this in master, or backpatch a change? To me the
> code in md.c is subtle and complicated enough that I'm rather inclined
> to only change this in master.

Yeah, that's my inclination as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()

2015-12-15 Thread Andres Freund

On 2015-12-09 16:50:06 -0500, Robert Haas wrote:
> Now, if subsequent to this an index scan happens to sweep through and
> try to fetch a block in 123456.2, it will work!  This happens because
> _mdfd_getseg() doesn't care about the length of the segments; it only
> cares whether or not they exist.

Unless in recovery in the startup process, or when EXTENSION_CREATE is
passed to it. Depending on whether it or mdnblocks were called first,
and depending on which segment is missing.  In that case it'd *possibly*
pad the last block in a sgment with zeroes, Yes, only the last block:
 *
 * We have to maintain the invariant that segments 
before the last
 * active segment are of size RELSEG_SIZE; therefore, 
pad them out
 * with zeroes if needed.  (This only matters if caller 
is
 * extending the relation discontiguously, but that can 
happen in
 * hash indexes.)
 */
if (behavior == EXTENSION_CREATE || InRecovery)
{
if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE)
{
char   *zerobuf = palloc0(BLCKSZ);

mdextend(reln, forknum,
 nextsegno * 
((BlockNumber) RELSEG_SIZE) - 1,
 zerobuf, skipFsync);
pfree(zerobuf);
}
v->mdfd_chain = _mdfd_openseg(reln, forknum, 
+nextsegno, O_CREAT);
}

> If 123456.1 were actually missing,
> then we'd never test whether 123456.2 exists and we'd get an error.
> But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite
> happy to fetch blocks in 123456.2; it considers the empty 123456.1
> file to be a sufficient reason to look for 123456.2, and seeing that
> file, and finding the block it wants therein, it happily returns that
> block, blithely ignoring the fact that it passed over a completely .1
> segment before returning a block from .2.  This is maybe not the
> smartest thing ever.

I think it might be worthwhile to add a check against this. It'd not be
very hard to ensure the segment is indeed large enough iff further
segments exist. Possibly not an ERROR, but a WARNING might be useful. To
avoid overhead and absurd log spamming we should make that check only if
the the segment isn't already open.

> The comment in mdnblocks.c says this:
> 
>  * Because we pass O_CREAT, we will create the
> next segment (with
>  * zero length) immediately, if the last
> segment is of length
>  * RELSEG_SIZE.  While perhaps not strictly
> necessary, this keeps
>  * the logic simple.
> 
> I don't really see how this "keeps the logic simple".  What it does is
> allow sequential scans and index scans to have two completely
> different notions of how many accessible blocks there are in the
> relation.  Granted, this kind of thing really shouldn't happen, but
> sometimes bad things do happen.  Therefore, I propose the attached
> patch.

I'm in favor of changing this behaviour - it's indeed weird and somewhat
error prone.

Do you want to change this in master, or backpatch a change? To me the
code in md.c is subtle and complicated enough that I'm rather inclined
to only change this in master.

This code is really hard to test effectively :(.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Andres Freund

On 2015-12-15 09:09:39 -0500, Tom Lane wrote:
> In the end, if you're building an old branch, you should be doing it with
> old tools.

That I don't buy for even one second. Old branches are used in up2date
environments in production. Absolutely regularly. apt.pg.o, yum.pg.o et
al do provide them for that.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Tom Lane

Andres Freund  writes:
> I think that's an ok one-off policy. But looking back it was pretty much
> always the case that the release -3 or so started to look pretty
> horrible, warning wise.

I think that's a condition of life.  The compilers are moving targets,
no matter that they allegedly implement standards.  We endeavor to keep
HEAD able to compile warning-free on recent compilers, but I don't think
we can make such a promise for back branches.  This thread offers a great
example of why not: the changes required would sometimes be too invasive
to be justifiable.

In the end, if you're building an old branch, you should be doing it with
old tools.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-15 Thread Robert Haas

On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila  wrote:
> set enable_hashjoin=off;
> set enable_mergejoin=off;

[ ... ]


> Now here the point to observe is that non-parallel case uses both less
> Execution time and Planning time to complete the statement.  There
> is a considerable increase in planning time without any benefit in
> execution.

So, you forced the query planner to give you a bad plan, and then
you're complaining that the plan is bad?  That's not a very surprising
result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Andres Freund

On 2015-12-15 08:53:25 -0500, Robert Haas wrote:
> So... that means we can't really get rid of these warnings on 9.1,
> IIUC.

Well, we could fix them. Or, as proposed here, just silence that
category.

> I agree it would be nice to do if this were no issue, but as it
> is I'm inclined to think we should just live with it for the next ~10
> months.  After that 9.1 will no longer be a supported branch.

I think that's an ok one-off policy. But looking back it was pretty much
always the case that the release -3 or so started to look pretty
horrible, warning wise.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small query, about skipping dump in dumpAttrDef

2015-12-15 Thread Tom Lane

amul sul  writes:
> In dumpAttrDef() function we are skipping dump if table definition is not 
> dumped(i.e. by checking 
> tbinfo->dobj.dump), its absolutely alright to do this.

> But, in dumpConstraint() we doing same by checking constraint dump 
> flag(coninfo->dobj.dump) instead of table dump flag(tbinfo->dobj.dump).

> IMHO, for a code consistency we should use attribute dump 
> flag(adinfo->dobj.dump) instead of table dump flag as shown below:

I don't see much point in doing it that way, since that code could not
possibly work right if the constraint has a different dump flag from the
table: it would either omit a required component of the table or emit an
ALTER TABLE against a nonexistent table.  So this change could not fix any
bug, and might introduce some if the flag values weren't carefully kept
equal.  Note also that the normal "not separate" code path for dumping
defaults hasn't got any test on the default's flag, so this would be
adding inconsistency compared to that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Robert Haas

On Tue, Dec 15, 2015 at 8:17 AM, Andres Freund  wrote:
> On 2015-12-15 08:13:06 -0500, Robert Haas wrote:
>> On Tue, Dec 15, 2015 at 7:59 AM, Andres Freund  wrote:
>> > On 2015-12-14 11:00:32 -0500, Robert Haas wrote:
>> >> IIUC, the main thing that causes incompatible pointer type warnings on
>> >> 9.1 is the conflation of FILE with gzFile in pg_dump and
>> >> pg_basebackup.
>> >
>> > Right.
>> >
>> >> Not sure exactly which commits fixed that offhand.
>> >
>> > d923125b77c5d698bb8107a533a21627582baa43 , but that doesn't fix the
>> > 'files' output format, which was removed in
>> > 19f45565f581ce605956c29586bfd277f6012eec
>>
>> Well, we're not going to back-patch the removal of the files format, right?
>
> Obviously not.

So... that means we can't really get rid of these warnings on 9.1,
IIUC.  I agree it would be nice to do if this were no issue, but as it
is I'm inclined to think we should just live with it for the next ~10
months.  After that 9.1 will no longer be a supported branch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.5RC1 wraps today

2015-12-15 Thread Tom Lane

A lot got done over the last couple of days to close out open issues for
9.5, and we're now at a point where we could realistically produce an RC
package.  However, we're also staring the holiday season in the face.
If we were to do a wrap per normal schedule next week, we'd be announcing
on Christmas Eve, not a schedule calculated to draw much attention from
potential testers.  The week after is no better.  So either we do it
*right now*, or we slip until the new year.

Accordingly, the release team has decided to wrap 9.5RC1 today, Tuesday,
for announcement Friday the 18th.  I'll start making the tarballs around
5PM (2200 UTC).  Sorry for the short notice, but there's no better way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Andres Freund

On 2015-12-15 08:13:06 -0500, Robert Haas wrote:
> On Tue, Dec 15, 2015 at 7:59 AM, Andres Freund  wrote:
> > On 2015-12-14 11:00:32 -0500, Robert Haas wrote:
> >> IIUC, the main thing that causes incompatible pointer type warnings on
> >> 9.1 is the conflation of FILE with gzFile in pg_dump and
> >> pg_basebackup.
> >
> > Right.
> >
> >> Not sure exactly which commits fixed that offhand.
> >
> > d923125b77c5d698bb8107a533a21627582baa43 , but that doesn't fix the
> > 'files' output format, which was removed in
> > 19f45565f581ce605956c29586bfd277f6012eec
> 
> Well, we're not going to back-patch the removal of the files format, right?

Obviously not.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Robert Haas

On Tue, Dec 15, 2015 at 7:59 AM, Andres Freund  wrote:
> On 2015-12-14 11:00:32 -0500, Robert Haas wrote:
>> On Mon, Dec 14, 2015 at 10:06 AM, Andres Freund  wrote:
>> > to compile 9.1 without warnings. -Wincompatible-pointer-types is quite
>> > useful to detect problems. The rest indeed is pretty 'Meh'.
>>
>> IIUC, the main thing that causes incompatible pointer type warnings on
>> 9.1 is the conflation of FILE with gzFile in pg_dump and
>> pg_basebackup.
>
> Right.
>
>> Not sure exactly which commits fixed that offhand.
>
> d923125b77c5d698bb8107a533a21627582baa43 , but that doesn't fix the
> 'files' output format, which was removed in
> 19f45565f581ce605956c29586bfd277f6012eec

Well, we're not going to back-patch the removal of the files format, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing warnings in back branches?

2015-12-15 Thread Andres Freund

On 2015-12-14 11:00:32 -0500, Robert Haas wrote:
> On Mon, Dec 14, 2015 at 10:06 AM, Andres Freund  wrote:
> > to compile 9.1 without warnings. -Wincompatible-pointer-types is quite
> > useful to detect problems. The rest indeed is pretty 'Meh'.
>
> IIUC, the main thing that causes incompatible pointer type warnings on
> 9.1 is the conflation of FILE with gzFile in pg_dump and
> pg_basebackup.

Right.

> Not sure exactly which commits fixed that offhand.

d923125b77c5d698bb8107a533a21627582baa43 , but that doesn't fix the
'files' output format, which was removed in
19f45565f581ce605956c29586bfd277f6012eec

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Andres Freund

On 2015-12-15 13:46:29 +0100, Petr Jelinek wrote:
> I don't think that approach alone is good enough. It might be ok for
> selective replication where the replication is driven by tables anyway, but
> in general and especially for failover it's not good enough to tell user
> that we handle some sequences and they have to fix the rest manually.

I think it solves roughly 80-90% of all usages of sequences. That's a
significant improvement over the status quo.

I'm not saying it's perfect, just that it's applicable to 9.4, and might
be good enough initially. I'm not arguing against adding sequence
decoding here.

> That's not much different than fixing them all in practice as you
> script it anyway.

If you can easily script it, it's just the same type (sequences owned by
a single column), everything else starts to be a bit more complicated anyway.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Petr Jelinek


On 2015-12-15 13:17, Andres Freund wrote:

On 2015-12-14 16:19:33 +0800, Craig Ringer wrote:


Needed to make logical replication based failover possible.


While it'll make it easier, I think it's certainly quite possible to do
so without this feature if you accept some sane restrictions. If you
e.g. define to only handle serial columns (i.e. sequences that
used/owned by exactly one table & column), you can do a 'good enough'
emulation the output plugin.

Take something like BDR's bdr_relcache.c (something which you're going
to end up needing for any nontrivial replication solution). Everytime
you build a cache entry you look up whether there's an owned by
sequence.  When decoding an insert or update to that relation, also emit
an 'increase this sequence to at least XXX' message.

While it's not perfect, this has the big advantage of being doable with 9.4.



I don't think that approach alone is good enough. It might be ok for 
selective replication where the replication is driven by tables anyway, 
but in general and especially for failover it's not good enough to tell 
user that we handle some sequences and they have to fix the rest 
manually. That's not much different than fixing them all in practice as 
you script it anyway.


However, if it was combined with something like what londiste does, 
which is to check sequences periodically and send last_value + some 
reasonable buffer, it could work well in most cases. In this scenario 
your method would be used for checking that sequence is close to going 
over buffer and firing the periodic send sooner than it would be if it 
was purely time based.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2015-12-15 Thread Andres Freund

On 2015-12-11 17:00:01 +0300, Aleksander Alekseev wrote:
> The problem is that code between LWLockAcquire (lock.c:881) and
> LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
> old-good gettimeofday and logging method I managed to find a bottleneck:
> 
> -- proclock = SetupLockInTable [lock.c:892]
>  `-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
>`-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
>  `-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]
> 
> If my measurements are not wrong (I didn't place gettimeofday between
> SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
> waiting for a spinlock, which doesn't seems right.

Well, it's quite possible that your process was scheduled out, while
waiting for that spinlock. Which'd make 3ms pretty normal.

I'd consider using a LWLock instead of a spinlock here. I've seen this
contended in a bunch of situations, and the queued behaviour, combined
with directed wakeups on the OS level, ought to improve the worst case
behaviour measurably.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-15 Thread Andres Freund

On 2015-12-14 16:19:33 +0800, Craig Ringer wrote:
> On 14 December 2015 at 11:28, Craig Ringer  wrote:
>
> > Hi all
> >
> > Attached is a patch against 9.6 to add support for informing logical
> > decoding plugins of the new sequence last_value when sequence advance WAL
> > records are processed during decoding.
> >
>
>
> Attached a slightly updated version. It just has less spam in the
> regression tests, by adding a new option to test_decoding to show
> sequences, which it doesn't enable except in sequence specific tests.

I think this is quite the wrong approach. You're calling the logical
decoding callback directly from decode.c, circumventing
reorderbuffer.c. While you don't need the actual reordering, you *do*
need the snapshot integration.   As you since noticed you can't just
look into the sequence tuple and be happy with that, you need to do
pg_class lookups et al. Furhtermore callbacks can validly expect to have
a snapshot set.

At the very least what you need to do is to SetupHistoricSnapshot(),
then lookup the actual identity of the sequence using
RelidByRelfilenode() and only then you can call the callback.

> Needed to make logical replication based failover possible.

While it'll make it easier, I think it's certainly quite possible to do
so without this feature if you accept some sane restrictions. If you
e.g. define to only handle serial columns (i.e. sequences that
used/owned by exactly one table & column), you can do a 'good enough'
emulation the output plugin.

Take something like BDR's bdr_relcache.c (something which you're going
to end up needing for any nontrivial replication solution). Everytime
you build a cache entry you look up whether there's an owned by
sequence.  When decoding an insert or update to that relation, also emit
an 'increase this sequence to at least XXX' message.

While it's not perfect, this has the big advantage of being doable with 9.4.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2015-12-15 Thread Aleksander Alekseev

Hello, Tom.

I was exploring this issue further and discovered something strange.

"PROCLOCK hash" and "LOCK hash" are hash tables in shared memory. All
memory for these tables is in fact pre-allocated. But for some reason
these two tables are created (lock.c:394) with init_size =/= max_size.
It causes small overhead on calling memory allocator after hash table
creation and additional locking/unlocking.

I checked all other hash tables created via ShmemInitHash. All of these
tables have init_size == max_size. I suggest to create "PROCLOCK hash"
and "LOCK hash" with init_size == max_size too (see attached patch).
Granted this change doesn't cause any noticeable performance
improvements according to pgbench. Nevertheless it looks like a very
right thing to do which at least doesn't make anything worse.

If this patch seems to be OK next logical step I believe would be to
remove init_size parameter in ShmemInitHash procedure since in practice
it always equals max_size.

On Fri, 11 Dec 2015 10:30:30 -0500
Tom Lane  wrote:

> Aleksander Alekseev  writes:
> > Turns out PostgreSQL can spend a lot of time waiting for a lock in
> > this particular place, especially if you are running PostgreSQL on
> > 60-core server. Which obviously is a pretty bad sign.
> > ...
> > I managed to fix this behaviour by modifying choose_nelem_alloc
> > procedure in dynahash.c (see attached patch).
> 
> TBH, this is just voodoo.  I don't know why this change would have
> made any impact on lock acquisition performance, and neither do you,
> and the odds are good that it's pure chance that it changed
> anything.  One likely theory is that you managed to shift around
> memory allocations so that something aligned on a cacheline boundary
> when it hadn't before.  But, of course, the next patch that changes
> allocations anywhere in shared memory could change that back.  There
> are lots of effects like this that appear or disappear based on
> seemingly unrelated code changes when you're measuring edge-case
> performance.
> 
> The patch is not necessarily bad in itself.  As time goes by and
> machines get bigger, it can make sense to allocate more memory at a
> time to reduce memory management overhead.  But arguing for it on the
> basis that it fixes lock allocation behavior with 60 cores is just
> horsepucky.  What you were measuring there was steady-state hash
> table behavior, not the speed of the allocate-some-more-memory code
> path.
> 
>   regards, tom lane
> 
> 

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 76fc615..86d2f88 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -373,16 +373,14 @@ void
 InitLocks(void)
 {
 	HASHCTL		info;
-	long		init_table_size,
-max_table_size;
+	long		shmem_table_size;
 	bool		found;
 
 	/*
 	 * Compute init/max size to request for lock hashtables.  Note these
 	 * calculations must agree with LockShmemSize!
 	 */
-	max_table_size = NLOCKENTS();
-	init_table_size = max_table_size / 2;
+	shmem_table_size = NLOCKENTS();
 
 	/*
 	 * Allocate hash table for LOCK structs.  This stores per-locked-object
@@ -394,14 +392,13 @@ InitLocks(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodLockHash = ShmemInitHash("LOCK hash",
-	   init_table_size,
-	   max_table_size,
+	   shmem_table_size,
+	   shmem_table_size,
 	   &info,
 	HASH_ELEM | HASH_BLOBS | HASH_PARTITION);
 
 	/* Assume an average of 2 holders per lock */
-	max_table_size *= 2;
-	init_table_size *= 2;
+	shmem_table_size *= 2;
 
 	/*
 	 * Allocate hash table for PROCLOCK structs.  This stores
@@ -413,8 +410,8 @@ InitLocks(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodProcLockHash = ShmemInitHash("PROCLOCK hash",
-		   init_table_size,
-		   max_table_size,
+		   shmem_table_size,
+		   shmem_table_size,
 		   &info,
  HASH_ELEM | HASH_FUNCTION | HASH_PARTITION);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow replication roles to use file access functions

2015-12-15 Thread Artur Zakirov


On 03.09.2015 05:40, Michael Paquier wrote:


Ah, OK. I thought that you were referring to a protocol where caller
sends a single LSN from which it gets a differential backup that needs
to scan all the relation files of the source cluster to get the data
blocks with an LSN newer than the one sent, and then sends them back
to the caller.

I guess that what you are suggesting instead is an approach where
caller sends something like that through the replication protocol with
a relation OID and a block list:
BLOCK_DIFF relation_oid BLOCK_LIST m,n,[o, ...]
Which is close to what pg_read_binary_file does now for a superuser.
We would need as well to extend BASE_BACKUP so as it does not include
relation files though for this use case.

Regards,



Hi,

we need to run pg_rewind without using a superuser role too. What if the 
new parameter EXCLUDE_DATA_FILES will be added to the BASE_BACKUP 
command? This parameter will force the BASE_BACKUP command to not 
include relation files.


And pg_rewind can execute, for example, the following command:

BASE_BACKUP LABEL 'pg_rewind base backup' WAL EXCLUDE_DATA_FILES

This command will be executed if --source-server parameter is defined. 
Are there any pitfalls in this condition?


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-12-15 Thread Fabien COELHO




"sum" is a double so count is converted to 0.0, 0.0/0.0 == NaN, hence the
comment.


PG code usually avoids that, and I recall static analyze tools type
coverity complaining that this may lead to undefined behavior. While I
agree that this would lead to NaN...


Hmmm. In this case that is what is actually wanted. If there is no 
transaction, the tps or average latency or whatever is "NaN", I cannot 
help it, and IEEE 754 allow that. So in this case the tool is wrong if it 
complains, or at least we are right to ignore the warning. Maybe there is 
some special comment to say "ignore this warning on the next line" if it 
occurs, if this is an issue.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Check for interrupts in bf and xdes crypt()

2015-12-15 Thread Andreas Karlsson


Hi,

Here is a patch which makes it possible to cancel a query which runs the 
crypt() function with the bf or xdes hashing algorithm, e.g. 
crypt('foo', gen_salt('bf', 13)). The md5 algorithm does not run for 
multiple rounds so there is no reason to patch it.


I noticed this problem when I accidentally picked a too high n for the 
number of hash rounds.


I have added a call to CHECK_FOR_INTERRUPTS() after every round, and I 
could not measure any performance hit from this.


Andreas
diff --git a/contrib/pgcrypto/crypt-blowfish.c b/contrib/pgcrypto/crypt-blowfish.c
index 4054e6a..6feaefc 100644
--- a/contrib/pgcrypto/crypt-blowfish.c
+++ b/contrib/pgcrypto/crypt-blowfish.c
@@ -33,6 +33,7 @@
  */
 
 #include "postgres.h"
+#include "miscadmin.h"
 
 #include "px-crypt.h"
 #include "px.h"
@@ -670,6 +671,8 @@ _crypt_blowfish_rn(const char *key, const char *setting,
 
 	do
 	{
+		CHECK_FOR_INTERRUPTS();
+
 		data.ctx.P[0] ^= data.expanded_key[0];
 		data.ctx.P[1] ^= data.expanded_key[1];
 		data.ctx.P[2] ^= data.expanded_key[2];
diff --git a/contrib/pgcrypto/crypt-des.c b/contrib/pgcrypto/crypt-des.c
index 6829586..a4aa496 100644
--- a/contrib/pgcrypto/crypt-des.c
+++ b/contrib/pgcrypto/crypt-des.c
@@ -61,6 +61,7 @@
  */
 
 #include "postgres.h"
+#include "miscadmin.h"
 
 #include "px-crypt.h"
 
@@ -540,6 +541,8 @@ do_des(uint32 l_in, uint32 r_in, uint32 *l_out, uint32 *r_out, int count)
 
 	while (count--)
 	{
+		CHECK_FOR_INTERRUPTS();
+
 		/*
 		 * Do each round.
 		 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

55 matches

Mail list logo