Re: [HACKERS] Bug in to_timestamp().

2016-08-24 Thread amul sul
Hi Artur Zakirov,

0001-to-timestamp-format-checking-v3.patch looks pretty reasonable to me, other 
than following concern:

#1.
Whitespace @ line # 317.

#2. Warning at compilation;

formatting.c: In function ‘do_to_timestamp’:
formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
^
formatting.c:2988:5: note: ‘prev_type’ was declared here
prev_type;
^

You can avoid this by assigning  zero (or introduce NODE_TYPE_INVAL ) to 
prev_type at following line: 

256 +   prev_type;



Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlock wait time view

2016-08-24 Thread Satoshi Nagayasu
2016-08-25 13:46 GMT+09:00 Haribabu Kommi :
> On Thu, Aug 25, 2016 at 6:57 AM, Robert Haas  wrote:
>> On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
>>  wrote:
>>>
>>> Based on the performance impact with the additional timing calculations,
>>> we can decide the view default behavior, Are there any objections to the
>>> concept?
>>
>> There have been some other recent threads on extending the wait event
>> stuff.  If you haven't already read those, you should, because the
>> issues are closely related.  I think that timing LWLock waits will be
>> quite expensive.  I believe that what the Postgres Pro folks want to
>> do is add up the wait times or maybe keep a history of waits (though
>> maybe I'm wrong about that), but showing them in pg_stat_activity is
>> another idea.  That's likely to add some synchronization overhead
>> which might be even greater in this case than for a feature that just
>> publishes accumulated times, but maybe it's all a drop in the bucket
>> compared to the cost of calling gettimeofday() in the first place.
>
> Yes, I agree this is an issue for the cases where the wait time is smaller
> than the logic that is added to calculate the wait time. Even if we use
> clock_gettime with CLOCK_REALTIME_COARSE there will be some
> overhead, as this clock method is 8 times faster than gettimeofday
> but not that accurate in result. May be we can use the clock_getime
> instead of gettimeofday in this case, as we may not needed the fine-grained
> value.

Is there any other option (rather than gettimeofday()) to measure elapsed
time with lower overhead?

I've heard about the RDTSC feature (hardware counter) supported by the recent
processors, and have found a few articles [1] [2] on its lower overhead than
gettimeofday().

[1] 
http://stackoverflow.com/questions/15623343/using-cpu-counters-versus-gettimeofday
[2] http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday

I'm not sure how we can benefit from it so far, because I'm not
familiar with this
facility and of course I don't have the numbers. In addition to that,
I guess it would
bring some portability issues. But I'm still curious to know more
about these stuff.
Anyone has some experiences on it?

>> Personally, my preferred solution is still to have a background worker
>> that samples the published wait events and rolls up statistics, but
>> I'm not sure I've convinced anyone else.  It could report the number
>> of seconds since it detected a wait event other than the current one,
>> which is not precisely the same thing as tracking the length of the
>> current wait but it's pretty close.  I don't know for sure what's best
>> here - I think some experimentation and dialog is needed.
>
> Yes, using of background worker can reduce the load of adding all the
> wait time calculations in the main backend. I can give a try by modifying
> direct calculation approach and background worker (may be pg_stat_collector)
> to find the wait time based on the stat messages that are received from
> main backend related to wait start and wait end.
>
> I am not sure with out getting any signal or message from main backend,
> how much accurate the data can be gathered from a background worker.

It looks a sort of accuracy-performance trade-off.
So, I think the use-cases matter here to get a better design.

I guess that's the reason why llya is looking for feature requests from DBA
in another thread [3].

[3] 
https://www.postgresql.org/message-id/cag95seuaqvj09kzlwu%2bz1b-gqdmqerzekpfr3hn0q88xzmq...@mail.gmail.com

Regards,
-- 
Satoshi Nagayasu 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gettimeofday is at the end of its usefulness?

2016-08-24 Thread Haribabu Kommi
On Thu, Jun 9, 2016 at 12:56 AM, Tom Lane  wrote:
> Thom Brown  writes:
>> On 15 May 2014 at 19:56, Bruce Momjian  wrote:
>>> On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
 A recent question from Tim Kane prompted me to measure the overhead
 costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
 are far worse than I thought.  On my current server (by no means
 lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
 at something like 110 nsec per row:
>
>> Did this idea die, or is it still worth considering?
>
> We still have a problem, for sure.  I'm not sure that there was any
> consensus on what to do about it.  Using clock_gettime(CLOCK_REALTIME)
> if available would be a straightforward change that should ameliorate
> gettimeofday()'s 1-usec-precision-limit problem; but it doesn't do
> anything to fix the excessive-overhead problem.  The ideas about the
> latter were all over the map, and none of them looked easy.
>
> If you're feeling motivated to work on this area, feel free.

How about using both CLOCK_REALTIME and CLOCK_REALTIME_COARSE
as the clock id's in clock_gettime wherever applicable. COARSE option is used
wherever there is no timing calculation is required, because in my laptop, there
is a significant performance difference is observed (like 8 times) compared to
CLOCK_REALTIME.

If it is fine, I will try to update the code and send a patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Wolfgang Wilhelm
Hello hackers,
I'm no PG hacker, so maybe I'm completely wrong, so sorry if I have wasted your 
time. I try to make the best out of Tom Lanes comment.

What would happen if there's a database on a server with initdb (or whatever) 
parameter -with-wal-size=64MB and later someone decides to make it the master 
in a replicated system and has a slave without that parameter? Would the slave 
work with the "different" wal size of the master? How could be guaranteed that 
in such a scenario the replication either works correctly or failes with a 
meaningful error message?

But in general I thing a more flexible WAL size is a good idea. 
To answer Andres: You have found one of the (few?) users to adjust initdb 
parameters.

Regards



Robert Haas  schrieb am 6:43 Donnerstag, 25.August 
2016:
 

 On Thu, Aug 25, 2016 at 12:35 AM, Andres Freund  wrote:
> FWIW, I'm also doubtful that investing time into making this initdb
> configurable is a good use of time: The number of users that'll adjust
> initdb time parameters is going to be fairly small.

I have to admit that I was skeptical about the idea of doing anything
about this at all the first few times it came up.  16MB ought to be
good enough for anyone!  However, the time between beatings has now
gotten short enough that the bruises don't have time to heal before
the next beating arrives from a completely different customer.  I try
not to hold my views so firmly as to be impervious to contrary
evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  

Re: [HACKERS] Stopping logical replication protocol

2016-08-24 Thread Craig Ringer
On 25 August 2016 at 09:22, Craig Ringer  wrote:
> On 25 August 2016 at 03:26, Vladimir Gordiychuk  wrote:
>> Hi. It has already passed a few months but patch still have required review
>> state. Can I help to speed up the review, or should i wait commitfest?
>> I plane complete changes in pgjdbc drive inside PR
>> https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
>> with not available stop logical replication.
>
> The latest patch is the updated one I posted, which was the part of
> yours that allowed return to command mode when logical decoding is
> idle.
>
> If you want to submit an updated version of the second patch, with the
> callback to allow logical decoding cancellation during
> ReorderBufferCommit, that'd be handy, but I don't think it's as
> important as the first one.
>
> As far as I'm concerned the first patch is ready. Please take a look
> and verify that you're happy with my updates and test the updated
> patch. I'll mark it ready to go if you are. It's linked to at
> https://commitfest.postgresql.org/10/621/ .

By the way, I now think that the second part of your patch, to allow
interruption during ReorderBufferCommit processing, is also very
desirable.

Not just for that feature, but because we should be processing client
messages during long ReorderBufferCommit executions so that we can
respond to feedback from the client. Right now, long running
ReorderBufferCommit runs can trigger walsender_timeout because there's
no feedback processed.

Alternately, ReorderBufferCommit() could call back into the walsender
with a progress update, without requiring the client to send feedback.
It knows the client is making progress because it's consuming data on
the socket. But I'd rather have client feedback processed, since it
could be useful later for client=->server messages to output plugin
callbacks too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlock wait time view

2016-08-24 Thread Haribabu Kommi
On Thu, Aug 25, 2016 at 6:57 AM, Robert Haas  wrote:
> On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
>  wrote:
>>
>> Based on the performance impact with the additional timing calculations,
>> we can decide the view default behavior, Are there any objections to the
>> concept?
>
> There have been some other recent threads on extending the wait event
> stuff.  If you haven't already read those, you should, because the
> issues are closely related.  I think that timing LWLock waits will be
> quite expensive.  I believe that what the Postgres Pro folks want to
> do is add up the wait times or maybe keep a history of waits (though
> maybe I'm wrong about that), but showing them in pg_stat_activity is
> another idea.  That's likely to add some synchronization overhead
> which might be even greater in this case than for a feature that just
> publishes accumulated times, but maybe it's all a drop in the bucket
> compared to the cost of calling gettimeofday() in the first place.

Yes, I agree this is an issue for the cases where the wait time is smaller
than the logic that is added to calculate the wait time. Even if we use
clock_gettime with CLOCK_REALTIME_COARSE there will be some
overhead, as this clock method is 8 times faster than gettimeofday
but not that accurate in result. May be we can use the clock_getime
instead of gettimeofday in this case, as we may not needed the fine-grained
value.

> Personally, my preferred solution is still to have a background worker
> that samples the published wait events and rolls up statistics, but
> I'm not sure I've convinced anyone else.  It could report the number
> of seconds since it detected a wait event other than the current one,
> which is not precisely the same thing as tracking the length of the
> current wait but it's pretty close.  I don't know for sure what's best
> here - I think some experimentation and dialog is needed.

Yes, using of background worker can reduce the load of adding all the
wait time calculations in the main backend. I can give a try by modifying
direct calculation approach and background worker (may be pg_stat_collector)
to find the wait time based on the stat messages that are received from
main backend related to wait start and wait end.

I am not sure with out getting any signal or message from main backend,
how much accurate the data can be gathered from a background worker.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Thu, Aug 25, 2016 at 12:35 AM, Andres Freund  wrote:
> FWIW, I'm also doubtful that investing time into making this initdb
> configurable is a good use of time: The number of users that'll adjust
> initdb time parameters is going to be fairly small.

I have to admit that I was skeptical about the idea of doing anything
about this at all the first few times it came up.  16MB ought to be
good enough for anyone!  However, the time between beatings has now
gotten short enough that the bruises don't have time to heal before
the next beating arrives from a completely different customer.  I try
not to hold my views so firmly as to be impervious to contrary
evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Andres Freund
On 2016-08-25 00:28:58 -0400, Robert Haas wrote:
> On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund  wrote:
> > On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> >> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund  wrote:
> >> > and I'm also rather doubtful that it's actually without overhead.
> >>
> >> Really?  Where do you think the overhead would come from?
> >
> > ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> > doing a lot of %). Some of that happens with exclusive lwlocks held, and
> > some even with a spinlock held IIRC. Making that variable won't be
> > free. Whether it's actually measurabel - hard to say. I do remember
> > Heikki fighting hard to simplify some parts of the critical code during
> > xlog scalability stuff, and that that even involved moving minor amounts
> > of math out of critical sections.
> 
> OK, that's helpful context.
> 
> >> What sort of test would you run to try to detect it?
> >
> > Xlog scalability tests (parallel copy, parallel inserts...), and
> > decoding speed (pg_xlogdump --stats?)
> 
> Thanks; that's helpful, too.

FWIW, I'm also doubtful that investing time into making this initdb
configurable is a good use of time: The number of users that'll adjust
initdb time parameters is going to be fairly small.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund  wrote:
> On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
>> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund  wrote:
>> > and I'm also rather doubtful that it's actually without overhead.
>>
>> Really?  Where do you think the overhead would come from?
>
> ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> doing a lot of %). Some of that happens with exclusive lwlocks held, and
> some even with a spinlock held IIRC. Making that variable won't be
> free. Whether it's actually measurabel - hard to say. I do remember
> Heikki fighting hard to simplify some parts of the critical code during
> xlog scalability stuff, and that that even involved moving minor amounts
> of math out of critical sections.

OK, that's helpful context.

>> What sort of test would you run to try to detect it?
>
> Xlog scalability tests (parallel copy, parallel inserts...), and
> decoding speed (pg_xlogdump --stats?)

Thanks; that's helpful, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 11:41 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> What am I missing?
>
> Maybe nothing.  But I'll point out that of the things that can currently
> be configured at initdb time, such as LC_COLLATE, there is not one single
> one that matters to walsender/walreceiver.  If you think there is zero
> risk involved in introducing a parameter that will matter at that level,
> you have a different concept of risk than I do.
>
> If you'd presented some positive reason why we ought to be taking some
> risk here, I'd be on board.  But you haven't really.  The current default
> value for this parameter is nearly old enough to vote; how is it that
> we suddenly need to make it easily configurable?  Let's just change
> the value and be happy.

I certainly think that's a good first cut.  As I said before, I think
that increasing the value from 16MB to 64MB won't really hurt people
with mostly-default configurations.  max_wal_size=1GB currently means
64 16-MB segments; if it starts meaning 16 64-MB segments, I don't
think that will have much impact on on people one way or the other.
Meanwhile, we'll significantly help people who are currently
generating painfully large but not totally insane numbers of WAL
segments.  Someone who is currently generating 32,768 WAL segments per
day - about one every 2.6 seconds - will have a significantly easier
time if they start generating 8,192 WAL segments per day - about one
every 10.5 seconds - instead.  It's just much easier for a reasonably
simple archive command to keep up, "ls" doesn't have as many directory
entries to sort, etc.

However, for people who have really high velocity systems - say
300,000 WAL segments per day - a fourfold increase in the segment size
only gets them down to 75,000 WAL segments per day, which is still
pretty nuts.  High tens of thousands of segments per day is, surely,
easier to manage than low hundreds of thousands, but it still puts
really tight requirements on how fast your archive_command has to run.
On that kind of system, you really want a segment size of maybe 1GB.
In this example that gets you down to ~4700 WAL files per day, or
about one every 18 seconds.  But 1GB is clearly too large to be the
default.

I think we're going to run into this issue more and more as people
start running PostgreSQL on larger databases.  In current releases,
the cost of wraparound autovacuums can easily be the limiting factor
here: the I/O cost is proportional to the XID burn rate multiplied by
the entire size of the database.  So mostly read-only databases or
databases that only take batch loads can be fine even if they are
really big, but it's hard to scale databases that do lots of
transaction processing beyond a certain size because you just end up
running continuous wraparound vacuums and eventually you can't even do
that fast enough.  The freeze map changes in 9.6 should help with this
problem, though, at least for databases that have hot spots rather
than uniform access, which is of course very common. I think the
result of that is likely to be that people try to scale up PostgreSQL
to larger databases than ever before.  New techniques for indexing
large amounts of data (like BRIN) and for querying it (like parallel
query, especially once we support having the driving scan be a bitmap
heap scan) are going to encourage people in that direction, too.
You're asking why we suddenly need to make this configurable as if it
were a surprising need, but I think it would be more surprising if
scaling up didn't create some new needs.  I can't think of any reason
why a 100TB database and a 100MB database should both want to use the
same WAL segment size, and I think we want to support both of those
things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-24 Thread Amit Kapila
On Wed, Aug 24, 2016 at 11:46 PM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>> $SUBJECT will make hash indexes reliable and usable on standby.
>
> Nice work.
>
> Can you split the new xlog-related stuff to a new file, say hash_xlog.h,
> instead of cramming it in hash.h?  Removing the existing #include
> "xlogreader.h" from hash.h would be nice.  I volunteer for pushing any
> preliminary header cleanup commits.
>

So, what you are expecting here is that we create hash_xlog.h and move
necessary functions like hash_redo(), hash_desc() and hash_identify()
to it. Then do whatever else is required to build it successfully.
Once that is done, I can build my patches on top of it.  Is that
right?  If yes, I can work on it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Andres Freund
On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund  wrote:
> > and I'm also rather doubtful that it's actually without overhead.
> 
> Really?  Where do you think the overhead would come from?

ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
doing a lot of %). Some of that happens with exclusive lwlocks held, and
some even with a spinlock held IIRC. Making that variable won't be
free. Whether it's actually measurabel - hard to say. I do remember
Heikki fighting hard to simplify some parts of the critical code during
xlog scalability stuff, and that that even involved moving minor amounts
of math out of critical sections.

> What sort of test would you run to try to detect it?

Xlog scalability tests (parallel copy, parallel inserts...), and
decoding speed (pg_xlogdump --stats?)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Tom Lane
Robert Haas  writes:
> What am I missing?

Maybe nothing.  But I'll point out that of the things that can currently
be configured at initdb time, such as LC_COLLATE, there is not one single
one that matters to walsender/walreceiver.  If you think there is zero
risk involved in introducing a parameter that will matter at that level,
you have a different concept of risk than I do.

If you'd presented some positive reason why we ought to be taking some
risk here, I'd be on board.  But you haven't really.  The current default
value for this parameter is nearly old enough to vote; how is it that
we suddenly need to make it easily configurable?  Let's just change
the value and be happy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2016-08-24 Thread Venkata B Nagothi
On Thu, Aug 18, 2016 at 3:37 PM, Michael Paquier 
wrote:

> On Tue, Aug 16, 2016 at 11:06 PM, Stephen Frost 
> wrote:
> > I could see supporting an additional "pause" option that means "pause at
> > the end of WAL if you don't reach the recovery target point".  I'd also
> > be happy with a warning being emitted in the log if the recovery target
> > point isn't reached before reaching the end of WAL, but I don't think it
> > makes sense to change the existing behavior.
>
> Indeed, let's not change the existing behavior. A warning showing up
> by default would be useful in itself, even if there are people that I
> think set up overly high recovery targets to be sure to replay WAL as
> much as possible. As recovery_target_action has meaning when a
> recovery target has been reached, I would guess that we would want a
> new option that has the same mapping value as recovery_target_action,
> except that it activates when the target recovery is *not* reached.
> Hence it would be possible to shutdown, pause or promote at will when
> recovery completes, and be able to take a separate action is the
> recovery target is indeed reached. The default of this parameter would
> be "promote", which is what happens now.
>

Yes, a new parameter with same options as recovery_target_action is the
idea i had in mind as well and i have the following queries while working
through the patch design -

*Query 1*

What about the existing parameter called "recovery_target" which accepts
only one value "immediate", which will be similar to the "promote" option
with the to-be-introduced new parameter.
Since this parameter's behaviour will be incorporated into the new
parameter, I think, this parameter can be deprecated from the next
PostgreSQL version ?

*Query 2*

I am thinking that the new parameter name should be
"recovery_target_incomplete" or "recovery_target_incomplete_action" which
(by name) suggests that recovery target point is not yet reached and
accepts options "pause","promote" and "shutdown".

The other alternative name i thought of was -
"recovery_target_immediate_action", which (by name) suggests the action to
be taken when the recovery does not reach the actual set recovery target
and reaches immediate consistent point.

Any comments, suggestions ?

Regards,
Venkata B N

Fujitsu Australia


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane  wrote:
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.

> I said an initdb-time parameter, meaning not capable of being changed
> within the lifetime of the cluster.  So I don't see how the slave
> servers would get out of sync?

The point is that that now becomes something to worry about.  I do not
think I have to exhibit a live bug within five minutes' thought before
saying that it's a risk area.  It's something that we simply have not
worried about before, and IME that generally means there's some squishy
things there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 11:02 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>>> ... but I think this is just folly.  You'd have to do major amounts
>>> of work to keep, eg, slave servers on the same page as the master
>>> about what the segment size is.
>
>> Don't think it'd actually be all that complicated, we already verify
>> the compatibility of some things.  But I'm doubtful it's worth it, and
>> I'm also rather doubtful that it's actually without overhead.
>
> My point is basically that it'll introduce failure modes that we don't
> currently concern ourselves with.  Yes, you can do configure
> --with-wal-segsize, but it's on your own head whether the resulting build
> will interoperate with anything else --- and I'm quite sure nobody tests,
> eg, walsender or walreceiver to see if they fail sanely in such cases.
> I don't think we'd get to take such a laissez-faire position with respect
> to an initdb option.

I am really confused by this.  If you connect a slave to a master
other than the one that you cloned to create the salve, of course
that's going to fail.  But if the slave is cloned from the master,
then the segment size is going to match.  It seems like the only thing
we need to do to make this work is make sure to get the segment size
from the control file rather than anywhere else, which doesn't seem
very difficult.  What am I missing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-08-24 Thread Corey Huinker
On Wed, Aug 24, 2016 at 10:36 PM, Gerdan Santos  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Sorry, my mistake!
>
> I could not find a way to disable this functionality , I see that the
> impact can be big as it is changed the output structure \timing without a
> mode to disable it. I even finding great functionality but need a way to
> set to default.
>
>
>
Thanks for reviewing! I'm not really sure how to proceed, so I'll try to
summarize where it stands. Apologies if I
mischaracterize/misattribute/misremember someone's position.

Generally speaking, people disliked the third mode for \timing, and were
generally fine with AndrewG's idea of printing the timing in both raw
milliseconds and a more human-digestible format, which means that we can:

1. keep the format exactly as is, ignoring locale issues
 + It's already done
 + lightweight
 +TomL believes there will be no confusion
 - others disagree
2. we fish out the proper locale-specific abbreviations for
days/hours/minutes/seconds
 + no additional settings
 + locale stuff can't be that hard
 - I've never dealt with it (American, surprise)
3. ignore locales and fall back to a left-trimmed DDD HH:MM:SS.mmm format
 + Easy to revert to that code
 + My original format and one PeterE advocated
 - others disliked
4. we have a \pset that sets fixed scale (seconds, minutes, hours, days),
sliding scale (what's displayed now), or interval
 + some flexibility with some easy config values
 - still have the locale issue
 -  likely will miss a format somebody wanted
4. The \pset option is a time format string like "%d %h:%m:%s".
 + maximum flexibility
 + sidesteps the locale issue by putting it in the user's hands
 - those format strings are sometimes tough for users to grok
 - ISO 8601 isn't much of a help as it doesn't handle milliseconds
 - additional config variable
 - documentation changes
 - debate about what the default should be. GOTO 1.

I personally would be happy with any of these options, so I think we get
some more feedback to see if a consensus emerges. It's a tiny patch and
trivial to test, so we have time(ing) on our side.


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund  wrote:
> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>> > Possibly it would make sense for this to be configurable at initdb
>> > time instead of requiring a recompile;
>>
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.
>
> Don't think it'd actually be all that complicated, we already verify
> the compatibility of some things.  But I'm doubtful it's worth it, and
> I'm also rather doubtful that it's actually without overhead.

Really?  Where do you think the overhead would come from?  What sort
of test would you run to try to detect it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane  wrote:
> ... but I think this is just folly.  You'd have to do major amounts
> of work to keep, eg, slave servers on the same page as the master
> about what the segment size is.

I said an initdb-time parameter, meaning not capable of being changed
within the lifetime of the cluster.  So I don't see how the slave
servers would get out of sync?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
>> ... but I think this is just folly.  You'd have to do major amounts
>> of work to keep, eg, slave servers on the same page as the master
>> about what the segment size is.

> Don't think it'd actually be all that complicated, we already verify
> the compatibility of some things.  But I'm doubtful it's worth it, and
> I'm also rather doubtful that it's actually without overhead.

My point is basically that it'll introduce failure modes that we don't
currently concern ourselves with.  Yes, you can do configure
--with-wal-segsize, but it's on your own head whether the resulting build
will interoperate with anything else --- and I'm quite sure nobody tests,
eg, walsender or walreceiver to see if they fail sanely in such cases.
I don't think we'd get to take such a laissez-faire position with respect
to an initdb option.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Andres Freund
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
> > Possibly it would make sense for this to be configurable at initdb
> > time instead of requiring a recompile;
> 
> ... but I think this is just folly.  You'd have to do major amounts
> of work to keep, eg, slave servers on the same page as the master
> about what the segment size is.

Don't think it'd actually be all that complicated, we already verify
the compatibility of some things.  But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Considering those three factors, I think we should consider pushing the
> default value up somewhat higher for v10.  Reverting to the 64MB size that
> we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
> sounds pretty reasonable.

+1
The other downside is that the response time of transactions may degrade when 
they have to wait for a new WAL segment to be created.  Tha might pop up as 
occasional slow or higher maximum response time, which is a mystery to users.  
Maybe it's time to use posix_fallocate() to create WAL segments.


> Possibly it would make sense for this to be configurable at initdb time
> instead of requiring a recompile; we probably don't save any significant
> number of cycles by compiling this into the server.

+1

> 3. archive_timeout is no longer a frequently used option.  Obviously, if
> you are frequently archiving partial segments, you don't want the segment
> size to be too large, because if it is, each forced segment switch
> potentially wastes a large amount of space (and bandwidth).
> But given streaming replication and pg_receivexlog, the use case for
> archiving partial segments is, at least according to my understanding, a
> lot narrower than it used to be.  So, I think we don't have to worry as
> much about keeping forced segment switches cheap as we did during the 8.x
> series.

I'm not sure about this.  I know (many or not) users use continuous archiving 
with archive_command and archive_timeout for backups, and don't want to use 
streaming replication, because the system is not worth the cost and trouble of 
HA.  I heard from a few users that they were surprised when they knew that 
PostgreSQL generates WAL even when no update transaction is happening.  Is this 
still true?

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-08-24 Thread Gerdan Santos
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Sorry, my mistake!

I could not find a way to disable this functionality , I see that the impact 
can be big as it is changed the output structure \timing without a mode to 
disable it. I even finding great functionality but need a way to set to default.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Tom Lane
Robert Haas  writes:
> I'd like to propose that we increase the default WAL segment size,
> which is currently 16MB.

That seems like a reasonable thing to consider ...

> Possibly it would make sense for this to be configurable at initdb
> time instead of requiring a recompile;

... but I think this is just folly.  You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.  Better to keep treating it like
BLCKSZ, as a fixed parameter of a build.  (There's a reason why we
keep this number in pg_control.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-08-24 Thread Gerdan Santos
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

I could not find a way to disable this functionality , I see that the impact 
can be big as it is changed the output structure \timing without a mode to 
disable it. I even finding great functionality but need a way to set to default.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-24 Thread Claudio Freire
On Wed, Aug 24, 2016 at 10:31 PM, Robert Haas  wrote:
> 1. Transaction rates are vastly higher these days.  In 1999, I think
> we were still limited to ~2^32 transactions during the entire lifetime
> of the server; transaction ID wraparound hadn't been invented yet.[1]
> Today, some installations do that many write transactions in under a
> week.  The practical consequence of this is that WAL files fill up in
> extremely short periods of time. Some users generate multiple
> terabytes of WAL per day, which means they are generating - and very
> likely archiving - WAL files a rate of greater than 1 per second!
> That poses multiple problems. For example, if your archive command
> happens to involve ssh, you might run into trouble because of this
> sort of thing:
>
> [rhaas pgsql]$ /usr/bin/time ssh hydra true
> 1.57 real 0.00 user 0.00 sys
...
> Considering those three factors, I think we should consider pushing
> the default value up somewhat higher for v10.  Reverting to the 64MB
> size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
> sounds pretty reasonable.  Users with really high transaction rates
> might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
> practical for small installs given our default of max_wal_size = 1GB.
> Possibly it would make sense for this to be configurable at initdb
> time instead of requiring a recompile; we probably don't save any
> significant number of cycles by compiling this into the server.

FWIW, +1

We're already hurt by the small segments due to a similar phenomenon
as the ssh case: TCP slow start. Designing the archive/recovery
command to work around TCP slow start is quite complex, and bigger
segments would just be a better thing.

Not to mention that bigger segments compress better.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] increasing the default WAL segment size

2016-08-24 Thread Robert Haas
Hi,

I'd like to propose that we increase the default WAL segment size,
which is currently 16MB.  It was first set to that value in commit
47937403676d913c0e740eec6b85113865c6c8ab in October of 1999; prior to
that, it was 64MB.  Between 1999 and now, there have been three
significant changes that make me think it might be time to rethink
this value:

1. Transaction rates are vastly higher these days.  In 1999, I think
we were still limited to ~2^32 transactions during the entire lifetime
of the server; transaction ID wraparound hadn't been invented yet.[1]
Today, some installations do that many write transactions in under a
week.  The practical consequence of this is that WAL files fill up in
extremely short periods of time. Some users generate multiple
terabytes of WAL per day, which means they are generating - and very
likely archiving - WAL files a rate of greater than 1 per second!
That poses multiple problems. For example, if your archive command
happens to involve ssh, you might run into trouble because of this
sort of thing:

[rhaas pgsql]$ /usr/bin/time ssh hydra true
1.57 real 0.00 user 0.00 sys

Also, your operating system's implementation of directories and the
commands to work with them (like ls) don't necessarily scale well to
tens or hundreds of thousands of archived files.

Furthermore, there is an enforced, synchronous fsync at the end of
every segment, which actually does hurt performance on write-heavy
workloads.[2] Of course, if that were the only reason to consider
increasing the segment size, it would probably make more sense to just
try to push that extra fsync into the background, but that's not
really the case.  From what I hear, the gigantic number of files is a
bigger pain point.

2. Disks are a bit larger these days.  In the worst case, we waste
just under twice as much space as whatever the segment size is: you
might need 1 byte from the oldest segment you're keeping and 1 byte
from the newest segment that you are keeping, but not the remaining
contents of either file.  In 1999, trying to limit disk wastage to
<32MB probably seemed reasonable, but today that's very little disk
space.  I think at that time typical hard drive sizes were around 10
GB, whereas today they are around 1 TB.[3] I'm not sure whether the
size of the sorts of high-performance storage that is likely to be
used for pg_xlog has grown as fast as hard drives generally, but even
so it seems pretty clear to me that trying to limit disk wastage to
32MB is excessively conservative on modern hardware.

3. archive_timeout is no longer a frequently used option.  Obviously,
if you are frequently archiving partial segments, you don't want the
segment size to be too large, because if it is, each forced segment
switch potentially wastes a large amount of space (and bandwidth).
But given streaming replication and pg_receivexlog, the use case for
archiving partial segments is, at least according to my understanding,
a lot narrower than it used to be.  So, I think we don't have to worry
as much about keeping forced segment switches cheap as we did during
the 8.x series.

Considering those three factors, I think we should consider pushing
the default value up somewhat higher for v10.  Reverting to the 64MB
size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
sounds pretty reasonable.  Users with really high transaction rates
might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
practical for small installs given our default of max_wal_size = 1GB.
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile; we probably don't save any
significant number of cycles by compiling this into the server.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] I believe at that time we consumed an XID even for a read-only
transaction, too; today, we can do 2^32 read transactions in a few
hours.
[2] Amit did some benchmarking on this, I believe, but I don't have
the numbers handy.
[3] https://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better tracking of free space during SP-GiST index build

2016-08-24 Thread Tomas Vondra



On 08/25/2016 01:45 AM, Tom Lane wrote:

Over in the thread about the SP-GiST inet opclass, I threatened to post
a patch like this, and here it is.

The basic idea is to track more than just the very latest page we've used
in each of the page categories that SP-GiST works with.  I started with an
arrangement that gave an equal number of cache slots to each category, but
soon realized that that was dumb, because there are usually way more leaf
pages than anything else.  So this version has a little table of how many
slots to give to each category.  The constants could maybe use a bit more
fiddling, if we have some more test data sets to try this on.

On the IRRExplorer data set we discussed in the other thread, this reduces
the index size from 132MB to 120MB.  Poking into that more closely with
pg_filedump, the total free space within the index drops from 42MB to
28MB.  If you think those numbers don't add up, you're right --- this
seems to result in more non-leaf tuples than before.  I'm not sure why;
maybe more aggressive sucking up of free space results in more splits.
(Maybe adjustment of the default spgist fillfactor would be in order
to counteract that?)  But the index search time doesn't seem to be hurt,
so perhaps there's nothing to worry about.

As coded, this makes no attempt to preferentially select pages with the
most or least free space.  I don't know if it'd be worth any cycles to
do that.

I'll put this in the commitfest queue.  It could use review from someone
with the time and motivation to do performance testing/tuning.



I can do a bit of benchmarking on this, I guess - possibly next week, 
but I can't promise that 100%. I'm not a spgist-expert and I won't have 
time to dive into the code, so it'll be mostly blackbox testing. Any 
hints what would be worth/interesting to test?


ISTM it'd be interesting to test both index creation, maintenance and 
querying, while varying the fillfactor. What other parameters would be 
interesting to tweak, and what datasets might be useful?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to do failover in pglogical replication?

2016-08-24 Thread Craig Ringer
On 24 August 2016 at 19:42, roshan_myrepublic  wrote:

> Now, I am able to see the last added row in my subscriber table. All the
> other 4 rows which were added in the beginning are still missing. What am I
> doing wrong here?

Hi. This isn't really on topic for the pgsql-hackers mailing list.
We're adding a pglogical mailing list now that it's clear it's not
going into core, but in the mean time feel free to raise this on the
github tracker for the project.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Martín Marqués
2016-08-24 21:34 GMT-03:00 Michael Paquier :
>
> Yes, you are right. If I look at the diffs this morning I am seeing
> the ACLs being dumped for this aggregate. So we could just fix the
> test and be done with it. I did not imagine the index issue though,
> and this is broken for some time, so that's not exclusive to 9.6 :)

Hi Michael,

Do you see any easier way than what I mentioned earlier (adding a
selectDumpableIndex() function) to fix the index dumping issue?

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stopping logical replication protocol

2016-08-24 Thread Craig Ringer
On 25 August 2016 at 03:26, Vladimir Gordiychuk  wrote:
> Hi. It has already passed a few months but patch still have required review
> state. Can I help to speed up the review, or should i wait commitfest?
> I plane complete changes in pgjdbc drive inside PR
> https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
> with not available stop logical replication.

The latest patch is the updated one I posted, which was the part of
yours that allowed return to command mode when logical decoding is
idle.

If you want to submit an updated version of the second patch, with the
callback to allow logical decoding cancellation during
ReorderBufferCommit, that'd be handy, but I don't think it's as
important as the first one.

As far as I'm concerned the first patch is ready. Please take a look
and verify that you're happy with my updates and test the updated
patch. I'll mark it ready to go if you are. It's linked to at
https://commitfest.postgresql.org/10/621/ .


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Michael Paquier
On Wed, Aug 24, 2016 at 11:15 PM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> The patch attached includes all those tests and they are failing. We
>> are going to need a patch able to pass all that, and even for master
>> this is going to need more thoughts, and let's focus on HEAD/9.6
>> first.
>
> Are you sure you have the tests correct..?   At least for testagg(), it
> looks like you're testing for:
>
> GRANT ALL ON FUNCTION test_agg(int2) TO regress_dump_test_role;
>
> but what's in the dump is (equivilantly):
>
> GRANT ALL ON FUNCTION test_agg(smallint) TO regress_dump_test_role;
>
> I've not looked into all the failures, but at least this one seems like
> an issue in the test, not an issue in pg_dump.

Yes, you are right. If I look at the diffs this morning I am seeing
the ACLs being dumped for this aggregate. So we could just fix the
test and be done with it. I did not imagine the index issue though,
and this is broken for some time, so that's not exclusive to 9.6 :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlock wait time view

2016-08-24 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi
> calculations may cause performance problem. Is there any need of writing
> this stats information to file? As this just provides the wait time
> information.

Yes, saving the workload diagnosis information would be nice like Oracle AWR.  
I mean not the current accumulated total, but a series of snapshots taken 
periodically.  It would enable:

* Daily monitoring across database restarts.  e.g. The response times of 
applications degraded after applying a patch.  What's the difference between 
before and after the patch application?

* Hint on troubleshooting a crash failure.  e.g. Excessive memory use by 
PostgreSQL crashed the OS.  What was the workload like just before the crash?

The point of discussion may be whether PostgreSQL itself provides the feature 
to accumulate performance diagnosis information on persistent storage.  
pg_statsinfo will be able to take on it, but it wouldn't be convenient nor 
efficient.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better tracking of free space during SP-GiST index build

2016-08-24 Thread Tom Lane
Over in the thread about the SP-GiST inet opclass, I threatened to post
a patch like this, and here it is.

The basic idea is to track more than just the very latest page we've used
in each of the page categories that SP-GiST works with.  I started with an
arrangement that gave an equal number of cache slots to each category, but
soon realized that that was dumb, because there are usually way more leaf
pages than anything else.  So this version has a little table of how many
slots to give to each category.  The constants could maybe use a bit more
fiddling, if we have some more test data sets to try this on.

On the IRRExplorer data set we discussed in the other thread, this reduces
the index size from 132MB to 120MB.  Poking into that more closely with
pg_filedump, the total free space within the index drops from 42MB to
28MB.  If you think those numbers don't add up, you're right --- this
seems to result in more non-leaf tuples than before.  I'm not sure why;
maybe more aggressive sucking up of free space results in more splits.
(Maybe adjustment of the default spgist fillfactor would be in order
to counteract that?)  But the index search time doesn't seem to be hurt,
so perhaps there's nothing to worry about.

As coded, this makes no attempt to preferentially select pages with the
most or least free space.  I don't know if it'd be worth any cycles to
do that.

I'll put this in the commitfest queue.  It could use review from someone
with the time and motivation to do performance testing/tuning.

regards, tom lane

diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index d570ae5..95c45fa 100644
*** a/src/backend/access/spgist/spgutils.c
--- b/src/backend/access/spgist/spgutils.c
***
*** 27,32 
--- 27,56 
  
  
  /*
+  * This array defines how many entries of the lastUsedPages[] cache are
+  * reserved for index pages of each classification known to SpGistGetBuffer().
+  */
+ struct LUPMapEntry
+ {
+ 	int			count;
+ 	int			start;			/* must equal sum of preceding counts */
+ };
+ 
+ static const struct LUPMapEntry lastUsedPagesMap[] = {
+ 	{8, 0},		/* inner pages, parity 0 */
+ 	{8, 8},		/* inner pages, parity 1 */
+ 	{8, 16},	/* inner pages, parity 2 */
+ 	{60, 24},	/* leaf pages */
+ 	{2, 84},	/* inner pages for nulls, parity 0 */
+ 	{2, 86},	/* inner pages for nulls, parity 1 */
+ 	{2, 88},	/* inner pages for nulls, parity 2 */
+ 	{10, 90}	/* leaf pages for nulls */
+ 
+ #define LASTUSEDPAGESMAP_END 100	/* must equal SPGIST_CACHED_PAGES */
+ };
+ 
+ 
+ /*
   * SP-GiST handler function: return IndexAmRoutine with access method parameters
   * and callbacks.
   */
*** spghandler(PG_FUNCTION_ARGS)
*** 35,40 
--- 59,67 
  {
  	IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
  
+ 	StaticAssertStmt(LASTUSEDPAGESMAP_END == SPGIST_CACHED_PAGES,
+ 	 "lastUsedPagesMap[] does not match SPGIST_CACHED_PAGES");
+ 
  	amroutine->amstrategies = 0;
  	amroutine->amsupport = SPGISTNProc;
  	amroutine->amcanorder = false;
*** spgGetCache(Relation index)
*** 129,140 
  
  		metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
  
! 		if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
  			elog(ERROR, "index \"%s\" is not an SP-GiST index",
   RelationGetRelationName(index));
  
- 		cache->lastUsedPages = metadata->lastUsedPages;
- 
  		UnlockReleaseBuffer(metabuffer);
  
  		index->rd_amcache = (void *) cache;
--- 156,179 
  
  		metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
  
! 		if (metadata->magicNumber == SPGIST_MAGIC_NUMBER)
! 			cache->lastUsedPages = metadata->lastUsedPages;
! 		else if (metadata->magicNumber == SPGIST_OLD_MAGIC_NUMBER)
! 		{
! 			/*
! 			 * We could make an effort to slurp up the pages that exist in the
! 			 * old-format cache, but it's probably not worth the trouble. Just
! 			 * init our cache to empty.
! 			 */
! 			int			i;
! 
! 			for (i = 0; i < SPGIST_CACHED_PAGES; i++)
! cache->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
! 		}
! 		else
  			elog(ERROR, "index \"%s\" is not an SP-GiST index",
   RelationGetRelationName(index));
  
  		UnlockReleaseBuffer(metabuffer);
  
  		index->rd_amcache = (void *) cache;
*** SpGistUpdateMetaPage(Relation index)
*** 258,263 
--- 297,305 
  		if (ConditionalLockBuffer(metabuffer))
  		{
  			metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
+ 
+ 			/* Update both the magic # and the cached page list */
+ 			metadata->magicNumber = SPGIST_MAGIC_NUMBER;
  			metadata->lastUsedPages = cache->lastUsedPages;
  
  			MarkBufferDirty(metabuffer);
*** SpGistUpdateMetaPage(Relation index)
*** 270,279 
  	}
  }
  
- /* Macro to select proper element of lastUsedPages cache depending on flags */
- /* Masking flags with SPGIST_CACHED_PAGES is just for paranoia's sake */
- #define GET_LUP(c, f)  

Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Martín Marqués
2016-08-24 17:01 GMT-03:00 Martín Marqués :
> 2016-08-24 11:15 GMT-03:00 Stephen Frost :
>> Michael,
>>
>> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>>> The patch attached includes all those tests and they are failing. We
>>> are going to need a patch able to pass all that, and even for master
>>> this is going to need more thoughts, and let's focus on HEAD/9.6
>>> first.
>>
>> Are you sure you have the tests correct..?   At least for testagg(), it
>> looks like you're testing for:
>>
>> GRANT ALL ON FUNCTION test_agg(int2) TO regress_dump_test_role;
>>
>> but what's in the dump is (equivilantly):
>>
>> GRANT ALL ON FUNCTION test_agg(smallint) TO regress_dump_test_role;
>
> Yes, that was the problem there.
>
>> I've not looked into all the failures, but at least this one seems like
>> an issue in the test, not an issue in pg_dump.
>
> I see the other 12 failures regarding the CREATE INDEX that Michael
> reported but can't quite find where it's originated. (or actually
> where the problem is)

OK, I see where the problem is.

Indexes don't have a selectDumpableIndex() function to see if we dump
it or not. We just don't gather indexes from tables for which we are
dumping their definition:

/* Ignore indexes of tables whose definitions are not
to be dumped */
if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
continue;

This means we have to perform the same change we did on
selectDumpableNamespace for selectDumpableTable, and also assign the
correct value to dump_contains, which is not set there.

The problem will come when we have to decide on which indexes were
created by the extension (primary key indexes, other indexes created
by the extension) and which were created afterwards over a table which
depends on the extension (the test_table from the extension).

Right now, I'm in an intermediate state, where I got getIndexes() to
query for the indexes of these tables, but dumpIndexes is not dumping
the indexes that were queried.

I wonder if we should have a selectDumpableIndexes to set the
appropriate dobj.dump for the Indexes.

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlock wait time view

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 4:23 AM, Haribabu Kommi
 wrote:
> There was some discussion earlier in adding pg_stat_lwlock view in [1].
> The main objections which I observed for that patch was showing LWLOCK
> information to the user that don't understand what this lock used for and etc.
>
> Currently as part of wait_event information in pg_stat_activity the LWLOCK
> information is available to the user and the details of LWLOCk's that are
> used in PostgreSQL are also listed in the documentation and with their
> purpose.
>
> So I feel it may be worth to add this view to find out the wait times of the
> LWLOCK's. This information can be useful to find out the bottlenecks
> around LWLOCK's in production environments. But adding the timing calculations
> may cause performance problem. Is there any need of writing this stats
> information to file? As this just provides the wait time information.
>
> Based on the performance impact with the additional timing calculations,
> we can decide the view default behavior, Are there any objections to the
> concept?

There have been some other recent threads on extending the wait event
stuff.  If you haven't already read those, you should, because the
issues are closely related.  I think that timing LWLock waits will be
quite expensive.  I believe that what the Postgres Pro folks want to
do is add up the wait times or maybe keep a history of waits (though
maybe I'm wrong about that), but showing them in pg_stat_activity is
another idea.  That's likely to add some synchronization overhead
which might be even greater in this case than for a feature that just
publishes accumulated times, but maybe it's all a drop in the bucket
compared to the cost of calling gettimeofday() in the first place.

Personally, my preferred solution is still to have a background worker
that samples the published wait events and rolls up statistics, but
I'm not sure I've convinced anyone else.  It could report the number
of seconds since it detected a wait event other than the current one,
which is not precisely the same thing as tracking the length of the
current wait but it's pretty close.  I don't know for sure what's best
here - I think some experimentation and dialog is needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Collective typos in contrib/postgres_fdw (Was: Re: [HACKERS] Incorrect comment in contrib/postgres_fdw/deparse.c)

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 2:41 AM, Etsuro Fujita
 wrote:
> On 2016/04/04 20:11, Etsuro Fujita wrote:
>> I found an incorrect comment in contrib/postgres_fdw/deparse.c: s/FOR
>> SELECT or FOR SHARE/FOR UPDATE or FOR SHARE/  Attached is a patch to fix
>> that comment.
>
> Other than this, I ran into some typos in contrib/postgres_fdw, while
> working on join pushdown improvements.  Please find attached an updated
> version of the patch.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Martín Marqués
2016-08-24 11:15 GMT-03:00 Stephen Frost :
> Michael,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> The patch attached includes all those tests and they are failing. We
>> are going to need a patch able to pass all that, and even for master
>> this is going to need more thoughts, and let's focus on HEAD/9.6
>> first.
>
> Are you sure you have the tests correct..?   At least for testagg(), it
> looks like you're testing for:
>
> GRANT ALL ON FUNCTION test_agg(int2) TO regress_dump_test_role;
>
> but what's in the dump is (equivilantly):
>
> GRANT ALL ON FUNCTION test_agg(smallint) TO regress_dump_test_role;

Yes, that was the problem there.

> I've not looked into all the failures, but at least this one seems like
> an issue in the test, not an issue in pg_dump.

I see the other 12 failures regarding the CREATE INDEX that Michael
reported but can't quite find where it's originated. (or actually
where the problem is)

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stopping logical replication protocol

2016-08-24 Thread Vladimir Gordiychuk
Hi. It has already passed a few months but patch still have required review
state. Can I help to speed up the review, or should i wait commitfest?
I plane complete changes in pgjdbc drive inside PR
https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
with not available stop logical replication.

2016-05-11 7:25 GMT+03:00 Craig Ringer :

> On 11 May 2016 at 06:47, Vladimir Gordiychuk  wrote:
>
>> Same thread, I just think these are two somewhat separate changes. One is
>>> just in the walsender and allows return to command mode during waiting for
>>> WAL. The other is more intrusive into the reorder buffer etc and allows
>>> aborting decoding during commit processing. So two separate patches make
>>> sense here IMO, one on top of the other.
>>
>>
>> About the second part of the patch. What the reason decode and send whole
>> transaction? Why we can't process logical decoding via WalSndLoop LSN by
>> LSN as it work in physycal replication? For example if transaction contains
>> in more them one LSN, first we decode and send "begin", "part data from
>> current LSN" and then returns to WalSndLoop on the next iteration we send
>> "another part data", "commit". I don't research in this way, because I
>> think it will be big changes in comparison callback that stop sending.
>>
>
> There are two parts to that. First, why do we reorder at all, accumulating
> a whole transaction in a reorder buffer until we see a commit then sending
> it all at once? Second, when sending, why don't we return control to the
> walsender between messages?
>
> For the first: reordering xacts server-side lets the client not worry
> about replay order. It just applies them as it receives them. It means the
> server can omit uncommitted transactions from the stream entirely and
> clients can be kept simple(r). IIRC there are also some issues around
> relcache invalidation handling and time travel that make it desirable to
> wait until commit before building a snapshot and decoding, but I haven't
> dug into the details. Andres is the person who knows that area best.
>
> As for why we don't return control to the walsender between change
> callbacks when processing a reorder buffer at commit time, I'm not really
> sure but suspect it's mostly down to easy API and development. If control
> returned to the walsender between each change we'd need an async api for
> the reorder buffer where you test to see if there's more unprocessed work
> and call back into the reorder buffer again if there is. So the reorder
> buffer has to keep state for the progress of replaying a commit in a
> separate struct, handle repeated calls to process work, etc. Also, since
> many individual changes are very small that could lead to a fair bit of
> overhead; it'd probably want to process a batch of changes then return.
> Which makes it even more complicated.
>
> If it returned control to the caller between changes then each caller
> would also need to have the logic to test for more work and call back into
> the reorder buffer. Both the walsender and SQL interface would need it. The
> way it is, the loop is just in one place.
>
> It probably makes more sense to have a callback that can test state and
> abort processing, like you've introduced. The callback could probably even
> periodically check to see if there's client input to process and consume it.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


[HACKERS] Re: [COMMITTERS] pgsql: Modify BufferGetPage() to prepare for "snapshot too old" feature

2016-08-24 Thread Kevin Grittner
On Wed, Aug 24, 2016 at 1:00 PM, Kevin Grittner  wrote:
> On Wed, Aug 24, 2016 at 12:40 PM, Alvaro Herrera  
> wrote:

>> #include catalog/catalog.h in storage/bufmgr.h.

>> Can we get it removed?
>
> Will do that now.

Done.  Back-patched to 9.6 (although I see I forgot to mention that
in the commit message).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2016-08-24 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.  Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.

I regret to announce that I'll have to stay away from this topic for a
little while, as I have another project taking priority.  I expect to
return to this shortly thereafter, hopefully in time to get it done for
pg10.

If anyone is interested in helping with the (currently not compilable)
patch I have, please mail me privately and we can discuss.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-24 Thread Alvaro Herrera
Amit Kapila wrote:
> $SUBJECT will make hash indexes reliable and usable on standby.

Nice work.

Can you split the new xlog-related stuff to a new file, say hash_xlog.h,
instead of cramming it in hash.h?  Removing the existing #include
"xlogreader.h" from hash.h would be nice.  I volunteer for pushing any
preliminary header cleanup commits.

I think it's silly that access/hash.h is the go-to header for hash
usage, including hash_any().  Not this patch's fault, of course, just
venting.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Modify BufferGetPage() to prepare for "snapshot too old" feature

2016-08-24 Thread Kevin Grittner
On Wed, Aug 24, 2016 at 12:40 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> Modify BufferGetPage() to prepare for "snapshot too old" feature
>
> I just noticed that this commit added a line to #include catalog/catalog.h
> in storage/bufmgr.h.  I can't find any reason for it doing so, and I
> think it's a bad line to have there.  Can we get it removed?

Will do that now.  It was initially added because
IsCatalogRelation() was referenced in storage/bufmgr.h, but for
various reasons it seemed better to move that, and I missed the
include.  Thanks for spotting it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-24 Thread Artur Zakirov
Sorry. I did not get last two mails from Amul. Don't know why. So I 
reply to another mail.



Documented as working case, but unfortunatly it does not :

postgres=# SELECT to_timestamp('2000JUN', ' MON');
ERROR:  invalid value "---" for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.


Indeed! Fixed it. Now this query executes without error. Added this case 
to tests.



NODE_TYPE_CHAR assumption in else block seems incorrect. What if we have space 
after double quote?  It will again have incorrect output without any error, see 
below:

postgres=# SELECT to_timestamp('Year: 1976, Month: May, Day: 16',
postgres(# '"Year:" , "Month:" FMMonth, "Day:"   DD');
to_timestamp
--
0006-05-16 00:00:00-07:52:58
(1 row)

I guess, we might need NODE_TYPE_SEPARATOR and NODE_TYPE_SPACE check as well?


Agree. Fixed and added to tests.


Unnecessary hunk?
We should not touch any code unless it necessary to implement proposed feature, 
otherwise it will add unnecessary diff to the patch and eventually extra burden 
to review the code. Similarly hunk in the patch @ line # 313 - 410, nothing to 
do with to_timestamp behaviour improvement, IIUC.

If you think this changes need to be in, please submit separate cleanup-patch.


Fixed it. It was a typo.
About lines # 313 - 410. It is necessary to avoid bugs. I wrote aboud it 
in 
https://www.postgresql.org/message-id/b2a39359-3282-b402-f4a3-057aae500ee7%40postgrespro.ru 
:



- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.


For example, you can test incorrect inputs for to_timestamp(). Try to 
execute such query several times.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6355300..0fe50e1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6146,9 +6146,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  
   
to_timestamp and to_date
-   skip multiple blank spaces in the input string unless the
+   skip multiple blank spaces and printable non letter and non digit
+   characters in the input string and in the formatting string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' MON') works, but
+   to_timestamp('2000JUN', ' MON'),
+   to_timestamp('2000JUN', ' MON')
+   and to_timestamp('2000JUN', 'MON') work, but
to_timestamp('2000JUN', 'FX MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..a3dbcaf 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
  const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,17 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1237,9 +1250,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+in_backslash = false;
 	int			node_set = 0,
-suffix,
-last = 0;
+suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	

[HACKERS] Re: [COMMITTERS] pgsql: Modify BufferGetPage() to prepare for "snapshot too old" feature

2016-08-24 Thread Alvaro Herrera
Kevin Grittner wrote:
> Modify BufferGetPage() to prepare for "snapshot too old" feature

I just noticed that this commit added a line to #include catalog/catalog.h
in storage/bufmgr.h.  I can't find any reason for it doing so, and I
think it's a bad line to have there.  Can we get it removed?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-24 Thread Emre Hasegeli
> Aside from the disturbing frequency of 100-to-1 split ratios, it also
> looks like the inclusion of the masklen bit is hardly pulling its weight,
> though that might be a artifact of this data set.

I was expecting this.  Including masklen bit to decision was something
we could easily do.  It doesn't make the index more complicated, even
more simpler.  I think it would be useful, when host addresses with
masklen are indexed.  IRRExplorer dataset is just networks.

> I think that it'd be worth investigating changing to a split rule that
> uses the next two or three or four bits of the address, not just the
> single next bit, to index into more than four subnodes.  It's pretty clear
> how to adjust the insertion functions to support that, and a crude hack in
> that line suggested that the index does get noticeably smaller.  However,
> I didn't quite grok how to adjust the search (consistent) functions.
> Obviously we could apply the same rules in a loop, considering each
> successive address bit, but there may be a better way.

I have experimented with such designs before posting this patch, but
couldn't make any of them work.  It gets very complicated when more
than one bit is used.  When only 2 bits are used, there would be 12
child nodes:

* network bits 00
* network bits 01
* network bits 10
* network bits 11
* network bit 0 and host bit 0
* network bit 0 and host bit 1
* network bit 1 and host bit 0
* network bit 1 and host bit 1
* host bits 00
* host bits 01
* host bits 10
* host bits 11

Another design would be a prefix tree.  We could split by using a
byte, and store that byte as label.  Then there wouldn't be static
number of nodes.  We would need to iterate trough the labels and
re-execute the condition on all of them.  Surely this would perform
better for some working sets, but I don't think on all them.  I will
experiment with this, if I get the chance.

I belive the current index is useful as it is.  The wasted space
depends on the distribution of the keys:

> # create table a3 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 8)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a4 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 16)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a5 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 32)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create index on a3 using spgist (inet);
> CREATE INDEX
> # create index on a4 using spgist (inet);
> CREATE INDEX
> # create index on a5 using spgist (inet);
> CREATE INDEX
> # create index on a6 using spgist (inet);
> CREATE INDEX
> # \di+
>   List of relations
> Schema |Name | Type  |  Owner   | Table | Size  | Description
> ---+-+---+--+---+---+-
> public | a3_inet_idx | index | hasegeli | a3| 42 MB |
> public | a4_inet_idx | index | hasegeli | a4| 46 MB |
> public | a5_inet_idx | index | hasegeli | a5| 55 MB |
> public | a6_inet_idx | index | hasegeli | a6| 56 MB |
> (4 rows)

It also gets better with the number of keys:

> # create table a6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table b6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 200) 
> as i;
> SELECT 200
> # create table c6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 400) 
> as i;
> SELECT 400
> # create table d6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 800) 
> as i;
> SELECT 800
> # create index on a6 using spgist (inet);
> CREATE INDEX
> # create index on b6 using spgist (inet);
> CREATE INDEX
> # create index on c6 using spgist (inet);
> CREATE INDEX
> # create index on d6 using spgist (inet);
> CREATE INDEX
> # \di+
>   List of relations
> Schema |Name | Type  |  Owner   | Table | Size   | Description
> ---+-+---+--+---++-
> public | a6_inet_idx | index | hasegeli | a6| 56 MB  |
> public | b6_inet_idx | index | hasegeli | b6| 109 MB |
> public | c6_inet_idx | index | hasegeli | c6| 181 MB |
> public | d6_inet_idx | index | hasegeli | a6| 336 MB |
> (4 rows)

Thank you for committing this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-24 Thread Jeff Janes
On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila 
wrote:

> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes  wrote:
>
> >
> > After an intentionally created crash, I get an Assert triggering:
> >
> > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
> > (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553)
> >
> > freep[0] is zero and bitmapbit is 16.
> >
>
> Here what is happening is that when it tries to clear the bitmapbit,
> it expects it to be set.  Now, I think the reason for why it didn't
> find the bit as set could be that after the new overflow page is added
> and the bit corresponding to it is set, you might have crashed the
> system and the replay would not have set the bit.  Then while freeing
> the overflow page it can hit the Assert as mentioned by you.  I think
> the problem here could be that I am using REGBUF_STANDARD to log the
> bitmap page updates which seems to be causing the issue.  As bitmap
> page doesn't follow the standard page layout, it would have omitted
> the actual contents while taking full page image and then during
> replay, it would not have set the bit, because page doesn't need REDO.
> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
> buffers.
>
> If you can send me the detailed steps for how you have produced the
> problem, then I can verify after fixing whether you are seeing the
> same problem or something else.
>


The test is rather awkward, it might be easier to just have me test it.
But, I've attached it.

There is a patch that needs to applied and compiled (alongside your
patches, of course), to inject the crashes.  A perl script which creates
the schema and does the updates.  And a shell script which sets up the
cluster with the appropriate parameters, and then calls the perl script in
a loop.

The top of the shell script has some hard coded paths to the binaries, and
to the test data directory (which is automatically deleted)

I run it like "sh do.sh >& do.err &"

It gives two different types of assertion failures:

$ fgrep TRAP: do.err |sort|uniq -c

 21 TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
(1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553)
 32 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
Line: 2506)

The second one is related to the intentional crashes, and so is not
relevant to you.

Cheers,

Jeff


count.pl
Description: Binary data


crash_REL10.patch
Description: Binary data


do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Some tests to cover hash_index"

2016-08-24 Thread Robert Haas
On Wed, Aug 24, 2016 at 2:34 AM, Amit Kapila  wrote:
> On Wed, Aug 24, 2016 at 11:38 AM, Ashutosh Sharma  
> wrote:
>>> Well, that change should be part of Amit's patch to add WAL logging,
>>> not this patch, whose mission is just to improve test coverage.
>>
>> I have just removed the warning message from expected output file as i
>> have performed the testing on Amit's latest patch that removes this
>> warning message from the  hash index code.
>>
>
> I think you are missing what Robert wants to point out.  You don't
> need to remove the warning message when you are adding new test cases
> on top of Mithun's patch even if those new tests helps to cover the
> code path which is required for wal-hash-index patch.

Right.  The point is, if somebody applies this patch on top of master,
the regression tests will now fail because of that missing line.  That
means nobody is going to commit this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for lazy_scan_heap

2016-08-24 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi,
I haven't tested the performance yet, but the patch itself looks pretty good
and reasonable improvement.
I have a question about removing the comment. It seems to be really tricky
moment. How do we know that all-frozen block hasn't changed since the 
moment we checked it?

-* Tricky, tricky.  If this is in aggressive 
vacuum, the page
-* must have been all-frozen at the time we 
checked whether it
-* was skippable, but it might not be any more. 
 We must be
-* careful to count it as a skipped all-frozen 
page in that
-* case, or else we'll think we can't update 
relfrozenxid and
-* relminmxid.  If it's not an aggressive 
vacuum, we don't
-* know whether it was all-frozen, so we have 
to recheck; but
-* in this case an approximate answer is OK.
+* We know that there are n_skipped pages by 
the visibilitymap scan we
+* did just before.
 */

I'm going to test the performance this week.
I wonder if you could send a test script or describe the steps to test it?

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Andres Freund


On August 24, 2016 9:32:48 AM PDT, Tomas Vondra  
wrote:
>
>
>On 08/24/2016 12:20 AM, Andres Freund wrote:
>> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
>>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>>>  wrote:
 Could someone please explain how the unlogged tables are supposed
>to fix the
 catalog bloat problem, as stated in the initial patch submission?
>We'd still
 need to insert/delete the catalog rows when creating/dropping the
>temporary
 tables, causing the bloat. Or is there something I'm missing?
>>
>> Beats me.
>>
>
>Are you puzzled just like me, or are you puzzled why I'm puzzled?

Like you. I don't think this addresses the problem to a significant enough 
degree to care.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Tomas Vondra



On 08/24/2016 12:20 AM, Andres Freund wrote:

On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:

Could someone please explain how the unlogged tables are supposed to fix the
catalog bloat problem, as stated in the initial patch submission? We'd still
need to insert/delete the catalog rows when creating/dropping the temporary
tables, causing the bloat. Or is there something I'm missing?


Beats me.



Are you puzzled just like me, or are you puzzled why I'm puzzled?



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> I think it's a bit too stupid as-is, though.  We don't need to
>  Tom> recalculate for Params in aggdirectargs, do we?

> In theory we would need to.

How come?  Those are only passed to the final function, no?  They
shouldn't affect what is in the hash table.

> But in practice we don't, because we don't
> allow ordered aggs in AGG_HASHED mode anyway.

True, it's moot at the moment.

> We could skip filling in aggParam at all if not in AGG_HASHED mode I
> guess.

Yeah, I'm planning to make it do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Hm, I was just working on inserting something of the sort into
 Tom> ExecInitAgg.  But I guess we could do it in the planner too.  Will
 Tom> run with your approach.

 Tom> I think it's a bit too stupid as-is, though.  We don't need to
 Tom> recalculate for Params in aggdirectargs, do we?

In theory we would need to. But in practice we don't, because we don't
allow ordered aggs in AGG_HASHED mode anyway.

We could skip filling in aggParam at all if not in AGG_HASHED mode I
guess.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Robert Haas
On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
 wrote:
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

No, not really.  Jim just asked if the idea of partitioning the
columns was completely dead in the water, and I said, no, you could
theoretically salvage it.  Whether that does you much good is another
question.

IMV, the point here is that you MUST have globally visible dependency
entries for this to work sanely.  If they're not in a catalog, they
have to be someplace else, and backend-private memory isn't good
enough, because that's not globally visible.  Until we've got a
strategy for that problem, this whole effort is going nowhere - even
though in other respects it may be a terrific idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-24 Thread Robert Haas
On Tue, Aug 23, 2016 at 11:18 PM, Etsuro Fujita
 wrote:
>> Yes, it seems what we are doing now is not consistent with what
>> happens for plain tables; that should probably be fixed.
>
> OK, I think we should fix the issue that postgres_fdw produces incorrect
> aliases for joining relations shown in the Relations line in EXPLAIN for a
> join pushdown query like the above) in advance of the 9.6 release, so I'll
> add this to the 9.6 open items.

Isn't it a bit late for that?

I'm not eager to have 9.6 get further delayed while we work on this
issue, and I definitely don't believe that this is a sufficiently
important issue to justify reverting join pushdown in its entirety.
We're talking about a minor detail of the EXPLAIN output that we'd
like to fix, but for which we have no consensus on exactly how to fix
it, and the fix may involve some redesign.  That does not seem like a
good thing to the week before rc1.  Let's just leave this well enough
alone and work on it for v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Tom Lane
Andrew Gierth  writes:
> How about:

Hm, I was just working on inserting something of the sort into
ExecInitAgg.  But I guess we could do it in the planner too.
Will run with your approach.

I think it's a bit too stupid as-is, though.  We don't need to
recalculate for Params in aggdirectargs, do we?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 Pavel> The result should not depend on GUC - hashagg on/off changing
 Pavel> output - it is error.

I don't think anyone's suggesting leaving it unfixed, just whether the
fix should introduce unnecessary rescans of the aggregate input.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:

 Tom> I'm not sure if it's worth trying to distinguish whether the Param
 Tom> is inside any aggregate calls or not.

How about:

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..4a9fefb 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3426,10 +3426,11 @@ ExecReScanAgg(AggState *node)
 
 		/*
 		 * If we do have the hash table and the subplan does not have any
-		 * parameter changes, then we can just rescan the existing hash table;
-		 * no need to build it again.
+		 * parameter changes, we might still need to rebuild the hash if any of
+		 * the parameter changes affect the input to aggregate functions.
 		 */
-		if (outerPlan->chgParam == NULL)
+		if (outerPlan->chgParam == NULL
+			&& !bms_overlap(node->ss.ps.chgParam, aggnode->aggParam))
 		{
 			ResetTupleHashIterator(node->hashtable, >hashiter);
 			return;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7a0644..7b5eb86 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -871,6 +871,7 @@ _copyAgg(const Agg *from)
 	COPY_SCALAR_FIELD(aggstrategy);
 	COPY_SCALAR_FIELD(aggsplit);
 	COPY_SCALAR_FIELD(numCols);
+	COPY_BITMAPSET_FIELD(aggParam);
 	if (from->numCols > 0)
 	{
 		COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1fab807..5e48edd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -706,6 +706,7 @@ _outAgg(StringInfo str, const Agg *node)
 	WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
 	WRITE_ENUM_FIELD(aggsplit, AggSplit);
 	WRITE_INT_FIELD(numCols);
+	WRITE_BITMAPSET_FIELD(aggParam);
 
 	appendStringInfoString(str, " :grpColIdx");
 	for (i = 0; i < node->numCols; i++)
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c83063e..67dcf17 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2004,6 +2004,7 @@ _readAgg(void)
 	READ_ENUM_FIELD(aggstrategy, AggStrategy);
 	READ_ENUM_FIELD(aggsplit, AggSplit);
 	READ_INT_FIELD(numCols);
+	READ_BITMAPSET_FIELD(aggParam);
 	READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
 	READ_OID_ARRAY(grpOperators, local_node->numCols);
 	READ_LONG_FIELD(numGroups);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a46cc10..2e56484 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -82,6 +82,7 @@ static Bitmapset *finalize_plan(PlannerInfo *root,
 			  Bitmapset *valid_params,
 			  Bitmapset *scan_params);
 static bool finalize_primnode(Node *node, finalize_primnode_context *context);
+static bool finalize_agg_primnode(Node *node, finalize_primnode_context *context);
 
 
 /*
@@ -2659,8 +2660,30 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			  );
 			break;
 
-		case T_Hash:
 		case T_Agg:
+			{
+Agg		   *agg = (Agg *) plan;
+finalize_primnode_context aggcontext;
+
+/*
+ * We need to know which params are referenced in inputs to
+ * aggregate calls, so that we know whether we need to rebuild
+ * the hashtable in the AGG_HASHED case. Rescan the tlist and
+ * qual for this purpose.
+ */
+
+aggcontext = context;
+aggcontext.paramids = NULL;
+
+finalize_agg_primnode((Node *) agg->plan.targetlist, );
+finalize_agg_primnode((Node *) agg->plan.qual, );
+
+/* remember results for execution */
+agg->aggParam = aggcontext.paramids;
+			}
+			break;
+
+		case T_Hash:
 		case T_Material:
 		case T_Sort:
 		case T_Unique:
@@ -2812,6 +2835,24 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 }
 
 /*
+ * finalize_agg_primnode: add IDs of all PARAM_EXEC params appearing inside
+ * Aggref nodes in the given expression tree to the result set.
+ */
+static bool
+finalize_agg_primnode(Node *node, finalize_primnode_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Aggref))
+	{
+		finalize_primnode(node, context);
+		return false;			/* no more to do here */
+	}
+	return expression_tree_walker(node, finalize_agg_primnode,
+  (void *) context);
+}
+
+/*
  * SS_make_initplan_output_param - make a Param for an initPlan's output
  *
  * The plan is expected to return a scalar value of the given type/collation.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 369179f..3d5e4df 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -712,6 +712,7 @@ typedef struct Agg
 	AggStrategy aggstrategy;	/* basic strategy, see nodes.h */
 	AggSplit	aggsplit;		/* agg-splitting mode, see nodes.h */
 	int			numCols;		/* number of grouping columns */
+	Bitmapset  *aggParam;		/* params 

Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I'm not sure if it's worth trying to distinguish whether the Param
 Tom> is inside any aggregate calls or not.  The existing code gets the
 Tom> right answer for

 Tom> select array(select x+sum(y) from generate_series(1,3) y group by y) 
 Tom>   from generate_series(1,3) x;

 Tom> and we'd be losing some efficiency for cases like that if we fix
 Tom> it as above.  But is it worth the trouble?

The loss of efficiency could be significant, since it's forcing a rescan
of what could be an expensive plan.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Pavel Stehule
2016-08-24 17:08 GMT+02:00 Tom Lane :

> Andrew Gierth  writes:
> > Something is wrong with the way chgParam is being handled in Agg nodes.
> > The code in ExecReScanAgg seems to assume that if the lefttree doesn't
> > have any parameter changes then it suffices to re-project the data from
> > the existing hashtable; but of course this is nonsense if the parameter
> > is in an input to an aggregate function.
>
> It looks like it's sufficient to do this:
>
> diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/
> nodeAgg.c
> index 1ec2515..f468fad 100644
> *** a/src/backend/executor/nodeAgg.c
> --- b/src/backend/executor/nodeAgg.c
> *** ExecReScanAgg(AggState *node)
> *** 3425,3435 
> return;
>
> /*
> !* If we do have the hash table and the subplan does not
> have any
> !* parameter changes, then we can just rescan the existing
> hash table;
> !* no need to build it again.
>  */
> !   if (outerPlan->chgParam == NULL)
> {
> ResetTupleHashIterator(node->hashtable,
> >hashiter);
> return;
> --- 3425,3436 
> return;
>
> /*
> !* If we do have the hash table and there are no relevant
> parameter
> !* changes, then we can just rescan the existing hash
> table; no need
> !* to build it again.
>  */
> !   if (node->ss.ps.chgParam == NULL &&
> !   outerPlan->chgParam == NULL)
> {
> ResetTupleHashIterator(node->hashtable,
> >hashiter);
> return;
>
>
> I'm not sure if it's worth trying to distinguish whether the Param is
> inside any aggregate calls or not.  The existing code gets the right
> answer for
>
> select array(select x+sum(y) from generate_series(1,3) y group by y)
>   from generate_series(1,3) x;
>
> and we'd be losing some efficiency for cases like that if we fix
> it as above.  But is it worth the trouble?
>
>
The result should not depend on GUC - hashagg on/off changing output - it
is error.

Regards

Pavel


> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Tom Lane
Andrew Gierth  writes:
> Something is wrong with the way chgParam is being handled in Agg nodes.
> The code in ExecReScanAgg seems to assume that if the lefttree doesn't
> have any parameter changes then it suffices to re-project the data from
> the existing hashtable; but of course this is nonsense if the parameter
> is in an input to an aggregate function.

It looks like it's sufficient to do this:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..f468fad 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*** ExecReScanAgg(AggState *node)
*** 3425,3435 
return;
  
/*
!* If we do have the hash table and the subplan does not have 
any
!* parameter changes, then we can just rescan the existing hash 
table;
!* no need to build it again.
 */
!   if (outerPlan->chgParam == NULL)
{
ResetTupleHashIterator(node->hashtable, 
>hashiter);
return;
--- 3425,3436 
return;
  
/*
!* If we do have the hash table and there are no relevant 
parameter
!* changes, then we can just rescan the existing hash table; no 
need
!* to build it again.
 */
!   if (node->ss.ps.chgParam == NULL &&
!   outerPlan->chgParam == NULL)
{
ResetTupleHashIterator(node->hashtable, 
>hashiter);
return;


I'm not sure if it's worth trying to distinguish whether the Param is
inside any aggregate calls or not.  The existing code gets the right
answer for

select array(select x+sum(y) from generate_series(1,3) y group by y) 
  from generate_series(1,3) x;

and we'd be losing some efficiency for cases like that if we fix
it as above.  But is it worth the trouble?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-24 Thread Emre Hasegeli
> That would require changing it from an AlterEnumStmt to a RenameStmt
> instead.  Those look to me like they're for renaming SQL objects,
> i.e. things addressed by identifiers, whereas enum labels are just
> strings.  Looking at the implementation of a few of the functions called
> by ExecRenameStmt(), they have very little in common with
> RenameEnumLabel() logic-wise.

Yes, you are right that this is not an identifier like others.  On the
other hand, all RENAME xxx TO yyy statements are RenameStmt at the
moment.  I think we better leave the decision to the committer.

>> What is "nbr"?  Why not calling them something like "i" and "val"?
>
> This is the same naming as used in AddEnumLabel(), which I used as a
> guide.

I see.  Judging from there, it should be shortcut for neighbour.  We
better use something else.

>> Maybe we better release them before reporting error, too.  I would
>> release the list after the loop, close the heap before ereport().
>
> As Tom said, this gets released automatically on error, and is again
> similar to how AddEnumLabel() does it.

I still don't see any reason not to ReleaseCatCacheList() after the
loop instead of writing it 3 times.

> Here is an updated patch.

I don't know, if it is used by the committer or not, but "existance"
is a typo on the commit message.

Other than these, it looks good to me.  I am marking it as Ready for Committer.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-24 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> What does it do if you are displaying more than one function?

> It prints more than one footer.  It's very much like the way that, say,
> rules are printed for tables by \d.

Or to be concrete: instead of


regression=# \df+ foo*

 List of functions
 Schema |  Name   | Result data type | Argument data types |  Type  | 
Volatility | Parallel |  Owner   | Security | Access privileges | Language |
Source code | Description 
+-+--+-+++--+--+--+---+--++-
 public | foo1| integer  | integer | normal | volatile  
 | unsafe   | postgres | invoker  |   | plpgsql  |  
 +| 
| |  | ||   
 |  |  |  |   |  | begin
 +| 
| |  | ||   
 |  |  |  |   |  |   return $1 
+ 1;  +| 
| |  | ||   
 |  |  |  |   |  | end  
 +| 
| |  | ||   
 |  |  |  |   |  |  
  | 
 public | foo2| integer  | integer | normal | volatile  
 | unsafe   | postgres | invoker  |   | sql  |  select $1 + 
2 | 
 public | footest | void | | normal | volatile  
 | unsafe   | postgres | invoker  |   | plpgsql  |  
 +| 
| |  | ||   
 |  |  |  |   |  | -- override 
the global+| 
| |  | ||   
 |  |  |  |   |  | 
#print_strict_params on   +| 
| |  | ||   
 |  |  |  |   |  | declare  
 +| 
| |  | ||   
 |  |  |  |   |  | x record;
 +| 
| |  | ||   
 |  |  |  |   |  | p1 int := 2; 
 +| 
| |  | ||   
 |  |  |  |   |  | p3 text := 
'foo'; +| 
| |  | ||   
 |  |  |  |   |  | begin
 +| 
| |  | ||   
 |  |  |  |   |  |   -- too 
many rows+| 
| |  | ||   
 |  |  |  |   |  |   select * 
from foo where f1 > p1 or f1::text = p3  into strict x;+| 
| |  | ||   
 |  |  |  |   |  |   raise 
notice 'x.f1 = %, x.f2 = %', x.f1, x.f2;  +| 
| |  | ||   
 |  |  |  |   |  | end  
  | 
(3 rows)


you get


regression=# \df+ foo*
   
List of functions
 Schema |  Name   | 

Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> The patch attached includes all those tests and they are failing. We
> are going to need a patch able to pass all that, and even for master
> this is going to need more thoughts, and let's focus on HEAD/9.6
> first.

Are you sure you have the tests correct..?   At least for testagg(), it
looks like you're testing for:

GRANT ALL ON FUNCTION test_agg(int2) TO regress_dump_test_role;

but what's in the dump is (equivilantly):

GRANT ALL ON FUNCTION test_agg(smallint) TO regress_dump_test_role;

I've not looked into all the failures, but at least this one seems like
an issue in the test, not an issue in pg_dump.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-24 Thread Tom Lane
I wrote:
> I've pushed this patch with mostly (not entirely) cosmetic adjustments.
> I still think it'd be worth looking into why the produced index is larger
> than the GiST equivalent, and for that matter exactly why the GiST
> equivalent is so much slower to search.

I spent some time poking at this, and noticed that although the picksplit
rule is guaranteed to produce a non-degenerate split unless the inputs
are all identical, it's entirely capable of producing a very bad split.
Here's some instrumentation printout showing the numbers of items assigned
to the four subnodes, for the test case we've been working with:

 58 inet picksplit (227 tuples): 0 0 127 100
 65 inet picksplit (227 tuples): 0 0 223 4
 69 inet picksplit (225 tuples): 2 0 126 97
 69 inet picksplit (227 tuples): 1 0 226 0
 72 inet picksplit (227 tuples): 0 0 224 3
 72 inet picksplit (227 tuples): 1 0 132 94
 82 inet picksplit (226 tuples): 1 0 127 98
 86 inet picksplit (227 tuples): 0 0 130 97
 95 inet picksplit (227 tuples): 0 0 2 225
 99 inet picksplit (227 tuples): 0 0 225 2
117 inet picksplit (227 tuples): 2 0 126 99
118 inet picksplit (227 tuples): 0 0 128 99
137 inet picksplit (227 tuples): 0 0 226 1
151 inet picksplit (227 tuples): 0 0 1 226
270 inet picksplit (227 tuples): 1 0 127 99
499 inet picksplit (122 tuples): 0 0 64 58

(This is from "sort | uniq -c" output, so the first column is the number
of occurrences of identical split counts.)

Aside from the disturbing frequency of 100-to-1 split ratios, it also
looks like the inclusion of the masklen bit is hardly pulling its weight,
though that might be a artifact of this data set.

The bad splits seem to be a direct contributor to the index's relatively
poor space efficiency; they lead to SPGiST having to move nearly all of
a long leaf chain to another page, and then soon having to split it again,
resulting in another mostly-empty page, lather rinse repeat.  They can't
be very helpful for search speed either.

I think that it'd be worth investigating changing to a split rule that
uses the next two or three or four bits of the address, not just the
single next bit, to index into more than four subnodes.  It's pretty clear
how to adjust the insertion functions to support that, and a crude hack in
that line suggested that the index does get noticeably smaller.  However,
I didn't quite grok how to adjust the search (consistent) functions.
Obviously we could apply the same rules in a loop, considering each
successive address bit, but there may be a better way.

Even though I already committed the code, we're a very long way from
v10 release, so I see no reason not to consider incompatible changes
in the way this opclass works.

I also noticed that the index build wastes some space because SPGIST
remembers only one partly-full page within each of its page categories.
A prototype patch to enlarge the size of that cache helped a good bit
too.  I'll clean that up and post it later, but I was hoping you'd
work on improving this opclass's split rules.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-24 Thread Tom Lane
Peter Eisentraut  writes:
> On 8/22/16 1:52 PM, Pavel Stehule wrote:
>> If I understand to purpose of this patch - it is compromise - PL source
>> is removed from table, but it is printed in result.

> What does it do if you are displaying more than one function?

It prints more than one footer.  It's very much like the way that, say,
rules are printed for tables by \d.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dump/restore doesn't preserve row ordering?

2016-08-24 Thread Kevin Grittner
On Tue, Aug 23, 2016 at 8:43 PM, Tom Lane  wrote:

> It's interesting that nobody has complained about this behavior.

We have been known to emphasize that unless you have an ORDER BY
clause at the outermost level of a query, there is no guarantee
about the order of rows returned.  In general, even with autovacuum
off and no DML running, the same query can return rows in a
different order because of synchronized scans -- so I would tend to
view the lack of complaints as a sign of the maturity and awareness
of the user base.

Of course, it can still be an inconvenience for testing, but that's
a different issue.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:
> "Jeevan" == Jeevan Chalke  writes:

 Jeevan> Hi,
 Jeevan> While playing with LATERAL along with some aggregates in
 Jeevan> sub-query, I have observed somewhat unusual behavior.

 Andrew> Simpler example not needing LATERAL:

 Andrew> select array(select sum(x+y) from generate_series(1,3) y group by y)
 Andrew>   from generate_series(1,3) x;

Oh, and another way to verify that it's a bug and not expected behavior
is that it goes away with set enable_hashagg=false;

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Jeevan" == Jeevan Chalke  writes:

 Jeevan> Hi,
 Jeevan> While playing with LATERAL along with some aggregates in
 Jeevan> sub-query, I have observed somewhat unusual behavior.

Simpler example not needing LATERAL:

select array(select sum(x+y) from generate_series(1,3) y group by y)
  from generate_series(1,3) x;
 ?column? 
--
 {2,4,3}
 {2,4,3}
 {2,4,3}
(3 rows)

This is broken all the way back, it's not a recent bug.

Something is wrong with the way chgParam is being handled in Agg nodes.
The code in ExecReScanAgg seems to assume that if the lefttree doesn't
have any parameter changes then it suffices to re-project the data from
the existing hashtable; but of course this is nonsense if the parameter
is in an input to an aggregate function.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-24 Thread Peter Eisentraut
On 8/22/16 1:52 PM, Pavel Stehule wrote:
> If I understand to purpose of this patch - it is compromise - PL source
> is removed from table, but it is printed in result.

What does it do if you are displaying more than one function?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-24 Thread Thomas Munro
On Wed, Aug 24, 2016 at 2:35 PM, Tsunakawa, Takayuki
 wrote:
> From: Peter Geoghegan [mailto:p...@heroku.com]
>> On Tue, Aug 23, 2016 at 1:44 PM, Bruce Momjian  wrote:
>> >> [Windows]
>> >> #clients  onoff
>> >> 12 29793  38169
>> >> 24 31587 87237
>> >> 48 32588 83335
>> >> 96 34261  67668
>> >
>> > This ranges from a 28% to a 97% speed improvement on Windows!  Those
>> > are not typos!  This is a game-changer for use of Postgres on Windows
>> > for certain workloads!
>>
>> While I don't care all that much about performance on windows, it is a little
>> sad that it took this long to fix something so simple. Consider this 
>> exchange,
>> as a further example of our lack of concern here:
>>
>> https://www.postgresql.org/message-id/30619.1428157...@sss.pgh.pa.us
>
> Probably, the useful Windows Performance Toolkit, which is a counterpart of 
> perf on Linux, was not available before.  Maybe we can dig deeper into 
> performance problems with it now.
>
> As a similar topic, I wonder whether the following still holds true, after 
> many improvements on shared buffer lock contention.
>
> https://www.postgresql.org/docs/devel/static/runtime-config-resource.html
>
> "The useful range for shared_buffers on Windows systems is generally 
> from 64MB to 512MB."

I don't use Windows, but I have heard recently that this is still true
from someone who was testing with pgbench.  He reported a dip in the
curve above 512MB.

Another database vendor recommends granting SeLockMemoryPrivilege so
that it can use large pages on Windows when using several GB of buffer
pool.  I wonder if that might help Postgres on Windows.  This could be
useful as a starting point to test that theory:

https://www.postgresql.org/message-id/CAEepm%3D075-bgHi_VDt4SCAmt%2Bo_%2B1XaRap2zh7XwfZvT294oHA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to do failover in pglogical replication?

2016-08-24 Thread roshan_myrepublic
Hi Craig,


I am trying to set up pglogical replication. I have a table which has
around 4 rows in the provider server.
=
employee_id | visitor_email | vistor_id |date |
message
-+---+---+-+--
   1 | ros...@gmail.com  | 1 | 2016-08-24 00:00:00 |
This is the first test.
   2 | ros...@myrepublic.net | 2 | 2016-08-24 00:00:00 |
This is the second test.
   3 | ros...@myrepublic.net | 3 | 2016-08-24 00:00:00 |
This is the third test.
   4 | ros...@myrepublic.net | 4 | 2016-08-24 00:00:00 |
This is the fourth test.
===

After creating the above mentioned table. I have created a replication set
in the provider . Then I configured the subscriber node and the
subscription. Ideally, I should see the 4 rows from the provider table in
my subscriber, but I am only seeing the table structure in subscriber, not
the data.  If I add a new row in the provider table as follows

INSERT INTO employees (employee_id, visitor_email, date, message) VALUES
(4, 'ros...@myrepublic.net', current_date, 'This is the fourth test.');

Now, I am able to see the last added row in my subscriber table. All the
other 4 rows which were added in the beginning are still missing. What am I
doing wrong here?

Does it mean that only the data which was created after the creation of
replication_sets will be considered? How to add the whole content of a
table to a replication set.?

Any help would be much appreciated.?

Regards,
Muhammed Roshan

On Thu, Aug 18, 2016 at 1:01 PM, Craig Ringer-3 [via PostgreSQL] <
ml-node+s1045698n5916916...@n5.nabble.com> wrote:

> On 17 August 2016 at 18:21, roshan_myrepublic <[hidden email]
> > wrote:
>
>> Hi,
>>
>> I am currently exploring pglogical replication for my db servers. I would
>> like to know how can I automatically failover from Provider Node to
>> Subscriber Node, if the Provider node goes down for some reasons. How can
>> I
>> redirect all the traffic to SubscriberNode automatically ? In the normal
>> replication, we use recovery_file and triggers to get this job done. Do we
>> have any similar alternative for pglogical replications as well?
>
>
>  There is not, as yet, any integegration into tooling like repmgr. You'll
> want some fairly simple scripts to manage failover, likely:
>
> * Update pgbouncer / haproxy / whatever to redirect connections
> * Drop the subscription on the replica
> * STONITH to make sure the master is really down
> * Clone and start a new master
>
> There's some work on automating this through repmgr, but at this time
> pglogical isn't really focused on failover deployments as its main use
> case. The limitations in PostgreSQL's logical decoding and replication when
> it comes to handling of big xacts, sequences, etc mean it's still better
> suited to data movement/integration etc than HA.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://postgresql.nabble.com/How-to-do-failover-in-pglogical-replication-
> tp5916769p5916916.html
> To unsubscribe from How to do failover in pglogical replication?, click
> here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://postgresql.nabble.com/How-to-do-failover-in-pglogical-replication-tp5916769p5917300.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-24 Thread Dagfinn Ilmari Mannsåker
Emre Hasegeli  writes:

>> Here is v4, which changes the command from ALTER VALUE to RENAME VALUE,
>> for consistency with RENAME ATTRIBUTE.
>
> It looks like we always use "ALTER ... RENAME ... old_name TO
> new_name" syntax, so it is better that way.  I have noticed that all
> the other RENAMEs go through ExecRenameStmt().  We better do the same.

That would require changing it from an AlterEnumStmt to a RenameStmt
instead.  Those look to me like they're for renaming SQL objects,
i.e. things addressed by identifiers, whereas enum labels are just
strings.  Looking at the implementation of a few of the functions called
by ExecRenameStmt(), they have very little in common with
RenameEnumLabel() logic-wise.

>> +int nbr_index;
>> +Form_pg_enum nbr_en;
>
> What is "nbr"?  Why not calling them something like "i" and "val"?

This is the same naming as used in AddEnumLabel(), which I used as a
guide.

>> +   /* Locate the element to rename and check if the new label is already in
>
> The project style for multi-line commands is to leave the first line
> of the comment block empty.
>
>> +if (strcmp(NameStr(nbr_en->enumlabel), oldVal) == 0)
>
> I found namestrcmp() for this.

Thanks, fixed.  This again came from using AddEnumLabel() as an example,
which should probably be fixed separately.

>
>> +}
>> +if (!old_tup)
>
> I would leave a space in between.
>
>> +ReleaseCatCacheList(list);
>> +heap_close(pg_enum, RowExclusiveLock);
>
> Maybe we better release them before reporting error, too.  I would
> release the list after the loop, close the heap before ereport().

As Tom said, this gets released automatically on error, and is again
similar to how AddEnumLabel() does it.

Here is an updated patch.

>From 542b20b3a0f82d07203035aa853bae2ddccb6af3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 24 Mar 2016 17:50:58 +
Subject: [PATCH] =?UTF-8?q?Add=20ALTER=20TYPE=20=E2=80=A6=20RENAME=20VALUE?=
 =?UTF-8?q?=20=E2=80=A6=20TO=20=E2=80=A6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Unlike adding values to an enum, altering an existing value doesn't
affect the OID values or ordering, so can be done in a transaction.

IF EXISTS and IF NOT EXISTS are supported for ignoring the non-existence
of the old value or the existance of the new one, respectively.
---
 doc/src/sgml/ref/alter_type.sgml   |  24 +++-
 src/backend/catalog/pg_enum.c  | 117 +
 src/backend/commands/typecmds.c|  52 ++---
 src/backend/parser/gram.y  |  18 ++
 src/include/catalog/pg_enum.h  |   3 +
 src/include/nodes/parsenodes.h |   2 +
 src/test/regress/expected/enum.out |  74 +++
 src/test/regress/sql/enum.sql  |  35 +++
 8 files changed, 301 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 9789881..9f0ca8f 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -29,6 +29,7 @@ ALTER TYPE name RENAME ATTRIBUTE name RENAME TO new_name
 ALTER TYPE name SET SCHEMA new_schema
 ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } existing_enum_value ]
+ALTER TYPE name RENAME VALUE [ IF EXISTS ] existing_enum_value TO new_enum_value [ IF NOT EXISTS ]
 
 where action is one of:
 
@@ -124,6 +125,23 @@ ALTER TYPE name ADD VALUE [ IF NOT

 

+RENAME VALUE [ IF EXISTS ] TO [ IF NOT EXISTS ]
+
+ 
+  This form renames a value in an enum type.  The value's place in the
+  enum's ordering is not affected.
+ 
+ 
+  If IF EXISTS or IF NOT EXISTS is
+  specified, it is not an error if the type doesn't contain the old value
+  or already contains the new value, respectively: a notice is issued but
+  no other action is taken.  Otherwise, an error will occur if the old
+  value is not already present or the new value is.
+ 
+
+   
+
+   
 CASCADE
 
  
@@ -241,7 +259,8 @@ ALTER TYPE name ADD VALUE [ IF NOT
   new_enum_value
   

-The new value to be added to an enum type's list of values.
+The new value to be added to an enum type's list of values,
+or to rename an existing one to.
 Like all enum literals, it needs to be quoted.

   
@@ -252,7 +271,8 @@ ALTER TYPE name ADD VALUE [ IF NOT
   

 The existing enum value that the new value should be added immediately
-before or after in the enum type's sort ordering.
+before or after in the enum type's sort ordering,
+or renamed from.
 Like all enum literals, it needs to be quoted.

   
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index af89daa..1920138 100644
--- a/src/backend/catalog/pg_enum.c

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Alexander Korotkov
On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas 
wrote:

> On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:
>
>> On 08/22/2016 08:38 PM, Andres Freund wrote:
>>
>>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
>>>
 I
 remember seeing ProcArrayLock contention very visible earlier, but I
 can't
 hit that now. I suspect you'd still see contention on bigger hardware,
 though, my laptop has oly 4 cores. I'll have to find a real server for
 the
 next round of testing.

>>>
>>> Yea, I think that's true. I can just about see ProcArrayLock contention
>>> on my more powerful laptop, to see it really bad you need bigger
>>> hardware / higher concurrency.
>>>
>>
>> As soon as I sent my previous post, Vladimir Borodin kindly offered
>> access to a 32-core server for performance testing. Thanks Vladimir!
>>
>> I installed Greg Smith's pgbench-tools kit on that server, and ran some
>> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
>> modifying the test script to use "-M prepared", and using Unix domain
>> sockets instead of TCP to connect. Apparently those things add enough
>> overhead to mask out the little difference.
>>
>> Attached is a graph with the results. Full results are available at
>> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
>> improved throughput, measured in TPS, with >= 32 or so clients. The
>> biggest difference was with 44 clients, which saw about 5% improvement.
>>
>> So, not phenomenal, but it's something. I suspect that with more cores,
>> the difference would become more clear.
>>
>> Like on a cue, Alexander Korotkov just offered access to a 72-core
>> system :-). Thanks! I'll run the same tests on that.
>>
>
> And here are the results on the 72 core machine (thanks again,
> Alexander!). The test setup was the same as on the 32-core machine, except
> that I ran it with more clients since the system has more CPU cores. In
> summary, in the best case, the patch increases throughput by about 10%.
> That peak is with 64 clients. Interestingly, as the number of clients
> increases further, the gain evaporates, and the CSN version actually
> performs worse than unpatched master. I don't know why that is. One theory
> that by eliminating one bottleneck, we're now hitting another bottleneck
> which doesn't degrade as gracefully when there's contention.
>

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S.  Also, it would be nice to check
something like 10% of writes, 90% of reads (which is quite typical workload
in real life I believe).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-24 Thread Magnus Hagander
On Wed, Aug 24, 2016 at 4:35 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Peter Geoghegan [mailto:p...@heroku.com]
> > On Tue, Aug 23, 2016 at 1:44 PM, Bruce Momjian  wrote:
> > >> [Windows]
> > >> #clients  onoff
> > >> 12 29793  38169
> > >> 24 31587 87237
> > >> 48 32588 83335
> > >> 96 34261  67668
> > >
> > > This ranges from a 28% to a 97% speed improvement on Windows!  Those
> > > are not typos!  This is a game-changer for use of Postgres on Windows
> > > for certain workloads!
> >
> > While I don't care all that much about performance on windows, it is a
> little
> > sad that it took this long to fix something so simple. Consider this
> exchange,
> > as a further example of our lack of concern here:
> >
> > https://www.postgresql.org/message-id/30619.1428157...@sss.pgh.pa.us
>
> Probably, the useful Windows Performance Toolkit, which is a counterpart
> of perf on Linux, was not available before.  Maybe we can dig deeper into
> performance problems with it now.
>
> As a similar topic, I wonder whether the following still holds true, after
> many improvements on shared buffer lock contention.
>
> https://www.postgresql.org/docs/devel/static/runtime-config-resource.html
>
> "The useful range for shared_buffers on Windows systems is
> generally from 64MB to 512MB."
>
>
Yes, that may very much be out of date as well. A good set of benchmarks
around that would definitely be welcome.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Heikki Linnakangas

On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.


Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.


And here are the results on the 72 core machine (thanks again, 
Alexander!). The test setup was the same as on the 32-core machine, 
except that I ran it with more clients since the system has more CPU 
cores. In summary, in the best case, the patch increases throughput by 
about 10%. That peak is with 64 clients. Interestingly, as the number of 
clients increases further, the gain evaporates, and the CSN version 
actually performs worse than unpatched master. I don't know why that is. 
One theory that by eliminating one bottleneck, we're now hitting another 
bottleneck which doesn't degrade as gracefully when there's contention.


Full results are available at 
https://hlinnaka.iki.fi/temp/csn-4-72core-results/.


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-24 Thread Simon Riggs
On 22 August 2016 at 16:56, Simon Riggs  wrote:
> On 22 August 2016 at 13:44, Kuntal Ghosh  wrote:
>
>> Please let me know your thoughts on this.
>
> Do the regression tests pass with this option enabled?

Hi,

I'd like to be a reviewer on this. Please can you add this onto the CF
app so we can track the review?

Please supply details of the testing and test coverage.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-08-24 Thread Etsuro Fujita

On 2016/04/04 23:24, Tom Lane wrote:

Amit Langote  writes:

On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote:

* .Observation: *Prepare statement execution plan is not getting changed
even after altering foreign table to point to new schema.



I wonder if performing relcache invalidation upon ATExecGenericOptions()
is the correct solution for this problem.  The attached patch fixes the
issue for me.



A forced relcache inval will certainly fix it, but I'm not sure if that's
the best (or only) place to put it.



A related issue, now that I've seen this example, is that altering
FDW-level or server-level options won't cause a replan either.  I'm
not sure there's any very good fix for that.  Surely we don't want
to try to identify all tables belonging to the FDW or server and
issue relcache invals on all of them.


While improving join pushdown in postgres_fdw, I happened to notice that  
the fetch_size option in 9.6 has the same issue.  A forced replan  
discussed above would fix that issue, but I'm not sure that that's a  
good idea, because the fetch_size option, which postgres_fdw gets at  
GetForeignRelSize, is not used for anything in creating a plan.  So, as  
far as the fetch_size option is concerned, a better solution is (1) to  
consult that option at execution time, probably at BeginForeignScan, not  
at planning time such as GetForiegnRelSize (and (2) to not cause that  
replan when altering that option.)  Maybe I'm missing something, though.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_lwlock wait time view

2016-08-24 Thread Haribabu Kommi
There was some discussion earlier in adding pg_stat_lwlock view in [1].
The main objections which I observed for that patch was showing LWLOCK
information to the user that don't understand what this lock used for and etc.

Currently as part of wait_event information in pg_stat_activity the LWLOCK
information is available to the user and the details of LWLOCk's that are
used in PostgreSQL are also listed in the documentation and with their
purpose.

So I feel it may be worth to add this view to find out the wait times of the
LWLOCK's. This information can be useful to find out the bottlenecks
around LWLOCK's in production environments. But adding the timing calculations
may cause performance problem. Is there any need of writing this stats
information to file? As this just provides the wait time information.

Based on the performance impact with the additional timing calculations,
we can decide the view default behavior, Are there any objections to the
concept?


[1] - https://www.postgresql.org/message-id/4fe9a6f5.2080...@uptime.jp

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-24 Thread Haribabu Kommi
On Tue, Aug 23, 2016 at 3:59 AM, Andres Freund  wrote:
>> Haribabu's categorization
>> scheme seems to need some work, but the idea of categorizing
>> statements by type and counting executions per type seems very
>> reasonable.
>
> I'd consider instead using something like COALESCE(commandType,
> nodeTag(Query->utilityStmt)) as the categories. Not sure if I'd even
> pivot that.

CommandType means to use select, insert, update, delete and utility
as the categorization to display the counters? I think I am not getting
your point correctly.

Apart from the above, here are the following list of command tags that
are generated in the code, I took only the first word of the command tag
just to see how many categories present. The number indicates the
subset of operations or number of types it is used. Like create table,
create function and etc.

insert
delete
update
select (6)
transaction (10)
declare
close (2)
move
fetch
create (37)
drop (36)
Alter (56)
import
truncate
comment
security
copy
grant
revoke
notify
listen
unlisten
load
cluster
vacuum
analyze
explain
refresh
set (2)
reset
show
discard (4)
reassign
lock
checkpoint
reindex
prepare
execute
deallocate

we can try to pick something from the above list also for main
categories and rest will fall under
other.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standalone backend PANICs during recovery

2016-08-24 Thread Bernd Helmle


--On 20. August 2016 12:41:48 -0400 Tom Lane  wrote:

> So at this point I'm pretty baffled as to what the actual use-case is
> here.  I am tempted to say that a standalone backend should refuse to
> start at all if a recovery.conf file is present.  If we do want to
> allow the case, we need some careful thought about what it should do.

Well, the "use case" was a crashed hot standby at a customer site (it kept
PANICing after restarting), where i decided to have a look through the
recovery process with gdb. The exact PANIC was

2016-03-29 15:12:40 CEST [16097]: [26-1] user=,db=,xid=0,vxid=1/0 FATAL:
unexpected GIN leaf action: 0

I had the idea that it was quick and dirty to use a single backend. I was
surprised that this time it PANIC'ed differently

That said, i'm okay if --single is not intended to bring up a hot standby.
There are many other ways to debug such problems.

-- 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-24 Thread Michael Paquier
On Wed, Aug 24, 2016 at 9:07 AM, Michael Paquier
 wrote:
> On Wed, Aug 24, 2016 at 5:34 AM, Martín Marqués  
> wrote:
>> Hi,
>>
>> 2016-08-23 16:46 GMT-03:00 Martín Marqués :
>>>
>>> I will add tests for sequence and functions as you mention and test again.
>>>
>>> Then I'll check if other tests should be added as well.
>>
>> I found quite some other objects we should be checking as well, but
>> this will add some duplication to the tests, as I'd just copy (with
>> minor changes) what's in src/bin/pg_dump/t/002_pg_dump.pl
>>
>> I can't think of a way to avoid this duplication, not that it really
>> hurts. We would have to make sure that any new objects added to one
>> test, if needed, are added to the other (that's a bit cumbersome).
>>
>> Other things to check:
>>
>> CREATE AGGREGATE
>> CREATE FUNCTION

Agreed on those two.

>> CREATE DOMAIN
>> CREATE TYPE

Those two overlap, so adding just a type would be fine.

>> CREATE MATERIALIZED VIEW

This overlaps with normal relations.

>> CREATE POLICY

Policies are not schema-qualified.

>> Maybe even CREATE INDEX over a table created in the schema.

Yes, we can do something fancy things here... But see below as pg_dump
is broken even in this case...

>> Also, ACLs have to be tested against objects in the schema.
>>
>> I hope I didn't miss anything there.

So I have reviewed the patch for master, did some wordsmithing and
added more tests. Particularly, I have added tests for a couple of
objects created in the extension, as well as objects that are not part
of the extension but make use of the schema of the extension. Even
with that, I am still seeing two problems:
- ACLs assigned to an aggregate in an extension are not getting dumped
in a binary upgrade, and I think that they should be. The aggregate
definition gets correctly handled though.
- if an index is defined on a table part of an extension it will not
be dumped. We can fix the problem of this thread and the one I just
found separately though.
The patch attached includes all those tests and they are failing. We
are going to need a patch able to pass all that, and even for master
this is going to need more thoughts, and let's focus on HEAD/9.6
first.
-- 
Michael


pgdump-extension-v2.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Collective typos in contrib/postgres_fdw (Was: Re: [HACKERS] Incorrect comment in contrib/postgres_fdw/deparse.c)

2016-08-24 Thread Etsuro Fujita

On 2016/04/04 20:11, Etsuro Fujita wrote:

I found an incorrect comment in contrib/postgres_fdw/deparse.c: s/FOR
SELECT or FOR SHARE/FOR UPDATE or FOR SHARE/  Attached is a patch to fix
that comment.


Other than this, I ran into some typos in contrib/postgres_fdw, while  
working on join pushdown improvements.  Please find attached an updated  
version of the patch.


Best regards,
Etsuro Fujita


comment-typos-in-contrib-postgres-fdw.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-08-24 Thread Pavel Stehule
2016-08-23 21:00 GMT+02:00 Pavel Stehule :

> Hi
>
> 2016-08-19 10:58 GMT+02:00 Pavel Stehule :
>
>> Hi
>>
>> I am sending implementation of xmltable function. The code should to have
>> near to final quality and it is available for testing.
>>
>> I invite any help with documentation and testing.
>>
>
> new update - the work with nodes is much more correct now.
>

next update

fix memory leak

Pavel


>
> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 169a385..a6334b6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10099,6 +10099,47 @@ SELECT xmlroot(xmlparse(document 'abc'),
 

 
+   
+xmltable
+
+   
+xmltable
+   
+
+
+xmltable(xmlnamespaces(namespace uri AS namespace name|DEFAULT namespace uri , ...) rowexpr PASSING BY REF xml BY REF COLUMNS name type PATH columnexpr DEFAULT expr NOT NULL|NULL , ...)
+
+
+
+  The xmltable produces table based on passed XML value.
+
+
+
+
+
+   
+

 xmlagg
 
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 743e7d6..7abc367 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -189,6 +189,9 @@ static Datum ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 static Datum ExecEvalGroupingFuncExpr(GroupingFuncExprState *gstate,
 		 ExprContext *econtext,
 		 bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalTableExpr(TableExprState *tstate,
+		ExprContext *econtext,
+		bool *isnull, ExprDoneCond *isDone);
 
 
 /* 
@@ -4500,6 +4503,218 @@ ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 	return 0;	/* keep compiler quiet */
 }
 
+/* 
+ *		ExecEvalTableExpr
+ *
+ * 
+ */
+
+#define XMLTABLE_DEFAULT_NAMESPACE			"pgdefxmlns"
+
+static Datum
+execEvalTableExpr(TableExprState *tstate,
+		ExprContext *econtext,
+		bool *isNull, ExprDoneCond *isDone)
+{
+	TupleDesc tupdesc;
+	HeapTuple		tuple;
+	HeapTupleHeader		result;
+	int	i;
+	Datum		*values;
+	bool		*nulls;
+	Datum	value;
+	bool	isnull;
+	xmltype		   *xmlval;
+	text		   *row_path;
+
+	tupdesc = tstate->tupdesc;
+
+	if (tstate->firstRow)
+	{
+		ListCell	*ns;
+
+		/* Evaluate expression first */
+		value = ExecEvalExpr(tstate->expr, econtext, , NULL);
+		if (isnull)
+		{
+			*isDone = ExprSingleResult;
+			*isNull = true;
+			return (Datum) 0;
+		}
+		xmlval = DatumGetXmlP(value);
+
+		value = ExecEvalExpr(tstate->row_path_expr, econtext, , NULL);
+		if (isnull)
+ereport(ERROR,
+		(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+	  errmsg("row query must not be null")));
+		row_path = DatumGetTextP(value);
+
+		Assert(tstate->xmltableCxt == NULL);
+		tstate->xmltableCxt = initXmlTableContext(xmlval,
+	tstate->used_dns ?
+		XMLTABLE_DEFAULT_NAMESPACE : NULL,
+	tstate->ncols,
+			  tstate->in_functions,
+			  tstate->typioparams,
+	econtext->ecxt_per_query_memory);
+
+		foreach(ns, tstate->namespaces)
+		{
+			Node	   *n = (Node *) lfirst(ns);
+			ExprState  *expr;
+			char	   *ns_name;
+			char	   *ns_uri;
+
+			if (IsA(n, NamedArgExpr))
+			{
+NamedArgExpr *na = (NamedArgExpr *) n;
+
+expr = (ExprState *) na->arg;
+ns_name = na->name;
+			}
+			else
+			{
+expr = (ExprState *) n;
+ns_name = NULL;
+			}
+
+			value = ExecEvalExpr((ExprState *) expr, econtext, , NULL);
+			if (isnull)
+ereport(ERROR,
+		(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+	  errmsg("namespace uri must not be null")));
+			ns_uri = text_to_cstring(DatumGetTextP(value));
+
+			XmlTableSetNs(tstate->xmltableCxt, ns_name, ns_uri);
+		}
+
+		XmlTableSetRowPath(tstate->xmltableCxt, row_path);
+
+		/* Path caclulation */
+		for (i = 0; i < tstate->ncols; i++)
+		{
+			char *col_path;
+
+			if (tstate->col_path_expr[i] != NULL)
+			{
+value = ExecEvalExpr(tstate->col_path_expr[i], econtext, , NULL);
+if (isnull)
+	ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+		  errmsg("column path for column \"%s\" must not be null",
+			  NameStr(tupdesc->attrs[i]->attname;
+col_path = text_to_cstring(DatumGetTextP(value));
+			}
+			else
+col_path = NameStr(tupdesc->attrs[i]->attname);
+
+			XmlTableSetColumnPath(tstate->xmltableCxt, i,
+			tupdesc->attrs[i]->atttypid, col_path);
+		}
+		tstate->firstRow = false;
+	}
+
+	values = tstate->values;
+	nulls = tstate->nulls;
+
+	if (XmlTableFetchRow(tstate->xmltableCxt))
+	{
+		if (tstate->ncols > 0)
+		{
+			for (i = 0; i < tstate->ncols; i++)
+			{
+if (i != tstate->for_ordinality_col - 1)
+{
+	values[i] = 

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-24 Thread Amit Kapila
On Wed, Aug 24, 2016 at 11:44 AM, Mark Kirkwood
 wrote:
> On 24/08/16 17:01, Mark Kirkwood wrote:
>>
>>
>> ...actually I was wrong there, only 2 of them were the same. So I've
>> attached a new log of bt's of them all.
>>
>>
>>
>
> And I can reproduce with only 1 session, figured that might be a helpful
> piece of the puzzle (trace attached).
>

Thanks.

I think I know the problem here.  Basically _hash_freeovflpage() is
trying to take a lock on a buffer previous to overflow page to update
the links and it is quite possible that the same buffer is already
locked for moving the tuples while squeezing the bucket.  I am working
on a fix for the same.

Coincidently, Ashutosh Sharma a colleague of mine who was also testing
this patch found the same issue by an attached sql script.  So we
might be able to inculcate a test case  in the regression suite as
well after fix.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


test_hash.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Some tests to cover hash_index"

2016-08-24 Thread Amit Kapila
On Wed, Aug 24, 2016 at 11:38 AM, Ashutosh Sharma  wrote:
> Hi,
>
>> Well, that change should be part of Amit's patch to add WAL logging,
>> not this patch, whose mission is just to improve test coverage.
>
> I have just removed the warning message from expected output file as i
> have performed the testing on Amit's latest patch that removes this
> warning message from the  hash index code.
>

I think you are missing what Robert wants to point out.  You don't
need to remove the warning message when you are adding new test cases
on top of Mithun's patch even if those new tests helps to cover the
code path which is required for wal-hash-index patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - minor doc improvements

2016-08-24 Thread Fabien COELHO



I looked at this, but I don't really find any of these changes to be
improvements.  They just make it harder to read IMO.


I marked the patch as rejected in CF 2016-09.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-24 Thread Mark Kirkwood

On 24/08/16 17:01, Mark Kirkwood wrote:


...actually I was wrong there, only 2 of them were the same. So I've 
attached a new log of bt's of them all.






And I can reproduce with only 1 session, figured that might be a helpful 
piece of the puzzle (trace attached).



$ pgbench -c 1 -T600 bench

bench=# SELECT pid,state,now()-xact_start AS 
wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE 
datname='bench' ORDER BY wait DESC;
  pid  | state  |  wait   | wait_event_type |   wait_event   |  
 
query   
 

---++-+-++---
-

 17266 | active | 00:45:23.027555 | LWLockTranche   | buffer_content | INSERT 
INTO pgbenc
h_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, 
CURRENT_TIMESTAM
P);
(1 row)

$ gdb -p 17266 
(gdb) bt
#0  0x7f7b4d533387 in semop () at ../sysdeps/unix/syscall-template.S:84
#1  0x00660331 in PGSemaphoreLock (sema=sema@entry=0x7f7b4cad1690)
at pg_sema.c:387
#2  0x006bde9b in LWLockAcquire (lock=0x7f7b44583fe4, mode=LW_EXCLUSIVE)
at lwlock.c:1288
#3  0x0049a625 in _hash_getbuf_with_strategy 
(rel=rel@entry=0x7f7b4e0c1908, 
blkno=blkno@entry=12, access=access@entry=2, flags=flags@entry=1, 
bstrategy=bstrategy@entry=0x0) at hashpage.c:252
#4  0x00498040 in _hash_freeovflpage (rel=rel@entry=0x7f7b4e0c1908, 
bucketbuf=bucketbuf@entry=12796, ovflbuf=ovflbuf@entry=236, 
wbuf=wbuf@entry=13166, 
itups=itups@entry=0x7ffc693fc800, 
itup_offsets=itup_offsets@entry=0x13067d0, 
tups_size=0x7ffc693fd4c0, nitups=34, bstrategy=0x0) at hashovfl.c:517
#5  0x00499985 in _hash_squeezebucket (rel=rel@entry=0x7f7b4e0c1908, 
bucket=bucket@entry=7, bucket_blkno=bucket_blkno@entry=10, 
bucket_buf=bucket_buf@entry=12796, bstrategy=bstrategy@entry=0x0) at 
hashovfl.c:1010
#6  0x00496caf in hashbucketcleanup (rel=rel@entry=0x7f7b4e0c1908, 
bucket_buf=bucket_buf@entry=12796, bucket_blkno=bucket_blkno@entry=10, 
bstrategy=bstrategy@entry=0x0, maxbucket=22, highmask=31, lowmask=15, 
tuples_removed=0x0, num_index_tuples=0x0, bucket_has_garbage=1 '\001', 
delay=0 '\000', callback=0x0, callback_state=0x0) at hash.c:937
#7  0x0049b353 in _hash_expandtable (rel=rel@entry=0x7f7b4e0c1908, 
metabuf=metabuf@entry=6179) at hashpage.c:785
#8  0x00497bae in _hash_doinsert (rel=rel@entry=0x7f7b4e0c1908, 
itup=itup@entry=0x13060b8) at hashinsert.c:313
#9  0x0049617f in hashinsert (rel=0x7f7b4e0c1908, values=, 
isnull=, ht_ctid=0x1305f7c, heapRel=, 
checkUnique=) at hash.c:247
#10 0x005bf060 in ExecInsertIndexTuples (slot=slot@entry=0x1304a70, 
tupleid=tupleid@entry=0x1305f7c, estate=estate@entry=0x13043b0, 
noDupErr=noDupErr@entry=0 '\000', specConflict=specConflict@entry=0x0, 
arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:388
#11 0x005dd011 in ExecInsert (canSetTag=1 '\001', estate=0x13043b0, 
onconflict=, arbiterIndexes=0x0, planSlot=0x1304a70, 
slot=0x1304a70, 
mtstate=0x1304600) at nodeModifyTable.c:481
#12 ExecModifyTable (node=node@entry=0x1304600) at nodeModifyTable.c:1496
#13 0x005c3948 in ExecProcNode (node=node@entry=0x1304600) at 
execProcnode.c:396
#14 0x005bfdbf in ExecutePlan (dest=0x12f4bd8, direction=, 
numberTuples=0, sendTuples=, operation=CMD_INSERT, 
use_parallel_mode=, planstate=0x1304600, estate=0x13043b0)
at execMain.c:1567
#15 standard_ExecutorRun (queryDesc=0x1303fa0, direction=, 
count=0)
at execMain.c:338
#16 0x006ce669 in ProcessQuery (plan=, 
sourceText=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, 
mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP);", params=0x0, 
dest=0x12f4bd8, 
completionTag=0x7ffc69400c30 "") at pquery.c:187
#17 0x006ce89c in PortalRunMulti (portal=portal@entry=0x1271fe0, 
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12f4bd8, 
---Type  to continue, or q  to quit---
altdest=altdest@entry=0x12f4bd8, 
completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:1303
#18 0x006cf34e in PortalRun (portal=portal@entry=0x1271fe0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 
'\001', 
dest=dest@entry=0x12f4bd8, altdest=altdest@entry=0x12f4bd8, 
completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:815
#19 0x006cc45b in exec_simple_query (
query_string=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, 
mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP);") at postgres.c:1094
#20 PostgresMain (argc=, 

Re: [HACKERS] "Some tests to cover hash_index"

2016-08-24 Thread Ashutosh Sharma
Hi,

> Well, that change should be part of Amit's patch to add WAL logging,
> not this patch, whose mission is just to improve test coverage.

I have just removed the warning message from expected output file as i
have performed the testing on Amit's latest patch that removes this
warning message from the  hash index code.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers