date:20150116

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Pavel Stehule

2015-01-16 20:33 GMT+01:00 Jim Nasby :

> On 1/16/15 12:30 PM, Pavel Stehule wrote:
>
>>
>>
>> 2015-01-16 19:24 GMT+01:00 Pavel Stehule > >:
>>
>>
>>
>> 2015-01-16 19:06 GMT+01:00 Jim Nasby > >:
>>
>> On 1/16/15 11:35 AM, Pavel Stehule wrote:
>>
>>
>>
>> 2015-01-16 18:23 GMT+01:00 Jim Nasby <
>> jim.na...@bluetreble.com  > Jim.Nasby@bluetreble.__com >>:
>>
>>  On 1/16/15 11:00 AM, Pavel Stehule wrote:
>>
>>  Hi all,
>>
>>  some time ago, I proposed a lock time measurement
>> related to query. A main issue was a method, how to show this information.
>> Today proposal is little bit simpler, but still useful. We can show a total
>> lock time per database in pg_stat_database statistics. High number can be
>> signal about lock issues.
>>
>>
>>  Would this not use the existing stats mechanisms? If so,
>> couldn't we do this per table? (I realize that won't handle all cases; we'd
>> still need a "lock_time_other" somewhere).
>>
>>
>>
>> it can use a current existing stats mechanisms
>>
>> I afraid so isn't possible to assign waiting time to table -
>> because it depends on order
>>
>>
>> Huh? Order of what?
>>
>>
>> when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
>> locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
>> to cont as lock time for T1 and T2?
>>
>
> If that select is waiting on a lock on t2, then it's waiting on that lock
> on that table. It doesn't matter who else has the lock.
>
>   Also, what do you mean by 'lock'? Heavyweight? We
>> already have some visibility there. What I wish we had was some way to know
>> if we're spending a lot of time in a particular non-heavy lock. Actually
>> measuring time probably wouldn't make sense but we might be able to count
>> how often we fail initial acquisition or something.
>>
>>
>> now, when I am thinking about it, lock_time is not good name
>> - maybe "waiting lock time" (lock time should not be interesting, waiting
>> is interesting) - it can be divided to some more categories - in GoodData
>> we use Heavyweight, pages, and others categories.
>>
>>
>> So do you see this somehow encompassing locks other than
>> heavyweight locks? Because I think that's the biggest need here. Basically,
>> something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't
>> depend on dtrace.
>>
>>
>> For these global statistics I see as important a common total waiting
>> time for locks - we can use a more detailed granularity but I am not sure,
>> if a common statistics are best tool.
>>
>
> Locks may be global, but what you're waiting for a lock on certainly
> isn't. It's almost always a lock either on a table or a row in a table. Of
> course this does mean you can't just blindly report that you're blocked on
> some XID; that doesn't tell anyone anything.
>
>  My motivations is - look to statistics -- and I can see ... lot of
>> rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
>> too. It is tool for people without possibility to use dtrace and similar
>> tools and for everyday usage - simple check if locks are not a issue (or if
>> locking is stable).
>>
>
> Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just
> about as useful. Or just turn on lock logging.
>
> If you really want to add it at the database level I'm not opposed (so
> long as it leaves the door open for more granular locking later), but I
> can't really get excited about it either.
>
>  and this proposal has sense only for heavyweight locks - because others
>> locks are everywhere
>>
>
> So what if they're everywhere? Right now if you're spending a lot of time
> waiting for LWLocks you have no way to know what's going on unless you
> happen to have dtrace. Obviously we're not going to something like issue a
> stats update every time we attempt to acquire an LWLock, but that doesn't
> mean we can't keep some counters on the locks and periodically report that.


I have a plan to update statistics when all necessary keys are acquired -
so it is once per statement - it is similar press on stats system like now.

Pavel


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Pavel Stehule

2015-01-16 22:35 GMT+01:00 Andrew Dunstan :

>
> On 01/16/2015 12:22 PM, Pavel Stehule wrote:
>
>>
>>
>> There two possible transformations:
>>
>> row_to_array --> [[key1, value1],[key2, value2], ...]
>> row_to_row_array --> [(key1, value1), (key2, value2), ... ]
>>
>>
>> If we're going to go that route, I think it makes more sense to
>> create an actual key/value type (ie:
>> http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.
>>
>>
>> ok
>>
>> 
>>
>>
>
> I think we'd possibly be better off with simply returning a flat array,
> [key1, value1, ...]
>
> Thats's what the hstore(text[]) and json_object(text[]) functions accept,
> along with the 2D variant, if we want a precedent.
>

It can be one of supported variant. I should not be one, because we cannot
to simply iterate over it

Next possibility is teach FOREACH to take key and value in one step.

Regards

Pavel


>
> cheers
>
> andrew
>
>

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread David Johnston

On Fri, Jan 16, 2015 at 10:19 PM, Amit Kapila 
wrote:

> On Sat, Jan 17, 2015 at 10:41 AM, David Johnston <
> david.g.johns...@gmail.com> wrote:
> > On Fri, Jan 16, 2015 at 10:08 PM, Amit Kapila 
> wrote:
> >>
> >> On Sat, Jan 17, 2015 at 10:02 AM, David G Johnston <
> david.g.johns...@gmail.com> wrote:
> >> > > You're right.
> >> > > pg_setting and SHOW command use value in current session rather than
> >> > > config file.
> >> > > It might break these common infrastructure.
> >> >
> >> > Two changes solve this problem in what seems to be a clean way.
> >> > 1) Upon each parsing of postgresql.conf we store all assigned
> variables
> >> > somewhere
> >> > 2) We display these assignments in a new pg_settings column named
> >> > "system_reset_val"
> >> >
> >> > I would also extend this to include:
> >> > a) upon each parsing of postgresql.auto.conf we store all assigned
> variables
> >> > somewhere (maybe the same place as postgresql.conf and simply label
> the file
> >> > source)
> >>
> >> Do we need to perform this parsing whenever user queries pg_settings?
> >> I think it might lead to extra cycles of reading file when user won't
> even
> >> need it and as the code is shared with SHOW commands that could be
> >> slightly complicated.
> >>
> >
> > There would be no parsing upon reading of pg_settings, only lookups.
> The existing parsing would simply have its values saved to the catalogs
> that will be involved in the underlying pg_setting view query.
> >
> So are you telling that whenever we read, save the settings
> to some catalog (probably a new one)?
>
> Will that completely address the problem specified in this thread,
> as those values could probably be old (when last time server is
> started or at last SIGHUP time values)?
>
>
Cache invalidation is a hard problem to solve :)

I am reasonably content with showing the user who has just updated their
postgresql.conf file and boots/SIGHUPs the server to find that said value
hasn't taken effect that their value is indeed sitting in postgresql.conf
but that other parts of the system are preempting it from being the active
value.

David J.

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread Amit Kapila

On Sat, Jan 17, 2015 at 10:41 AM, David Johnston 
wrote:
> On Fri, Jan 16, 2015 at 10:08 PM, Amit Kapila 
wrote:
>>
>> On Sat, Jan 17, 2015 at 10:02 AM, David G Johnston <
david.g.johns...@gmail.com> wrote:
>> > > You're right.
>> > > pg_setting and SHOW command use value in current session rather than
>> > > config file.
>> > > It might break these common infrastructure.
>> >
>> > Two changes solve this problem in what seems to be a clean way.
>> > 1) Upon each parsing of postgresql.conf we store all assigned variables
>> > somewhere
>> > 2) We display these assignments in a new pg_settings column named
>> > "system_reset_val"
>> >
>> > I would also extend this to include:
>> > a) upon each parsing of postgresql.auto.conf we store all assigned
variables
>> > somewhere (maybe the same place as postgresql.conf and simply label
the file
>> > source)
>>
>> Do we need to perform this parsing whenever user queries pg_settings?
>> I think it might lead to extra cycles of reading file when user won't
even
>> need it and as the code is shared with SHOW commands that could be
>> slightly complicated.
>>
>
> There would be no parsing upon reading of pg_settings, only lookups.  The
existing parsing would simply have its values saved to the catalogs that
will be involved in the underlying pg_setting view query.
>
So are you telling that whenever we read, save the settings
to some catalog (probably a new one)?

Will that completely address the problem specified in this thread,
as those values could probably be old (when last time server is
started or at last SIGHUP time values)?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread David Johnston

On Fri, Jan 16, 2015 at 10:08 PM, Amit Kapila 
wrote:

> On Sat, Jan 17, 2015 at 10:02 AM, David G Johnston <
> david.g.johns...@gmail.com> wrote:
> > > You're right.
> > > pg_setting and SHOW command use value in current session rather than
> > > config file.
> > > It might break these common infrastructure.
> >
> > Two changes solve this problem in what seems to be a clean way.
> > 1) Upon each parsing of postgresql.conf we store all assigned variables
> > somewhere
> > 2) We display these assignments in a new pg_settings column named
> > "system_reset_val"
> >
> > I would also extend this to include:
> > a) upon each parsing of postgresql.auto.conf we store all assigned
> variables
> > somewhere (maybe the same place as postgresql.conf and simply label the
> file
> > source)
>
> Do we need to perform this parsing whenever user queries pg_settings?
> I think it might lead to extra cycles of reading file when user won't even
> need it and as the code is shared with SHOW commands that could be
> slightly complicated.
>
>
There would be no parsing upon reading of pg_settings, only lookups.  The
existing parsing would simply have its values saved to the catalogs that
will be involved in the underlying pg_setting view query.

David J.

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread Amit Kapila

On Sat, Jan 17, 2015 at 10:02 AM, David G Johnston <
david.g.johns...@gmail.com> wrote:
> > You're right.
> > pg_setting and SHOW command use value in current session rather than
> > config file.
> > It might break these common infrastructure.
>
> Two changes solve this problem in what seems to be a clean way.
> 1) Upon each parsing of postgresql.conf we store all assigned variables
> somewhere
> 2) We display these assignments in a new pg_settings column named
> "system_reset_val"
>
> I would also extend this to include:
> a) upon each parsing of postgresql.auto.conf we store all assigned
variables
> somewhere (maybe the same place as postgresql.conf and simply label the
file
> source)

Do we need to perform this parsing whenever user queries pg_settings?
I think it might lead to extra cycles of reading file when user won't even
need it and as the code is shared with SHOW commands that could be
slightly complicated.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-01-16 Thread Robert Haas

On Fri, Jan 16, 2015 at 11:27 PM, Amit Kapila  wrote:
> On Fri, Jan 16, 2015 at 11:49 PM, Robert Haas  wrote:
>> As mentioned downthread, a far bigger consideration is the I/O pattern
>> we create.  A sequential scan is so-called because it reads the
>> relation sequentially.  If we destroy that property, we will be more
>> than slightly sad.  It might be OK to do sequential scans of, say,
>> each 1GB segment separately, but I'm pretty sure it would be a real
>> bad idea to read 8kB at a time at blocks 0, 64, 128, 1, 65, 129, ...
>>
>> What I'm thinking about is that we might have something like this:
>>
>> struct this_lives_in_dynamic_shared_memory
>> {
>> BlockNumber last_block;
>> Size prefetch_distance;
>> Size prefetch_increment;
>> slock_t mutex;
>> BlockNumber next_prefetch_block;
>> BlockNumber next_scan_block;
>> };
>>
>> Each worker takes the mutex and checks whether next_prefetch_block -
>> next_scan_block < prefetch_distance and also whether
>> next_prefetch_block < last_block.  If both are true, it prefetches
>> some number of additional blocks, as specified by prefetch_increment.
>> Otherwise, it increments next_scan_block and scans the block
>> corresponding to the old value.
>
> Assuming we will increment next_prefetch_block only after prefetching
> blocks (equivalent to prefetch_increment), won't 2 workers can
> simultaneously see the same value for next_prefetch_block and try to
> perform prefetch for same blocks?

The idea is that you can only examine and modify next_prefetch_block
or next_scan_block while holding the mutex.

> What will be value of prefetch_increment?
> Will it be equal to prefetch_distance or prefetch_distance/2 or
> prefetch_distance/4 or .. or will it be totally unrelated to
> prefetch_distance?

I dunno, that might take some experimentation.  prefetch_distance/2
doesn't sound stupid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread David G Johnston

Sawada Masahiko wrote
> On Fri, Jan 16, 2015 at 12:54 PM, Amit Kapila <

> amit.kapila16@

> > wrote:
>> On Thu, Jan 15, 2015 at 9:48 PM, Sawada Masahiko <

> sawada.mshk@

> >
>> wrote:
>>> On Thu, Jan 15, 2015 at 2:02 PM, Amit Kapila <

> amit.kapila16@

> >
>>> wrote:
>>> >
>>> > One thought I have in this line is that currently there doesn't seem
>>> to
>>> > be
>>> > a way to know if the setting has an entry both in postgresql.conf and
>>> > postgresql.auto.conf, if we can have some way of knowing the same
>>> > (pg_settings?), then it could be convenient for user to decide if the
>>> > value
>>> > in postgresql.auto.conf is useful or not and if it's not useful then
>>> use
>>> > Alter System .. Reset command to remove the same from
>>> > postgresql.auto.conf.
>>>
>>> I think one way is that pg_settings has file name of variables,  But
>>> It would not affect to currently status of postgresql.conf
>>> So we would need to parse postgresql.conf again at that time.
>>>
>>
>> Yeah that could be a possibility, but I think that will break the
>> existing
>> command('s) as this is the common infrastructure used for SHOW ..
>> commands as well which displays the guc value that is used by
>> current session rather than the value in postgresql.conf.
> 
> You're right.
> pg_setting and SHOW command use value in current session rather than
> config file.
> It might break these common infrastructure.

Two changes solve this problem in what seems to be a clean way.
1) Upon each parsing of postgresql.conf we store all assigned variables
somewhere
2) We display these assignments in a new pg_settings column named
"system_reset_val"

I would also extend this to include:
a) upon each parsing of postgresql.auto.conf we store all assigned variables
somewhere (maybe the same place as postgresql.conf and simply label the file
source)
b) add an "alter_system_val" field to show that value (or null)
c) add a "db_role_val" to show the current value for the session via
pg_db_role_setting
c.1) add a "db_role_id" to show the named user that is being used for the
db_role_val lookup

The thinking for c.1 is that in situations with role hierarchies and SET
ROLE usage it would be nice to be able to query what the connection role -
the one used during variable lookup - is.

I'm probably going overkill on this but there are not a lot of difference
sources nor do they change frequently so extending the pg_settings view to
be more of a one-stop-shopping for this information seems to make sense.

As it relates back to this thread the desired "merging" ends up being done
inside this view and at least gives the DBA a central location (well, along
with pg_db_role_setting) to go and look at the configuration landscape for
the system.

David J.





--
View this message in context: 
http://postgresql.nabble.com/Merging-postgresql-conf-and-postgresql-auto-conf-tp5833910p5834386.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-16 Thread Amit Kapila

On Fri, Jan 16, 2015 at 11:49 PM, Robert Haas  wrote:
>
> As mentioned downthread, a far bigger consideration is the I/O pattern
> we create.  A sequential scan is so-called because it reads the
> relation sequentially.  If we destroy that property, we will be more
> than slightly sad.  It might be OK to do sequential scans of, say,
> each 1GB segment separately, but I'm pretty sure it would be a real
> bad idea to read 8kB at a time at blocks 0, 64, 128, 1, 65, 129, ...
>
> What I'm thinking about is that we might have something like this:
>
> struct this_lives_in_dynamic_shared_memory
> {
> BlockNumber last_block;
> Size prefetch_distance;
> Size prefetch_increment;
> slock_t mutex;
> BlockNumber next_prefetch_block;
> BlockNumber next_scan_block;
> };
>
> Each worker takes the mutex and checks whether next_prefetch_block -
> next_scan_block < prefetch_distance and also whether
> next_prefetch_block < last_block.  If both are true, it prefetches
> some number of additional blocks, as specified by prefetch_increment.
> Otherwise, it increments next_scan_block and scans the block
> corresponding to the old value.
>

Assuming we will increment next_prefetch_block only after prefetching
blocks (equivalent to prefetch_increment), won't 2 workers can
simultaneously see the same value for next_prefetch_block and try to
perform prefetch for same blocks?

What will be value of prefetch_increment?
Will it be equal to prefetch_distance or prefetch_distance/2 or
prefetch_distance/4 or .. or will it be totally unrelated
to prefetch_distance?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread Amit Kapila

On Fri, Jan 16, 2015 at 9:55 PM, Sawada Masahiko 
wrote:
> On Fri, Jan 16, 2015 at 12:54 PM, Amit Kapila 
wrote:
> >
> > I don't know how appealing it would be to others, but a new view
> > like pg_file_settings which would display the settings in file could
> > be meaningful for your need.
> >
> > Another way is user can do pg_reload_conf() to see the latest
> > values (excluding values for server startup time parameters).
> >
>
> I prefer the former. Because the latter would not handle all guc
> variables as you said.
> The new view like pg_file_setting has guc variable and config file
> which has corresponding guc variable.
> It would be helpful view for like ALTER SYSTEM ... RESET and that
> command might be beneficial feature for user who want to manage
> configuration file manually, I would like to propose them.
>

Okay, but I think it would be better to start a new thread
as this proposal seems to be different than your original
idea.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] n_live_tup smaller than the number of rows in a table

2015-01-16 Thread Lisa Guo

The server did crash yesterday. It has many schemas that have the same table 
(different shards). Operations done on these tables are very similar, but only 
a few of them actually have this problem. Other than this error, we do not see 
other abnormalities on the box.

Lisa

From: Tom Lane mailto:t...@sss.pgh.pa.us>>
Date: Friday, January 16, 2015 at 4:23 PM
To: s mailto:l...@fb.com>>
Cc: "pgsql-hackers@postgresql.org" 
mailto:pgsql-hackers@postgresql.org>>
Subject: Re: [HACKERS] n_live_tup smaller than the number of rows in a table

Lisa Guo mailto:l...@fb.com>> writes:
We are seeing a strange behavior where n_live_tup is way smaller than the 
number of rows in a table. The table has > 18m rows, but n_live_tup only has < 
100K. We tried to do vacuum analyze to clear up any sticky errors, but it 
didn’t correct the problem. We are running Postgres 9.2. Any pointers on how we 
could debug this problem and how to correct the stats?

n_live_tup is a moving average over the last few observations, so in
theory it should get better if you repeat ANALYZE several times.
AFAIR, VACUUM isn't likely to help much.  (VACUUM FREEZE might though.)

It seems odd that you have a value that's so far off ... have you been
using this table in any unusual way?

regards, tom lane

Re: [HACKERS] n_live_tup smaller than the number of rows in a table

2015-01-16 Thread Tom Lane

Lisa Guo  writes:
> We are seeing a strange behavior where n_live_tup is way smaller than the 
> number of rows in a table. The table has > 18m rows, but n_live_tup only has 
> < 100K. We tried to do vacuum analyze to clear up any sticky errors, but it 
> didnt correct the problem. We are running Postgres 9.2. Any pointers on how 
> we could debug this problem and how to correct the stats?

n_live_tup is a moving average over the last few observations, so in
theory it should get better if you repeat ANALYZE several times.
AFAIR, VACUUM isn't likely to help much.  (VACUUM FREEZE might though.)

It seems odd that you have a value that's so far off ... have you been
using this table in any unusual way?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan

On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas
 wrote:
> It looks very much like that a page has for some reason been moved to a
> different block number. And that's exactly what Peter found out in his
> investigation too; an index page was mysteriously copied to a different
> block with identical content.

What I found suspicious about that was that the spuriously identical
pages were not physically adjacent, but logically adjacent (i.e. the
bad page was considered the B-Tree right link of the good page by the
good, spuriously-copied-by-bad page). It also seems likely that that
small catalog index on pg_class(oid) was well cached in
shared_buffers. So I agree that it's unlikely that this is actually a
hardware or filesystem problem. Beyond that, if I had to guess, I'd
say that the problem is more likely to be in the B-Tree code than it
is in the buffer manager or whatever (so the "logically adjacent"
thing is probably not an artifact of the order that the pages were
accessed, since it appears there was a downlink to the bad page. This
downlink was not added recently. Also, this logical adjacency is
unlikely to be mere coincidence - Postgres seemed to fairly
consistently break this way).

Does anyone have a better developed sense of where the ultimate
problem here is than I do? I guess I've never thought too much about
how the system fails when a catalog index is this thoroughly broken.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] n_live_tup smaller than the number of rows in a table

2015-01-16 Thread Lisa Guo

Hi,

We are seeing a strange behavior where n_live_tup is way smaller than the 
number of rows in a table. The table has > 18m rows, but n_live_tup only has < 
100K. We tried to do vacuum analyze to clear up any sticky errors, but it 
didn’t correct the problem. We are running Postgres 9.2. Any pointers on how we 
could debug this problem and how to correct the stats?

Thanks,
Lisa

Re: [HACKERS] [PATCH] explain sortorder

2015-01-16 Thread Tom Lane

"Timmer, Marius"  writes:
> attached is version 8, fixing remaining issues, adding docs and tests as 
> requested/agreed.

I've committed this with some rather substantial alterations, notably:

* Having get_sortorder_by_keyno dig into the plan state node by itself
seemed like a bad idea; it's certainly at variance with the existing
division of knowledge in this code, and I think it might outright do
the wrong thing if there's a Sort node underneath an Agg or Group node
(since in those cases the child Sort node, not the parent node, would
get passed in).  I fixed it so that show_sort_keys and siblings are
responsible for extracting and passing in the correct data arrays.

* There were some minor bugs in the rules for when to print NULLS
FIRST/LAST too.  I think the way I've got it now is a precise inverse of
what addTargetToSortList() will do.

* The proposed new regression test cases were not portable ("en_US"
isn't guaranteed to exist), and I thought adding a new regression
script file for just one test was a bit excessive.  The DESC and
USING behaviors were already covered by existing test cases so I just
added something to exercise COLLATE and NULLS FIRST in collate.sql.

* I took out the change in perform.sgml.  The change as proposed was
seriously confusing because it injected off-topic discussion into an
example that was meant to be just about the additional information printed
by EXPLAIN ANALYZE.  I'm not really sure that this feature needs any
specific documentation (other than its eventual mention in the release
notes); but if it does, we should probably add a brand new example
someplace before the EXPLAIN ANALYZE subsection.

* Assorted cleanups such as removal of irrelevant whitespace changes.
That sort of thing just makes a reviewer's job harder, so it's best
to avoid it if you can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-01-16 Thread Michael Paquier

On Sat, Jan 17, 2015 at 2:40 AM, Robert Haas  wrote:
> There's only value in adding a fillfactor parameter to GIN indexes if
> it improves performance.  There are no benchmarks showing it does.
> So, why are we still talking about this?
Indeed. There is no such benchmark as of now, and I am not sure I'll
get the time to work more on that soon, so let's reject the patch for
now. And sorry for the useless noise.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Peter Geoghegan

On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure  wrote:
> ISTM the next step is to bisect the problem down over the weekend in
> order to to narrow the search.  If that doesn't turn up anything
> productive I'll look into taking other steps.

That might be the quickest way to do it, provided you can isolate the
bug fairly reliably. It might be a bit tricky to write a shell script
that assumes a certain amount of time having passed without the bug
tripping indicates that it doesn't exist, and have that work
consistently. I'm slightly concerned that you'll hit other bugs that
have since been fixed, given the large number of possible symptoms
here.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] advance local xmin more aggressively

2015-01-16 Thread Heikki Linnakangas


On 01/15/2015 09:57 PM, Robert Haas wrote:

On Thu, Jan 15, 2015 at 3:08 AM, Michael Paquier
 wrote:

On Mon, Dec 22, 2014 at 7:31 PM, Heikki Linnakangas
 wrote:

Here's an updated version, rebased over the pairing heap code that I just
committed, and fixing those bugs.

So, are we reaching an outcome for the match happening here?


Well, I still like using the existing ResourceOwner pointers to find
the snapshots more than introducing a separate data structure just for
that.  But I'm OK with Heikki committing his version and, if anybody
notices the new code becoming a hotspot, we can revisit the issue.


Ok, committed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Decoding follows timelines

2015-01-16 Thread Andres Freund

On 2015-01-03 12:07:29 +0200, Heikki Linnakangas wrote:
> >@@ -941,6 +936,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
> >  * Force a disconnect, so that the decoding code doesn't need to care
> >  * about an eventual switch from running in recovery, to running in a
> >  * normal environment. Client code is expected to handle reconnects.
> >+ * This covers the race condition where we are promoted half way
> >+ * through starting up.
> >  */
> > if (am_cascading_walsender && !RecoveryInProgress())
> > {
> 
> We could exit recovery immediately after this check. Why is this check
> needed?

I probably wrote that ched and I don't think it really is needed. I
think that's a remnant of what the physical pendant used to do.

I think this needs slightly more abstraction because the infrastructure
is local to walsender.c - but logical decoding is also possible via
SQL. I'm not yet sure how that should look like. It'd be awesome if in
the course of that we could get rid of the nearly duplicated XLogRead()
:(

Simon, have you checked that this actually correctly follows timelines?
Afaics the patch as is won't allow to start logical decoding on a standby.

To allow logical decoding from clients I (apperently) wrote the the
following comment:
/* 
 * TODO: We got to change that someday soon...
 *
 * There's basically three things missing to allow this:
 * 1) We need to be able to correctly and quickly identify the timeline 
a
 *LSN belongs to
 * 2) We need to force hot_standby_feedback to be enabled at all times 
so
 *the primary cannot remove rows we need.
 * 3) support dropping replication slots referring to a database, in
 *dbase_redo. There can't be any active ones due to HS recovery
 *conflicts, so that should be relatively easy.
 * 
 */
if (RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   errmsg("logical decoding cannot be used while in 
recovery")));

You're implementing 1) here. 3) doesn't look very challenging.

But 2) imo is rather more interesting/complex. I guess we'd have to
force that streaming replication is used, that a physical replication
slot is used and that hot_standby_feedback is enabled.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Andrew Dunstan



On 01/16/2015 12:22 PM, Pavel Stehule wrote:



There two possible transformations:

row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]


If we're going to go that route, I think it makes more sense to
create an actual key/value type (ie:
http://pgxn.org/dist/pair/doc/pair.html) and return an array of that.


ok






I think we'd possibly be better off with simply returning a flat array, 
[key1, value1, ...]


Thats's what the hstore(text[]) and json_object(text[]) functions 
accept, along with the 2D variant, if we want a precedent.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Jim Nasby


On 1/16/15 12:30 PM, Pavel Stehule wrote:



2015-01-16 19:24 GMT+01:00 Pavel Stehule mailto:pavel.steh...@gmail.com>>:



2015-01-16 19:06 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com>>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:



2015-01-16 18:23 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com> >>:

 On 1/16/15 11:00 AM, Pavel Stehule wrote:

 Hi all,

 some time ago, I proposed a lock time measurement related 
to query. A main issue was a method, how to show this information. Today 
proposal is little bit simpler, but still useful. We can show a total lock time 
per database in pg_stat_database statistics. High number can be signal about 
lock issues.


 Would this not use the existing stats mechanisms? If so, couldn't we do 
this per table? (I realize that won't handle all cases; we'd still need a 
"lock_time_other" somewhere).



it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - 
because it depends on order


Huh? Order of what?


when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked 
for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as 
lock time for T1 and T2?


If that select is waiting on a lock on t2, then it's waiting on that lock on 
that table. It doesn't matter who else has the lock.


 Also, what do you mean by 'lock'? Heavyweight? We already have 
some visibility there. What I wish we had was some way to know if we're 
spending a lot of time in a particular non-heavy lock. Actually measuring time 
probably wouldn't make sense but we might be able to count how often we fail 
initial acquisition or something.


now, when I am thinking about it, lock_time is not good name - maybe 
"waiting lock time" (lock time should not be interesting, waiting is 
interesting) - it can be divided to some more categories - in GoodData we use 
Heavyweight, pages, and others categories.


So do you see this somehow encompassing locks other than heavyweight 
locks? Because I think that's the biggest need here. Basically, something akin 
to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.


For these global statistics I see as important a common total waiting time 
for locks - we can use a more detailed granularity but I am not sure, if a 
common statistics are best tool.


Locks may be global, but what you're waiting for a lock on certainly isn't. 
It's almost always a lock either on a table or a row in a table. Of course this 
does mean you can't just blindly report that you're blocked on some XID; that 
doesn't tell anyone anything.


My motivations is - look to statistics -- and I can see ... lot of 
rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue 
too. It is tool for people without possibility to use dtrace and similar tools 
and for everyday usage - simple check if locks are not a issue (or if locking 
is stable).


Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about 
as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so long as 
it leaves the door open for more granular locking later), but I can't really 
get excited about it either.


and this proposal has sense only for heavyweight locks - because others locks 
are everywhere


So what if they're everywhere? Right now if you're spending a lot of time 
waiting for LWLocks you have no way to know what's going on unless you happen 
to have dtrace. Obviously we're not going to something like issue a stats 
update every time we attempt to acquire an LWLock, but that doesn't mean we 
can't keep some counters on the locks and periodically report that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure

On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund  wrote:
> Hi,
>
> On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
>> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan  wrote:
>> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:
>> >> Running this test on another set of hardware to verify -- if this
>> >> turns out to be a false alarm which it may very well be, I can only
>> >> offer my apologies!  I've never had a new drive fail like that, in
>> >> that manner.  I'll burn the other hardware in overnight and report
>> >> back.
>>
>> huh -- well possibly. not.  This is on a virtual machine attached to a
>> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
>> checksums on) hours then the starting having issues:
>
> Damn.
>
> Is there any chance you can package this somehow so that others can run
> it locally? It looks hard to find the actual bug here without adding
> instrumentation to to postgres.

FYI, a two hour burn in on my workstation on 9.3 ran with no issues.
An overnight run would probably be required to prove it, ruling out
both hardware and pl/sh.   If proven, it's possible we may be facing a
regression, perhaps a serious one.

ISTM the next step is to bisect the problem down over the weekend in
order to to narrow the search.  If that doesn't turn up anything
productive I'll look into taking other steps.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Pavel Stehule

2015-01-16 19:24 GMT+01:00 Pavel Stehule :

>
>
> 2015-01-16 19:06 GMT+01:00 Jim Nasby :
>
>> On 1/16/15 11:35 AM, Pavel Stehule wrote:
>>
>>>
>>>
>>> 2015-01-16 18:23 GMT+01:00 Jim Nasby >> jim.na...@bluetreble.com>>:
>>>
>>> On 1/16/15 11:00 AM, Pavel Stehule wrote:
>>>
>>> Hi all,
>>>
>>> some time ago, I proposed a lock time measurement related to
>>> query. A main issue was a method, how to show this information. Today
>>> proposal is little bit simpler, but still useful. We can show a total lock
>>> time per database in pg_stat_database statistics. High number can be signal
>>> about lock issues.
>>>
>>>
>>> Would this not use the existing stats mechanisms? If so, couldn't we
>>> do this per table? (I realize that won't handle all cases; we'd still need
>>> a "lock_time_other" somewhere).
>>>
>>>
>>>
>>> it can use a current existing stats mechanisms
>>>
>>> I afraid so isn't possible to assign waiting time to table - because it
>>> depends on order
>>>
>>
>> Huh? Order of what?
>>
>
> when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
> locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
> to cont as lock time for T1 and T2?
>
> DDL statements are exception - there is almost simple mapping between
> relations and lock time reason.
>
>
>>
>>  Also, what do you mean by 'lock'? Heavyweight? We already have some
>>> visibility there. What I wish we had was some way to know if we're spending
>>> a lot of time in a particular non-heavy lock. Actually measuring time
>>> probably wouldn't make sense but we might be able to count how often we
>>> fail initial acquisition or something.
>>>
>>>
>>> now, when I am thinking about it, lock_time is not good name - maybe
>>> "waiting lock time" (lock time should not be interesting, waiting is
>>> interesting) - it can be divided to some more categories - in GoodData we
>>> use Heavyweight, pages, and others categories.
>>>
>>
>> So do you see this somehow encompassing locks other than heavyweight
>> locks? Because I think that's the biggest need here. Basically, something
>> akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on
>> dtrace.
>
>
> For these global statistics I see as important a common total waiting time
> for locks - we can use a more detailed granularity but I am not sure, if a
> common statistics are best tool.
>
> My motivations is - look to statistics -- and I can see ... lot of
> rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
> too. It is tool for people without possibility to use dtrace and similar
> tools and for everyday usage - simple check if locks are not a issue (or if
> locking is stable).
>

and this proposal has sense only for heavyweight locks - because others
locks are everywhere

>
>
>>
>> --
>> Jim Nasby, Data Architect, Blue Treble Consulting
>> Data in Trouble? Get it in Treble! http://BlueTreble.com
>>
>
>

Re: [HACKERS] infinite loop in _bt_getstackbuf

2015-01-16 Thread Robert Haas

On Thu, Jan 15, 2015 at 5:46 PM, Peter Geoghegan  wrote:
> I think that it might be a good idea to have circular _bt_moveright()
> moves (the direct offender in Merlin's case, which has very similar
> logic to your _bt_getstackbuf() problem case) detected. I'm pretty
> sure that it's exceptional for there to be more than 2 or 3 retries in
> _bt_moveright(). It would probably be fine to consider the possibility
> that we'll never finish once we get past 5 retries or something like
> that. We'd then start keeping track of blocks visited, and raise an
> error when a page was visited a second time.

Yeah, I could go for that.  Possibly somebody might object that it's a
lot of code that will never get tested in normal operation, but as
this problem doesn't seem to be strictly theoretical I'm not sure I
subscribe to that objection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Pavel Stehule

2015-01-16 19:06 GMT+01:00 Jim Nasby :

> On 1/16/15 11:35 AM, Pavel Stehule wrote:
>
>>
>>
>> 2015-01-16 18:23 GMT+01:00 Jim Nasby > jim.na...@bluetreble.com>>:
>>
>> On 1/16/15 11:00 AM, Pavel Stehule wrote:
>>
>> Hi all,
>>
>> some time ago, I proposed a lock time measurement related to
>> query. A main issue was a method, how to show this information. Today
>> proposal is little bit simpler, but still useful. We can show a total lock
>> time per database in pg_stat_database statistics. High number can be signal
>> about lock issues.
>>
>>
>> Would this not use the existing stats mechanisms? If so, couldn't we
>> do this per table? (I realize that won't handle all cases; we'd still need
>> a "lock_time_other" somewhere).
>>
>>
>>
>> it can use a current existing stats mechanisms
>>
>> I afraid so isn't possible to assign waiting time to table - because it
>> depends on order
>>
>
> Huh? Order of what?
>

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
to cont as lock time for T1 and T2?

DDL statements are exception - there is almost simple mapping between
relations and lock time reason.


>
>  Also, what do you mean by 'lock'? Heavyweight? We already have some
>> visibility there. What I wish we had was some way to know if we're spending
>> a lot of time in a particular non-heavy lock. Actually measuring time
>> probably wouldn't make sense but we might be able to count how often we
>> fail initial acquisition or something.
>>
>>
>> now, when I am thinking about it, lock_time is not good name - maybe
>> "waiting lock time" (lock time should not be interesting, waiting is
>> interesting) - it can be divided to some more categories - in GoodData we
>> use Heavyweight, pages, and others categories.
>>
>
> So do you see this somehow encompassing locks other than heavyweight
> locks? Because I think that's the biggest need here. Basically, something
> akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on
> dtrace.


For these global statistics I see as important a common total waiting time
for locks - we can use a more detailed granularity but I am not sure, if a
common statistics are best tool.

My motivations is - look to statistics -- and I can see ... lot of
rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
too. It is tool for people without possibility to use dtrace and similar
tools and for everyday usage - simple check if locks are not a issue (or if
locking is stable).


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] Parallel Seq Scan

2015-01-16 Thread Robert Haas

On Wed, Jan 14, 2015 at 9:00 PM, Jim Nasby  wrote:
> Simply doing
> something like blkno % num_workers is going to cause imbalances,

Yes.

> but trying
> to do this on a per-block basis seems like too much overhead.

...but no.  Or at least, I doubt it.  The cost of handing out blocks
one at a time is that, for each block, a worker's got to grab a
spinlock, increment and record the block number counter, and release
the spinlock.  Or, use an atomic add.  Now, it's true that spinlock
cycles and atomic ops can have sometimes impose severe overhead, but
you have to look at it as a percentage of the overall work being done.
In this case, the backend has to read, pin, and lock the page and
process every tuple on the page.  Processing every tuple on the page
may involve de-TOASTing the tuple (leading to many more page
accesses), or evaluating a complex expression, or hitting CLOG to
check visibility, but even if it doesn't, I think the amount of work
that it takes to process all the tuples on the page will be far larger
than the cost of one atomic increment operation per block.

As mentioned downthread, a far bigger consideration is the I/O pattern
we create.  A sequential scan is so-called because it reads the
relation sequentially.  If we destroy that property, we will be more
than slightly sad.  It might be OK to do sequential scans of, say,
each 1GB segment separately, but I'm pretty sure it would be a real
bad idea to read 8kB at a time at blocks 0, 64, 128, 1, 65, 129, ...

What I'm thinking about is that we might have something like this:

struct this_lives_in_dynamic_shared_memory
{
BlockNumber last_block;
Size prefetch_distance;
Size prefetch_increment;
slock_t mutex;
BlockNumber next_prefetch_block;
BlockNumber next_scan_block;
};

Each worker takes the mutex and checks whether next_prefetch_block -
next_scan_block < prefetch_distance and also whether
next_prefetch_block < last_block.  If both are true, it prefetches
some number of additional blocks, as specified by prefetch_increment.
Otherwise, it increments next_scan_block and scans the block
corresponding to the old value.

So in this way, the prefetching runs ahead of the scan by a
configurable amount (prefetch_distance), which should be chosen so
that the prefetches have time to compete before the scan actually
reaches those blocks.  Right now, of course, we rely on the operating
system to prefetch for sequential scans, but I have a strong hunch
that may not work on all systems if there are multiple processes doing
the reads.

Now, what of other strategies like dividing up the relation into 1GB
chunks and reading each one in a separate process?  We could certainly
DO that, but what advantage does it have over this?  The only benefit
I can see is that you avoid accessing a data structure of the type
shown above for every block, but that only matters if that cost is
material, and I tend to think it won't be.  On the flip side, it means
that the granularity for dividing up work between processes is now
very coarse - when there are less than 6GB of data left in a relation,
at most 6 processes can work on it.  That might be OK if the data is
being read in from disk anyway, but it's certainly not the best we can
do when the data is in memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-16 Thread Pavel Stehule

2015-01-16 18:37 GMT+01:00 Jim Nasby :

> On 1/16/15 11:16 AM, Pavel Stehule wrote:
>
>>
>>
>> 2015-01-16 17:57 GMT+01:00 Jim Nasby > jim.na...@bluetreble.com>>:
>>
>> On 1/16/15 3:39 AM, Pavel Stehule wrote:
>>
>> I am proposing a simple function, that returns a position of
>> element in array.
>>
>>
>> Yes please!
>>
>> FUNCTION array_position(anyarray, anyelement) RETURNS int
>>
>>
>> That won't work on a multi-dimensional array. Ideally it needs to
>> accept a slice or an element and return the specifier for the slice.
>>
>>
>> It is question, what is a result - probably, there can be a
>> multidimensional variant, where result will be a array
>>
>> array_position([1,2,3],2) --> 2
>> array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should
>> to have N-1 dimension of first parameter */
>>
>
> The problem with that is you can't actually use '2' to get [2,3] back:
>
> select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
>  ?column?
> --
>  t
> (1 row)
>

yes, but when you are searching a array in array you can use a full slice
selection:

postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a
constant every time in this case -- so it should not be returned
  array
-
 {{1,2}}
(1 row)




>
> I think the bigger problem here is we need something better than slices
> for handling subsets of arrays. Even if the function returned [2:2] it's
> still going to behave differently than it will in the non-array case
> because you won't be getting the expected number of dimensions back. :(
>

you cannot to return a slice and I don't propose it, although we can return
a range type or array of range type - but still we cannot to use range for
a arrays.

>
>  array_position_md([1,2,3],2) --> [2]
>> array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]
>>
>> another question is how to solve more than one occurrence on one value -
>> probably two sets of functions - first returns first occurrence of value,
>> second returns set of occurrence
>>
>
> Gee, if only way had some way to return multiple elements of something...
> ;P
>
> In other words, I think all of these should actually return an array of
> positions. I think it's OK for someone that only cares about the first
> instance to just do [1].


there can be two functions - "position" - returns first and "positions"
returns all as a array


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Jim Nasby


On 1/16/15 11:35 AM, Pavel Stehule wrote:



2015-01-16 18:23 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A 
main issue was a method, how to show this information. Today proposal is little 
bit simpler, but still useful. We can show a total lock time per database in 
pg_stat_database statistics. High number can be signal about lock issues.


Would this not use the existing stats mechanisms? If so, couldn't we do this per 
table? (I realize that won't handle all cases; we'd still need a 
"lock_time_other" somewhere).



it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends 
on order


Huh? Order of what?


Also, what do you mean by 'lock'? Heavyweight? We already have some 
visibility there. What I wish we had was some way to know if we're spending a 
lot of time in a particular non-heavy lock. Actually measuring time probably 
wouldn't make sense but we might be able to count how often we fail initial 
acquisition or something.


now, when I am thinking about it, lock_time is not good name - maybe "waiting lock 
time" (lock time should not be interesting, waiting is interesting) - it can be 
divided to some more categories - in GoodData we use Heavyweight, pages, and others 
categories.


So do you see this somehow encompassing locks other than heavyweight locks? 
Because I think that's the biggest need here. Basically, something akin to 
TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Pavel Stehule

2015-01-16 18:42 GMT+01:00 Jim Nasby :

> On 1/16/15 11:22 AM, Pavel Stehule wrote:
>
>>
>>
>> 2015-01-16 18:03 GMT+01:00 Jim Nasby > jim.na...@bluetreble.com>>:
>>
>> On 1/16/15 3:45 AM, Pavel Stehule wrote:
>>
>> I am returning back to processing records in plpgsql.
>>
>> I am thinking so it can be simply processed with transformations
>> to array.
>>
>> Now we have similar functions - hstore(row), row_to_json, ... but
>> using of these functions can be a useless step. Any row variable can be
>> transformed to 2D text array.
>>
>>
>> How is it useless? Why wouldn't you just use JSON and be done with it?
>>
>>
>> We can use a FOREACH IN ARRAY iteration in plpgsql (second variant is a
>> implementation FOREACH for jsonb)
>>
>> so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY
>>
>
> I think the real problem here is that we're inventing a bunch of different
> ways to do the same thing: iterate over a set. Instead of doing that,
> should we add the idea of an iterator to the type system? That would make
> sense for arrays, hstore, json and XML.
>

what do you think? How this can be implemented?




>
>  Do you have some use cases you can share?
>>
>>
>> processing of NEW, OLD variables in triggers
>>
>
> Note that last time I checked you couldn't do something like NEW.variable,
> and I don't think you could use EXEC to do it either. So there's more
> needed here than just converting a record to an array.
>
>  There two possible transformations:
>>
>> row_to_array --> [[key1, value1],[key2, value2], ...]
>> row_to_row_array --> [(key1, value1), (key2, value2), ... ]
>>
>>
>> If we're going to go that route, I think it makes more sense to
>> create an actual key/value type (ie: http://pgxn.org/dist/pair/doc/
>> __pair.html ) and return an
>> array of that.
>>
>>
>> ok
>>
>> --
>> Jim Nasby, Data Architect, Blue Treble Consulting
>> Data in Trouble? Get it in Treble! http://BlueTreble.com
>>
>>
>>
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-16 Thread Andres Freund

Hi,

On 2015-01-05 17:16:43 +0900, Michael Paquier wrote:
> +  xreflabel="restore_command_retry_interval">
> +  restore_command_retry_interval 
> (integer)
> +  
> +restore_command_retry_interval recovery 
> parameter
> +  
> +  
> +  
> +   
> +If restore_command returns nonzero exit status code, 
> retry
> +command after the interval of time specified by this parameter,
> +measured in milliseconds if no unit is specified. Default value is
> +5s.
> +   
> +   
> +This command is particularly useful for warm standby configurations
> +to leverage the amount of retries done to restore a WAL segment file
> +depending on the activity of a master node.
> +   

Don't really like this paragraph. What's "leveraging the amount of
retries done" supposed to mean?

> +# restore_command_retry_interval
> +#
> +# specifies an optional retry interval for restore_command command, if
> +# previous command returned nonzero exit status code. This can be useful
> +# to increase or decrease the number of calls of restore_command.

It's not really the number  but frequency, right?

> + else if (strcmp(item->name, "restore_command_retry_interval") 
> == 0)
> + {
> + const char *hintmsg;
> +
> + if (!parse_int(item->value, 
> &restore_command_retry_interval, GUC_UNIT_MS,
> +&hintmsg))
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("parameter \"%s\" 
> requires a temporal value",
> + 
> "restore_command_retry_interval"),
> +  hintmsg ? errhint("%s", 
> _(hintmsg)) : 0));

"temporal value" sounds odd to my ears. I'd rather keep in line with
what guc.c outputs in such a case.

> + now = GetCurrentTimestamp();
> + if 
> (!TimestampDifferenceExceeds(last_fail_time, now,
> + 
> restore_command_retry_interval))
>   {
> - pg_usleep(100L * (5 - (now 
> - last_fail_time)));
> - now = (pg_time_t) time(NULL);
> + longsecs;
> + int 
> microsecs;
> +
> + 
> TimestampDifference(last_fail_time, now, &secs, µsecs);
> + 
> pg_usleep(restore_command_retry_interval * 1000L - (100L * secs + 1L * 
> microsecs));
> + now = GetCurrentTimestamp();
>   }
>   last_fail_time = now;
>   currentSource = XLOG_FROM_ARCHIVE;

Am I missing something or does this handle/affect streaming failures
just the same? That might not be a bad idea, because the current
interval is far too long for e.g. a sync replication setup. But it
certainly makes the name a complete misnomer.

Not this patch's fault, but I'm getting a bit tired seing the above open
coded. How about adding a function that does the sleeping based on a
timestamptz and a ms interval?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Jim Nasby


On 1/16/15 11:22 AM, Pavel Stehule wrote:



2015-01-16 18:03 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com>>:

On 1/16/15 3:45 AM, Pavel Stehule wrote:

I am returning back to processing records in plpgsql.

I am thinking so it can be simply processed with transformations to 
array.

Now we have similar functions - hstore(row), row_to_json, ... but using 
of these functions can be a useless step. Any row variable can be transformed 
to 2D text array.


How is it useless? Why wouldn't you just use JSON and be done with it?


We can use a FOREACH IN ARRAY iteration in plpgsql (second variant is a 
implementation FOREACH for jsonb)

so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY


I think the real problem here is that we're inventing a bunch of different ways 
to do the same thing: iterate over a set. Instead of doing that, should we add 
the idea of an iterator to the type system? That would make sense for arrays, 
hstore, json and XML.


Do you have some use cases you can share?


processing of NEW, OLD variables in triggers


Note that last time I checked you couldn't do something like NEW.variable, and 
I don't think you could use EXEC to do it either. So there's more needed here 
than just converting a record to an array.


There two possible transformations:

row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]


If we're going to go that route, I think it makes more sense to create an actual 
key/value type (ie: http://pgxn.org/dist/pair/doc/__pair.html 
) and return an array of that.


ok

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com





--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-01-16 Thread Robert Haas

On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
 wrote:
> Alexander Korotkov wrote:
>> I'm not sure. On the one hand it's unclear why fillfactor should be
>> different from 9.4.
>> On the other hand it's unclear why it should be different from btree.
>> I propose marking this "ready for committer". So, committer can make a final
>> decision.
> OK let's do so then. My preference is to fully pack the index at
> build. GIN compression has been one of the headlines of 9.4.

I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:

"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"

Nobody replied.  On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance.  There are no benchmarks showing it does.
So, why are we still talking about this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2015-01-16 Thread Tom Lane

Andres Freund  writes:
> On 2015-01-16 12:21:13 -0500, Tom Lane wrote:
>> I think people felt that sending that information to the client wouldn't
>> be a good idea security-wise.

> It won't if issued during the right phase of the authentication:

Good point.

> But as I don't think sending logs to the client is a unsurmountable
> problem (due to the above) I don't really care if we use WARNING or LOG.

If we're expecting the DBA to need to read it in the postmaster log,
remember that LOG is actually *more* likely to get to the log than
WARNING is.  Choosing WARNING because you think it sounds more significant
is a mistake.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-16 Thread Jim Nasby


On 1/16/15 11:16 AM, Pavel Stehule wrote:



2015-01-16 17:57 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com>>:

On 1/16/15 3:39 AM, Pavel Stehule wrote:

I am proposing a simple function, that returns a position of element in 
array.


Yes please!

FUNCTION array_position(anyarray, anyelement) RETURNS int


That won't work on a multi-dimensional array. Ideally it needs to accept a 
slice or an element and return the specifier for the slice.


It is question, what is a result - probably, there can be a multidimensional 
variant, where result will be a array

array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to 
have N-1 dimension of first parameter */


The problem with that is you can't actually use '2' to get [2,3] back:

select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
 ?column?
--
 t
(1 row)

I think the bigger problem here is we need something better than slices for 
handling subsets of arrays. Even if the function returned [2:2] it's still 
going to behave differently than it will in the non-array case because you 
won't be getting the expected number of dimensions back. :(


array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]

another question is how to solve more than one occurrence on one value - 
probably two sets of functions - first returns first occurrence of value, 
second returns set of occurrence


Gee, if only way had some way to return multiple elements of something... ;P

In other words, I think all of these should actually return an array of 
positions. I think it's OK for someone that only cares about the first instance 
to just do [1].
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Pavel Stehule

2015-01-16 18:23 GMT+01:00 Jim Nasby :

> On 1/16/15 11:00 AM, Pavel Stehule wrote:
>
>> Hi all,
>>
>> some time ago, I proposed a lock time measurement related to query. A
>> main issue was a method, how to show this information. Today proposal is
>> little bit simpler, but still useful. We can show a total lock time per
>> database in pg_stat_database statistics. High number can be signal about
>> lock issues.
>>
>
> Would this not use the existing stats mechanisms? If so, couldn't we do
> this per table? (I realize that won't handle all cases; we'd still need a
> "lock_time_other" somewhere).
>


it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it
depends on order


>
> Also, what do you mean by 'lock'? Heavyweight? We already have some
> visibility there. What I wish we had was some way to know if we're spending
> a lot of time in a particular non-heavy lock. Actually measuring time
> probably wouldn't make sense but we might be able to count how often we
> fail initial acquisition or something.
>

now, when I am thinking about it, lock_time is not good name - maybe
"waiting lock time" (lock time should not be interesting, waiting is
interesting) - it can be divided to some more categories - in GoodData we
use Heavyweight, pages, and others categories.

> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] speedup tidbitmap patch: cache page

2015-01-16 Thread Tom Lane

Andres Freund  writes:
> On 2015-01-16 12:15:35 -0500, Tom Lane wrote:
>> It strikes me that this patch leaves some lookups on the table,
>> specifically that it fails to avoid repeated hash_search lookups
>> inside tbm_page_is_lossy() in the situation where we're adding
>> new TIDs to an already-lossified page.  Is it worth adding a few
>> more lines to handle that case as well?

> There was a alternative version (v2.3 in 549950fb.2050...@sigaev.ru) of
> the patch that cached the lossyness of a page, but Teodor/David didn't
> find it to be beneficial in their benchmarking.

> Teodor's argument was basically that it's completely lost in the noise
> due to the much bigger overhead of rechecks.

That's a fair point, but on reflection it seems like a patch that covered
this case too wouldn't actually be any more complicated, so why not?
v2.3 is pretty brute-force and I agree it's not very attractive, but
I was thinking about something like

BlockNumber cached_blkno = InvalidBlockNumber;
PagetableEntry *page = NULL;

inside loop:

/* look up the target page unless we already have it */
if (blk != cached_blkno)
{
if (tbm_page_is_lossy(tbm, blk))
page = NULL;
else
page = tbm_get_pageentry(tbm, blk);
cached_blkno = blk;
}
if (page == NULL)
continue;  /* page is already marked lossy */

The "reset" after calling tbm_lossify() would just need to be
"cached_blkno = InvalidBlockNumber".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)

2015-01-16 Thread Robert Haas

On Wed, Jan 14, 2015 at 9:07 PM, Amit Langote
 wrote:
> * It has been repeatedly pointed out that we may want to decouple
> partitioning from inheritance because implementing partitioning as an
> extension of inheritance mechanism means that we have to keep all the
> existing semantics which might limit what we want to do with the special
> case of it which is partitioning; in other words, we would find
> ourselves in difficult position where we have to inject a special case
> code into a very generalized mechanism that is inheritance.
> Specifically, do we regard a partitions as pg_inherits children of its
> partitioning parent?

I don't think this is totally an all-or-nothing decision.  I think
everyone is agreed that we need to not break things that work today --
e.g. Merge Append.  What that implies for pg_inherits isn't altogether
clear.

> * Syntax: do we want to make it similar to one of the many other
> databases out there? Or we could invent our own?

Well, what I think we don't want is something that is *almost* like
some other database but not quite.  I lean toward inventing our own
since I'm not aware of something that we'd want to copy exactly.

> I wonder if we could add a clause like DISTRIBUTED BY to complement
> PARTITION ON that represents a hash distributed/partitioned table (that
> could be a syntax to support sharded tables maybe; we would definitely
> want to move ahead in that direction I guess)

Maybe eventually, but let's not complicate things by worrying too much
about that now.

> * Catalog: We would like to have a catalog structure suitable to
> implement capabilities like multi-column partitioning, sub-partitioning
> (with arbitrary number of levels in the hierarchy). I had suggested
> that we create two new catalogs viz. pg_partitioned_rel,
> pg_partition_def to store metadata about a partition key of a
> partitioned relation and partition bound info of a partition,
> respectively. Also, see the point about on-disk representation of
> partition bounds

I'm not convinced that there is any benefit in spreading this
information across two tables neither of which exist today.  If the
representation of the partitioning scheme is going to be a node tree,
then there's no point in taking what would otherwise have been a List
and storing each element of it in a separate tuple. The overarching
point here is that the system catalog structure should be whatever is
most convenient for the system internals; I'm not sure we understand
what that is yet.

> * It is desirable to treat partitions as pg_class relations with perhaps
> a new relkind(s). We may want to choose an implementation where only the
> lowest level relations in a partitioning hierarchy have storage; those
> at the upper layers are mere placeholder relations though of course with
> associated constraints determined by partitioning criteria (with
> appropriate metadata entered into the additional catalogs).

I think the storage-less parents need a new relkind precisely to
denote that they have no storage; I am not convinced that there's any
reason to change the relkind for the leaf nodes.  But that's been
proposed, so evidently someone thinks there's a reason to do it.

> I am not
> quite sure if each kind of the relations involved in the partitioning
> scheme have separate namespaces and, if they are, how we implement that

I am in favor of having all of the nodes in the hierarchy have names
just as relations do today -- pg_class.relname.  Anything else seems
to me to be complex to implement and of very marginal benefit.  But
again, it's been proposed.

> * In the initial implementation, we could just live with partitioning on
> a set of columns (and not arbitrary expressions of them)

Seems quite fair.

> * We perhaps do not need multi-column LIST partitions as they are not
> very widely used and may complicate the implementation

I agree that the use case is marginal; but I'm not sure it needs to
complicate the implementation much.  Depending on how the
implementation shakes out, prohibiting it might come to seem like more
of a wart than allowing it.

> * There are a number of suggestions about how we represent partition
> bounds (on-disk) - pg_node_tree, RECORD (a composite type or the rowtype
> associated with the relation itself), etc. Important point to consider
> here may be that partition key may contain more than one column

Yep.

> * How we represent partition definition in memory (for a given
> partitioned relation) - important point to remember is that such a
> representation should be efficient to iterate through or
> binary-searchable. Also see the points about tuple-routing and
> partition-pruning

Yep.

> * Overflow/catchall partition: it seems we do not want/need them. It
> might seem desirable for example in cases where a big transaction enters
> a large number of tuples all but one of which find a defined partition;
> we may not want to error out in such case but instead enter that erring
> tuple into the o

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2015-01-16 Thread Andres Freund

On 2015-01-16 12:21:13 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Why don't we just add emit a NOTICE or WARNING in the relevant place
> > saying that pg_hba.conf is outdated? Then the server won't log those if
> > configured appropriately, which doesn't seem like a bad thing. Note that
> > <= ERROR messages aren't sent to the client during authentication.
> 
> I think people felt that sending that information to the client wouldn't
> be a good idea security-wise.

It won't if issued during the right phase of the authentication:
/*
 * client_min_messages is honored only after we complete the
 * authentication handshake.  This is required both for security
 * reasons and because many clients can't handle NOTICE messages
 * during authentication.
 */
if (ClientAuthInProgress)
output_to_client = (elevel >= ERROR);
else
output_to_client = (elevel >= client_min_messages ||
elevel == INFO);
}

Surely deserves a comment on the emitting site.

> But I'd phrase it as "why not just emit a LOG message?".

Well, LOGs can be sent to the client just the same, no? Just requires a
nondefault client_min_messages.

But as I don't think sending logs to the client is a unsurmountable
problem (due to the above) I don't really care if we use WARNING or LOG.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Jim Nasby


On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A main 
issue was a method, how to show this information. Today proposal is little bit 
simpler, but still useful. We can show a total lock time per database in 
pg_stat_database statistics. High number can be signal about lock issues.


Would this not use the existing stats mechanisms? If so, couldn't we do this per table? 
(I realize that won't handle all cases; we'd still need a "lock_time_other" 
somewhere).

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility 
there. What I wish we had was some way to know if we're spending a lot of time 
in a particular non-heavy lock. Actually measuring time probably wouldn't make 
sense but we might be able to count how often we fail initial acquisition or 
something.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Pavel Stehule

2015-01-16 18:03 GMT+01:00 Jim Nasby :

> On 1/16/15 3:45 AM, Pavel Stehule wrote:
>
>> I am returning back to processing records in plpgsql.
>>
>> I am thinking so it can be simply processed with transformations to array.
>>
>> Now we have similar functions - hstore(row), row_to_json, ... but using
>> of these functions can be a useless step. Any row variable can be
>> transformed to 2D text array.
>>
>
> How is it useless? Why wouldn't you just use JSON and be done with it?
>

We can use a FOREACH IN ARRAY iteration in plpgsql (second variant is a
implementation FOREACH for jsonb)

so ROW->ARRAY is shorter than ROW->JSON->ARRAY or ROW->HSTORE->ARRAY


>
> Do you have some use cases you can share?
>

processing of NEW, OLD variables in triggers


>
>  There two possible transformations:
>>
>> row_to_array --> [[key1, value1],[key2, value2], ...]
>> row_to_row_array --> [(key1, value1), (key2, value2), ... ]
>>
>
> If we're going to go that route, I think it makes more sense to create an
> actual key/value type (ie: http://pgxn.org/dist/pair/doc/pair.html) and
> return an array of that.
>

ok


> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2015-01-16 Thread Tom Lane

Andres Freund  writes:
> Why don't we just add emit a NOTICE or WARNING in the relevant place
> saying that pg_hba.conf is outdated? Then the server won't log those if
> configured appropriately, which doesn't seem like a bad thing. Note that
> <= ERROR messages aren't sent to the client during authentication.

I think people felt that sending that information to the client wouldn't
be a good idea security-wise.  But I'd phrase it as "why not just emit
a LOG message?".  I agree that new logging infrastructure seems like
overkill.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] speedup tidbitmap patch: cache page

2015-01-16 Thread Andres Freund

On 2015-01-16 12:15:35 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-12-25 01:26:53 +1300, David Rowley wrote:
> >> So I think v3 is the one to go with, and I can't see any problems with it,
> >> so I'm marking it as ready for committer.
> 
> > And committed.
> 
> It strikes me that this patch leaves some lookups on the table,
> specifically that it fails to avoid repeated hash_search lookups
> inside tbm_page_is_lossy() in the situation where we're adding
> new TIDs to an already-lossified page.  Is it worth adding a few
> more lines to handle that case as well?

There was a alternative version (v2.3 in 549950fb.2050...@sigaev.ru) of
the patch that cached the lossyness of a page, but Teodor/David didn't
find it to be beneficial in their benchmarking.

Teodor's argument was basically that it's completely lost in the noise
due to the much bigger overhead of rechecks.

I thought it'd better to get this improvement committed rather than
waiting for someone to find a even bigger improvement for some case.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-16 Thread Pavel Stehule

2015-01-16 17:57 GMT+01:00 Jim Nasby :

> On 1/16/15 3:39 AM, Pavel Stehule wrote:
>
>> I am proposing a simple function, that returns a position of element in
>> array.
>>
>
> Yes please!
>
>  FUNCTION array_position(anyarray, anyelement) RETURNS int
>>
>
> That won't work on a multi-dimensional array. Ideally it needs to accept a
> slice or an element and return the specifier for the slice.
>

It is question, what is a result - probably, there can be a
multidimensional variant, where result will be a array

array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to
have N-1 dimension of first parameter */
array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]

another question is how to solve more than one occurrence on one value -
probably two sets of functions - first returns first occurrence of value,
second returns set of occurrence


>
> This wouldn't be so bad if we had an easier way to extract subsets of an
> array, but even that is really ugly because whatever you extract keeps the
> original number of dimensions.
>
>  Implementation is simple (plpgsql code)
>>
>
> This would actually be written in C though, yes? Otherwise it's not really
> any better than just doing an extension...
>

Sure, I expect a C implementation

Pavel


> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] speedup tidbitmap patch: cache page

2015-01-16 Thread Tom Lane

Andres Freund  writes:
> On 2014-12-25 01:26:53 +1300, David Rowley wrote:
>> So I think v3 is the one to go with, and I can't see any problems with it,
>> so I'm marking it as ready for committer.

> And committed.

It strikes me that this patch leaves some lookups on the table,
specifically that it fails to avoid repeated hash_search lookups
inside tbm_page_is_lossy() in the situation where we're adding
new TIDs to an already-lossified page.  Is it worth adding a few
more lines to handle that case as well?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-16 Thread Heikki Linnakangas


On 01/16/2015 03:30 AM, Peter Eisentraut wrote:

Here is a random bag of comments for the v5 patch:


Addressed most of your comments, and Michael's. Another version 
attached. More on a few of the comments below.



The option --source-server had be confused at first, because the entry
above under --source-pgdata also talks about a "source server".  Maybe
--source-connection would be clearer?


Hmm. I would rather emphasize that the source is a running server, than 
the connection to the server. I can see the confusion, though. What 
about phrasing it as:


--source-pgdata

  Specifies path to the data directory of the source server, to
  synchronize the target with. When --source-pgdata is used,
  the source server must be cleanly shut down.

The point is that whether you use --source-pgdata or --source-server, 
the source is a PostgreSQL server. Perhaps it should be mentioned here 
that you only need one of --source-pgdata and --source-server, even 
though the synopsis says that too.


Another idea is to rename --source-server to just --source. That would 
make sense if we assume that it's more common to connect to a live server:


pg_rewind --target mypgdata --source "host=otherhost user=dba"

pg_rewind --target mypgdata --source-pgdata /other/pgdata



Reference pages have standardized top-level headers, so "Theory of
operation" should be under something like "Notes".

Similarly for "Restrictions", but that seems important enough to go
into the description.


Oh. That's rather limiting. I'll rename the "Restrictions" to "Notes" - 
that seems to be where we have listed limitations like that in many 
other pages. I also moved Theory of operation as a "How it works" 
subsection under "Notes".


The top-level headers aren't totally standardized in man pages. Googling 
around, a few seem to have a section called "IMPLEMENTATION NOTES", 
which would be a good fit here. We don't have such sections currently, 
but how about it?



There should be an installcheck target.


Doesn't seem appropriate, as there are no regression tests that would 
run against an existing cluster. Or should it just use the installed 
pg_rewind and postgres binary, but create the temporary clusters all the 
same?



RewindTest.pm should be in the t/ directory.


I used a similar layout in src/test/ssl, so that ought to be changed too 
if we're not happy with it.


"make check" will not find the module if I just move it to t/, so that 
would require changes to Makefiles or something. I don't know how to do 
that offhand.



Instead of FILE_TYPE_DIRECTORY etc., why not use S_IFDIR etc.?


I don't see what that would buy us. Most of the code only knows about a 
few S_IF* types, namely regular files, directories and symlinks. Those 
are the types that there are FILE_TYPE_* codes for. If we started using 
the more general S_IF* constants, it would not be clear which types can 
appear in which parts of the code. Also, the compiler would no longer 
warn about omitting one of the types in a "switch(file->type)" statement.



TestLib.pm addition command_is sounds a bit wrong.  It's evidently
modelled after command_like, but that now sounds wrong too.  How about
command_stdout_is?


Works for me. How about also renaming command_like to 
command_stdout_like, and committing that along with the new 
command_stdout_is function as a separate patch, before pg_rewind?



The test suite needs to silence all non-TAP output.  So psql needs to
be run with -q pg_ctl with -s etc.  Any important output needs to be
through diag() or note().

Test cases like

ok 6 - psql -A -t --no-psqlrc -c port=10072 -c SELECT datname FROM pg_database 
exit code 0

should probably get a real name.

The whole structure of the test suite still looks too much like the
old hack.  I'll try to think of other ways to structure it.


Yeah. It currently works with callback functions, like:

test program:
-> call RewindTest::run_rewind_test
 set up both cluster
 -> call before_standby callback
 start standby
 -> call standby_following_master callback
 promote standby
 -> call after_promotion callback
 stop master
 run pg_rewind
 restart master
 -> call after_rewind callback

It might be better to turn the control around, so that all the code 
that's now in the callbacks are in the test program's main flow, and 
test program calls functions in RewindTest.sh to execute the parts that 
are currently between the callbacks:


test program
  -> call RewindTest::setup_cluster
  do stuff that's now in before_standby callback
  -> call RewindTest::start_standby
  do stuff that's now in standby_following_master callback
  -> call RewindTest::promote_standby
  do stuff that's now in after_promotion callback
  -> call RewindTest::run_pg_rewind
  do stuff that's now in after_rewind callback

- Heikki



pg_rewind-bin-6.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subs

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2015-01-16 Thread Andres Freund

On 2015-01-16 18:01:24 +0100, Andres Freund wrote:
> Why don't we just add emit a NOTICE or WARNING in the relevant place
> saying that pg_hba.conf is outdated? Then the server won't log those if
> configured appropriately, which doesn't seem like a bad thing. Note that
> <= ERROR messages aren't sent to the client during authentication.

I think a 'WARNING: client authentication failed/succeeded with a
outdated pg_hba.conf in effect' would actually be a good way to present
this. It's not only failed logins where this is relevant.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function

2015-01-16 Thread Jim Nasby


On 1/16/15 3:45 AM, Pavel Stehule wrote:

I am returning back to processing records in plpgsql.

I am thinking so it can be simply processed with transformations to array.

Now we have similar functions - hstore(row), row_to_json, ... but using of 
these functions can be a useless step. Any row variable can be transformed to 
2D text array.


How is it useless? Why wouldn't you just use JSON and be done with it?

Do you have some use cases you can share?


There two possible transformations:

row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]


If we're going to go that route, I think it makes more sense to create an 
actual key/value type (ie: http://pgxn.org/dist/pair/doc/pair.html) and return 
an array of that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2015-01-16 Thread Andres Freund

On 2014-12-15 19:38:16 +0300, Alex Shulgin wrote:
> Attached is the modified version of the original patch by Craig,
> addressing the handling of the new hint_log error data field and
> removing the client-side HINT.

I'm not a big fan of this implementation. We're adding a fair bit of
infrastructure (i.e. server-only hints) where in other places we use
NOTICE for similar hints.

Why don't we just add emit a NOTICE or WARNING in the relevant place
saying that pg_hba.conf is outdated? Then the server won't log those if
configured appropriately, which doesn't seem like a bad thing. Note that
<= ERROR messages aren't sent to the client during authentication.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: lock_time for pg_stat_database

2015-01-16 Thread Pavel Stehule

Hi all,

some time ago, I proposed a lock time measurement related to query. A main
issue was a method, how to show this information. Today proposal is little
bit simpler, but still useful. We can show a total lock time per database
in pg_stat_database statistics. High number can be signal about lock issues.

Comments, ideas, notices?

Regards

Pavel

Re: [HACKERS] Temporal features in PostgreSQL

2015-01-16 Thread pe...@vanroose.be

What's the current status of this topic?
Has someone worked on temporal tables for PostgreSQL since 2012 ?

I'm giving a presentation on Fosdem later this month in Brussels, on the
topic of temporal tables, and would like to give all possibly relevant
information to the audience!

--  Peter Vanroose,
 Leuven, Belgium.



--
View this message in context: 
http://postgresql.nabble.com/Temporal-features-in-PostgreSQL-tp5737881p5834312.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-16 Thread Jim Nasby


On 1/16/15 3:39 AM, Pavel Stehule wrote:

I am proposing a simple function, that returns a position of element in array.


Yes please!


FUNCTION array_position(anyarray, anyelement) RETURNS int


That won't work on a multi-dimensional array. Ideally it needs to accept a 
slice or an element and return the specifier for the slice.

This wouldn't be so bad if we had an easier way to extract subsets of an array, 
but even that is really ugly because whatever you extract keeps the original 
number of dimensions.


Implementation is simple (plpgsql code)


This would actually be written in C though, yes? Otherwise it's not really any 
better than just doing an extension...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-16 Thread Marco Nenciarini

On 14/01/15 17:22, Gabriele Bartolini wrote:
> 
> My opinion, Marco, is that for version 5 of this patch, you:
> 
> 1) update the information on the wiki (it is outdated - I know you have
> been busy with LSN map optimisation)

Done.

> 2) modify pg_basebackup in order to accept a directory (or tar file) and
> automatically detect the LSN from the backup profile

New version of patch attached. The -I parameter now requires a backup
profile from a previous backup. I've added a sanity check that forbid
incremental file level backups if the base timeline is different from
the current one.

> 3) add the documentation regarding the backup profile and pg_basebackup
> 

Next on my TODO list.

> Once we have all of this, we can continue trying the patch. Some
> unexplored paths are:
> 
> * tablespace usage

I've improved my pg_restorebackup python PoC. It now supports tablespaces.

> * tar format
> * performance impact (in both "read-only" and heavily updated contexts)

From the server point of view, the current code generates a load similar
to normal backup. It only adds an initial scan of any data file to
decide whether it has to send it. One it found a single newer page it
immediately stop scanning and start sending the file. The IO impact
should not be that big due to the filesystem cache, but I agree with you
that it has to be measured.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
From f7cf8b9dd7d32f64a30dafaeeaeb56cbcd2eafff Mon Sep 17 00:00:00 2001
From: Marco Nenciarini 
Date: Tue, 14 Oct 2014 14:31:28 +0100
Subject: [PATCH] File-based incremental backup v5

Add backup profile to pg_basebackup
INCREMENTAL option implementaion
---
 src/backend/access/transam/xlog.c  |   7 +-
 src/backend/access/transam/xlogfuncs.c |   2 +-
 src/backend/replication/basebackup.c   | 335 +++--
 src/backend/replication/repl_gram.y|   6 +
 src/backend/replication/repl_scanner.l |   1 +
 src/bin/pg_basebackup/pg_basebackup.c  | 147 +--
 src/include/access/xlog.h  |   3 +-
 src/include/replication/basebackup.h   |   4 +
 8 files changed, 473 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 629a457..1e50625 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** XLogFileNameP(TimeLineID tli, XLogSegNo 
*** 9249,9255 
   * permissions of the calling user!
   */
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
   char **labelfile)
  {
boolexclusive = (labelfile == NULL);
--- 9249,9256 
   * permissions of the calling user!
   */
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast,
!  XLogRecPtr incremental_startpoint, 
TimeLineID *starttli_p,
   char **labelfile)
  {
boolexclusive = (labelfile == NULL);
*** do_pg_start_backup(const char *backupids
*** 9468,9473 
--- 9469,9478 
 (uint32) (startpoint >> 32), (uint32) startpoint, 
xlogfilename);
appendStringInfo(&labelfbuf, "CHECKPOINT LOCATION: %X/%X\n",
 (uint32) (checkpointloc >> 32), 
(uint32) checkpointloc);
+   if (incremental_startpoint > 0)
+   appendStringInfo(&labelfbuf, "INCREMENTAL FROM 
LOCATION: %X/%X\n",
+(uint32) 
(incremental_startpoint >> 32),
+(uint32) 
incremental_startpoint);
appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n",
 exclusive ? "pg_start_backup" 
: "streamed");
appendStringInfo(&labelfbuf, "BACKUP FROM: %s\n",
diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index 2179bf7..ace84d8 100644
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 59,65 
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   errmsg("must be superuser or replication role to run a 
backup")));
  
!   startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
PG_RETURN_LSN(startpoint);
  }
--- 59,65 
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   errmsg("must be superuser or replication role to run a 
backup")));
  
!   startpoint = do_pg_start_backup(backupidstr, fast, 0, NULL, NULL);
  
PG_RETURN_LSN(startpoint);
  }
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 07030a2..05

Re: [HACKERS] speedup tidbitmap patch: cache page

2015-01-16 Thread Andres Freund

On 2014-12-25 01:26:53 +1300, David Rowley wrote:
> So I think v3 is the one to go with, and I can't see any problems with it,
> so I'm marking it as ready for committer.

And committed.

Thanks Teodor and David.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Alvaro Herrera

Andres Freund wrote:
> On 2015-01-16 12:56:18 -0300, Alvaro Herrera wrote:

> > So how about something like
> > 
> > #define ALLOCFLAG_HUGE  0x01
> > #define ALLOCFLAG_NO_ERROR_ON_OOM   0x02
> > void *
> > MemoryContextAllocFlags(MemoryContext context, Size size, int flags);

> I don't know, this seems a bit awkward to use.  Your earlier example with
> the *Huge variant that returns a smaller allocation doesn't really
> convince me - that'd need a separate API anyway.

What example was that?  My thinking was that the mcxt.c function would
return NULL if the request was not satisfied; only the caller would be
entitled to retry with a smaller size.  I was thinking in something like

baseflag = ALLOCFLAG_NO_ERROR_ON_OOM;
reqsz = SomeHugeValue;
while (true)
{
ptr = MemoryContextAllocFlags(cxt, reqsz,
ALLOCFLAG_HUGE | baseflag);
if (ptr != NULL)
break;  /* success */

/* too large, retry with a smaller allocation */
reqsz *= 0.75;

/* if under some limit, have it fail next time */
if (reqsz < SomeHugeValue * 0.1)
baseflag = 0;
}
/* by here, you know ptr points to a memory area of size reqsz, which is
   between SomeHugeValue * 0.1 and SomeHugeValue. */


Were you thinking of something else?

> I definitely do not want to push the nofail stuff via the
> MemoryContextData-> API into aset.c. Imo aset.c should always return
> NULL and then mcxt.c should throw the error if in the normal palloc()
> function.

Sure, that seems reasonable ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-16 Thread Sawada Masahiko

On Fri, Jan 16, 2015 at 12:54 PM, Amit Kapila  wrote:
> On Thu, Jan 15, 2015 at 9:48 PM, Sawada Masahiko 
> wrote:
>> On Thu, Jan 15, 2015 at 2:02 PM, Amit Kapila 
>> wrote:
>> >
>> > One thought I have in this line is that currently there doesn't seem to
>> > be
>> > a way to know if the setting has an entry both in postgresql.conf and
>> > postgresql.auto.conf, if we can have some way of knowing the same
>> > (pg_settings?), then it could be convenient for user to decide if the
>> > value
>> > in postgresql.auto.conf is useful or not and if it's not useful then use
>> > Alter System .. Reset command to remove the same from
>> > postgresql.auto.conf.
>>
>> I think one way is that pg_settings has file name of variables,  But
>> It would not affect to currently status of postgresql.conf
>> So we would need to parse postgresql.conf again at that time.
>>
>
> Yeah that could be a possibility, but I think that will break the existing
> command('s) as this is the common infrastructure used for SHOW ..
> commands as well which displays the guc value that is used by
> current session rather than the value in postgresql.conf.

You're right.
pg_setting and SHOW command use value in current session rather than
config file.
It might break these common infrastructure.

>
> I don't know how appealing it would be to others, but a new view
> like pg_file_settings which would display the settings in file could
> be meaningful for your need.
>
> Another way is user can do pg_reload_conf() to see the latest
> values (excluding values for server startup time parameters).
>

I prefer the former. Because the latter would not handle all guc
variables as you said.
The new view like pg_file_setting has guc variable and config file
which has corresponding guc variable.
It would be helpful view for like ALTER SYSTEM ... RESET and that
command might be beneficial feature for user who want to manage
configuration file manually, I would like to propose them.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Andres Freund

On 2015-01-16 12:56:18 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > We rely on palloc erroring out on large allocations in a couple places
> > as a crosscheck. I don't think this argument holds much water.
> 
> I don't understand what that has to do with it.  Surely we're not going
> to have palloc_noerror() not error out when presented with a huge
> allocation.

Precisely. That means it *does* error out in a somewhat expected path.

> My point is just that the "noerror" bit in palloc_noerror() means that
> it doesn't fail in OOM, and that there are other causes for it to
> error.

That description pretty much describes why it's a misnomer, no?

> One thought I just had is that we also have MemoryContextAllocHuge; are
> we going to consider a mixture of both things in the future, i.e. allow
> huge allocations to return NULL when OOM?

I definitely think we should. I'd even say that the usecase is larger
for huge allocations. It'd e.g. be rather nice to first try sorting with
the huge 16GB work mem and then try 8GB/4/1GB if that fails.

> It sounds a bit useless
> currently, but it doesn't seem extremely far-fetched that we will need
> further flags in the future.  (Or, perhaps, we will want to have code
> that retries a Huge allocation that returns NULL with a smaller size,
> just in case it does work.)  Maybe what we need is to turn these things
> into flags to a new generic function.  Furthermore, I question whether
> we really need a "palloc" variant -- I mean, can we live with just the
> MemoryContext API instead?  As with the "Huge" variant (which does not
> have a corresponding palloc equivalent), possible use cases seem very
> limited so there's probably not much point in providing a shortcut.

I'm fine with not providing a palloc() equivalent, but I also am fine
with having it.

> So how about something like
> 
> #define ALLOCFLAG_HUGE0x01
> #define ALLOCFLAG_NO_ERROR_ON_OOM 0x02
> void *
> MemoryContextAllocFlags(MemoryContext context, Size size, int flags);
> 
> and perhaps even
> 
> #define MemoryContextAllocHuge(cxt, sz) \
>   MemoryContextAllocFlags(cxt, sz, ALLOCFLAG_HUGE)
> for source-level compatibility.

I don't know, this seems a bit awkward to use.  Your earlier example with
the *Huge variant that returns a smaller allocation doesn't really
convince me - that'd need a separate API anyway.

I definitely do not want to push the nofail stuff via the
MemoryContextData-> API into aset.c. Imo aset.c should always return
NULL and then mcxt.c should throw the error if in the normal palloc()
function.

> (Now we all agree that palloc() itself is a very hot spot and shouldn't
> be touched at all.  I don't think these new functions are used as commonly
> as that, so the fact that they are slightly slower shouldn't be too
> troublesome.)

Yea, the speed of the new functions really shouldn't matter.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-16 Thread Andres Freund

On 2015-01-16 15:16:20 +0100, Andreas Karlsson wrote:
> Indeed. As Noah and I discussed previously in this thread we would need to
> do quite a bit of refactoring of ruleutils.c to make it fully MVCC.

Right.

> For this reason I opted to only lower the lock levels of ADD and ALTER
> TRIGGER, and not DROP TRIGGER. Neither of those require MVCC of then
> WHEN clause.

I'm unconvinced that this is true. Using a snapshot for part of getting
a definition certainly opens the door for getting strange
results.

Acquiring a lock that prevents schema changes on the table and then
getting the definition using the syscaches guarantees that that
definition is at least self consistent because no further schema changes
are possible and the catalog caches will be up2date.

What you're doing though is doing part of the scan using the
transaction's snapshot (as used by pg_dump that will usually be a
repeatable read snapshot and possibly quite old) and the other using a
fresh catalog snapshot. This can result in rather odd things.

Just consider:
S1: CREATE TABLE flubber(id serial primary key, data text);
S1: CREATE FUNCTION blarg() RETURNS TRIGGER LANGUAGE plpgsql AS $$BEGIN RETURN 
NEW; END;$$;
S1: CREATE TRIGGER flubber_blarg BEFORE INSERT ON flubber FOR EACH ROW WHEN 
(NEW.data IS NOT NULL) EXECUTE PROCEDURE blarg();
S2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
S2: SELECT 'dosomethingelse';
S1: ALTER TABLE flubber RENAME TO wasflubber;
S1: ALTER TABLE wasflubber RENAME COLUMN data TO wasdata;
S1: ALTER TRIGGER flubber_blarg ON wasflubber RENAME TO wasflubber_blarg;
S1: ALTER FUNCTION blarg() RENAME TO wasblarg;
S2: SELECT pg_get_triggerdef(oid) FROM pg_trigger;

This will give you: The old trigger name. The new table name. The new
column name. The new function name.

I don't think using a snapshot for tiny parts of these functions
actually buys anything. Now, this isn't a pattern you introduced. But I
think we should think about this for a second before expanding it
further.

Before you argue that this isn't relevant for pg_dump: It is. Precisely
the above can happen - just replace the 'dosomethingelse' with several
LOCK TABLEs as pg_dump does. The first blocks after a snapshot has been
acquired. While waiting, all the ALTERs happen.

Arguably the benefit here is that the window for this issue is becoming
smaller as pg_dump (and hopefully other possible callers) acquire
exclusive locks on the table. I.e. that the lowering of the lock level
doesn't introduce new races. But on the other side of the coin, this
makes the result of pg_get_triggerdef() even *more* inaccurate in many
cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Alvaro Herrera

Andres Freund wrote:
> On 2015-01-16 12:09:25 -0300, Alvaro Herrera wrote:
> > Robert Haas wrote:
> > > On Thu, Jan 15, 2015 at 10:57 AM, Alvaro Herrera
> > >  wrote:
> > 
> > > >> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> > > >> or palloc_null() or palloc_no_oom() or palloc_unsafe().
> > > >
> > > > I liked palloc_noerror() better myself FWIW.
> > > 
> > > I don't care for noerror() because it probably still will error in
> > > some circumstances; just not for OOM.
> > 
> > Yes, but that seems fine to me.  We have other functions with "noerror"
> > flags, and they can still fail under some circumstances -- just not if
> > the error is the most commonly considered scenario in which they fail.
> 
> We rely on palloc erroring out on large allocations in a couple places
> as a crosscheck. I don't think this argument holds much water.

I don't understand what that has to do with it.  Surely we're not going
to have palloc_noerror() not error out when presented with a huge
allocation.  My point is just that the "noerror" bit in palloc_noerror()
means that it doesn't fail in OOM, and that there are other causes for
it to error.

One thought I just had is that we also have MemoryContextAllocHuge; are
we going to consider a mixture of both things in the future, i.e. allow
huge allocations to return NULL when OOM?  It sounds a bit useless
currently, but it doesn't seem extremely far-fetched that we will need
further flags in the future.  (Or, perhaps, we will want to have code
that retries a Huge allocation that returns NULL with a smaller size,
just in case it does work.)  Maybe what we need is to turn these things
into flags to a new generic function.  Furthermore, I question whether
we really need a "palloc" variant -- I mean, can we live with just the
MemoryContext API instead?  As with the "Huge" variant (which does not
have a corresponding palloc equivalent), possible use cases seem very
limited so there's probably not much point in providing a shortcut.

So how about something like

#define ALLOCFLAG_HUGE  0x01
#define ALLOCFLAG_NO_ERROR_ON_OOM   0x02
void *
MemoryContextAllocFlags(MemoryContext context, Size size, int flags);

and perhaps even

#define MemoryContextAllocHuge(cxt, sz) \
MemoryContextAllocFlags(cxt, sz, ALLOCFLAG_HUGE)
for source-level compatibility.

(Now we all agree that palloc() itself is a very hot spot and shouldn't
be touched at all.  I don't think these new functions are used as commonly
as that, so the fact that they are slightly slower shouldn't be too
troublesome.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Andres Freund

On 2015-01-16 12:09:25 -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> > On Thu, Jan 15, 2015 at 10:57 AM, Alvaro Herrera
> >  wrote:
> 
> > >> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> > >> or palloc_null() or palloc_no_oom() or palloc_unsafe().
> > >
> > > I liked palloc_noerror() better myself FWIW.
> > 
> > I don't care for noerror() because it probably still will error in
> > some circumstances; just not for OOM.
> 
> Yes, but that seems fine to me.  We have other functions with "noerror"
> flags, and they can still fail under some circumstances -- just not if
> the error is the most commonly considered scenario in which they fail.

We rely on palloc erroring out on large allocations in a couple places
as a crosscheck. I don't think this argument holds much water.

> The first example I found is LookupAggNameTypeNames(); there are many
> more.  I don't think this causes any confusion in practice.
> 
> Another precendent we have is something like "missing_ok" as a flag name
> in get_object_address() and other places; following that, we could have
> this new function as "palloc_oom_ok" or something like that.  But it
> doesn't seem an improvement to me.  (I'm pretty sure we all agree that
> this must not be a flag to palloc but rather a new function.)
> 
> Of all the ones you proposed above, the one I like the most is
> palloc_no_oom, but IMO palloc_noerror is still better.

Neither seem very accurate. no_oom isn't true because they actually can
cause ooms. _noerror isn't true because they can error out - we
e.g. rely on palloc erroring out when reading toast tuples (to detect
invalid datum lengths) and during parsing of WAL as an additional
defense.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-16 Thread Dave Cramer

On 16 January 2015 at 01:33, Noah Misch  wrote:

> On Thu, Jan 15, 2015 at 09:24:01AM -0500, Robert Haas wrote:
> > On Thu, Jan 15, 2015 at 1:04 AM, Noah Misch  wrote:
> > > On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote:
> > >> What I'm seeing now is that the unaccent regression tests when run
> under
> > >> make check-world abort with
> > >>
> > >> FATAL:  postmaster became multithreaded during startup
> > >> HINT:  Set the LC_ALL environment variable to a valid locale.
> > >
> > > contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense.  I
> expect the
> > > patch over here will fix it:
> > >
> http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com
> >
> > I just hit this same problem; are you going to commit that patch soon?
> >  It's rather annoying to have make check-world fail.
>
> Sure, done.  Dave, orangutan should now be able to pass with --enable-nls.
> Would you restore that option?
>

I can, but is this for HEAD or all versions ?

Re: [HACKERS] Error check always bypassed in tablefunc.c

2015-01-16 Thread Alvaro Herrera

Michael Paquier wrote:

> As mentioned in $subject, commit 08c33c4 of 2003 has made the
> following block of code dead in tablefunc.c:1320 because level is
> incremented to at least 1:
> /* First time through, do a little more setup */
> if (level == 0)
> {

Uh.  This means the error case has no test coverage ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-16 Thread Noah Misch

On Fri, Jan 16, 2015 at 05:43:44AM -0500, Dave Cramer wrote:
> On 16 January 2015 at 01:33, Noah Misch  wrote:
> > Sure, done.  Dave, orangutan should now be able to pass with --enable-nls.
> > Would you restore that option?
> 
> I can, but is this for HEAD or all versions ?

All versions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Error check always bypassed in tablefunc.c

2015-01-16 Thread Michael Paquier

Hi all,

As mentioned in $subject, commit 08c33c4 of 2003 has made the
following block of code dead in tablefunc.c:1320 because level is
incremented to at least 1:
/* First time through, do a little more setup */
if (level == 0)
{
/*
 * Check that return tupdesc is compatible
with the one we got
 * from the query, but only at level 0 -- no
need to check more
 * than once
 */

if (!compatConnectbyTupleDescs(tupdesc, spi_tupdesc))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("invalid return type"),
 errdetail("Return and
SQL tuple descriptions are " \

"incompatible.")));
}

A simple fix is simply to change "level == 0" to "level == 1" to check
for this error, like in the patch attached. This issue has been
spotted by Coverity.
Regards,
-- 
Michael
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 3388fab..878255e 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -1317,12 +1317,12 @@ build_tuplestore_recursively(char *key_fld,
 		StringInfoData chk_current_key;
 
 		/* First time through, do a little more setup */
-		if (level == 0)
+		if (level == 1)
 		{
 			/*
 			 * Check that return tupdesc is compatible with the one we got
-			 * from the query, but only at level 0 -- no need to check more
-			 * than once
+			 * from the query, but only at the first level -- no need to check
+			 * more than once
 			 */
 
 			if (!compatConnectbyTupleDescs(tupdesc, spi_tupdesc))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Alvaro Herrera

Robert Haas wrote:
> On Thu, Jan 15, 2015 at 10:57 AM, Alvaro Herrera
>  wrote:

> >> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> >> or palloc_null() or palloc_no_oom() or palloc_unsafe().
> >
> > I liked palloc_noerror() better myself FWIW.
> 
> I don't care for noerror() because it probably still will error in
> some circumstances; just not for OOM.

Yes, but that seems fine to me.  We have other functions with "noerror"
flags, and they can still fail under some circumstances -- just not if
the error is the most commonly considered scenario in which they fail.
The first example I found is LookupAggNameTypeNames(); there are many
more.  I don't think this causes any confusion in practice.

Another precendent we have is something like "missing_ok" as a flag name
in get_object_address() and other places; following that, we could have
this new function as "palloc_oom_ok" or something like that.  But it
doesn't seem an improvement to me.  (I'm pretty sure we all agree that
this must not be a flag to palloc but rather a new function.)

Of all the ones you proposed above, the one I like the most is
palloc_no_oom, but IMO palloc_noerror is still better.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Robert Haas

On Thu, Jan 15, 2015 at 10:57 AM, Alvaro Herrera
 wrote:
>> Hmm, I understood Tom to be opposing the idea of a palloc variant that
>> returns NULL on failure, and I understand you to be supporting it.
>> But maybe I'm confused.
>
> Your understanding seems correct to me.  I was just saying that your
> description of Tom's argument to dislike the idea seemed at odds with
> what he was actually saying.

OK, that may be.  I'm not sure.

>> Anyway, I support it.  I agree that there are
>> systems (or circumstances?) where malloc is going to succeed and then
>> the world will blow up later on anyway, but I don't think that means
>> that an out-of-memory error is the only sensible response to a palloc
>> failure; returning NULL seems like a sometimes-useful alternative.
>>
>> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
>> or palloc_null() or palloc_no_oom() or palloc_unsafe().
>
> I liked palloc_noerror() better myself FWIW.

I don't care for noerror() because it probably still will error in
some circumstances; just not for OOM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure

On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund  wrote:
> Is there any chance you can package this somehow so that others can run
> it locally? It looks hard to find the actual bug here without adding
> instrumentation to to postgres.

That's possible but involves a lot of complexity in the setup because
of the source database (SQL Server) dependency..

Thinking outside the box here I'm going to migrate the source to
postgres.   This will rule out pl/sh which is the only non-core
dependency but will take some setup work on my end first.  If I can
still reproduce the error at that point, maybe we can go in this
direction, and it it would make local reproduction easier anyways.

>> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
>
> This was the first error? None of the 'could not find subXID' errors
> beforehand?

yep.  no caught exceptions or anything interesting in the log before that.

> Could you add a EmitErrorReport(); before the FlushErrorState() in
> pl_exec.c's exec_stmt_block()?

will do.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Andres Freund

Hi,

On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan  wrote:
> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:
> >> Running this test on another set of hardware to verify -- if this
> >> turns out to be a false alarm which it may very well be, I can only
> >> offer my apologies!  I've never had a new drive fail like that, in
> >> that manner.  I'll burn the other hardware in overnight and report
> >> back.
> 
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:

Damn.

Is there any chance you can package this somehow so that others can run
it locally? It looks hard to find the actual bug here without adding
instrumentation to to postgres.

> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20
> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
>   DELETE FROM "onesitepmc"."propertyguestcard" t
>   WHERE EXISTS
>   (
> SELECT 1 FROM "propertyguestcard_insert" d
> WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
>   )

> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT:  PL/pgSQL
> function cdsreconciletable(text,text,text,text,boolean) line 197 at
> EXECUTE statement
> SQL statement "SELECT* FROM CDSReconcileTable(
>   t.CDSServer,
>   t.CDSDatabase,
>   t.SchemaName,
>   t.TableName)"
> PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement


This was the first error? None of the 'could not find subXID' errors
beforehand?


> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING:  did not find
> subXID 7553 in MyProc
> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG:  could not send data
> to client: Broken pipe
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT:  SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type ShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
> shared lock object
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT:  SELE

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Heikki Linnakangas


On 01/16/2015 04:05 PM, Merlin Moncure wrote:

On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan  wrote:

On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:

Running this test on another set of hardware to verify -- if this
turns out to be a false alarm which it may very well be, I can only
offer my apologies!  I've never had a new drive fail like that, in
that manner.  I'll burn the other hardware in overnight and report
back.


huh -- well possibly. not.  This is on a virtual machine attached to a
SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
checksums on) hours then the starting having issues:

[cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
verification failed, calculated checksum 59143 but expected 59137 at
character 20


The calculated checksum is suspiciously close to to the expected one. It 
could be coincidence, but the previous checksum warning you posted was 
also quite close:



[cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
verification failed, calculated checksum 28520 but expected 28541


I believe the checksum algorithm is supposed to mix the bits quite 
thoroughly, so that a difference in a single byte in the input will lead 
to a completely different checksum. However, we add the block number to 
the checksum last:



/* Mix in the block number to detect transposed pages */
checksum ^= blkno;

/*
 * Reduce to a uint16 (to fit in the pd_checksum field) with an offset 
of
 * one. That avoids checksums of zero, which seems like a good idea.
 */
return (checksum % 65535) + 1;


It looks very much like that a page has for some reason been moved to a 
different block number. And that's exactly what Peter found out in his 
investigation too; an index page was mysteriously copied to a different 
block with identical content.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure

On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure  wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan  wrote:
>> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:
>>> Running this test on another set of hardware to verify -- if this
>>> turns out to be a false alarm which it may very well be, I can only
>>> offer my apologies!  I've never had a new drive fail like that, in
>>> that manner.  I'll burn the other hardware in overnight and report
>>> back.
>
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:

I'm going to bisect this down...FYI.  I'll start with the major
releases first.   This is not going to be a fast process...I'm out of
pocket for the weekend and have a lot of other stuff going on.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-16 Thread Andreas Karlsson


On 01/16/2015 03:01 PM, Andres Freund wrote:

Hi,


/*
-* Grab an exclusive lock on the pk table, so that someone doesn't 
delete
-* rows out from under us. (Although a lesser lock would do for that
-* purpose, we'll need exclusive lock anyway to add triggers to the pk
-* table; trying to start with a lesser lock will just create a risk of
-* deadlock.)
+* Grab ShareRowExclusiveLock on the pk table, so that someone doesn't
+* delete rows out from under us. Note that this does not create risks
+* of deadlocks as triggers add added to the pk table using the same
+* lock.
 */


"add added" doesn't look intended. The rest of the sentence doesn't look
entirely right either.


It was intended to be "are added", but the sentence is pretty awful 
anyway. I am not sure the sentence is really necessary anyway.



/*
 * Triggers must be on tables or views, and there are additional
@@ -526,8 +526,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 * can skip this for internally generated triggers, since the name
 * modification above should be sufficient.
 *
-* NOTE that this is cool only because we have AccessExclusiveLock on 
the
-* relation, so the trigger set won't be changing underneath us.
+* NOTE that this is cool only because of the unique contraint.


I fail to see what the unique constraint has to do with this? The
previous comment refers to the fact that the AccessExclusiveLock is what
prevents a race where another transaction adds a trigger with the same
name already exists. Yes, the unique index would, as noted earlier in
the comment, catch the error. But that's not the point of the
check. Unless I miss something the comment is just as true if you
replace the access exclusive with share row exlusive as it's also self
conflicting.


Yeah, this must have been a remainder from the version where I only 
grabbed a ShareLock. The comment should be restored.



Hm. Pushing the snapshot is supposed to make this fully mvcc? Idon't
think that's actually sufficient because of the deparsing of the WHEN
clause and of the function name.


Indeed. As Noah and I discussed previously in this thread we would need 
to do quite a bit of refactoring of ruleutils.c to make it fully MVCC. 
For this reason I opted to only lower the lock levels of ADD and ALTER 
TRIGGER, and not DROP TRIGGER. Neither of those require MVCC of then 
WHEN clause.


Andreas




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Andres Freund

On 2015-01-16 23:06:12 +0900, Michael Paquier wrote:
>  /*
> + * Wrappers for allocation functions.
> + */
> +static void *set_alloc_internal(MemoryContext context,
> + Size size, bool 
> noerror);
> +static void *set_realloc_internal(MemoryContext context, void *pointer,
> +   Size size, 
> bool noerror);
> +
> +/*
>   * These functions implement the MemoryContext API for AllocSet contexts.
>   */
>  static void *AllocSetAlloc(MemoryContext context, Size size);
> +static void *AllocSetAllocNoError(MemoryContext context, Size size);
>  static void AllocSetFree(MemoryContext context, void *pointer);
>  static void *AllocSetRealloc(MemoryContext context, void *pointer, Size 
> size);
> +static void *AllocSetReallocNoError(MemoryContext context,
> +  void *pointer, 
> Size size);
>  static void AllocSetInit(MemoryContext context);
>  static void AllocSetReset(MemoryContext context);
>  static void AllocSetDelete(MemoryContext context);
> @@ -264,8 +275,10 @@ static void AllocSetCheck(MemoryContext context);
>   */
>  static MemoryContextMethods AllocSetMethods = {
>   AllocSetAlloc,
> + AllocSetAllocNoError,
>   AllocSetFree,
>   AllocSetRealloc,
> + AllocSetReallocNoError,
>   AllocSetInit,
>   AllocSetReset,
>   AllocSetDelete,
> @@ -517,140 +530,16 @@ AllocSetContextCreate(MemoryContext parent,
>  }

Wouldn't it make more sense to change the MemoryContext API to return
NULLs in case of allocation failure and do the error checking in the
mcxt.c callers?
> +/* wrapper routines for allocation */
> +static void* palloc_internal(Size size, bool noerror);
> +static void* repalloc_internal(void *pointer, Size size, bool noerror);
> +
>  /*
>   * You should not do memory allocations within a critical section, because
>   * an out-of-memory error will be escalated to a PANIC. To enforce that
> @@ -684,8 +688,8 @@ MemoryContextAllocZeroAligned(MemoryContext context, Size 
> size)
>   return ret;
>  }
>  
> -void *
> -palloc(Size size)
> +static void*
> +palloc_internal(Size size, bool noerror)
>  {
>   /* duplicates MemoryContextAlloc to avoid increased overhead */
>   void   *ret;
> @@ -698,31 +702,85 @@ palloc(Size size)
>  
>   CurrentMemoryContext->isReset = false;
>  
> - ret = (*CurrentMemoryContext->methods->alloc) (CurrentMemoryContext, 
> size);
> + if (noerror)
> + ret = (*CurrentMemoryContext->methods->alloc_noerror)
> + (CurrentMemoryContext, size);
> + else
> + ret = (*CurrentMemoryContext->methods->alloc)
> + (CurrentMemoryContext, size);
>   VALGRIND_MEMPOOL_ALLOC(CurrentMemoryContext, ret, size);
>  
>   return ret;
>  }

I'd be rather surprised if these branches won't show up in
profiles. This is really rather hot code. At the very least this helper
function should be inlined. Also, calling the valgrind function on an
allocation failure surely isn't correct.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Andres Freund

On 2015-01-16 08:47:10 +0900, Michael Paquier wrote:
> On Fri, Jan 16, 2015 at 12:57 AM, Alvaro Herrera
>  wrote:
> >> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> >> or palloc_null() or palloc_no_oom() or palloc_unsafe().
> >
> > I liked palloc_noerror() better myself FWIW.
> Voting for palloc_noerror() as well.

I don't like that name. It very well can error out. E.g. because of the
allocation size. And we definitely do not want to ignore that case. How
about palloc_try()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-16 Thread Michael Paquier

On Fri, Jan 16, 2015 at 8:47 AM, Michael Paquier
 wrote:
> Voting for palloc_noerror() as well.
And here is an updated patch using this naming, added to the next CF as well.
-- 
Michael
From b636c809c2f2cb4177bedc2e5a4883a79b61fbc6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 13 Jan 2015 15:40:38 +0900
Subject: [PATCH] Add memory allocation APIs able to return NULL instead of
 ERROR

The following functions are added to the existing set for frontend and
backend:
- palloc_noerror
- palloc0_noerror
- repalloc_noerror
---
 src/backend/utils/mmgr/aset.c| 529 +++
 src/backend/utils/mmgr/mcxt.c| 124 +
 src/common/fe_memutils.c |  72 --
 src/include/common/fe_memutils.h |   3 +
 src/include/nodes/memnodes.h |   2 +
 src/include/utils/palloc.h   |   3 +
 6 files changed, 451 insertions(+), 282 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 85b3c9a..974e018 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -243,11 +243,22 @@ typedef struct AllocChunkData
 	((AllocPointer)(((char *)(chk)) + ALLOC_CHUNKHDRSZ))
 
 /*
+ * Wrappers for allocation functions.
+ */
+static void *set_alloc_internal(MemoryContext context,
+Size size, bool noerror);
+static void *set_realloc_internal(MemoryContext context, void *pointer,
+  Size size, bool noerror);
+
+/*
  * These functions implement the MemoryContext API for AllocSet contexts.
  */
 static void *AllocSetAlloc(MemoryContext context, Size size);
+static void *AllocSetAllocNoError(MemoryContext context, Size size);
 static void AllocSetFree(MemoryContext context, void *pointer);
 static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
+static void *AllocSetReallocNoError(MemoryContext context,
+ void *pointer, Size size);
 static void AllocSetInit(MemoryContext context);
 static void AllocSetReset(MemoryContext context);
 static void AllocSetDelete(MemoryContext context);
@@ -264,8 +275,10 @@ static void AllocSetCheck(MemoryContext context);
  */
 static MemoryContextMethods AllocSetMethods = {
 	AllocSetAlloc,
+	AllocSetAllocNoError,
 	AllocSetFree,
 	AllocSetRealloc,
+	AllocSetReallocNoError,
 	AllocSetInit,
 	AllocSetReset,
 	AllocSetDelete,
@@ -517,140 +530,16 @@ AllocSetContextCreate(MemoryContext parent,
 }
 
 /*
- * AllocSetInit
- *		Context-type-specific initialization routine.
- *
- * This is called by MemoryContextCreate() after setting up the
- * generic MemoryContext fields and before linking the new context
- * into the context tree.  We must do whatever is needed to make the
- * new context minimally valid for deletion.  We must *not* risk
- * failure --- thus, for example, allocating more memory is not cool.
- * (AllocSetContextCreate can allocate memory when it gets control
- * back, however.)
- */
-static void
-AllocSetInit(MemoryContext context)
-{
-	/*
-	 * Since MemoryContextCreate already zeroed the context node, we don't
-	 * have to do anything here: it's already OK.
-	 */
-}
-
-/*
- * AllocSetReset
- *		Frees all memory which is allocated in the given set.
- *
- * Actually, this routine has some discretion about what to do.
- * It should mark all allocated chunks freed, but it need not necessarily
- * give back all the resources the set owns.  Our actual implementation is
- * that we hang onto any "keeper" block specified for the set.  In this way,
- * we don't thrash malloc() when a context is repeatedly reset after small
- * allocations, which is typical behavior for per-tuple contexts.
- */
-static void
-AllocSetReset(MemoryContext context)
-{
-	AllocSet	set = (AllocSet) context;
-	AllocBlock	block;
-
-	AssertArg(AllocSetIsValid(set));
-
-#ifdef MEMORY_CONTEXT_CHECKING
-	/* Check for corruption and leaks before freeing */
-	AllocSetCheck(context);
-#endif
-
-	/* Clear chunk freelists */
-	MemSetAligned(set->freelist, 0, sizeof(set->freelist));
-
-	block = set->blocks;
-
-	/* New blocks list is either empty or just the keeper block */
-	set->blocks = set->keeper;
-
-	while (block != NULL)
-	{
-		AllocBlock	next = block->next;
-
-		if (block == set->keeper)
-		{
-			/* Reset the block, but don't return it to malloc */
-			char	   *datastart = ((char *) block) + ALLOC_BLOCKHDRSZ;
-
-#ifdef CLOBBER_FREED_MEMORY
-			wipe_mem(datastart, block->freeptr - datastart);
-#else
-			/* wipe_mem() would have done this */
-			VALGRIND_MAKE_MEM_NOACCESS(datastart, block->freeptr - datastart);
-#endif
-			block->freeptr = datastart;
-			block->next = NULL;
-		}
-		else
-		{
-			/* Normal case, release the block */
-#ifdef CLOBBER_FREED_MEMORY
-			wipe_mem(block, block->freeptr - ((char *) block));
-#endif
-			free(block);
-		}
-		block = next;
-	}
-
-	/* Reset block size allocation sequence, too */
-	set->nextBlockSize = set->initBlockSize;
-}
-
-/*
- * AllocSetDelete
- *		Frees all memory which is allocated in the given set,
- *		in preparation for dele

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-16 Thread Merlin Moncure

On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan  wrote:
> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:
>> Running this test on another set of hardware to verify -- if this
>> turns out to be a false alarm which it may very well be, I can only
>> offer my apologies!  I've never had a new drive fail like that, in
>> that manner.  I'll burn the other hardware in overnight and report
>> back.

huh -- well possibly. not.  This is on a virtual machine attached to a
SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
checksums on) hours then the starting having issues:

[cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
verification failed, calculated checksum 59143 but expected 59137 at
character 20
[cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
  DELETE FROM "onesitepmc"."propertyguestcard" t
  WHERE EXISTS
  (
SELECT 1 FROM "propertyguestcard_insert" d
WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
  )

[cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT:  PL/pgSQL
function cdsreconciletable(text,text,text,text,boolean) line 197 at
EXECUTE statement
SQL statement "SELECT* FROM CDSReconcileTable(
  t.CDSServer,
  t.CDSDatabase,
  t.SchemaName,
  t.TableName)"
PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement

After that, several hours of clean running, followed by:

[cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG:  could not send data
to client: Broken pipe
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT:  SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT:  SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-16 Thread Andres Freund

Hi,

>   /*
> -  * Grab an exclusive lock on the pk table, so that someone doesn't 
> delete
> -  * rows out from under us. (Although a lesser lock would do for that
> -  * purpose, we'll need exclusive lock anyway to add triggers to the pk
> -  * table; trying to start with a lesser lock will just create a risk of
> -  * deadlock.)
> +  * Grab ShareRowExclusiveLock on the pk table, so that someone doesn't
> +  * delete rows out from under us. Note that this does not create risks
> +  * of deadlocks as triggers add added to the pk table using the same
> +  * lock.
>*/

"add added" doesn't look intended. The rest of the sentence doesn't look
entirely right either.

>   /*
>* Triggers must be on tables or views, and there are additional
> @@ -526,8 +526,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char 
> *queryString,
>* can skip this for internally generated triggers, since the name
>* modification above should be sufficient.
>*
> -  * NOTE that this is cool only because we have AccessExclusiveLock on 
> the
> -  * relation, so the trigger set won't be changing underneath us.
> +  * NOTE that this is cool only because of the unique contraint.

I fail to see what the unique constraint has to do with this? The
previous comment refers to the fact that the AccessExclusiveLock is what
prevents a race where another transaction adds a trigger with the same
name already exists. Yes, the unique index would, as noted earlier in
the comment, catch the error. But that's not the point of the
check. Unless I miss something the comment is just as true if you
replace the access exclusive with share row exlusive as it's also self
conflicting.

> @@ -1272,8 +1271,7 @@ renametrig(RenameStmt *stmt)
>* on tgrelid/tgname would complain anyway) and to ensure a trigger does
>* exist with oldname.
>*
> -  * NOTE that this is cool only because we have AccessExclusiveLock on 
> the
> -  * relation, so the trigger set won't be changing underneath us.
> +  * NOTE that this is cool only because there is a unique constraint.
>*/

Same as above.

>   tgrel = heap_open(TriggerRelationId, RowExclusiveLock);
>  
> diff --git a/src/backend/utils/adt/ruleutils.c 
> b/src/backend/utils/adt/ruleutils.c
> index dd748ac..8eeccf2 100644
> --- a/src/backend/utils/adt/ruleutils.c
> +++ b/src/backend/utils/adt/ruleutils.c
> @@ -699,7 +699,8 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
>   HeapTuple   ht_trig;
>   Form_pg_trigger trigrec;
>   StringInfoData buf;
> - Relationtgrel;
> + Snapshotsnapshot = RegisterSnapshot(GetTransactionSnapshot());
> + Relationtgrel = heap_open(TriggerRelationId, AccessShareLock);
>   ScanKeyData skey[1];
>   SysScanDesc tgscan;
>   int findx = 0;
> @@ -710,18 +711,18 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
>   /*
>* Fetch the pg_trigger tuple by the Oid of the trigger
>*/
> - tgrel = heap_open(TriggerRelationId, AccessShareLock);
> -
>   ScanKeyInit(&skey[0],
>   ObjectIdAttributeNumber,
>   BTEqualStrategyNumber, F_OIDEQ,
>   ObjectIdGetDatum(trigid));
>  
>   tgscan = systable_beginscan(tgrel, TriggerOidIndexId, true,
> - NULL, 1, skey);
> + snapshot, 1, 
> skey);
>  
>   ht_trig = systable_getnext(tgscan);
>  
> + UnregisterSnapshot(snapshot);
> +
>   if (!HeapTupleIsValid(ht_trig))
>   elog(ERROR, "could not find tuple for trigger %u", trigid);
>

Hm. Pushing the snapshot is supposed to make this fully mvcc? Idon't
think that's actually sufficient because of the deparsing of the WHEN
clause and of the function name.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2015-01-16 Thread Gilles Darold

On 16/01/2015 01:06, Jim Nasby wrote:
> On 1/15/15 5:26 AM, Gilles Darold wrote:
>> Hello,
>>
>> There's a long pending issue with pg_dump and extensions that have
>> table members with foreign keys. This was previously reported in this
>> thread
>> http://www.postgresql.org/message-id/ca+tgmoyvzkadmgh_8el7uvm472geru0b4pnnfjqye6ss1k9...@mail.gmail.com
>> and discuss by Robert. All PostgreSQL users that use the PostGis
>> extension postgis_topology are facing the issue because the two
>> members tables (topology and layer) are linked by foreign keys.
>>
>> If you dump a database with this extension and try to import it you
>> will experience this error:
>>
>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
>> pg_restore: [archiver (db)] Error from TOC entry 3345; 0
>> 157059176 TABLE DATA layer gilles
>> pg_restore: [archiver (db)] COPY failed for table "layer": ERROR:
>> insert or update on table "layer" violates foreign key constraint
>> "layer_topology_id_fkey"
>> DETAIL:  Key (topology_id)=(1) is not present in table "topology".
>> WARNING: errors ignored on restore: 1
>>
>>
>> The problem is that, whatever export type you choose (plain/custom
>> and full-export/data-only) the data of tables "topology" and "layer"
>> are always exported in alphabetic order. I think this is a bug
>> because outside extension, in data-only export, pg_dump is able to
>> find foreign keys dependency and dump table's data in the right order
>> but not with extension's members. Default is alphabetic order but
>> that should not be the case with extension's members because
>> constraints are recreated during the CREATE EXTENSION order. I hope I
>> am clear enough.
>>
>> Here we have three solutions:
>>
>>  1/ Inform developers of extensions to take care to alphabetical
>> order when they have member tables using foreign keys.
>>  2/ Inform DBAs that they have to restore the failing table
>> independently. The use case above can be resumed using the following
>> command:
>>
>>   pg_restore -h localhost -n topology -t layer -Fc -d
>> testdb_empty testdump.dump
>>
>>  3/ Inform DBAs that they have to restore the schema first then
>> the data only using --disable-triggers
>
> I don't like 1-3, and I doubt anyone else does...
>
>>  4/ Patch pg_dump to solve this issue.
>
> 5. Disable FK's during load.
> This is really a bigger item than just extensions. It would have the
> nice benefit of doing a wholesale FK validation instead of firing
> per-row triggers, but it would leave the database in a weird state if
> a restore failed...

I think this is an other problem. Here we just need to apply to
extension's members tables the same work than to normal tables. I guess
this is what this patch try to solve.

>
>> I attach a patch that solves the issue in pg_dump, let me know if it
>> might be included in Commit Fest or if the three other solutions are
>> a better choice. I also join a sample extension (test_fk_in_ext) to
>> be able to reproduce the issue and test the patch. Note that it might
>> exists a simpler solution than the one I used in this patch, if this
>> is the case please point me on the right way, I will be pleased to
>> rewrite and send an other patch.
>
> The only problem I see with this approach is circular FK's:
>
> decibel@decina.local=# create table a(a_id serial primary key, b_id int);
> CREATE TABLE
> decibel@decina.local=# create table b(b_id serial primary key, a_id
> int references a);
> CREATE TABLE
> decibel@decina.local=# alter table a add foreign key(b_id) references b;
> ALTER TABLE
> decibel@decina.local=#
>
> That's esoteric enough that I think it's OK not to directly support
> them, but pg_dump shouldn't puke on them (and really should throw a
> warning). Though it looks like it doesn't handle that in the data-only
> case anyway...

The patch is taking care or circular references and you will be warn if
pg_dump found it in the extension members. That was not the case before.
If you try do dump a database with the postgis extension you will be
warn about FK defined on the edge_data table.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-16 Thread Andres Freund

On 2015-01-16 21:50:16 +0900, Michael Paquier wrote:
> On Fri, Jan 16, 2015 at 9:45 PM, Andres Freund  wrote:
> > On 2015-01-16 21:43:43 +0900, Michael Paquier wrote:
> >> On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut  wrote:
> >> > I have written similar logic, and while it's not pleasant, it's doable.
> >> >  This issue would really only go away if you don't use a file to signal
> >> > recovery at all, which you have argued for, but which is really a
> >> > separate and more difficult problem.
> >> Moving this patch to the next CF and marking it as returned with
> >> feedback for current CF as there is visibly no consensus reached.
> >
> > I don't think that's a good idea. If we defer this another couple months
> > we'l *never* reach anything coming close to concensus.

> What makes you think that the situation could move suddendly move into
> a direction more than another?

That we have to fix this.

I see absolutely no advantage of declaring the discussion closed for
now. That doesn't exactly increase the chance of this ever succeeding.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-01-16 Thread Andres Freund

On 2015-01-05 11:27:57 -0500, Robert Haas wrote:
> On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund  wrote:
> > * I wonder if parallel contexts shouldn't be tracked via resowners
> 
> That is a good question.  I confess that I'm somewhat fuzzy about
> which things should be tracked via the resowner mechanism vs. which
> things should have their own bespoke bookkeeping.  However, the
> AtEOXact_Parallel() stuff happens well before ResourceOwnerRelease(),
> which makes merging them seem not particularly clean.

I'm not sure either. But I think the current location is wrong anyway -
during AtEOXact_Parallel() before running user defined queries via
AfterTriggerFireDeferred() seems wrong.

> > * combocid.c should error out in parallel mode, as insurance
> 
> Eh, which function?  HeapTupleHeaderGetCmin(),
> HeapTupleHeaderGetCmax(), and AtEOXact_ComboCid() are intended to work
> in parallel mode. HeapTupleHeaderAdjustCmax() could
> Assert(!IsInParallelMode()) but the only calls are in heap_update()
> and heap_delete() which already have error checks, so putting yet
> another elog() there seems like overkill.

To me it seems like a good idea, but whatever.

> > * I doubt it's a good idea to allow heap_insert, heap_inplace_update,
> >   index_insert. I'm not convinced that those will be handled correct and
> >   relaxing restrictions is easier than adding them.
> 
> I'm fine with adding checks to heap_insert() and
> heap_inplace_update().  I'm not sure we really need to add anything to
> index_insert(); how are we going to get there without hitting some
> other prohibited operation first?

I'm not sure. But it's not that hard to imagine that somebody will start
adding codepaths that insert into indexes first. Think upsert.

> > * I'd much rather separate HandleParallelMessageInterrupt() in one
> >   variant that just tells the machinery there's a interrupt (called from
> >   signal handlers) and one that actually handles them. We shouldn't even
> >   consider adding any new code that does allocations, errors and such in
> >   signal handlers. That's just a *bad* idea.
> 
> That's a nice idea, but I didn't invent the way this crap works today.
> ImmediateInterruptOK gets set to true while performing a lock wait,
> and we need to be able to respond to errors while in that state.  I
> think there's got to be a better way to handle that than force every
> asynchronous operation to have to cope with the fact that
> ImmediateInterruptOK may be set or not set, but as of today that's
> what you have to do.

I personally think it's absolutely unacceptable to add any more of
that. That the current mess works is more luck than anything else - and
I'm pretty sure there's several bugs in it. But since I realize I can't
force you to develop a alternative solution, I tried to implement enough
infrastructure to deal with it without too much work.

As far as I can see this could relatively easily be implemented ontop of
the removal of ImmediateInterruptOK in the patch series I posted
yesterday? Afaics you just need to remove most of
+HandleParallelMessageInterrupt() and replace it by a single SetLatch().

> > * I'm not a fan of the shm_toc stuff... Verbose and hard to read. I'd
> >   much rather place a struct there and be careful about not using
> >   pointers. That also would obliviate the need to reserve some ids.
> 
> I don't see how that's going to work with variable-size data
> structures, and a bunch of the things that we need to serialize are
> variable-size.

Meh. Just appending the variable data to the end of the structure and
calculating offsets works just fine.

> Moreover, I'm not really interested in rehashing this
> design again.  I know it's not your favorite; you've said that before.

Well, if I keep having to read code using it, I'll keep being annoyed by
it. I guess we both have to live with that.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-16 Thread Michael Paquier

On Fri, Jan 16, 2015 at 9:45 PM, Andres Freund  wrote:
> On 2015-01-16 21:43:43 +0900, Michael Paquier wrote:
>> On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut  wrote:
>> > I have written similar logic, and while it's not pleasant, it's doable.
>> >  This issue would really only go away if you don't use a file to signal
>> > recovery at all, which you have argued for, but which is really a
>> > separate and more difficult problem.
>> Moving this patch to the next CF and marking it as returned with
>> feedback for current CF as there is visibly no consensus reached.
>
> I don't think that's a good idea. If we defer this another couple months
> we'l *never* reach anything coming close to concensus.
What makes you think that the situation could move suddendly move into
a direction more than another?
(FWIW, my vote goes to the all GUC approach with standby.enabled.)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-16 Thread Andres Freund

On 2015-01-16 21:43:43 +0900, Michael Paquier wrote:
> On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut  wrote:
> > I have written similar logic, and while it's not pleasant, it's doable.
> >  This issue would really only go away if you don't use a file to signal
> > recovery at all, which you have argued for, but which is really a
> > separate and more difficult problem.
> Moving this patch to the next CF and marking it as returned with
> feedback for current CF as there is visibly no consensus reached.

I don't think that's a good idea. If we defer this another couple months
we'l *never* reach anything coming close to concensus.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-16 Thread Michael Paquier

On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut  wrote:
> I have written similar logic, and while it's not pleasant, it's doable.
>  This issue would really only go away if you don't use a file to signal
> recovery at all, which you have argued for, but which is really a
> separate and more difficult problem.
Moving this patch to the next CF and marking it as returned with
feedback for current CF as there is visibly no consensus reached.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2015-01-16 Thread Michael Paquier

On Thu, Jan 8, 2015 at 4:07 AM, Tomas Vondra
 wrote:
> which is IMHO obviously wrong, because that accounts only for the
> hashtable itself.
> In those cases the patch actually does no memory accounting and as
> hashcontext has no child contexts, there's no accounting overhead.
> [blah]

> So the performance drops 2x. With more groups, the performance impact is
> even worse.
Hmm. It seems that this patch is not there yet, so marking it as
returned with feedback, especially knowing that the patch is incorrect
by using hascontext as mentioned by Tomas.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

2015-01-16 Thread Manuel Kniep

On 16. Januar 2015 at 00:57:14, Tom Lane (t...@sss.pgh.pa.us) wrote:
> I wrote:
> > Manuel Kniep writes:
> >> ok after lot’s of testing I could create a test case
> >> which can be found here 
> >> https://gist.github.com/rapimo/3c8c1b35270e5854c524  
> >> it’s written in ruby an depends on the gem activerecord pg and parallel
>  
> > Hm. I don't see a segfault from this. I do see the CREATE TEMP TABLE
> > command failing with "ctid is NULL", which probably shouldn't be happening
> > ... but no segfault.
>  
> The reason turns out to be that this is a dangling-pointer bug, and I was
> using a memory-clobber-enabled build so it was pretty predictable what the
> pointer would be pointing at. I've got no doubt that hard-to-reproduce
> misbehavior, including segfaults, would ensue without CLOBBER_FREED_MEMORY
> turned on.
>  
> You need this patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=34668c8eca065d745bf1166a92c9efc588e7aee2
>   

thanks a lot 
we applied the patch to our servers and everything looks great so far

cheers Manuel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.

2015-01-16 Thread Andres Freund

Hi Heikki,

On 2014-09-02 21:22:29 +0300, Heikki Linnakangas wrote:
> On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> >   To make the code mentioned above (Patch 0002) tidy, rewrite the
> >   socket emulation code for win32 backends so that each socket
> >   can have its own non-blocking state. (patch 0001)
> 
> The first patch that makes non-blocking sockets behave more sanely on
> Windows seems like a good idea, independently of the second patch. I'm
> looking at the first patch now, I'll make a separate post about the second
> patch.

> On Windows, the backend has an emulation layer for POSIX signals, which uses
> threads and Windows events. The reason win32/socket.c always uses
> non-blocking mode internally is that it needs to wait for the socket to
> become readable/writeable, and for the signal-emulation event, at the same
> time. So no, we can't remove it.
> 
> The approach taken in the first patch seems sensible. I changed it to not
> use FD_SET, though. A custom array seems better, that way we don't need the
> pgwin32_nonblockset_init() call, we can just use initialize the variable.
> It's a little bit more code, but it's well-contained in win32/socket.c.
> Please take a look, to double-check that I didn't screw up.

Heikki, what's your plan about this patch? Do you plan to commit it?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More Norwegian trouble

2015-01-16 Thread Heikki Linnakangas


On 01/16/2015 09:13 AM, Noah Misch wrote:

On Thu, Jan 08, 2015 at 04:37:37PM +0200, Heikki Linnakangas wrote:

setlocale(LC_COLLATE, NULL) -> "Norwegian (Bokmål)_Norway"

but:

setlocale(LC_COLLATE, "norwegian-bokmal_Norway") -> "Norwegian_Norway")



Apparently the behavior changed when I upgraded the toolchain. IIRC, I used
to use "Microsoft Windows SDK 7.1", with "Microsoft Visual C++ Compilers
2010 Standard Edition" that came with it. I'm now using "Microsoft Visual
Studio Community Edition 2013 Update 4", with "Microsoft Visual C++
Compilers 2010 SP Standard". I don't know what part of the upgrade broke
this. Could also have been something else; I don't keep track of my build
environment that carefully.


MSVCR110 (Visual Studio 2012) locale handling departed significantly from that
of its predecessors; see comments at IsoLocaleName().


Now, what should we do about this? I'd like to know if others are seeing
this, with whatever compiler versions you are using.


VS2012 x64 behaves roughly as you describe:

setlocale(LC_COLLATE, NULL)-> "Norwegian 
(Bokmål)_Norway"
setlocale(LC_COLLATE, "norwegian-bokmal_Norway")   -> "Norwegian_Norway.1252"
setlocale(LC_COLLATE, "Norwegian_Norway")  -> "Norwegian_Norway.1252"
setlocale(LC_COLLATE, "Norwegian (Bokmål)_Norway") -> "Norwegian 
(Bokmål)_Norway"

I see the traditional behavior with 64-bit MinGW-w64 (MSVCRT):

setlocale(LC_COLLATE, NULL)  -> "Norwegian (Bokmål)_Norway"
setlocale(LC_COLLATE, "norwegian-bokmal_Norway") -> "Norwegian (Bokmål)_Norway"
setlocale(LC_COLLATE, "Norwegian_Norway")-> "Norwegian (Bokmål)_Norway"


In particular, I wonder
if the builds included in the EnterpriseDB installers are experiencing this.


I strongly suspect those builds use VS2012 for some of the newer branches, so
they will be affected.


Perhaps the nicest fix would be to change the mapping code to map the
problematic locale name to "Norwegian_Norway" instead of "norwegian-bokmal".
That's assuming that it is in fact the same locale, and that it's accepted
on all supported Windows versions.


I bet it is always accepted and always refers to the same locale.  IIRC,
interpretation of these names falls entirely within the CRT.  Windows system
libraries have no concept of these naming schemes.


Ok thanks for checking. I've committed a fix that way, mapping 
"Norwegian (Bokmål)_Norway" to "Norwegian_Norway". The 
"norwegian-bokmal" alias isn't used for anything anymore.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: row_to_array function

2015-01-16 Thread Pavel Stehule

Hi

I am returning back to processing records in plpgsql.

I am thinking so it can be simply processed with transformations to array.

Now we have similar functions - hstore(row), row_to_json, ... but using of
these functions can be a useless step. Any row variable can be transformed
to 2D text array.

There two possible transformations:

row_to_array --> [[key1, value1],[key2, value2], ...]
row_to_row_array --> [(key1, value1), (key2, value2), ... ]

Both transformations can be simply implemented.

Comments, notices?

Regards

Pavel

[HACKERS] proposal: searching in array function - array_position

2015-01-16 Thread Pavel Stehule

Hi

I am proposing a simple function, that returns a position of element in
array.

FUNCTION array_position(anyarray, anyelement) RETURNS int

Implementation is simple (plpgsql code)

CREATE OR REPLACE FUNCTION array_position(anyarray, anyelement)
RETURNS int AS $$
DECLARE i int := 0;
BEGIN
  FOREACH a IN ARRAY $1
  LOOP
IF a = $1 THEN
  RETURN i;
END IF;
i := i + 1;
  END LOOP;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

A possible benefits:

1. speed in plpgsql applications
2. reduced length of SQL functions

Ideas, comments, notices?

Regards

Pavel

[HACKERS] hamerkop is stuck

2015-01-16 Thread Noah Misch

Buildfarm member hamerkop stopped reporting in after commit f6dc6dd.  After
commit 8d9cb0b, it resumed reporting in for 9.3 and earlier branches.  It is
still silent for HEAD and REL9_4_STABLE.  It may have stale processes or stale
lock files requiring manual cleanup.  Would you check it?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-16 Thread Kyotaro HORIGUCHI

I revised the patch so that async scan will be done more
aggressively, and took execution time for two very simple cases.

As the result, simple seq scan gained 5% and hash join of two
foreign tables gained 150%. (2.4 times faster).

While measuring the performance, I noticed that each scan in a
query runs at once rather than alternating with each other in
many cases such as hash join or sorted joins and so. So I
modified the patch so that async fetch is done more
aggressively. The new v4 patch is attached. The following numbers
are taken based on it.


Simple seq scan for the first test.

> CREATE TABLE lt1 (a int, b timestamp, c text);
> CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 
> 'localhost');
> CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 
> 99) a);

On this case, I took the the 10 times average of exec time of the
following query for both master head and patched version.  The
fetch size is 100.

> postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
>  QUERY PLAN  
> --
>  Foreign Scan on ft1  (actual time=0.79 5..4175.706 rows=100 loops=1)
>  Planning time: 0.060 ms
>  Execution time: 4276.043 ms

  master head  : avg = 4256.621,  std dev = 17.099
  patched pgfdw: avg = 4036.463,  std dev =  2.608

The patched version is faster by about 5%. This should be pure
result of asynchronous fetching, not including the effect of
early starting of remote execution in ExecInit.

Interestingly, as fetch_count gets larger, the gain raises in
spite of the decrease of the number of query sending.

  master head  : avg = 2622.759,  std dev = 38.379
  patched pgfdw: avg = 2277.622,  std dev = 27.269

About 15% gain. And for 1,

  master head  : avg = 2000.980,  std dev =  6.434
  patched pgfdw: avg = 1616.793,  std dev = 13.192

19%.. It is natural that exec time reduces along with increase of
fetch size, but I haven't found the reason why the patch's gain
also increases.

==

The second case is a simple join of two foreign tables sharing
one connection.

The master head runs this query in about 16 seconds with almost
no fluctuation among multiple tries.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
>FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
>   QUERY PLAN
> 
>  Hash Join (actual time=7541.831..15924.631 rows=100 loops=1)
>Hash Cond: (x.a = y.a)
>   ->  Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=100 loops=1)
>   ->  Hash (actual time=7539.761..7539.761 rows=100 loops=1)
>Buckets: 32768  Batches: 64  Memory Usage: 2829kB
>->  Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=100 
> loops=1)
>  Planning time: 0.223 ms
>  Execution time: 15973.916 ms

But the v4 patch mysteriously accelerates this query, 6.5 seconds.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
>FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
>QUERY PLAN
> 
>  Hash Join (actual time=2556.977..5812.937 rows=100 loops=1)
>Hash Cond: (x.a = y.a)
>   ->  Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=100 
> loops=1)
>   ->  Hash (actual time=2523.810..2523.810 rows=100 loops=1)
>Buckets: 32768  Batches: 64  Memory Usage: 2829kB
>->  Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=100 
> loops=1)
>  Planning time: 0.220 ms
>  Execution time: 6512.043 ms

The result data seems not broken. I don't know the reason yet but
I'll investigate it.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From edba0530fb6a9c5a4e6def055757d6d60bce9171 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 13 Jan 2015 19:20:35 +0900
Subject: [PATCH] Asynchronous execution of postgres_fdw v4

This is the modified version of Asynchronous execution of
postgres_fdw.

- Do async fetch more aggressively than v3.

- No additional tests yet :(
---
 contrib/postgres_fdw/Makefile   |   2 +-
 contrib/postgres_fdw/PgFdwConn.c| 200 +
 contrib/postgres_fdw/PgFdwConn.h|  61 
 contrib/postgres_fdw/connection.c   |  82 ++-
 contrib/postgres_fdw/postgres_fdw.c | 283 +++-
 contrib/postgres_fdw/postgres_fdw.h |  15 +-
 6 files changed, 527 insertions(+), 116 deletions(-)
 create mode 100644 contrib/postgres_fdw/PgFdwConn.c
 create mode 100644 contrib/postgres_fdw/PgFdwConn.h

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..d0913e2 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib

Re: [HACKERS] VODKA?

2015-01-16 Thread Oleg Bartunov

On Thu, Jan 8, 2015 at 11:20 PM, Jim Nasby  wrote:

> On 1/7/15, 3:26 PM, Arthur Silva wrote:
>
>>
>> On Jan 6, 2015 7:14 PM, "Josh Berkus" > j...@agliodbs.com>> wrote:
>>  >
>>  > Oleg, Teodor:
>>  >
>>  > I take it VODKA is sliding to version 9.6?
>>
>> This is kinda off, but I was wondering if anyone ever considered running
>> a crowd-funding campaign for this sort of feature (aka potentially very
>> popular).
>>
>
> I don't know if Teodor or Oleg are in a position to accept funding, but it
> is an interesting idea. Perhaps it would be useful to try this with a
> different feature.
>

We are on the way to postgres-centric company and we'll continue our
development in that company. We certainly don't need funding for the VODKA,
our national product :)  I thought about crowd-funding, but didn't find any
good cases for products similar to postgres. So, we decided to go
traditional way - built our own company and get money from industry. In
short perspective we delay our development, since many-many other
non-developers business, but in long perspective we'll have more russian
contributors for the postgres project.  Postgres in Russia was popular
database, but now it gets much more interests from business and goverment,
so I'm quite optimistic.

Many years I wanted to see well-visible place, where postgres developers
(companies/individuals)  could share their plans and interests with
time/money estimates.  Then, somehow, companies or individuals could
communicate directly with developers.

-- 
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

90 matches

Mail list logo