Re: [HACKERS] snapshot too old, configured by time

2016-05-02 Thread Bruce Momjian
On Mon, May  2, 2016 at 03:50:36PM -0500, Kevin Grittner wrote:
> >> Also, it seems we have similar behavior already in applying WAL on the
> >> standby --- we delay WAL replay when there is a long-running
> >> transaction.  Once the time expires, we apply the WAL.  Do we cancel the
> >> long-running transaction at that time, or wait for the long-running
> >> transaction to touch some WAL we just applied?  If the former, does
> >> Kevin's new code allow us to do the later?
> >
> > Is this a TODO item?
> 
> I'm not aware of any TODO items existing or needed here.  The
> feature operates by adjusting the xmin used by vacuum and pruning,
> and leaving all the other mechanisms functioning as they were.
> That looked to me like it should interact with replication streams
> correctly.  If someone sees something that needs adjustment please
> speak up Real Soon Now.

My question is whether this method could also be used to avoid read-only
query cancel when we force replay of a conflicting wal record.  Could we
wait for the read-only query to try to _access_ some old data before
cancelling it?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Fabien COELHO


Hello,


I'm unsure about switching enum to #define, could be an enum still with
explicit values set, something like:


An enum doesn't have a benefit for a bitmask imo - you can't "legally"
use it as a type for functions accepting the bitmask.


I do not understand. I suggested to use enum to enumerate the bitmask 
constants, ISTM that it does not preclude to use it as a bitmask as you 
do, it is just a replacement of the #define? The type constraint on the 
enum does not disallow bitmasking values, I checked with both gcc & clang.



I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it
RECREATE really?


No. The relevant explanation is at the top of the file:


[...]


*   -- Optionally, any number of inactive segments of size 0 blocks.
*   Inactive segments are those that once contained data but are currently
*   not needed because of an mdtruncate() operation.  The reason for leaving
*   them present at size zero, rather than unlinking them, is that other
*   backends and/or the checkpointer might be holding open file references 
to
*   such segments.  If the relation expands again after mdtruncate(), such
*   that a deactivated segment becomes active again, it is important that
*   such file references still be valid --- else data might get written
*   out to an unlinked old copy of a segment file that will eventually
*   disappear.


Ok.

Then should it be _OPEN_INACTIVE[TED] or _OPEN_TRUNCATED rather than 
_OPEN_DELETED, which is contradictory?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Bruce Momjian
On Mon, May  2, 2016 at 07:21:21AM -0700, Andres Freund wrote:
> On 2016-05-02 09:03:19 -0400, Robert Haas wrote:
> > On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
> > > Now to continue with the performance benchmarks.  I'm pretty sure
> > > we've fixed the problems when the feature is disabled
> > > (old_snapshot_threshold = -1), and there are several suggestions
> > > for improving performance while it is on that need to be compared
> > > and benchmarked.
> > 
> > If anyone thinks that the issue with the feature disabled is NOT
> > fixed, please speak up!  I'm moving the corresponding open item to
> > CLOSE_WAIT status, meaning that it will be closed if nobody shows up
> > to say that there is still an issue.
> 
> Well, I don't agree that the feature is in a releaseable state. The
> datastructure is pretty much non-scalable, and maintained on the wrong
> side (every read, instead of once in writing writing xacts). There's no
> proposal actually addressing the scalability issues.

I also strongly question whether we should revert this feature and try
again in 9.7.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix assorted inconsistencies in GIN opclass support function dec

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 5:15 PM, Andres Freund  wrote:
> On 2016-05-02 17:12:41 -0400, Robert Haas wrote:
>> On Mon, May 2, 2016 at 5:09 PM, Tom Lane  wrote:
>> > Alexander Korotkov  writes:
>> > But this is likely moot anyway, because of the need to bump all the
>> > contrib modules' versions in order to install parallel-safety labels on
>> > their functions.  (I wonder why that isn't on the open-items list.)
>>
>> Because it was argued by Noah that this was 9.7 work.
>
> Thinking about this again, I think that's not a good approach: It'll
> mean we've to do very similar testing in 9.7 again. I'd rather label as
> much as possible in one release.

Yeah, I somewhat agree, but there's also no patch for this, yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgindent fixups

2016-05-02 Thread Robert Haas
I spent some time going through the output of a trial pgindent run
today.  Some questions/problems:

1. Is pgindent supposed to touch DATA() lines?  Because it does.

2. CustomPathMethods is not in the buildfarm's typedefs.list.  Why not?

I'm attaching a patch that fixes up a few other problems that I found,
which I'll commit if there are not objections.  In detail:

- In contrib/pageinspect/heapfuncs.c, it separates the declaration of
bits_len from the initialization to avoid an awkward line-wrap.

- In src/backend/executor/execParallel.c, it dodges two cases where
pgindent does stupid things with offsetof.  Apparently, pgindent
thinks that you should write "offsetof(a, b) +c" rather than
"offsetof(a, b) + c".  In one case, I talked it out of it by putting
the + at the end of the first line rather than the start of the
continuation line.  The other statement was all on one line so I
changed it to say "c + offsetof(a, b)" instead.

- In nodeAgg.c, to_ts_any.c, and tsvector_op.c, I moved end-of-line
comments to their own separate lines, because they were getting broken
up into multiple lines in ways that seemed awkward.  In tsginidx.c, I
left a similar comment inline but fiddled the whitespace and comment
text to avoid getting a line break in mid-comment.

- In spell.c, I added  markers around a comment to prevent
pgindent from messing with the whitespace (entab still adjusts it, but
that should look the same if you have your tab stops set right).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgindent-cleanup.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About subxact and xact nesting level...

2016-05-02 Thread dandl
> ow...@postgresql.org] On Behalf Of Tom Lane
> > Are there specific requirements or things to do/avoid in order to use
> > subtransactions (at the PL API level)?
> 
> There isn't, currently, any very hard-and-fast rule about what APIs
> extensions should use or not use.  My advice is to borrow freely from
> existing PLs, particularly pl/pgsql.  You might have to change your code
in
> future PG major versions, but that could happen anyway.

I guess you're right. It's not that big, and most of the interesting stuff
seems to be in just a couple of files. And after all, that should be the
gold standard for a PL!

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About subxact and xact nesting level...

2016-05-02 Thread Tom Lane
"dandl"  writes:
> Are there specific requirements or things to do/avoid in order to use
> subtransactions (at the PL API level)?

There isn't, currently, any very hard-and-fast rule about what APIs
extensions should use or not use.  My advice is to borrow freely
from existing PLs, particularly pl/pgsql.  You might have to change
your code in future PG major versions, but that could happen anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 04/27/2016 11:04 PM, Tom Lane wrote:
>> Actually, git_changelog can merge identically-messaged commits despite
>> intervening commits.  It's set up to not merge commits more than 24 hours
>> apart, though.  We could loosen that requirement but I'm afraid it would
>> make things confusing to merge far-apart commits.

> ISTM that git_changelog should be looking at the AuthorDate instead of
> the CommitDate.  Then it would work correctly for backpatches done using
> cherry-pick.

Meh.  That would also make it noticeably *less* accurate for some other
scenarios.  It actually did look at AuthorDate to start with, and we
changed it because we didn't like the results; cf 7299778a9.

Also, IME, backpatching with cherry-pick fails often enough that designing
one's process around it is just asking for pain.  Certainly, many fewer
than half of my own back-patches manage to use cherry-pick.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-02 Thread Peter Eisentraut
On 04/27/2016 11:04 PM, Tom Lane wrote:
>> There is no value in providing exact message matches when the backpatch
>> occurs even after one other commit in the master branch.
> 
> Actually, git_changelog can merge identically-messaged commits despite
> intervening commits.  It's set up to not merge commits more than 24 hours
> apart, though.  We could loosen that requirement but I'm afraid it would
> make things confusing to merge far-apart commits.

ISTM that git_changelog should be looking at the AuthorDate instead of
the CommitDate.  Then it would work correctly for backpatches done using
cherry-pick.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About subxact and xact nesting level...

2016-05-02 Thread dandl
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> >> The file xact.c contains references to sub-transactions (subxact) and
> >> transaction nesting level, but no obvious documentation about what
> >> these correspond to in SQL.
> 
> > Subtransactions are used to implement SAVEPOINT, and also BEGIN blocks
> > with EXCEPTION clauses in plpgsql.
> 
> Yeah.  The implementation is based on nested subtransactions, and that
> concept also applies pretty directly to, eg, BEGIN/EXCEPT blocks in
plpgsql.
> But what's exposed to SQL is SAVEPOINT/RELEASE SAVEPOINT/ ROLLBACK TO
> SAVEPOINT, because those operations are what the standard specifies.  If
you
> hold your head at the correct angle you can see those as nested
> subtransactions, but it's not exactly obvious --- mainly because RELEASE
and
> ROLLBACK can exit multiple levels of nested subtransaction in one command.

Yes, that answers the question.

What now concerns me is that access to these capability seems to require
calling these three 'internal' functions, for which it's hard to determine
the prerequisites. The SPI interface is well-documented and the conversion
functions I'm using are not stateful and look pretty safe. I've got the
process model figured out well enough, but the transaction model is not so
easy. State looks important; so does memory management.

Are there specific requirements or things to do/avoid in order to use
subtransactions (at the PL API level)?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-02 Thread Andres Freund
Hi,

some of the review items here are mere matters of style/preference. Feel
entirely free to discard them, but I thought if I'm going through the
change anyway...


On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
> a892234 Change the format of the VM fork to add a second bit per page.

TL;DR: fairly minor stuff.


+ * heap_tuple_needs_eventual_freeze
+ *
+ * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
+ * will eventually require freezing.  Similar to heap_tuple_needs_freeze,
+ * but there's no cutoff, since we're trying to figure out whether freezing
+ * will ever be needed, not whether it's needed now.
+ */
+bool
+heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)

Wouldn't redefining this to heap_tuple_is_frozen() and then inverting the
checks be easier to understand?


+   /*
+* If xmax is a valid xact or multixact, this tuple is also not frozen.
+*/
+   if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
+   {
+   MultiXactId multi;
+
+   multi = HeapTupleHeaderGetRawXmax(tuple);
+   if (MultiXactIdIsValid(multi))
+   return true;
+   }

Hm. What's the test inside the if() for? There shouldn't be any case
where xmax is invalid if HEAP_XMAX_IS_MULTI is set.   Now there's a
check like that outside of this commit, but it seems strange to me
(Alvaro, perhaps you could comment on this?).


+ *
+ * Clearing both visibility map bits is not separately WAL-logged.  The callers
  * must make sure that whenever a bit is cleared, the bit is cleared on WAL
  * replay of the updating operation as well.

I think including "both" here makes things less clear, because it
differentiates clearing one bit from clearing both. There's no practical
differentce atm, but still.

  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
- * The visibility map is not used for anti-wraparound vacuums, because
- * an anti-wraparound vacuum needs to freeze tuples and observe the latest xid
- * present in the table, even on pages that don't have any dead tuples.
  *

I think the remaining sentence isn't entirely accurate, there's now more
than one bit, and they're different with regard to scan_all/!scan_all
vacuums (or will be - maybe this updated further in a later commit? But
if so, that sentence shouldn't yet be removed...).


-
-/* Number of heap blocks we can represent in one byte. */
-#define HEAPBLOCKS_PER_BYTE 8
-

Hm, why was this moved to the header? Sounds like something the outside
shouldn't care about.


#define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)

Hm. This isn't really a mapping to an individual bit anymore - but I
don't really have a better name in mind. Maybe TO_OFFSET?


+static const uint8 number_of_ones_for_visible[256] = {
...
+};
+static const uint8 number_of_ones_for_frozen[256] = {
...
 };

Did somebody verify the new contents are correct?


/*
- * visibilitymap_clear - clear a bit in visibility map
+ * visibilitymap_clear - clear all bits in visibility map
  *

This seems rather easy to misunderstand, as this really only clears all
the bits for one page, not actually all the bits.



  * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
- * releasing *buf after it's done testing and setting bits.
+ * releasing *buf after it's done testing and setting bits, and must pass flags
+ * for which it needs to check the value in visibility map.
  *
  * NOTE: This function is typically called without a lock on the heap page,
  * so somebody else could change the bit just after we look at it.  In fact,
@@ -327,17 +351,16 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, 
Buffer heapBuf,

I'm not seing what flags the above comment change is referring to?


/*
-* A single-bit read is atomic.  There could be memory-ordering effects
+* A single byte read is atomic.  There could be memory-ordering effects
 * here, but for performance reasons we make it the caller's job to 
worry
 * about that.
 */
-   result = (map[mapByte] & (1 << mapBit)) ? true : false;
-
-   return result;
+   return ((map[mapByte] >> mapBit) & VISIBILITYMAP_VALID_BITS);
 }

Not a new issue, and *very* likely to be irrelevant in practice (given
the value is only referenced once): But there's really no guarantee
map[mapByte] is only read once here.


-BlockNumber
-visibilitymap_count(Relation rel)
+void
+visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber 
*all_frozen)

Not really a new issue again: The parameter types (previously return
type) to this function seem wrong to me.



@@ -1934,5 +1992,14 @@ heap_page_is_all_visible(Relation rel, Buffer buf, 
TransactionId *visibility_cut
} 
+   /*
+* We don't bother clearing *all_frozen when the page is discovered 

Re: [HACKERS] Is pg_control file crashsafe?

2016-05-02 Thread Andres Freund
Hi,

On 2016-04-28 21:58:00 +, Alex Ignatov wrote:
> We have some issue with truncated pg_control file on Windows after
> power failure.My questions is : 1) Is pg_control protected from say ,
> power crash or partial write?

It should be. I think to make progress on this thread we're going to
need a bit more details about the exact corruption. Was the length of
the file change? Did the checksum fail? Did you just observe too old
contents?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-02 Thread Tom Lane
Alex Ignatov  writes:
> I think that rename can help a little bit. At least on some FS it is 
> atomic operation.

Writing a single sector ought to be atomic too.  I'm very skeptical that
it'll be an improvement to just move the risk from one filesystem
operation to another; especially not to one where there's not even a
terribly portable way to request fsync.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Andres Freund
Hi,

On 2016-05-03 00:05:35 +0200, Fabien COELHO wrote:
> Maybe consider checking for the exclusivity explicitely?

I thought about it, and decided it's not worth it.  Requiring one of
those to be specified seems stringent enough.

> I'm unsure about switching enum to #define, could be an enum still with
> explicit values set, something like:
> 
>   enum {
> EXTENSION_RETURN_NULL = (1 << 0),
> ...
>   } extension_behavior;

An enum doesn't have a benefit for a bitmask imo - you can't "legally"
use it as a type for functions accepting the bitmask.

> I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it
> RECREATE really?

No. The relevant explanation is at the top of the file:
 *  On disk, a relation must consist of consecutively numbered segment
 *  files in the pattern
 *  -- Zero or more full segments of exactly RELSEG_SIZE blocks each
 *  -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE 
blocks
 *  -- Optionally, any number of inactive segments of size 0 blocks.
 *  The full and partial segments are collectively the "active" segments.
 *  Inactive segments are those that once contained data but are currently
 *  not needed because of an mdtruncate() operation.  The reason for leaving
 *  them present at size zero, rather than unlinking them, is that other
 *  backends and/or the checkpointer might be holding open file references 
to
 *  such segments.  If the relation expands again after mdtruncate(), such
 *  that a deactivated segment becomes active again, it is important that
 *  such file references still be valid --- else data might get written
 *  out to an unlinked old copy of a segment file that will eventually
 *  disappear.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Fabien COELHO


Hello Andres,


I'm not sure this is the best way to go about this.  I can see valid
arguments for *always* using _mdfd_openseg() in mdsync(); and I'm
wondering whether we shouldn't make EXTENSION_* into a bitmask
(extend,extend_recovery,return_null,open_deleted).


I thought about that when I looked at the previous fix, but it seemed that
not all combinations made sense.


Sure, but that's nothing unusual.  Here's an attempt at doing so - not
fully polished, just as a discussion point. I think it looks better.
Fabien, Robert, what do you think?


My 0,02€.

Not tested, just a few comments on the patch from someone which does not 
understand this API deep down... Nevertheless:


I agree that it is looks better than "EXTENSION_REALLY_RETURNS_NULL", that 
I did not like much.


There are 3 possible behaviors on extension, but coding them as bits does 
not make their exclusivity clear. Now mixing numbers & bits does not seem 
advisable either.


Maybe consider checking for the exclusivity explicitely?

  EXTENSION_BEHAVIORS = (EXTENSION_RETURN_NULL | ..._FAIL | ..._CREATE);

And then the Assert can check for the exclusivity:

  int behavior = option & EXTENSION_BEHAVIORS;
  Assert( (behavior == EXTENSION_RETURN_NULL) ||
  (behavior == ..._FAIL) ||
  (behavior == ..._CREATE));

I'm unsure about switching enum to #define, could be an enum still with 
explicit values set, something like:


  enum {
EXTENSION_RETURN_NULL = (1 << 0),
...
  } extension_behavior;

I'm fuzzy about the _OPEN_DELETED part because it is an oxymoron. Is it 
RECREATE really?


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-02 Thread David Fetter
On Mon, May 02, 2016 at 01:58:11PM -0400, Tom Lane wrote:
> I wrote:
> > I think we'd be better off to rename these to tsvector_delete()
> > and tsvector_filter() while we still can.
> 
> ... although I now notice that hstore already exposes a function
> named delete(), so that ship may have sailed already.  But I'm more
> troubled by filter() anyhow, since that keyword can appear in
> expressions --- it seems much more likely that that would pose a
> parsing conflict after future SQL extensions.

I suspect that steering that ship would be a good idea starting with
deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is pg_control file crashsafe?

2016-05-02 Thread Alex Ignatov



On 01.05.2016 0:55, Bruce Momjian wrote:

On Thu, Apr 28, 2016 at 09:58:00PM +, Alex Ignatov wrote:

Hello everyone!
We have some issue with truncated pg_control file on Windows after power
failure.
My questions is :
1) Is pg_control protected from say , power crash or partial write?
2) How PG update pg_control? By writing in it or writing in some temp file and
after that rename it to pg_control to be atomic?

We write pg_controldata in one write() OS call:

 if (write(fd, buffer, PG_CONTROL_SIZE) != PG_CONTROL_SIZE)


3) Can PG have  multiple pg_control copy to be more fault tolerant?

PS During some experiments we found that at present time there is no any method
to do crash recovery with "restored" version of pg_control (based on some
manipulations with pg_resetxlog ).
  Only by using pg_resetxlog and setting it parameters to values taken from wal
file (pg_xlogdump)we can at least start PG and saw that PG state is at the
moment of last check point. But we have no real confidence that PG is in
consistent state(also docs on pg_resetxlogs told us about it too)

We have talked about improving the reliability of pg_control, but
failures are so rare we have never done anything to improve it.  I know
Tatsuo has talked about making pg_control more reliable, so I am CC'ing
him.


Oh! Good. Thank you!
It is rare but as we saw now it is our reality too. One of our customers 
had this issue on previous week =)


I think that rename can help a little bit. At least on some FS it is 
atomic operation.


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reviewing freeze map code

2016-05-02 Thread Andres Freund
Hi,

The freeze map changes, besides being very important, seem to be one of
the patches with a high risk profile in 9.6.  Robert had asked whether
I'd take a look.  I thought it'd be a good idea to review that while
running tests for
http://www.postgresql.org/message-id/CAMkU=1w85Dqt766AUrCnyqCXfZ+rsk1witAc_=v5+pce93s...@mail.gmail.com

For starters, I'm just going through the commits. It seems the relevant
pieces are:

a892234 Change the format of the VM fork to add a second bit per page.
77a1d1e Department of second thoughts: remove PD_ALL_FROZEN.
fd31cd2 Don't vacuum all-frozen pages.
7087166 pg_upgrade: Convert old visibility map format to new format.
ba0a198 Add pg_visibility contrib module.

did I miss anything important?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and toasted pg_largeobject

2016-05-02 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> On Mon, May 2, 2016 at 12:30 PM, Alvaro Herrera
>>  wrote:
>>> A customer of ours was unable to pg_upgrade a database, with this error:
 old and new databases "postgres" have a mismatched number of relations
>>> After some research, it turned out that pg_largeobject had acquired a
>>> toast table.

>> I think that if you use -O, and it breaks something, you get to keep
>> both pieces.

> I'm happy with the solution that pg_upgrade has a step in the check
> stage that says "catalog XYZ has a toast table but shouldn't, aborting
> the upgrade".  (Well, not _happy_, but at least it's a lot easier to
> diagnose).

I agree with Robert that this is not something we should expend a huge
amount of effort on, but I also agree that that error message is
inadequate if the situation is not a nobody-could-ever-hit-this case.

I think though that you're defining the problem too narrowly by putting
forward a solution that would result in an error message like that.
If we're going to do anything here at all, I think the code should emit
a list of the names of relations that it was unable to match up, rather
than trying (and likely failing) to be smart about why.  To take just
one reason why, the issue might be that there were too many rels in
the new installation, not too few.

The matching probably does need to be smart about toast tables to the
extent of regarding them as "toast table belonging to relation FOO",
since just printing their actual names would be unhelpful in most cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old_snapshot_threshold's interaction with hash index

2016-05-02 Thread Kevin Grittner
On Sun, May 1, 2016 at 1:43 AM, Amit Kapila  wrote:
> On Sun, May 1, 2016 at 12:05 PM, Amit Kapila  wrote:
>>
>> Currently we do the test for old snapshot (TestForOldSnapshot) for hash
>> indexes while scanning them.  Does this test makes any sense for hash
>> indexes considering LSN on hash index will always be zero (as hash indexes
>> are not WAL-logged)?  It seems to me that PageLSN check in
>> TestForOldSnapshot() will always return false which means that the error
>> "snapshot too old" won't be generated for hash indexes.
>>
>> Am I missing something here, if not, then I think we need a way to
>> prohibit pruning for hash indexes based on old_snapshot_threshold?
>
> What I mean to say here is prohibit pruning the relation which has hash
> index based on old_snapshot_threshold.

Good spot; added to the open issues page.

Thanks!

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-05-02 Thread Kevin Grittner
On Sun, May 1, 2016 at 11:54 PM, Bruce Momjian  wrote:
> On Sat, Apr 23, 2016 at 10:20:19AM -0400, Bruce Momjian wrote:
>> On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
>>> On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian  wrote:

 I kind of agreed with Tom about just aborting transactions that held
 snapshots for too long, and liked the idea this could be set per
 session, but the idea that we abort only if a backend actually touches
 the old data is very nice.  I can see why the patch author worked hard
 to do that.
>
> As I understand it, a transaction trying to access a shared buffer
> aborts if there was a cleanup on the page that removed rows it might be
> interested in.  How does this handle cases where vacuum removes _pages_
> from the table?

(1)  When the "snapshot too old" feature is enabled
(old_snapshot_threshold >= 0) relations are not truncated, so pages
cannot be removed that way.  This mainly protects against a seq
scan having the problem you describe.

(2)  Other than a seq scan, you could miss a page when descending
through an index or following sibling pointers within an index.  In
either case you can't remove a page without modifying the page
pointing to it to no longer do so, so the modified LSN on the
parent or sibling will trigger the error.

Note that a question has recently been raised regarding hash
indexes (which should perhaps be generalized to any non-WAL-logged
index on a permanent table.  Since that is a correctness issue,
versus a performance issue affecting only how many people will find
the feature useful, I will add that to the release blockers list
and prioritize it ahead of those issues only affecting how many
people will find it useful.

> Does vacuum avoid this when there are running transactions?

I'm not sure I understand the question.

>> Also, it seems we have similar behavior already in applying WAL on the
>> standby --- we delay WAL replay when there is a long-running
>> transaction.  Once the time expires, we apply the WAL.  Do we cancel the
>> long-running transaction at that time, or wait for the long-running
>> transaction to touch some WAL we just applied?  If the former, does
>> Kevin's new code allow us to do the later?
>
> Is this a TODO item?

I'm not aware of any TODO items existing or needed here.  The
feature operates by adjusting the xmin used by vacuum and pruning,
and leaving all the other mechanisms functioning as they were.
That looked to me like it should interact with replication streams
correctly.  If someone sees something that needs adjustment please
speak up Real Soon Now.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Andres Freund
On 2016-05-02 22:00:14 +0200, Fabien COELHO wrote:
> I'm wondering that if _mdfd_openseg may return NULL, then ISTM that
> "opened_directly" should logically be false, because it was not opened?
> 
> If so, maybe consider something like:
> 
>   opened_directy = (seg != NULL);

Hm, don't care either way. Seems just as valid to understand it as the
attempt to directly open the segment.


> Also, I do not understand why this issue is raised by the flushing patch, it
> seems rather independent.

It's not the flushing itself, it's 72a98a639574d2e25ed94652848555900c81a799


> >I'm not sure this is the best way to go about this.  I can see valid
> >arguments for *always* using _mdfd_openseg() in mdsync(); and I'm
> >wondering whether we shouldn't make EXTENSION_* into a bitmask
> >(extend,extend_recovery,return_null,open_deleted).
> 
> I thought about that when I looked at the previous fix, but it seemed that
> not all combinations made sense.

Sure, but that's nothing unusual.  Here's an attempt at doing so - not
fully polished, just as a discussion point. I think it looks better.
Fabien, Robert, what do you think?

Greetings,

Andres Freund
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2981b41..f0c557f 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -163,23 +163,27 @@ static CycleCtr mdsync_cycle_ctr = 0;
 static CycleCtr mdckpt_cycle_ctr = 0;
 
 
-typedef enum	/* behavior for mdopen & _mdfd_getseg */
-{
-	/* ereport if segment not present, create in recovery */
-	EXTENSION_FAIL,
-	/* return NULL if not present, create in recovery */
-	EXTENSION_RETURN_NULL,
-	/* return NULL if not present */
-	EXTENSION_REALLY_RETURN_NULL,
-	/* create new segments as needed */
-	EXTENSION_CREATE
-} ExtensionBehavior;
+/*** behavior for mdopen & _mdfd_getseg ***/
+/* ereport if segment not present */
+#define EXTENSION_FAIL(1 << 0)
+/* return NULL if segment not present */
+#define EXTENSION_RETURN_NULL		(1 << 1)
+/* create new segments as needed */
+#define EXTENSION_CREATE			(1 << 2)
+/* create new segments if needed during recovery */
+#define EXTENSION_CREATE_RECOVERY	(1 << 3)
+/*
+ * Allow opening deleted segments. Note that this is breaks mdnblocks() and
+ * related functionality - which currently is ok, because this is only
+ * required in the checkpointer which never uses mdnblocks().
+ */
+#define EXTENSION_OPEN_DELETED		(1 << 4)
+
 
 /* local routines */
 static void mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum,
 			 bool isRedo);
-static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum,
-	   ExtensionBehavior behavior);
+static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum, int behavior);
 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
 	   MdfdVec *seg);
 static void register_unlink(RelFileNodeBackend rnode);
@@ -189,7 +193,7 @@ static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
 static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forkno,
 			  BlockNumber segno, int oflags);
 static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forkno,
-			 BlockNumber blkno, bool skipFsync, ExtensionBehavior behavior);
+			 BlockNumber blkno, bool skipFsync, int behavior);
 static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
 		   MdfdVec *seg);
 
@@ -570,7 +574,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  * invent one out of whole cloth.
  */
 static MdfdVec *
-mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)
+mdopen(SMgrRelation reln, ForkNumber forknum, int behavior)
 {
 	MdfdVec*mdfd;
 	char	   *path;
@@ -596,8 +600,7 @@ mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)
 			fd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, 0600);
 		if (fd < 0)
 		{
-			if ((behavior == EXTENSION_RETURN_NULL ||
- behavior == EXTENSION_REALLY_RETURN_NULL) &&
+			if ((behavior & EXTENSION_RETURN_NULL) &&
 FILE_POSSIBLY_DELETED(errno))
 			{
 pfree(path);
@@ -691,7 +694,7 @@ mdwriteback(SMgrRelation reln, ForkNumber forknum,
 	segnum_end;
 
 		v = _mdfd_getseg(reln, forknum, blocknum, false,
-		 EXTENSION_REALLY_RETURN_NULL);
+		 EXTENSION_RETURN_NULL);
 
 		/*
 		 * We might be flushing buffers of already removed relations, that's
@@ -737,7 +740,8 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		reln->smgr_rnode.node.relNode,
 		reln->smgr_rnode.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL);
+	v = _mdfd_getseg(reln, forknum, blocknum, false,
+	 EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
 
 	seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -812,7 +816,8 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 		 reln->smgr_rnode.node.relNode,
 		 reln->smgr_rnode.backend);
 
-	v = _mdfd_getseg(reln, forknum, 

[HACKERS] Pg_stop_backup process does not run - Backup Intervals

2016-05-02 Thread Rodrigo Cavalcante
Hi,

On alternate days my backup is failing, by the pg_stop_backup process ()
does not perform or quit.

Version PostgreSQL: 9.1.6

The following backup script:

PGDATA=/dados
SAVE_BASE_DIR=/backup/diario
backup="'bkpfull'"
data=$(date +'%d%m%y')

WAL_DIR=/backup/archive

mv $WAL_DIR/* $WAL_DIR/wal_anterior

psql -U postgres -h localhost -c 'select pg_start_backup('$backup',true);'
template1 && \

tar -czvf $SAVE_BASE_DIR/$data.tar.gz $PGDATA && psql -U postgres -h
localhost -c 'select pg_stop_backup();' template1



--
View this message in context: 
http://postgresql.nabble.com/Pg-stop-backup-process-does-not-run-Backup-Intervals-tp5901538.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and toasted pg_largeobject

2016-05-02 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, May 2, 2016 at 12:30 PM, Alvaro Herrera
>  wrote:
> > A customer of ours was unable to pg_upgrade a database, with this error:
> >
> >   old and new databases "postgres" have a mismatched number of relations
> >   Failure, exiting
> >
> > After some research, it turned out that pg_largeobject had acquired a
> > toast table.  After some more research, we determined that it was
> > because right after initdb of the old database (months or years prior)
> > they moved pg_largeobject to another, slower tablespace, because for
> > their case it is very bulky and not used as much as the other data.
> > (This requires restarting postmaster with the -O parameter).

> I think that if you use -O, and it breaks something, you get to keep
> both pieces.  pg_largeobject is a big problem, and we should replace
> it with something better.  And maybe in the meantime we should support
> moving it to a different tablespace.  But if it's not officially
> supported and you do it anyway, I don't think it's pg_upgrade's job to
> cope.

I'm happy with the solution that pg_upgrade has a step in the check
stage that says "catalog XYZ has a toast table but shouldn't, aborting
the upgrade".  (Well, not _happy_, but at least it's a lot easier to
diagnose).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and toasted pg_largeobject

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 12:30 PM, Alvaro Herrera
 wrote:
> A customer of ours was unable to pg_upgrade a database, with this error:
>
>   old and new databases "postgres" have a mismatched number of relations
>   Failure, exiting
>
> After some research, it turned out that pg_largeobject had acquired a
> toast table.  After some more research, we determined that it was
> because right after initdb of the old database (months or years prior)
> they moved pg_largeobject to another, slower tablespace, because for
> their case it is very bulky and not used as much as the other data.
> (This requires restarting postmaster with the -O parameter).
>
> While I understand that manual system catalog modifications are frowned
> upon, it seems to me that we should handle this better.  The failure is
> very confusing and hard to diagnose, and hard to fix also.

I think that if you use -O, and it breaks something, you get to keep
both pieces.  pg_largeobject is a big problem, and we should replace
it with something better.  And maybe in the meantime we should support
moving it to a different tablespace.  But if it's not officially
supported and you do it anyway, I don't think it's pg_upgrade's job to
cope.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Fabien COELHO


Hello Andres,


Sure, attached.


For what it is worth, it works for me on head.

I'm wondering that if _mdfd_openseg may return NULL, then ISTM that 
"opened_directly" should logically be false, because it was not opened?


If so, maybe consider something like:

opened_directy = (seg != NULL);

Now it does not change the result because the later code seems garded 
against a NULL seg, but it does not look right to have a boolean to say 
that a segment was opened if it was not indeed the case.


Given the comments, I understand that the truncation implementation is a 
shortcut with a sting, as a lot of functions must then take into account 
that something unusual may have happen somewhere and deal with it...


Also, I do not understand why this issue is raised by the flushing patch, 
it seems rather independent.



I'm not sure this is the best way to go about this.  I can see valid
arguments for *always* using _mdfd_openseg() in mdsync(); and I'm
wondering whether we shouldn't make EXTENSION_* into a bitmask
(extend,extend_recovery,return_null,open_deleted).


I thought about that when I looked at the previous fix, but it seemed that 
not all combinations made sense.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More inaccurate results from numeric pow()

2016-05-02 Thread Dean Rasheed
On 2 May 2016 at 19:40, Robert Haas  wrote:
> On Mon, May 2, 2016 at 1:02 PM, Dean Rasheed  wrote:
>> Doing some more testing of the numeric code patched in [1] I noticed
>> another case where the result is inaccurate -- computing 0.12 ^
>> -2345.6 gives a very large number containing 2162 digits, but only the
>> first 2006 correct, while the last 156 digits are wrong.
>
> Just out of curiosity, how can you tell?  Where do you get alternate
> output to compare against?
>

The easiest way is to use bc, although it's pretty slow for this kind of thing.


> Also, I wonder what we think the contract with the user is in cases
> like this.

My expectation is that the numeric computations should generally
produce results that are correct in all, or nearly all, digits
returned. This is commonly expressed in terms of ULP's --
https://en.wikipedia.org/wiki/Unit_in_the_last_place. An error of a
few ULP's may be OK, but I don't think hundreds of ULP's is OK.
Actually I think this particular issue is mostly of academic interest,
but we went to some effort to get accurate results in the previous
patch, and this is just closing a loophole to hopefully complete that
work.


  Surely, if we were dealing with floating point numbers,
> nobody would expect a calculation like this to be accurate beyond the
> first n digits, where n is surely much less than 2006.  I like the
> fact that numeric has a lot more precision than any built-in floating
> point type, but does it have to get every digit in front of the
> decimal point exactly right no matter how many there are?
>

I would say it should come close. Otherwise we're just returning a lot
of noise. Note that there is a limit to how many digits we will ever
return, so this is manageable.


> rhaas=# select tan(pi()::numeric/2), tan(pi()/2);
>tan   | tan
> -+--
>  618986325617924 | 1.63312393531954e+16
> (1 row)
>

That doesn't prove anything, since we haven't implemented tan(numeric)
(and I don't plan to), so the above is actually

select tan((pi()::numeric/2)::float8), tan(pi()/2);

and it's not surprising that the 2 results are wildly different (and
infinitely far away from the correct answer). It's using tan(float8)
in both cases, just with slightly different float8 inputs. There's not
really any reason to expect particularly accurate results from float8
functions in cases like this.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread David G. Johnston
On Mon, May 2, 2016 at 12:14 PM, Robert Haas  wrote:

> On Mon, May 2, 2016 at 2:44 PM, David G. Johnston
>  wrote:
> > Does this apply to the extent that a value of 1 is likely worse than 0
> since
> > the leader is now tasked with accumulating but there is only one process
> > actually working to provide the leader data?
>
> I don't know what that means, but it doesn't work like that.  If the
> worker can't generate data fast enough, the leader will also run the
> parallel portion of the plan.  So 1 is unlikely to be worse than 0; in
> fact it's often a lot better.
>
> > I'm inclined to accept max_parallel_workers where a value of 0 means no
> > parallelism and the non-zero counts indicate the number of workers in
> > addition to the required leader.
>
> That's how it works now.
>

​
​Ok, that's basically what I was trying to figure out.  Whether the leader
will take on the role of "worker" at lower levels of "parallelism" since if
it was dedicated to only aggregating data from other workers it would be a
net loss if only one other worker existed.
​


>
> > Though that does suggest "additional" as a valid option.  Something like
> > "max_additional_workers".  Just how overloaded is the term "worker".  If
> > worker is understood to mean "a process which implements execution of
> [part
> > of] a query plan" the word additional leaves no ambiguity as to where the
> > leader is accounted for.
> >
> > It does significantly reduce grep-ability though :(
> >
> > max_additional_parallel_workers...
>
> I don't think that it's likely to be very clear what "additional"
> refers to in this context.
>
>
​I'm not sure how "what" could be interpreted as anything other than
"parallel worker"​...which I presumed is recognized by the user otherwise
"max_parallel_workers" itself exhibits the same problem.

I could see how code usage could be annoying, having to always "+1" the
value to get total number of potential workers, but the UI, for me, removes
any question of of whether the leader is counted in the set of "parallel
workers".

David J.


Re: [HACKERS] Timeline following for logical slots

2016-05-02 Thread Alvaro Herrera
Andres Freund wrote:

> -   /* oldest LSN that the client has acked receipt for */
> +   /*
> +* Oldest LSN that the client has acked receipt for.  This is used as the
> +* start_lsn point in case the client doesn't specify one, and also as a
> +* safety measure to back off in case the client specifies a start_lsn
> +* that's further in the future than this value.
> +*/
> XLogRecPtr  confirmed_flush;
> 
> This is the wrong way round. confirmed_flush is used if the client's
> start_lsn is further in the *past* than this value.

Bah.  Funnily enough I got this one right in the comment for
CreateDecodingContext.

Thanks for reading through the commit.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 3:18 PM, Christoph Berg  wrote:
> Re: Robert Haas 2016-05-02 
> 
>> max_parallel_degree -> max_parallel_workers
>> parallel_degree -> parallel_workers
>>
>> I would prefer to keep it as "degree".  It's a reasonable term of art,
>> and it also improves grep-ability.  But I'm willing to go do the above
>> renaming if there is a clear consensus behind it.  Alternatively, I'm
>> willing to make it 1-based rather than 0-based if there is a clear
>> consensus on that option, though unsurprisingly I prefer it the way it
>> is now.  Do we have such a consensus?
>
> Fwiw the one thing I remember from when I read first about the feature
> was a big "wtf if I set that to 1, I'll actually get 2 processes?". So
> +1 on doing *something* about it.

To be clear, you'll get 1 new process in addition to the 1 that is
already running.  But vote noted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-05-02 Thread Andres Freund
On 2016-05-02 16:12:53 -0300, Alvaro Herrera wrote:
> I pushed a fix to some comments, including the ones being discussed in
> this subthread, which should hopefully close things here.
> 
> I'm now going to go over Craig's pg_recvlogical changes and the proposed
> for that problem.


-   /* oldest LSN that the client has acked receipt for */
+   /*
+* Oldest LSN that the client has acked receipt for.  This is used as the
+* start_lsn point in case the client doesn't specify one, and also as a
+* safety measure to back off in case the client specifies a start_lsn
+* that's further in the future than this value.
+*/
XLogRecPtr  confirmed_flush;

This is the wrong way round. confirmed_flush is used if the client's
start_lsn is further in the *past* than this value.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread Christoph Berg
Re: Robert Haas 2016-05-02 

> max_parallel_degree -> max_parallel_workers
> parallel_degree -> parallel_workers
> 
> I would prefer to keep it as "degree".  It's a reasonable term of art,
> and it also improves grep-ability.  But I'm willing to go do the above
> renaming if there is a clear consensus behind it.  Alternatively, I'm
> willing to make it 1-based rather than 0-based if there is a clear
> consensus on that option, though unsurprisingly I prefer it the way it
> is now.  Do we have such a consensus?

Fwiw the one thing I remember from when I read first about the feature
was a big "wtf if I set that to 1, I'll actually get 2 processes?". So
+1 on doing *something* about it.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-05-02 Thread Robert Haas
On Sat, Apr 30, 2016 at 1:35 PM, Tom Lane  wrote:
> Julien Rouhaud  writes:
>> On 29/04/2016 18:05, Tom Lane wrote:
>>> Julien Rouhaud  writes:
 The segfault is caused by quals_match_foreign_key() calling get_leftop()
 and get_rightop() on a ScalarArrayOpExpr node.

 I'm not sure that assuming this compatibility is the right way to fix
 this though.
>
>>> It certainly isn't.
>
>> Agreed. New attached patch handles explicitly each node tag.
>
> No, this is completely nuts.  The problem is that quals_match_foreign_key
> is simply assuming that every clause in the list it's given is an OpExpr,
> which is quite wrong.  It should just ignore non-OpExpr quals, since it
> cannot do anything useful with them anyway.  There's a comment claiming
> that non-OpExpr quals were already rejected:
>
>  * Here since 'usefulquals' only contains bitmap indexes for quals
>  * of type "var op var" we can safely skip checking this.
>
> but that doesn't appear to have anything to do with current reality.
>
> While this in itself is about a two-line fix, after reviewing
> 137805f89acb3611 I'm pretty unhappy that it got committed at all.
> I think this obvious bug is a good sign that it wasn't ready.
> Other unfinished aspects like invention of an undocumented GUC
> don't leave a good impression either.
>
> Moreover, it looks to me like this will add quite a lot of overhead,
> probably far more than is justified, because clauselist_join_selectivity
> is basically O(N^2) in the relation-footprint of the current join --- and
> not with a real small multiplier, either, as the functions it calls
> contain about four levels of nested loops themselves.  Maybe that's
> unmeasurable on trivial test cases but it's going to be disastrous in
> large queries, or for relations with large numbers of foreign keys.
>
> I think this should be reverted and pushed out to 9.7 development.
> It needs a significant amount of rewriting to fix the performance
> issue, and now's not the time to be doing that.

If this gets reverted, then probably
015e88942aa50f0d419ddac00e63bb06d6e62e86 should also be reverted.

We need to make some decisions here PDQ.  I haven't had time to look
at this issue in any technical detail yet.  Simon, anyone else, please
weigh in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 2:44 PM, David G. Johnston
 wrote:
> Does this apply to the extent that a value of 1 is likely worse than 0 since
> the leader is now tasked with accumulating but there is only one process
> actually working to provide the leader data?

I don't know what that means, but it doesn't work like that.  If the
worker can't generate data fast enough, the leader will also run the
parallel portion of the plan.  So 1 is unlikely to be worse than 0; in
fact it's often a lot better.

> I'm inclined to accept max_parallel_workers where a value of 0 means no
> parallelism and the non-zero counts indicate the number of workers in
> addition to the required leader.

That's how it works now.

> Though that does suggest "additional" as a valid option.  Something like
> "max_additional_workers".  Just how overloaded is the term "worker".  If
> worker is understood to mean "a process which implements execution of [part
> of] a query plan" the word additional leaves no ambiguity as to where the
> leader is accounted for.
>
> It does significantly reduce grep-ability though :(
>
> max_additional_parallel_workers...

I don't think that it's likely to be very clear what "additional"
refers to in this context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-05-02 Thread Alvaro Herrera
I pushed a fix to some comments, including the ones being discussed in
this subthread, which should hopefully close things here.

I'm now going to go over Craig's pg_recvlogical changes and the proposed
for that problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] max_worker_processes missing some documentation

2016-05-02 Thread Julien Rouhaud
I noticed that postgresql.conf.sample doesn't say that changing
max_worker_processes requires a restart.

Patch attached.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e6c6591..2d2db7e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -166,7 +166,7 @@
 # - Asynchronous Behavior -
 
 #effective_io_concurrency = 1		# 1-1000; 0 disables prefetching
-#max_worker_processes = 8
+#max_worker_processes = 8		# (change requires restart)
 #max_parallel_degree = 2		# max number of worker processes per node
 #old_snapshot_threshold = -1		# 1min-60d; -1 disables; 0 is immediate
 	# (change requires restart)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread David G. Johnston
On Sun, Apr 24, 2016 at 8:01 PM, Robert Haas  wrote:

>
> Of course, we could make this value 1-based rather than 0-based, as
> Peter Geoghegan suggested a while back.  But as I think I said at the
> time, I think that's more misleading than helpful.  The leader
> participates in the parallel plan, but typically does far less of the
> work beneath the Gather node than the other nodes involved in the
> query, often almost none.  In short, the leader is special.
> Pretending that it's just another process involved in the parallel
> group isn't doing anyone a favor.
>

​Does this apply to the extent that a value of 1 is likely worse than 0
since the leader is now tasked with accumulating but there is only one
process actually working to provide the leader data?

I'm inclined to accept max_parallel_workers where a value of 0 means no
parallelism and the non-zero counts indicate the number of workers in
addition to the required leader.

Though that does suggest "additional" as a valid option.  Something like
"max_additional_workers".  Just how overloaded is the term "worker".  If
worker is understood to mean "a process which implements execution of [part
of] a query plan" the word additional leaves no ambiguity as to where the
leader is accounted for.

​It does significantly reduce grep-ability though :(

​max_additional_parallel_workers...

David J.


Re: [HACKERS] More inaccurate results from numeric pow()

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 1:02 PM, Dean Rasheed  wrote:
> Doing some more testing of the numeric code patched in [1] I noticed
> another case where the result is inaccurate -- computing 0.12 ^
> -2345.6 gives a very large number containing 2162 digits, but only the
> first 2006 correct, while the last 156 digits are wrong.

Just out of curiosity, how can you tell?  Where do you get alternate
output to compare against?

Also, I wonder what we think the contract with the user is in cases
like this.  Surely, if we were dealing with floating point numbers,
nobody would expect a calculation like this to be accurate beyond the
first n digits, where n is surely much less than 2006.  I like the
fact that numeric has a lot more precision than any built-in floating
point type, but does it have to get every digit in front of the
decimal point exactly right no matter how many there are?

rhaas=# select tan(pi()::numeric/2), tan(pi()/2);
   tan   | tan
-+--
 618986325617924 | 1.63312393531954e+16
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 1:58 PM, Tom Lane  wrote:
> I wrote:
>> I think we'd be better off to rename these to tsvector_delete() and
>> tsvector_filter() while we still can.
>
> ... although I now notice that hstore already exposes a function named
> delete(), so that ship may have sailed already.  But I'm more troubled
> by filter() anyhow, since that keyword can appear in expressions ---
> it seems much more likely that that would pose a parsing conflict
> after future SQL extensions.

But not everybody has hstore installed, so even if that's a problem it
won't be a problem for everybody, all the time.  +1 for renaming them
both.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread Robert Haas
On Tue, Apr 26, 2016 at 11:49 AM, Robert Haas  wrote:
> On Tue, Apr 26, 2016 at 11:44 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> I still think
>>> max_parallel_workers is confusingly similar to max_worker_processes,
>>> but nothing's going to make everyone completely happy here.
>>
>> Well, what was suggested upthread was to change all of these to follow
>> the pattern max_foo_workers or max_foo_worker_processes, where foo would
>> (hopefully) clarify the scope in which the limitation applies.
>
> Well, I don't like max_node_parallel_degree much.  We don't call it
> max_node_work_mem.  And node is not exactly a term that's going to be
> more familiar to the average PostgreSQL user than parallel degree is
> to (apparently) the average PostgreSQL developer.  I think at some
> point adding noise words hurts more than it helps, and you've just got
> to ask people to RTFM if they really want to understand.

If we're going to change this before beta1, we need to do it soon.
IMHO, the only reasonably sane idea that's been proposed thus far is
to rename as follows:

max_parallel_degree -> max_parallel_workers
parallel_degree -> parallel_workers

I would prefer to keep it as "degree".  It's a reasonable term of art,
and it also improves grep-ability.  But I'm willing to go do the above
renaming if there is a clear consensus behind it.  Alternatively, I'm
willing to make it 1-based rather than 0-based if there is a clear
consensus on that option, though unsurprisingly I prefer it the way it
is now.  Do we have such a consensus?

To summarize the positions as I understand them:

Magnus seems OK with the way things are.
Peter wants to change either the fact that it is 0-based or the fact
that it is called degree, but is OK with either.
Tom doesn't like "degree" and also thinks anything called degree
should 1-based, but it sounds like he's more interested in changing
the first thing than the second one
Bruce and JD seemed to like degree -> workers.
JD also suggested another option that nobody else endorsed.
Alvaro suggested another option that nobody else endorsed.

Does anyone else want to vote?  Does anyone who previously cast a vote
wish to change it or clarify their position?  I think I am reading
that degree -> workers will please most people (just not me).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-02 Thread Tom Lane
I wrote:
> I think we'd be better off to rename these to tsvector_delete() and
> tsvector_filter() while we still can.

... although I now notice that hstore already exposes a function named
delete(), so that ship may have sailed already.  But I'm more troubled
by filter() anyhow, since that keyword can appear in expressions ---
it seems much more likely that that would pose a parsing conflict
after future SQL extensions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More inaccurate results from numeric pow()

2016-05-02 Thread Tom Lane
Dean Rasheed  writes:
> In fact it's possible to predict exactly how large we need to allow
> "val" to become, since the final result is computed using exp_var(),
> which accepts inputs up to 6000, so the result weight "val" can be up
> to around log10(exp(6000)) ~= 2606 before the final result causes an
> overflow.

> The obvious fix would be to modify the clamping limits. I think a
> better answer though is to replace the clamping code with an overflow
> test, immediately throwing an error if "val" is outside the allowed
> range, per the attached patch.

I don't much care for the hardwired magic number here, especially since
exp_var() does not have its limit expressed as "6000" but as
"NUMERIC_MAX_RESULT_SCALE * 3".  I think you should rephrase the limit
to use that expression, and also add something like this in exp_var():

val = numericvar_to_double_no_overflow();

/* Guard against overflow */
+   /* If you change this limit, see also power_var()'s limit */
if (Abs(val) >= NUMERIC_MAX_RESULT_SCALE * 3)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("value overflows numeric format")));

Seems like a reasonable idea otherwise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-02 Thread Joshua D. Drake

On 05/02/2016 10:27 AM, Tom Lane wrote:

I noticed that 6943a946c introduces some new functions named delete()
and filter().  This does not seem like a terribly bright idea to me.
They may not be formally ambiguous with the corresponding keywords,
but it's not very hard to imagine how small typos could lead to
the parser taking the unintended interpretation and then producing
totally confusing error messages.  It's even less hard to imagine
this choice preventing us from introducing some new syntax in future
(for instance, DELETE ... RETURNING ... as a subquery-in-FROM) because
it *would* be formally ambiguous.

I think we'd be better off to rename these to tsvector_delete() and
tsvector_filter() while we still can.


or ts_filter/delete? but no objection

JD



regards, tom lane





--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Naming of new tsvector functions

2016-05-02 Thread Tom Lane
I noticed that 6943a946c introduces some new functions named delete()
and filter().  This does not seem like a terribly bright idea to me.
They may not be formally ambiguous with the corresponding keywords,
but it's not very hard to imagine how small typos could lead to
the parser taking the unintended interpretation and then producing
totally confusing error messages.  It's even less hard to imagine
this choice preventing us from introducing some new syntax in future
(for instance, DELETE ... RETURNING ... as a subquery-in-FROM) because
it *would* be formally ambiguous.

I think we'd be better off to rename these to tsvector_delete() and
tsvector_filter() while we still can.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More inaccurate results from numeric pow()

2016-05-02 Thread Dean Rasheed
Doing some more testing of the numeric code patched in [1] I noticed
another case where the result is inaccurate -- computing 0.12 ^
-2345.6 gives a very large number containing 2162 digits, but only the
first 2006 correct, while the last 156 digits are wrong.

The reason is this code in power_var():

/* limit to something that won't cause integer overflow */
val = Max(val, -NUMERIC_MAX_RESULT_SCALE);
val = Min(val, NUMERIC_MAX_RESULT_SCALE);

where "val" is the approximate decimal result weight. Here
NUMERIC_MAX_RESULT_SCALE is 2000, so it's clamping the estimated
result weight to 2000, and therefore reducing the rscale in the
subsequent calculations, causing the loss of precision at around 2000
digits.

In fact it's possible to predict exactly how large we need to allow
"val" to become, since the final result is computed using exp_var(),
which accepts inputs up to 6000, so the result weight "val" can be up
to around log10(exp(6000)) ~= 2606 before the final result causes an
overflow.

The obvious fix would be to modify the clamping limits. I think a
better answer though is to replace the clamping code with an overflow
test, immediately throwing an error if "val" is outside the allowed
range, per the attached patch.

This has the advantage that it avoids some expensive computations in
the case where the result will end up overflowing, but more
importantly it means that power_var() isn't so critically dependent on
the limits of exp_var() -- if someone in the future increased the
limits of exp_var() without touching power_var(), and power_var()
clamped to the old range, the problem would resurface. But doing an
overflow test in power_var() instead of clamping "val", it would
either compute an accurate result, or throw an overflow error early
on. There should be no possibility of it returning an inaccurate
result.

Regards,
Dean

[1] 
http://www.postgresql.org/message-id/CAEZATCV7w+8iB=07dj8q0zihxqt1semcqutek+4_rogc_zq...@mail.gmail.com
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 3ba373a..b4a2829
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8008,9 +8008,16 @@ power_var(NumericVar *base, NumericVar *
 
 	val *= 0.434294481903252;	/* approximate decimal result weight */
 
-	/* limit to something that won't cause integer overflow */
-	val = Max(val, -NUMERIC_MAX_RESULT_SCALE);
-	val = Min(val, NUMERIC_MAX_RESULT_SCALE);
+	/*
+	 * Apply a crude overflow test so we can exit early if the result is sure
+	 * to overflow.  exp_var() supports inputs up to 6000, so we know that the
+	 * result will overflow if the approximate decimal result weight exceeds
+	 * log10(exp(6000)) ~= 2610.
+	 */
+	if (Abs(val) > 2610)
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value overflows numeric format")));
 
 	/* choose the result scale */
 	rscale = NUMERIC_MIN_SIG_DIGITS - (int) val;
diff --git a/src/test/regress/expected/numeric_big.out b/src/test/regress/expected/numeric_big.out
new file mode 100644
index 1b3608a..8bb09c9
--- a/src/test/regress/expected/numeric_big.out
+++ b/src/test/regress/expected/numeric_big.out
@@ -1075,6 +1075,20 @@ SELECT b, p, bc_result, b^p AS power, b^
  27.234987 |  20.230957 |   108142427112079606637962972621.121293 |   108142427112079606637962972621.121293 |0.00
 (41 rows)
 
+-- Inputs close to overflow
+--
+-- bc(1) results computed with a scale of 2700 and truncated to 4 decimal
+-- places.
+WITH t(b, p, bc_result) AS (VALUES
+(0.12, -2829.8369, 

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Andres Freund
On 2016-05-02 12:44:53 -0400, Robert Haas wrote:
> On Mon, May 2, 2016 at 12:41 PM, Andres Freund  wrote:
> > On 2016-05-02 12:29:45 -0400, Robert Haas wrote:
> >> On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund  wrote:
> >> > Basically the reason for the problem is that mdsync() needs to access
> >> > "formally non-existant segments" (as in ones where previous segments are
> >> > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync
> >> > requests via register_dirty_segment() in mdtruncate().
> >>
> >> Shouldn't we just throw those flush requests away?
> >
> > Well, we explicity make them for truncations (register_dirty_segment()
> > calls in mdtruncate()).  There's no comment as to why - I suspect the
> > idea is that you want to make sure the truncation sticks in case of
> > crash?
> 
> I dunno, I don't understand this well enough yet.
> 
> > FWIW, falling back to _mdfd_openseg() fixes the issue.
> 
> Can you post a patch?

Sure, attached.


I'm not sure this is the best way to go about this.  I can see valid
arguments for *always* using _mdfd_openseg() in mdsync(); and I'm
wondering whether we shouldn't make EXTENSION_* into a bitmask
(extend,extend_recovery,return_null,open_deleted).

Andres
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2981b41..3774fb0 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1159,6 +1159,7 @@ mdsync(void)
 			while ((segno = bms_first_member(requests)) >= 0)
 			{
 int			failures;
+bool		opened_directly = false;
 
 /*
  * If fsync is off then we don't have to bother opening the
@@ -1223,6 +1224,23 @@ mdsync(void)
 
 	INSTR_TIME_SET_CURRENT(sync_start);
 
+	/*
+	 * _mdfd_getseg() will only open segments which aren't
+	 * preceded by non-truncated segments (c.f. notes about
+	 * RELSEG_SIZE at the top of this file). But there are
+	 * some cases, e.g. mdtruncate, where truncated segments
+	 * are to be fsynced: Thus open those explicitly here.  We
+	 * avoid always using _mdfd_openseg() because using
+	 * _mdfd_getseg() avoids some open()/close() calls if the
+	 * underlying files are already open.
+	 */
+	if (seg == NULL)
+	{
+		seg = _mdfd_openseg(reln, forknum, segno,
+			EXTENSION_RETURN_NULL);
+		opened_directly = true;
+	}
+
 	if (seg != NULL &&
 		FileSync(seg->mdfd_vfd) >= 0)
 	{
@@ -1241,6 +1259,13 @@ mdsync(void)
  FilePathName(seg->mdfd_vfd),
  (double) elapsed / 1000);
 
+		/* free resources if explicitly opened above */
+		if (opened_directly)
+		{
+			if (seg->mdfd_vfd >= 0)
+FileClose(seg->mdfd_vfd);
+			pfree(seg);
+		}
 		break;	/* out of retry loop */
 	}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread Robert Haas
On Sat, Apr 30, 2016 at 4:54 AM, David Rowley
 wrote:
>> Right, but they're probably not doing the SAME work.  You can look at
>> EXPLAIN (ANALYZE, VERBOSE, BUFFERS) to see.  Of course, all the work
>> above the Gather node is being done by the leader, but the stuff below
>> the Gather node often has a bit of participation from the leader, but
>> is mostly the workers.
>
> Robert, I'd imagine that most of your tests to make you think what you
> do would have come from testing parallel seq scan, where perhaps
> Daniel's comes from testing something like parallel aggregates, or at
> least something that gives the workers a decent amount of work per
> tuple returned.

Sure, if you're doing parallel aggregate with a small number of output
groups, the work distribution will be much more nearly equal.  I
wasn't trying to dispute that.  There are many cases where it doesn't
work out that way, though.

> So I'd say this very much depends on how busy the main process is
> pulling rows from each worker.

Absolutely.

> It would also be quite nice if we could see at a glance how much the
> main process did, without having to go subtracting what all the
> workers managed to do.

I agree, but that looked considerably more invasive, because of the
way that the worker results get aggregated into the leader's result.
There may be a clever way to avoid that, but I didn't see it.  Or we
may want to do it anyway despite it requiring more invasive changes.
I just wasn't in a hurry to rush into all that while trying to get
parallel query v1 out the door.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 12:41 PM, Andres Freund  wrote:
> On 2016-05-02 12:29:45 -0400, Robert Haas wrote:
>> On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund  wrote:
>> > Basically the reason for the problem is that mdsync() needs to access
>> > "formally non-existant segments" (as in ones where previous segments are
>> > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync
>> > requests via register_dirty_segment() in mdtruncate().
>>
>> Shouldn't we just throw those flush requests away?
>
> Well, we explicity make them for truncations (register_dirty_segment()
> calls in mdtruncate()).  There's no comment as to why - I suspect the
> idea is that you want to make sure the truncation sticks in case of
> crash?

I dunno, I don't understand this well enough yet.

> FWIW, falling back to _mdfd_openseg() fixes the issue.

Can you post a patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Andres Freund
On 2016-05-02 12:29:45 -0400, Robert Haas wrote:
> On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund  wrote:
> > Basically the reason for the problem is that mdsync() needs to access
> > "formally non-existant segments" (as in ones where previous segments are
> > < RELSEG_SIZE), because we queue (and the might be preexistant) fsync
> > requests via register_dirty_segment() in mdtruncate().
> 
> Shouldn't we just throw those flush requests away?

Well, we explicity make them for truncations (register_dirty_segment()
calls in mdtruncate()).  There's no comment as to why - I suspect the
idea is that you want to make sure the truncation sticks in case of
crash?

FWIW, falling back to _mdfd_openseg() fixes the issue.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade and toasted pg_largeobject

2016-05-02 Thread Alvaro Herrera
Hi,

A customer of ours was unable to pg_upgrade a database, with this error:

  old and new databases "postgres" have a mismatched number of relations
  Failure, exiting

After some research, it turned out that pg_largeobject had acquired a
toast table.  After some more research, we determined that it was
because right after initdb of the old database (months or years prior)
they moved pg_largeobject to another, slower tablespace, because for
their case it is very bulky and not used as much as the other data.
(This requires restarting postmaster with the -O parameter).

While I understand that manual system catalog modifications are frowned
upon, it seems to me that we should handle this better.  The failure is
very confusing and hard to diagnose, and hard to fix also.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 7:58 PM, Andres Freund  wrote:
> On 2016-04-28 17:41:29 +0100, Thom Brown wrote:
>> I've noticed another breakage, which I can reproduce consistently.
>
>> 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG:  could not
>> fsync file "base/24581/24594.1" but retrying: No such file or directory
>> 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR:  could not
>> fsync file "base/24581/24594.1": No such file or directory
>> 2016-04-28 17:36:08 BST [18605]: [17-1]
>> user=thom,db=postgres,client=[local] ERROR:  checkpoint request failed
>> 2016-04-28 17:36:08 BST [18605]: [18-1]
>> user=thom,db=postgres,client=[local] HINT:  Consult recent messages in the
>> server log for details.
>
> Yuck. md.c is so crummy :(
>
> Basically the reason for the problem is that mdsync() needs to access
> "formally non-existant segments" (as in ones where previous segments are
> < RELSEG_SIZE), because we queue (and the might be preexistant) fsync
> requests via register_dirty_segment() in mdtruncate().

Shouldn't we just throw those flush requests away?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 9:45 AM, Tom Lane  wrote:
>> It will
>> be helpful if you can find the offending query or plan corresponding to it?
>
> I presume the lack of debug_query_string data is because nothing is
> bothering to set debug_query_string in a worker process.  Should that be
> remedied?  At the very least set it to "worker process", but it might be
> worth copying over the full query from the parent side.

I agree.  I thought about doing that at one point, but I didn't quite
have the cycles and I wasn't sure how important it would be.  The fact
that we're already hitting cases like this before we've even gone to
beta suggests that it's pretty important.  I think it's worth the
extra cycles, even in non-cassert builds.  Compared to the overhead of
cajoling the postmaster to fork a new process, the cost of this should
be trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-05-02 Thread Oleksandr Shulgin
On Mon, May 2, 2016 at 4:04 PM, Andrew Dunstan  wrote:

>
> On 05/02/2016 04:56 AM, Shulgin, Oleksandr wrote:
>
>> On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan > > wrote:
>>
>> On 04/29/2016 06:11 PM, Merlin Moncure wrote:
>>
>> This is a simple matter of removing spaces in the occasional C
>> string
>> literal in the serialization routines and adding a json_pretty
>> function.
>>
>>
>> I spent a few hours on this. See
>> 
>> for WIP - there are three commits. No regression tests yet for the
>> two new functions (json_squash and json_pretty), Otherwise fairly
>> complete. Removing whitespace generation was pretty simple for
>> both json and jsonb.
>>
>>
>> Looks good, thank you!
>>
>> It would make sense IMO to rename FormatState's `indent' field as
>> `pretty': it's being used to add whitespace between the punctuation, not
>> only at start of a line.  I'd also move the "if (indent)" check out of
>> add_indent(): just don't call it if no indent is needed.
>>
>> I'll try to play with the patch to produce some regression tests for the
>> new functions.
>>
>>
> It was done the way it was to be as consistent as possible with how it's
> done for jsonb (c.f. jsonb.c:JsonbToCStringWorker and jsonb.c::add_indent).
>

Ah, I see.

Simply taking regression tests for jsonb_pretty() and using them against
json_pretty() revealed a bug with extra indent being added before every
array/object start.  Attached patch fixes that and adds the regression
tests.

For json_squash() I've taken the same three test values, for lack of a
better idea at the moment.  At least we are testing key order stability and
that no whitespace is spit out.

Regards,
--
Alex
diff --git a/src/backend/utils/adt/jsonfuncs.c 
b/src/backend/utils/adt/jsonfuncs.c
new file mode 100644
index d82c9c8..77d8325
*** a/src/backend/utils/adt/jsonfuncs.c
--- b/src/backend/utils/adt/jsonfuncs.c
*** fmt_object_field_start(void *state, char
*** 3394,3402 
  
escape_json(_state->strval, fname);
  
!   appendStringInfoCharMacro(_state->strval, ':');
!   if (_state->indent)
!   appendStringInfoCharMacro(_state->strval, ' ');
  
_state->last_was_key = true;
  }
--- 3394,3400 
  
escape_json(_state->strval, fname);
  
!   appendBinaryStringInfo(_state->strval, ": ", _state->indent ? 2 : 1);
  
_state->last_was_key = true;
  }
*** fmt_array_element_start(void *state, boo
*** 3409,3417 
if (!_state->first)
appendStringInfoCharMacro(_state->strval, ',');
_state->first = false;
! 
!   add_indent(_state->strval, _state->indent, _state->lex->lex_level);
! 
  }
  
  static void
--- 3407,3413 
if (!_state->first)
appendStringInfoCharMacro(_state->strval, ',');
_state->first = false;
!   _state->last_was_key = false;
  }
  
  static void
*** fmt_scalar(void *state, char *token, Jso
*** 3419,3424 
--- 3415,3424 
  {
FormatState *_state = (FormatState *) state;
  
+   if (_state->lex->lex_level > 0)
+   add_indent(_state->strval, _state->indent && 
!_state->last_was_key,
+  _state->lex->lex_level);
+ 
if (tokentype == JSON_TOKEN_STRING)
escape_json(_state->strval, token);
else
diff --git a/src/test/regress/expected/json.out 
b/src/test/regress/expected/json.out
new file mode 100644
index 0c45c64..7af4022
*** a/src/test/regress/expected/json.out
--- b/src/test/regress/expected/json.out
*** select json_strip_nulls('{"a": {"b": nul
*** 1658,1660 
--- 1658,1736 
   {"a":{},"d":{}}
  (1 row)
  
+ select json_pretty('{"a": "test", "b": [1, 2, 3], "c": "test3", "d":{"dd": 
"test4", "dd2":{"ddd": "test5"}}}');
+ json_pretty 
+ 
+  { +
+  "a": "test",  +
+  "b": [+
+  1,+
+  2,+
+  3 +
+  ],+
+  "c": "test3", +
+  "d": {+
+  "dd": "test4",+
+  "dd2": {  +
+  "ddd": "test5"+
+  } +
+  } +
+  }
+ (1 row)
+ 
+ select json_pretty('[{"f1":1,"f2":null},2,null,[[{"x":true},6,7],8],3]');
+ json_pretty
+ ---
+  [+
+  {+
+  "f1": 1, +
+  "f2": null   +
+  },   +
+  2,   +
+  null,+
+  [+
+  [+
+  {+
+  "x": true+
+  },   +
+ 

Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 12:04 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>>> - If that flag is set on a subsequent startup, say:
>>> WARNING: Recovery was previously performed with fsync=off; this
>>> cluster may be irretrievably corrupted.
>
>> Well, the problem with that is that postgres crashes are actually
>> harmless with regard to fsync=on/off. It's just OS crashes that are a
>> problem. So it seems quite likely that the false-positive rate here
>> would be high enough, to make people ignore it.
>
> That's a pretty good point.  Also, as sketched, I believe this would
> start bleating after a crash recovery performed because a backend
> died --- which is a case where we know for certain there was no OS
> crash.  So this idea needs some more thought.

That's true.  I think, that we could arrange to ignore postmaster
initiated crash-and-restart cycles in deciding whether to set the
flag.  Now, somebody could still do an immediate shutdown, or the
postmaster could go boom, but I don't think those are common enough
scenarios to justify not tracking this.  If you are using fsync=off
and running an immediate shutdown and then setting fsync=on and
restarting the server ... yeah, that could hypothetically be safe.
But I think you are playing with fire.  If you are using fsync=off for
the initial data load, it's not too much to ask that you shut the
cluster down cleanly when you are done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Tom Lane
Andres Freund  writes:
> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>> - If that flag is set on a subsequent startup, say:
>> WARNING: Recovery was previously performed with fsync=off; this
>> cluster may be irretrievably corrupted.

> Well, the problem with that is that postgres crashes are actually
> harmless with regard to fsync=on/off. It's just OS crashes that are a
> problem. So it seems quite likely that the false-positive rate here
> would be high enough, to make people ignore it.

That's a pretty good point.  Also, as sketched, I believe this would
start bleating after a crash recovery performed because a backend
died --- which is a case where we know for certain there was no OS
crash.  So this idea needs some more thought.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Andres Freund
Hi,

On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
> I also think that it would be a swell idea to detect whether a system
> has ever crashed with fsync=off, and do something about that, like
> maybe bleat on every subsequent startup for the lifetime of the
> cluster.  I think Andres may have even proposed a patch for this sort
> of thing before, although I don't remember for sure and I think he and
> I disagreed on the details.  Sketch:

Hm,  I can't remember doing that.


> - Keep a copy of the fsync status in pg_control.
> - If we ever enter recovery while it's turned off, say:
> WARNING: Entering recovery with fsync=off; this cluster may be
> irretrievably corrupted.
> ...and also set a separate flag indicating that we've done at least
> one recovery with fsync=off.
> - If that flag is set on a subsequent startup, say:
> WARNING: Recovery was previously performed with fsync=off; this
> cluster may be irretrievably corrupted.

Well, the problem with that is that postgres crashes are actually
harmless with regard to fsync=on/off. It's just OS crashes that are a
problem. So it seems quite likely that the false-positive rate here
would be high enough, to make people ignore it.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Andres Freund
On 2016-05-02 18:15:40 +0300, Ants Aasma wrote:
> On Mon, May 2, 2016 at 5:21 PM, Andres Freund  wrote:
> > On 2016-05-02 09:03:19 -0400, Robert Haas wrote:
> >> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
> >> > Now to continue with the performance benchmarks.  I'm pretty sure
> >> > we've fixed the problems when the feature is disabled
> >> > (old_snapshot_threshold = -1), and there are several suggestions
> >> > for improving performance while it is on that need to be compared
> >> > and benchmarked.
> >>
> >> If anyone thinks that the issue with the feature disabled is NOT
> >> fixed, please speak up!  I'm moving the corresponding open item to
> >> CLOSE_WAIT status, meaning that it will be closed if nobody shows up
> >> to say that there is still an issue.
> >
> > Well, I don't agree that the feature is in a releaseable state. The
> > datastructure is pretty much non-scalable, and maintained on the wrong
> > side (every read, instead of once in writing writing xacts). There's no
> > proposal actually addressing the scalability issues.
> 
> Unless I'm missing something fundamental the feature only requires
> tracking an upper bound on xmin observed by snapshots between clock
> ticks.

I'm not saying that there's no datastructure that can make the whole
thing efficient - just that current datastructure doesn't look viable
and that I've not seen that point addressed seriously.


> The simplest way to do this would be a periodic process that
> increments a clock counter (32bit counter would be plenty) and then
> calculates xmin for the preceding range. With this scheme
> GetSnapshotData would need two atomic fetches to get current LSN and
> the timestamp. Test for old snapshot can also run completely lock free
> with a single atomic fetch of threshold timestamp. The negative side
> is that we need to have a process running that runs the clock ticks
> and the ticks may sometimes be late. Giving something like autovacuum
> launcher this task doesn't seem too bad and the consequence of falling
> behind is just delayed timing out of old snapshots.

That'd be one way, yes. I suspect it'd even be sufficient to move
maintaining the map around GetNewTransactionId(); so only writers pay
the overhead. Given that writes obviously are slower than reads that
might make the problem disappear for a long while.

> As far as I can see this approach would get rid of any scalability
> issues, but it is a pretty significant change and requires 64bit
> atomic reads to get rid of contention on xlog insert lock.

Yea.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Ants Aasma
On Mon, May 2, 2016 at 5:21 PM, Andres Freund  wrote:
> On 2016-05-02 09:03:19 -0400, Robert Haas wrote:
>> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
>> > Now to continue with the performance benchmarks.  I'm pretty sure
>> > we've fixed the problems when the feature is disabled
>> > (old_snapshot_threshold = -1), and there are several suggestions
>> > for improving performance while it is on that need to be compared
>> > and benchmarked.
>>
>> If anyone thinks that the issue with the feature disabled is NOT
>> fixed, please speak up!  I'm moving the corresponding open item to
>> CLOSE_WAIT status, meaning that it will be closed if nobody shows up
>> to say that there is still an issue.
>
> Well, I don't agree that the feature is in a releaseable state. The
> datastructure is pretty much non-scalable, and maintained on the wrong
> side (every read, instead of once in writing writing xacts). There's no
> proposal actually addressing the scalability issues.

Unless I'm missing something fundamental the feature only requires
tracking an upper bound on xmin observed by snapshots between clock
ticks. The simplest way to do this would be a periodic process that
increments a clock counter (32bit counter would be plenty) and then
calculates xmin for the preceding range. With this scheme
GetSnapshotData would need two atomic fetches to get current LSN and
the timestamp. Test for old snapshot can also run completely lock free
with a single atomic fetch of threshold timestamp. The negative side
is that we need to have a process running that runs the clock ticks
and the ticks may sometimes be late. Giving something like autovacuum
launcher this task doesn't seem too bad and the consequence of falling
behind is just delayed timing out of old snapshots.

As far as I can see this approach would get rid of any scalability
issues, but it is a pretty significant change and requires 64bit
atomic reads to get rid of contention on xlog insert lock.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Accidentally parallel unsafe functions

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 6:06 PM, Andreas Karlsson  wrote:
> I am currently looking into adding the correct parallel options to all
> functions in the extensions and I noticed that some built-in functions seems
> to have been marked as unsafe by accident. The main culprit is
> system_views.sql which redefines these functions and removes the parallel
> safe flag.
>
> I think this counts as a 9.6 bug unlike my work on adding the flags to all
> extensions which is for 9.7.
>
> I have attached a patch which marks them and all conversion functions as
> parallel safe. I also added the flag to ts_debug() when I was already
> editing system_views.sql, feel free to ignore that one if you like.
>
> Affected functions:
>
> - json_populate_record()
> - json_populate_recordset()
> - jsonb_insert()
> - jsonb_set()
> - make_interval()
> - parse_ident()
> - Loads of conversion functions

Hmm.  The new pg_start_backup() is not parallel-safe.  It's
parallel-restricted, because it relies on backend-private state.  I'll
go fix that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Tom Lane
Craig Ringer  writes:
> On 2 May 2016 at 22:07, Robert Haas  wrote:
>> I also think that it would be a swell idea to detect whether a system
>> has ever crashed with fsync=off, and do something about that, like
>> maybe bleat on every subsequent startup for the lifetime of the
>> cluster.

> Yes. Very, very yes.

+1 for tracking this in pg_control (maybe even with a counter, not
just a flag).  I'm less convinced that we need to bleat on every
subsequent startup though --- that seems like just nagging.
Having the info available from pg_controldata seems sufficient for
forensics.

The timestamp ideas aren't bad either.

BTW, how would this work in a standby server?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 10:21 AM, Andres Freund  wrote:
> On 2016-05-02 09:03:19 -0400, Robert Haas wrote:
>> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
>> > Now to continue with the performance benchmarks.  I'm pretty sure
>> > we've fixed the problems when the feature is disabled
>> > (old_snapshot_threshold = -1), and there are several suggestions
>> > for improving performance while it is on that need to be compared
>> > and benchmarked.
>>
>> If anyone thinks that the issue with the feature disabled is NOT
>> fixed, please speak up!  I'm moving the corresponding open item to
>> CLOSE_WAIT status, meaning that it will be closed if nobody shows up
>> to say that there is still an issue.
>
> Well, I don't agree that the feature is in a releaseable state. The
> datastructure is pretty much non-scalable, and maintained on the wrong
> side (every read, instead of once in writing writing xacts). There's no
> proposal actually addressing the scalability issues.

You are certainly welcome to add a new open item to cover those
complaints.  But I do not want to blur together the discussion of
whether the feature is well-designed with the question of whether it
regresses performance when it is turned off.  Those are severable
issues, meriting separate discussion (and probably separate threads).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About subxact and xact nesting level...

2016-05-02 Thread david
> From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]

> > The file xact.c contains references to sub-transactions (subxact) and
> > transaction nesting level, but no obvious documentation about what
> > these correspond to in SQL. A search shows that plpython supports
> > something called “proper sub transactions”. There are random mentions
> > of subtransactions in the release notes, but nothing substantive that
> > I can find, and nothing about transaction nesting.
> >
> > Any pointers to docs or help to understand much appreciated.
> 
> Subtransactions are used to implement SAVEPOINT, and also BEGIN blocks with
> EXCEPTION clauses in plpgsql.
> 
> http://www.postgresql.org/docs/9.5/static/sql-savepoint.html
> http://www.postgresql.org/docs/9.5/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

Thanks. I guess that explains the nesting level too. It seems there is an 
internal API based on:
* BeginInternalSubTransaction
* RollbackAndReleaseCurrentSubTransaction
* ReleaseCurrentSubTransaction

This looks like something I shall need to use. I have the plandl language 
handler all working, and understanding the transaction environment is turning 
out to be a major challenge. 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Craig Ringer
On 2 May 2016 at 22:07, Robert Haas  wrote:


>
> I also think that it would be a swell idea to detect whether a system
> has ever crashed with fsync=off, and do something about that, like
> maybe bleat on every subsequent startup for the lifetime of the
> cluster.


Yes. Very, very yes.

That would've made my life considerably easier on a few occasions now.

It shouldn't take much more than a new pg_control field and a test during
recovery.

Should TODO this, but since that's sometimes where ideas go to die, I'm
going to see if I can hack this out soon as well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Refactor pg_dump as a library?

2016-05-02 Thread Tom Lane
Robert Haas  writes:
> On Mon, May 2, 2016 at 10:00 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Mon, Apr 18, 2016 at 11:04 AM, Tom Lane  wrote:
 The problem with that approach is that then you are talking about building
 duplicate copies of entire layers of the system.

>>> Urgh.  Does ruleutils.c really depend on everything in namespace.c?

>> Indirectly, probably most of it.  For example, it uses format_type_be()
>> which depends on TypeIsVisible(), and it uses func_get_detail()
>> which depends on FuncnameGetCandidates().  And it's those intermediate
>> functions that are really bloating the depends-on footprint.
>> But really the killer point here is that it uses SPI in some places.
>> I've always wondered whether that was a good design choice, but right
>> now that implicates just about the whole backend.

> Ouch.

> Well, I think the first thing to do here might be to reconsider
> whether the footprint could be cut down.  Removing the dependency on
> SPI seems like a good idea even if we do nothing else.  Nailing the
> catalogs to a snapshot isn't crazy - the logical decoding stuff does
> it already - but having such a wide dependency footprint does not seem
> especially good.

Meh.  I'd be the first to say that probably a lot of that is because it
was convenient; but I don't see any way to avoid it without duplicating
vast quantities of code, which does not sound like a great idea from a
maintainability standpoint.  And have we mentioned the fact that some of
this code looks directly at the catalogs because there's no suitable
syscache?

Really I think the idea of fixing this with an alternate syscache is
a nonstarter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Andres Freund
On 2016-05-02 09:03:19 -0400, Robert Haas wrote:
> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
> > Now to continue with the performance benchmarks.  I'm pretty sure
> > we've fixed the problems when the feature is disabled
> > (old_snapshot_threshold = -1), and there are several suggestions
> > for improving performance while it is on that need to be compared
> > and benchmarked.
> 
> If anyone thinks that the issue with the feature disabled is NOT
> fixed, please speak up!  I'm moving the corresponding open item to
> CLOSE_WAIT status, meaning that it will be closed if nobody shows up
> to say that there is still an issue.

Well, I don't agree that the feature is in a releaseable state. The
datastructure is pretty much non-scalable, and maintained on the wrong
side (every read, instead of once in writing writing xacts). There's no
proposal actually addressing the scalability issues.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About subxact and xact nesting level...

2016-05-02 Thread dandl
> From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]

> > The file xact.c contains references to sub-transactions (subxact) and
> > transaction nesting level, but no obvious documentation about what
> > these correspond to in SQL. A search shows that plpython supports
> > something called “proper sub transactions”. There are random mentions
> > of subtransactions in the release notes, but nothing substantive that
> > I can find, and nothing about transaction nesting.
> >
> > Any pointers to docs or help to understand much appreciated.
> 
> Subtransactions are used to implement SAVEPOINT, and also BEGIN blocks with
> EXCEPTION clauses in plpgsql.
> 
> http://www.postgresql.org/docs/9.5/static/sql-savepoint.html
> http://www.postgresql.org/docs/9.5/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

Thanks. I guess that explains the nesting level too. It seems there is an 
internal API based on:
* BeginInternalSubTransaction
* RollbackAndReleaseCurrentSubTransaction
* ReleaseCurrentSubTransaction

This looks like something I shall need to use. I have the plandl language 
handler all working, and understanding the transaction environment is turning 
out to be a major challenge. 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-05-02 Thread Robert Haas
On Mon, May 2, 2016 at 10:00 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Apr 18, 2016 at 11:04 AM, Tom Lane  wrote:
>>> The problem with that approach is that then you are talking about building
>>> duplicate copies of entire layers of the system.  For example, namespace.c
>>> would have to be duplicated into one copy that uses syscache and one that
>>> uses this not-quite-cache.  If it were *only* syscache.c that had to be
>>> duplicated, probably this would work, but ruleutils.c depends on an awful
>>> lot of code above that level.  Indeed, if it did not, the idea of
>>> reimplementing it on the client side wouldn't be so unattractive.
>
>> Urgh.  Does ruleutils.c really depend on everything in namespace.c?
>
> Indirectly, probably most of it.  For example, it uses format_type_be()
> which depends on TypeIsVisible(), and it uses func_get_detail()
> which depends on FuncnameGetCandidates().  And it's those intermediate
> functions that are really bloating the depends-on footprint.  As things
> stand, ruleutils depends on significant fractions of backend/catalog/
> and backend/parser/, all of which would have to be rewritten if you'd
> like to make it use some alternate catalog-access infrastructure.
>
> But really the killer point here is that it uses SPI in some places.
> I've always wondered whether that was a good design choice, but right
> now that implicates just about the whole backend.

Ouch.

Well, I think the first thing to do here might be to reconsider
whether the footprint could be cut down.  Removing the dependency on
SPI seems like a good idea even if we do nothing else.  Nailing the
catalogs to a snapshot isn't crazy - the logical decoding stuff does
it already - but having such a wide dependency footprint does not seem
especially good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 and fsync=off

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 9:49 AM, Tom Lane  wrote:
> Abhijit Menon-Sen  writes:
>> Do you want a patch along those lines now, or is it too late?
>
> We're certainly not going to consider fooling with this in 9.6.
> The situation for manual fsync-twiddling is no worse than it was in
> any prior release, and we are long past feature freeze.
>
> If you want to put it on your to-do queue for 9.7, feel free.

Agreed.

I also think that it would be a swell idea to detect whether a system
has ever crashed with fsync=off, and do something about that, like
maybe bleat on every subsequent startup for the lifetime of the
cluster.  I think Andres may have even proposed a patch for this sort
of thing before, although I don't remember for sure and I think he and
I disagreed on the details.  Sketch:

- Keep a copy of the fsync status in pg_control.
- If we ever enter recovery while it's turned off, say:
WARNING: Entering recovery with fsync=off; this cluster may be
irretrievably corrupted.
...and also set a separate flag indicating that we've done at least
one recovery with fsync=off.
- If that flag is set on a subsequent startup, say:
WARNING: Recovery was previously performed with fsync=off; this
cluster may be irretrievably corrupted.

While I'm kvetching, it might also be a good idea to have a timestamp
in pg_control indicating the date and time at which pg_resetxlog was
last run (and maybe the cluster creation time, too).  I run across way
too many clusters where the customer can't convincingly vouch for the
proposition that nothing evil has been done, and having some forensic
evidence available would make it easier to figure out where the blame
lies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-05-02 Thread Andrew Dunstan



On 05/02/2016 04:56 AM, Shulgin, Oleksandr wrote:
On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan > wrote:



On 04/29/2016 06:11 PM, Merlin Moncure wrote:

This is a simple matter of removing spaces in the occasional C
string
literal in the serialization routines and adding a json_pretty
function.


I spent a few hours on this. See

for WIP - there are three commits. No regression tests yet for the
two new functions (json_squash and json_pretty), Otherwise fairly
complete. Removing whitespace generation was pretty simple for
both json and jsonb.


Looks good, thank you!

It would make sense IMO to rename FormatState's `indent' field as 
`pretty': it's being used to add whitespace between the punctuation, 
not only at start of a line.  I'd also move the "if (indent)" check 
out of add_indent(): just don't call it if no indent is needed.


I'll try to play with the patch to produce some regression tests for 
the new functions.






It was done the way it was to be as consistent as possible with how it's 
done for jsonb (c.f. jsonb.c:JsonbToCStringWorker and jsonb.c::add_indent).


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subtle bug in autoconf flex version test

2016-05-02 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 02 May 2016, at 15:38, Tom Lane  wrote:
>> Hm, is that a popular flex version?  I wonder whether we will get
>> complaints if we start warning about it.

> Sorry, I missed half the sentence there.  What I meant was that I can trigger
> the warning synthetically by changing the version number just to test the
> warning; before any version is happily accepted.

Ah, I see.  I was wondering where you found a flex reporting such a
number; I was guessing it was an unreleased devel version ...

Will commit the fix in a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-05-02 Thread Tom Lane
Robert Haas  writes:
> On Mon, Apr 18, 2016 at 11:04 AM, Tom Lane  wrote:
>> The problem with that approach is that then you are talking about building
>> duplicate copies of entire layers of the system.  For example, namespace.c
>> would have to be duplicated into one copy that uses syscache and one that
>> uses this not-quite-cache.  If it were *only* syscache.c that had to be
>> duplicated, probably this would work, but ruleutils.c depends on an awful
>> lot of code above that level.  Indeed, if it did not, the idea of
>> reimplementing it on the client side wouldn't be so unattractive.

> Urgh.  Does ruleutils.c really depend on everything in namespace.c?

Indirectly, probably most of it.  For example, it uses format_type_be()
which depends on TypeIsVisible(), and it uses func_get_detail()
which depends on FuncnameGetCandidates().  And it's those intermediate
functions that are really bloating the depends-on footprint.  As things
stand, ruleutils depends on significant fractions of backend/catalog/
and backend/parser/, all of which would have to be rewritten if you'd
like to make it use some alternate catalog-access infrastructure.

But really the killer point here is that it uses SPI in some places.
I've always wondered whether that was a good design choice, but right
now that implicates just about the whole backend.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-05-02 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Apr 26, 2016 at 4:37 PM, Alvaro Herrera
>  wrote:

> > I failed to meet this deadline, for which I apologize.  This week is
> > going to be hectic around here, so my new deadline is to get these two
> > patches applied on Friday 29th.  Ok?
> 
> Not that you don't know this already, but you have also failed to meet
> this deadline.  Also, this open item has now been promoted to the much
> sought-after status of "oldest as-yet-unaddressed PostgreSQL 9.6 open
> item".

Yay.  I thought I was going to have Friday completely available, but
that ended up not happening.

I'm working on this exclusively now until I get the item closed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subtle bug in autoconf flex version test

2016-05-02 Thread Daniel Gustafsson
> On 02 May 2016, at 15:38, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> The PGAC_PATH_FLEX version test in config/programs.m4 tests the major and 
>> minor
>> versions with = rather than == which unless I’m missing something is 
>> performing
>> assignment rather than testing equality?
> 
> Huh.  That's been broken since forever ... thanks for noticing!
> 
>> The attached diff makes the test
>> trigger the expected warning on major/minor version on my OS X box (awk 
>> version
>> 20070501).
> 
> Hm, is that a popular flex version?  I wonder whether we will get
> complaints if we start warning about it.

Sorry, I missed half the sentence there.  What I meant was that I can trigger
the warning synthetically by changing the version number just to test the
warning; before any version is happily accepted.

cheers ./daniel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-05-02 Thread Robert Haas
On Tue, Apr 26, 2016 at 4:37 PM, Alvaro Herrera
 wrote:
> Alvaro Herrera wrote:
>> Noah Misch wrote:
>>
>> > The above-described topic is currently a PostgreSQL 9.6 open item.  Alvaro,
>> > since you committed the patch believed to have created it, you own this 
>> > open
>> > item.  If that responsibility lies elsewhere, please let us know whose
>> > responsibility it is to fix this.  Since new open items may be discovered 
>> > at
>> > any time and I want to plan to have them all fixed well in advance of the 
>> > ship
>> > date, I will appreciate your efforts toward speedy resolution.  Please
>> > present, within 72 hours, a plan to fix the defect within seven days of 
>> > this
>> > message.  Thanks.
>>
>> I plan to get Craig's patch applied on Monday 25th, giving me some more
>> time for study.
>
> I failed to meet this deadline, for which I apologize.  This week is
> going to be hectic around here, so my new deadline is to get these two
> patches applied on Friday 29th.  Ok?

Not that you don't know this already, but you have also failed to meet
this deadline.  Also, this open item has now been promoted to the much
sought-after status of "oldest as-yet-unaddressed PostgreSQL 9.6 open
item".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subtle bug in autoconf flex version test

2016-05-02 Thread Tom Lane
Daniel Gustafsson  writes:
> The PGAC_PATH_FLEX version test in config/programs.m4 tests the major and 
> minor
> versions with = rather than == which unless I’m missing something is 
> performing
> assignment rather than testing equality?

Huh.  That's been broken since forever ... thanks for noticing!

> The attached diff makes the test
> trigger the expected warning on major/minor version on my OS X box (awk 
> version
> 20070501).

Hm, is that a popular flex version?  I wonder whether we will get
complaints if we start warning about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 7:08 AM, Bruce Momjian  wrote:
> On Wed, Apr  6, 2016 at 12:57:16PM +0200, Andres Freund wrote:
>> While benchmarking on hydra
>> (c.f. 
>> http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
>> which has quite slow IO, I was once more annoyed by how incredibly long
>> the vacuum at the the end of a pgbench -i takes.
>>
>> The issue is that, even for an entirely shared_buffers resident scale,
>> essentially no data is cached in shared buffers. The COPY to load data
>> uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
>> that copy immediately writes and evicts all data. Then vacuum reads &
>> writes the data in small chunks; again evicting nearly all buffers. Then
>> the creation of the ringbuffer has to read that data *again*.
>>
>> That's fairly idiotic.
>>
>> While it's not easy to fix this in the general case, we introduced those
>> ringbuffers for a reason after all, I think we at least should add a
>> special case for loads where shared_buffers isn't fully used yet.  Why
>> not skip using buffers from the ringbuffer if there's buffers on the
>> freelist? If we add buffers gathered from there to the ringlist, we
>> should have few cases that regress.
>>
>> Additionally, maybe we ought to increase the ringbuffer sizes again one
>> of these days? 256kb for VACUUM is pretty damn low.
>
> Is this a TODO?

I think we are in agreement that some changes may be needed, but I
don't think we necessarily know what the changes are.  So you could
say something like "improve VACUUM ring buffer logic", for example,
but I think something specific like "increase size of the VACUUM ring
buffer" will just encourage someone to do it as a beginner project,
which it really isn't.  Maybe others disagree, but I don't think this
is a slam-dunk where we can just change the behavior in 10 minutes and
expect to have winners but no losers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlog -> pg_xjournal?

2016-05-02 Thread Robert Haas
On Thu, Apr 28, 2016 at 11:46 PM, Craig Ringer  wrote:
> On 29 April 2016 at 10:12, Bruce Momjian  wrote:
>> My larger question was, was 9.6 an ideal time to do this, and if so, why
>> did this issue not get done.  If 9.6 wasn't in some way ideal, we can do
>> it in 9.7.
>
> Doing it at the very beginning of the release cycle seems like a good idea.

Yeah.  If we do this, it's is going to affect quite a few bits and
pieces that know about pg_xlog, not to mention (I think) lots of
third-party tools.  So any patch doing this needs to be very carefully
reviewed before it goes into core, and on a timeline that gives
outside-of-core stuff a chance to react to it.

> I just helped another person yesterday who deleted their pg_xlog.

The biggest reason I've seen pg_xlog get deleted is not because it's
called pg_xlog, but because it's located someplace other than under
the data directory and referenced via a symlink.  Renaming it might
still make it less likely for people to get trigger happy, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-05-02 Thread Robert Haas
On Mon, Apr 18, 2016 at 11:04 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think that we could have an alternate set of functions which have
>> the same interface as the syscache functions but using the transaction
>> snapshot and don't actually cache anything, and it would be fine for
>> what the pg_dump support functions need.
>
> The problem with that approach is that then you are talking about building
> duplicate copies of entire layers of the system.  For example, namespace.c
> would have to be duplicated into one copy that uses syscache and one that
> uses this not-quite-cache.  If it were *only* syscache.c that had to be
> duplicated, probably this would work, but ruleutils.c depends on an awful
> lot of code above that level.  Indeed, if it did not, the idea of
> reimplementing it on the client side wouldn't be so unattractive.

Urgh.  Does ruleutils.c really depend on everything in namespace.c?
When I last looked at this I had the idea that at least a good chunk
of ruleutils.c had only limited outside dependencies that might not be
too tough to manage.  However, if that's not true, then I agree that's
a problem for this approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 5:02 PM, Bruce Momjian  wrote:
>> > I think pg_dump is reasonably proof against DDL on tables.  It is not
>> > at all proof against DDL on other sorts of objects, such as functions,
>> > because of the fact that the syscache will follow catalog updates that
>> > occur after pg_dump's transaction snapshot.
>>
>> Hmm, OK.  I'll need to go look at that.
>>
>> I thought that the backend running the pg_dump would fill it's syscache
>> when it took all the locks and then not update them during the actual
>> dump.  If that's not the case then it's a bit scary, yes.
>>
>> It seems to make a good case for physical backups vs. logical.
>
> I think another issue is that the pg_dump backend gets cache
> invalidations from other backends that cause it to reload the cache with
> new contents, so even if you pre-loaded the cache at snapshot time, you
> would still need to ignore cache invalidations from other backends.

If you temporarily nailed the backend's snapshot for syscache lookups
to some specific MVCC snapshot, as Tom was suggesting, then it
wouldn't matter if it processed invalidations, because reload would
pull the same entries as before back into the cache.  So I don't think
you'd *have* to do this, but you might want to do it as an
optimization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-02 Thread Robert Haas
On Thu, Apr 28, 2016 at 7:59 AM, Etsuro Fujita
 wrote:
> On 2016/03/14 17:56, Ashutosh Bapat wrote:
>> On Mon, Mar 14, 2016 at 1:29 PM, Etsuro Fujita
>> > wrote:
>
>>  /*
>>   * Build the fdw_private list that will be available to the
>> executor.
>>   * Items in the list must match order in enum
>> FdwScanPrivateIndex.
>>   */
>>  fdw_private = list_make4(makeString(sql.data),
>>   retrieved_attrs,
>>   makeInteger(fpinfo->fetch_size),
>>   makeInteger(foreignrel->umid));
>>
>> I don't think it's correct to use makeInteger for the foreignrel's
>> umid.
>
>
>> As long as we are using makeInteger() and inVal() pair to set and
>> extract the values, it should be fine.
>
> Yeah, but my concern about this is eg, print plan if debugging (ie,
> debug_print_plan=on); the umid OID will be printed with the %ld specifier,
> so in some platform, the OID might be printed wrongly.  Maybe I'm missing
> something, though.

That seems like a legitimate, if minor, complaint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner  wrote:
> Now to continue with the performance benchmarks.  I'm pretty sure
> we've fixed the problems when the feature is disabled
> (old_snapshot_threshold = -1), and there are several suggestions
> for improving performance while it is on that need to be compared
> and benchmarked.

If anyone thinks that the issue with the feature disabled is NOT
fixed, please speak up!  I'm moving the corresponding open item to
CLOSE_WAIT status, meaning that it will be closed if nobody shows up
to say that there is still an issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix planner crash from pfree'ing a partial path that a GatherPat

2016-05-02 Thread Robert Haas
On Sat, Apr 30, 2016 at 12:29 PM, Tom Lane  wrote:
> Fix planner crash from pfree'ing a partial path that a GatherPath uses.
>
> We mustn't run generate_gather_paths() during add_paths_to_joinrel(),
> because that function can be invoked multiple times for the same target
> joinrel.  Not only is it wasteful to build GatherPaths repeatedly, but
> a later add_partial_path() could delete the partial path that a previously
> created GatherPath depends on.  Instead establish the convention that we
> do generate_gather_paths() for a rel only just before set_cheapest().
>
> The code was accidentally not broken for baserels, because as of today there
> never is more than one partial path for a baserel.  But that assumption
> obviously has a pretty short half-life, so move the generate_gather_paths()
> calls for those cases as well.
>
> Also add some generic comments explaining how and why this all works.
>
> Per fuzz testing by Andreas Seltenreich.
>
> Report: <871t5pgwdt@credativ.de>

Thanks!

The preposition at the end of this sentence broke my internal English parser:

+ * This must not be called until after we're done creating all partial paths
+ * for the specified relation.  (Otherwise, add_partial_path might delete a
+ * path that some GatherPath has a reference to.)

I eventually figured out what you were saying there, and it does make sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2016-05-02 Thread Dean Rasheed
On 27 April 2016 at 08:36, Dean Rasheed  wrote:
> Here is a rough patch based on the way pg_dump handles this. It still
> needs a bit of polishing -- in particular I think fmtReloptionsArray()
> (copied from pg_dump) should probably be moved to string_utils.c so
> that it can be shared between pg_dump and psql. Also, I'm not sure
> that's the best name for it -- I think appendReloptionsArray() is a
> more accurate description of what is does.
>

Here are updated patches doing that --- the first moves
fmtReloptionsArray() from pg_dump.c to fe_utils/string_utils.c so that
it can be shared by pg_dump and psql, and renames it to
appendReloptionsArray(). The second patch fixes the actual psql bug.

Regards,
Dean
From a01c897bf2945f8ebef39bf3c58bb14e1739abf9 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Mon, 2 May 2016 11:27:15 +0100
Subject: [PATCH 1/2] Move and rename fmtReloptionsArray().

Move fmtReloptionsArray() from pg_dump.c to string_utils.c so that it
is available to other frontend code. In particular psql's \ev and \sv
commands need it to handle view reloptions. Also rename the function
to appendReloptionsArray(), which is a more accurate description of
what it does.
---
 src/bin/pg_dump/pg_backup.h |  7 
 src/bin/pg_dump/pg_dump.c   | 80 +++--
 src/fe_utils/string_utils.c | 72 +
 src/include/fe_utils/string_utils.h |  3 ++
 4 files changed, 88 insertions(+), 74 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 83f6029..6b162a4 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -288,4 +288,11 @@ extern int	archprintf(Archive *AH, const char *fmt,...) pg_attribute_printf(2, 3
 #define appendStringLiteralAH(buf,str,AH) \
 	appendStringLiteral(buf, str, (AH)->encoding, (AH)->std_strings)
 
+#define appendReloptionsArrayAH(buf,reloptions,prefix,AH) \
+	do { \
+		if (!appendReloptionsArray(buf, reloptions, prefix, \
+   (AH)->encoding, (AH)->std_strings)) \
+			write_msg(NULL, "WARNING: could not parse reloptions array\n"); \
+	} while (0)
+
 #endif   /* PG_BACKUP_H */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d3f5157..3005bf5 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -261,8 +261,6 @@ static void binary_upgrade_extension_member(PQExpBuffer upgrade_buffer,
 static const char *getAttrName(int attrnum, TableInfo *tblInfo);
 static const char *fmtCopyColumnList(const TableInfo *ti, PQExpBuffer buffer);
 static bool nonemptyReloptions(const char *reloptions);
-static void fmtReloptionsArray(Archive *fout, PQExpBuffer buffer,
-   const char *reloptions, const char *prefix);
 static char *get_synchronized_snapshot(Archive *fout);
 static PGresult *ExecuteSqlQueryForSingleRow(Archive *fout, char *query);
 static void setupDumpWorker(Archive *AHX);
@@ -15046,7 +15044,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 		if (nonemptyReloptions(tbinfo->reloptions))
 		{
 			appendPQExpBufferStr(q, " WITH (");
-			fmtReloptionsArray(fout, q, tbinfo->reloptions, "");
+			appendReloptionsArrayAH(q, tbinfo->reloptions, "", fout);
 			appendPQExpBufferChar(q, ')');
 		}
 		result = createViewAsClause(fout, tbinfo);
@@ -15301,13 +15299,14 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			if (nonemptyReloptions(tbinfo->reloptions))
 			{
 addcomma = true;
-fmtReloptionsArray(fout, q, tbinfo->reloptions, "");
+appendReloptionsArrayAH(q, tbinfo->reloptions, "", fout);
 			}
 			if (nonemptyReloptions(tbinfo->toast_reloptions))
 			{
 if (addcomma)
 	appendPQExpBufferStr(q, ", ");
-fmtReloptionsArray(fout, q, tbinfo->toast_reloptions, "toast.");
+appendReloptionsArrayAH(q, tbinfo->toast_reloptions, "toast.",
+		fout);
 			}
 			appendPQExpBufferChar(q, ')');
 		}
@@ -15908,7 +15907,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 			if (nonemptyReloptions(indxinfo->indreloptions))
 			{
 appendPQExpBufferStr(q, " WITH (");
-fmtReloptionsArray(fout, q, indxinfo->indreloptions, "");
+appendReloptionsArrayAH(q, indxinfo->indreloptions, "", fout);
 appendPQExpBufferChar(q, ')');
 			}
 
@@ -16809,7 +16808,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 	{
 		appendPQExpBuffer(cmd, "ALTER VIEW %s SET (",
 		  fmtId(tbinfo->dobj.name));
-		fmtReloptionsArray(fout, cmd, rinfo->reloptions, "");
+		appendReloptionsArrayAH(cmd, rinfo->reloptions, "", fout);
 		appendPQExpBufferStr(cmd, ");\n");
 	}
 
@@ -17704,73 +17703,6 @@ nonemptyReloptions(const char *reloptions)
 }
 
 /*
- * Format a reloptions array and append it to the given buffer.
- *
- * "prefix" is prepended to the option names; typically it's "" or "toast.".
- *
- * Note: this logic should generally match the backend's flatten_reloptions()
- * (in adt/ruleutils.c).
- */
-static void
-fmtReloptionsArray(Archive 

Re: [HACKERS] WIP: Covering + unique indexes.

2016-05-02 Thread Robert Haas
On Wed, Apr 27, 2016 at 5:47 PM, David Steele  wrote:
> On 4/27/16 5:08 PM, Peter Geoghegan wrote:
>> So, in case it needs to be
>> said, I'll say it: You've chosen a very ambitious set of projects to
>> work on, by any standard. I think it's a good thing that you've been
>> ambitious, and I don't suggest changing that, since I think that you
>> have commensurate skill. But, in order to be successful in these
>> projects, patience and resolve are very important.
>
> +1.
>
> This is very exciting work and I look forward to seeing it continue.
> The patch was perhaps not a good fit for the last CF of 9.6 but that
> doesn't mean it can't have a bright future.

+1.  Totally agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-05-02 Thread Robert Haas
On Fri, Apr 22, 2016 at 3:30 AM, Peter Geoghegan  wrote:
> On Fri, Apr 22, 2016 at 12:25 AM, Noah Misch  wrote:
>> Folks run clusters with ~1000 databases; we previously accepted at least one
>> complex performance improvement[1] based on that use case.  On the faster of
>> the two machines I tested, the present thread's commits slowed "pg_dumpall
>> --schema-only --binary-upgrade" by 1-2s per database.  That doubles pg_dump
>> runtime against the installcheck regression database.  A run against a 
>> cluster
>> of one hundred empty databases slowed fifteen-fold, from 8.6s to 131s.
>> "pg_upgrade -j50" probably will keep things tolerable for the 1000-database
>> case, but the performance regression remains jarring.  I think we should not
>> release 9.6 with pg_dump performance as it stands today.
>
> As someone that is responsible for many such clusters, I strongly agree.

Stephen: This is a CRITICAL ISSUE.  Unless I'm missing something, this
hasn't gone anywhere in well over a week, and we're wrapping beta next
Monday.  Please fix it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] what does function EmitWarningsOnPlaceholders?

2016-05-02 Thread Pavel Stehule
2016-05-02 13:54 GMT+02:00 Craig Ringer :

> On 2 May 2016 at 18:51, Pavel Stehule  wrote:
>
>> Hi
>>
>> what is sense of this function? There is not any comment.
>>
>>
> What it does IIRC is complain about defined GUCs nobody registered an
> interest in. So you can say
>
> EmitWarningsOnPlaceholders("bdr")
>
> and any GUC with prefix "bdr." will result in an "unrecognized
> configuration parameter" warning in the log.
>

thank you for info

Regards

Pavel


>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] Processes and caches in postgresql

2016-05-02 Thread Robert Haas
On Wed, Apr 27, 2016 at 7:07 AM, Anastasia Lubennikova
 wrote:
> Hi, hackers.
> There's a couple of questions about processes.
>
> I found EXEC_BACKEND flag, while reading the code.
> As I understood, it exists because we have to emulate fork() on WIN32.
> And also it allows to debug the same behavior on Linux.
> Is it right? Are there any other use cases?

Right.  Or, more precisely, we can't really emulate fork(), so we have
to make due with what's available on Windows, which is basically
exec().

> Another question is about "--fork" argument (see code below).
> I didn't find it in documentation, so I'm a bit confused.
> I wonder if it exists only for internal purposes?

Correct.  Just for internal purposes only - definitely not intended
for users to invoke directly.

> And the last, but not least. Do we have any
> presentations/articles/READMEs/whatever
> about caches (src/backend/utils/cache/*) in postgresql?
> I found nothing, besides comments in the code.

Nothing specific comes to mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] what does function EmitWarningsOnPlaceholders?

2016-05-02 Thread Craig Ringer
On 2 May 2016 at 18:51, Pavel Stehule  wrote:

> Hi
>
> what is sense of this function? There is not any comment.
>
>
What it does IIRC is complain about defined GUCs nobody registered an
interest in. So you can say

EmitWarningsOnPlaceholders("bdr")

and any GUC with prefix "bdr." will result in an "unrecognized
configuration parameter" warning in the log.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Dirtying replication slots when confirm_lsn is updated (trivial, 9.7)

2016-05-02 Thread Craig Ringer
Hi all

Currently replication slots are not dirtied when the client sends
confirmation of replay. So when we checkpoint we don't bother writing out
the updated slot state unless the restart_lsn has also changed as a result
of the replay confirmation.

That's pretty fuss free for the walsender interface, but for the SQL
interface it means that pg_logical_slot_get_changes() will repeat changes
you've already seen after a restart. Even if you did a clean shutdown.

I think that's a bit ugly, and I propose a trivial change to dirty the
replication slot from pg_logical_slot_get_changes_guts(...) if called in
get mode not peek mode. That way it'll get written out next time we
checkpoint the slot state.

The current behaviour isn't a data loss risk, and apps have to be prepared
for this to happen anyway on unclean exit. So this is mostly
convenience/cosmetic; I'll add it to the first 9.7 CF.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From d31ab118ee6689503bac1a3d308677a9742180b5 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 2 May 2016 19:50:22 +0800
Subject: [PATCH] Dirty replication slots from SQL decoding interface

---
 src/backend/replication/logical/logicalfuncs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 84b4d57..8b32f02 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -328,7 +328,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		 * business..)
 		 */
 		if (ctx->reader->EndRecPtr != InvalidXLogRecPtr && confirm)
+		{
 			LogicalConfirmReceivedLocation(ctx->reader->EndRecPtr);
+			ReplicationSlotMarkDirty();
+		}
 
 		/* free context, call shutdown callback */
 		FreeDecodingContext(ctx);
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] what does function EmitWarningsOnPlaceholders?

2016-05-02 Thread Pavel Stehule
Hi

what is sense of this function? There is not any comment.

Regards

Pavel


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-05-02 Thread Shulgin, Oleksandr
On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan  wrote:

>
> On 04/29/2016 06:11 PM, Merlin Moncure wrote:
>
> This is a simple matter of removing spaces in the occasional C string
>> literal in the serialization routines and adding a json_pretty
>> function.
>>
>
> I spent a few hours on this. See <
> https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat> for WIP
> - there are three commits. No regression tests yet for the two new
> functions (json_squash and json_pretty), Otherwise fairly complete.
> Removing whitespace generation was pretty simple for both json and jsonb.
>

Looks good, thank you!

It would make sense IMO to rename FormatState's `indent' field as `pretty':
it's being used to add whitespace between the punctuation, not only at
start of a line.  I'd also move the "if (indent)" check out of
add_indent(): just don't call it if no indent is needed.

I'll try to play with the patch to produce some regression tests for the
new functions.

--
Alex


[HACKERS] Subtle bug in autoconf flex version test

2016-05-02 Thread Daniel Gustafsson
The PGAC_PATH_FLEX version test in config/programs.m4 tests the major and minor
versions with = rather than == which unless I’m missing something is performing
assignment rather than testing equality?  The attached diff makes the test
trigger the expected warning on major/minor version on my OS X box (awk version
20070501).

cheers ./daniel



flex_version.diff
Description: Binary data




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unused macros in contrib code

2016-05-02 Thread Heikki Linnakangas

On 28/04/16 16:06, Daniel Gustafsson wrote:

While doing some archaeology I came across a couple of macros in pgstattuple
and pageinspect contrib code which seems to have been unused for some time.
CHECK_PAGE_OFFSET_RANGE is unused in both modules and commit 6405842 which
introduced contrib/pageinspect moved the code making CHECK_RELATION_BLOCK_RANGE
in pgstattuple unused.

For cleanliness sake it seems reasonable to remove these, patch attached which
passes make check (regular and in contrib).


Committed, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers