date:20170122


On 01/23/2017 08:30 AM, Amit Kapila wrote:

On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra
 wrote:


That being said, I'm ready to do some benchmarking on this, so that we have
at least some numbers to argue about. Can we agree on a set of workloads
that we want to benchmark in the first round?



I think if we can get data for pgbench read-write workload when data
doesn't fit in shared buffers but fit in RAM, that can give us some
indication.  We can try by varying the ratio of shared buffers w.r.t
data.  This should exercise the checksum code both when buffers are
evicted and at next read.  I think it also makes sense to check the
WAL data size for each of those runs.



Yes, I'm thinking that's pretty much the worst case for OLTP-like 
workload, because it has to evict buffers from shared buffers, 
generating a continuous stream of writes. Doing that on good storage 
(e.g. PCI-e SSD or possibly tmpfs) will further limit the storage 
overhead, making the time spent computing checksums much more 
significant. Makes sense?


What about other types of workload? I think we should not look just at 
write-heavy workloads - I wonder what is the overhead of verifying the 
checksums in read-only workloads (again, with data that fits into RAM).


What about large data loads simulating OLAP, and exports (e.g. pg_dump)?

That leaves us with 4 workload types, I guess:

1) read-write OLTP (shared buffers < data < RAM)
2) read-only OLTP (shared buffers < data < RAM)
3) large data loads (COPY)
4) large data exports (pg_dump)

Anything else?

The other question is of course hardware - IIRC there are differences 
between CPUs. I do have a new e5-2620v4, but perhaps it'd be good to 
also do some testing on a Power machine, or an older Intel CPU.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-22 Thread Amit Kapila

On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra
 wrote:
>
> That being said, I'm ready to do some benchmarking on this, so that we have
> at least some numbers to argue about. Can we agree on a set of workloads
> that we want to benchmark in the first round?
>

I think if we can get data for pgbench read-write workload when data
doesn't fit in shared buffers but fit in RAM, that can give us some
indication.  We can try by varying the ratio of shared buffers w.r.t
data.  This should exercise the checksum code both when buffers are
evicted and at next read.  I think it also makes sense to check the
WAL data size for each of those runs.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-22 Thread Kuntal Ghosh

On Thu, Jan 19, 2017 at 7:46 AM, Haribabu Kommi
 wrote:
>
> Updated patch attached.
>
I've looked at the updated patch. There are still some changes that
needs to be done. It includes:

1. Adding macaddr8 support for btree_gist and testcases.
2. Implementation of macaddr8 support for btree_gin is incomplete. You
need to modify contrib/btree_gin/*.sql files as well. Also, testcases
needs to be added.

Other than that, I've tested the basic implementation of the feature.
It looks good.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-22 Thread Vladimir Borodin

> 21 янв. 2017 г., в 18:18, Petr Jelinek  
> написал(а):
> 
> On 21/01/17 11:39, Magnus Hagander wrote:
>> Is it time to enable checksums by default, and give initdb a switch to
>> turn it off instead?

+1

> 
> I'd like to see benchmark first, both in terms of CPU and in terms of
> produced WAL (=network traffic) given that it turns on logging of hint bits.

Here are the results of my testing for 9.4 in December 2014. The benchmark was 
done on a production like use case when all the data fits in memory (~ 20 GB) 
and doesn’t fit into shared_buffers (it was intentionally small - 128 MB on 
stress stend), so that shared_blk_read / shared_blk_hit = 1/4. The workload was 
a typical OLTP but with mostly write queries (80% writes, 20% reads).

Here are the number of WALs written during the test:
Defaults263
wal_log_hints   410
checksums   367

I couldn’t find the answer why WAL write amplification is even worse for 
wal_log_hints than for checksums but I checked several times and it always 
reproduced.

As for CPU I couldn’t see the overhead [1]. And perf top showed me less then 2% 
in calculating CRC.

For all new DBs we now enable checksums at initdb and several dozens of our 
shards use checksums now. I don’t see any performance difference for them 
comparing with non-checksumed clusters. And we have already had one case when 
we caught data corruption with checksums.

[1] https://yadi.sk/i/VAiWjv6t3AQCs2?lang=en

--
May the force be with you…
https://simply.name

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

On Mon, Jan 23, 2017 at 7:53 AM, Tom Lane  wrote:
>> 2. I didn't do anything about docs, either, though maybe no change
>> is needed.  One user-visible change from this is that queries should
>> never return any "unknown"-type columns to clients anymore.  But I
>> think that is not documented now, so maybe there's nothing to change.
>
> The only thing I could find in the SGML docs that directly addresses
> unknown-type columns was a remark in the CREATE VIEW man page, which
> I've updated.  I also added a section to typeconv.sgml to document
> the behavior.

This looks good.

>> 3. We need to look at whether pg_upgrade is affected.  I think we
>> might be able to let it just ignore the issue for views, since they'd
>> get properly updated during the dump and reload anyway.  But if someone
>> had an actual table (or matview) with an "unknown" column, that would
>> be a pg_upgrade hazard, because "unknown" doesn't store like "text".
>
> I tested and found that simple views with unknown columns seem to update
> sanely if we just let pg_upgrade dump and restore them.  So I suggest we
> allow that to happen.  There might be cases where dependent views behave
> unexpectedly after such a conversion, but I think that would be rare
> enough that it's not worth forcing users to fix these views by hand before
> updating.  However, tables with unknown columns would fail the upgrade
> (since we'd reject the CREATE TABLE command) while matviews would be
> accepted but then the DDL wouldn't match the physical data storage.
> So I added code to pg_upgrade to check for those cases and refuse to
> proceed.  This is almost a straight copy-and-paste of the existing
> pg_upgrade code for checking for "line" columns.
>
> I think this is committable now; the other loose ends can be dealt
> with in follow-on patches.  Does anyone want to review it?

I have spent a couple of hours looking at it, and it looks in pretty
good shape. The concept of doing the checks in the parser is far
cleaner than what was proposed upthread to tweak the column lists, and
more generic.

One thing though: even with this patch, it is still possible to define
a domain with unknown as underlying type and have a table grab it:
create domain name as unknown;
create table foo_name (a name);
I think that this case should be restricted as well, and pg_upgrade
had better complain with a lookup at typbasetype in pg_type.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

On 23 January 2017 at 12:34, Craig Ringer  wrote:
> On 20 January 2017 at 21:40, Alvaro Herrera  wrote:
>
>> One option would be to add another limit Xid which advances before the
>> truncation but which is not used for other decisions other than limiting
>> what can users consult.
>
> This could be useful for other things, but it's probably heavier than needed.
>
> What I've done in the latest revision of the txid_status() patch is
> simply to advance OldestXid _before_ truncating the clog. The rest of
> the xid info is advanced after. Currently this is incorporated into
> the txid_status patch, but can be separated if desired.
>
> Relevant commit message portion:
>
> There was previously no way to look up an arbitrary xid without
> running the risk of having clog truncated out from under you. This
> hasn't been a problem because anything looking up xids in clog knows
> they're protected by datminxid, but that's not the case for arbitrary
> user-supplied XIDs. clog was truncated before we advance oldestXid so
> taking XidGenLock was insufficient. There's no way to look up a
> SLRU with soft-failure. To address this, increase oldestXid under 
> XidGenLock
> before we trunate clog rather than after, so concurrent access is safe.
>
> Note that while oldestXid is advanced before clog truncation, the xid
> limits are advanced _after_ it. If we advanced the xid limits before
> truncation too, we'd theoretically run the risk of allocating an xid
> from the clog section we're about to truncate, which would be no fun.
> (In practice it can't really happen since we only use 1/2 the
> available space at a time).
>
> Moving the lower bound up, truncating, and moving the upper bound up
> is the way to go IMO.

Patch with explanation in commit msg:
https://www.postgresql.org/message-id/camsr+yhodeduua5vw1_rjps_assvemejn_34rqd3pzhf5of...@mail.gmail.com

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-22 Thread Jeff Davis

On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin  wrote:
> Hello Jeff!
>
>>Review of the code itself:
> How do you think, is there anything else to improve in that patch or
> we can mark it as 'Ready for committer'?
>
> I've checked the concurrency algorithm with original work of Lehman
> and Yao on B-link-tree. For me everything looks fine (safe, deadlock
> free and not increased probability of livelock due to
> LockBufferForCleanup call).

Hi,

I looked at the patch some more.

I have some reservations about the complexity and novelty of the
approach. I don't think I've seen an approach quite like this one
before. It seems like the main reason you are using this approach is
because the btree structure was too simple (no left links); is that
right? My gut is telling me we should either leave it simple, or use a
real concurrent btree implementation.

One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Another idea is to partition the posting tree so that the root page
actually points to K posting trees. Scans would need to merge them,
but it would allow inserts to progress in one tree while the other is
being vacuumed. I think this would give good concurrency even for K=2.
I just had this idea now, so I didn't think it through very well.

What do you think?

Regards,
 Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

On 28 December 2016 at 18:00, Craig Ringer  wrote:
> On 23 December 2016 at 18:00, Craig Ringer  wrote:
>
>> I'll have to follow up with a patch soon, as it's Toddler o'Clock.
>
> Here we go.
>
> This patch advances oldestXid, under XidGenLock, _before_ truncating clog.
>
> txid_status() holds XidGenLock from when it tests oldestXid until it's
> done looking up clog, thus eliminating the race.
>
> CLOG_TRUNCATE records now contain the oldestXid, so they can advance
> oldestXid on a standby, or when we've truncated clog since the most
> recent checkpoint on the master during recovery. It's advanced under
> XidGenLock during redo to protect against this race on standby.
>
> As outlined in my prior mail I think this is the right approach. I
> don't like taking XidGenLock twice, but we don't advance datfrozenxid
> much so it's not a big concern. While a separate ClogTruncationLock
> could be added like in my earlier patch, oldestXid is currently under
> XidGenLock and I'd rather not change that.
>
> The biggest change here is that oldestXid is advanced separately to
> the vac limits in the rest of ShmemVariableCache. As far as I can tell
> we don't prevent two manual VACUUMs on different DBs from trying to
> concurrently run vac_truncate_clog, so this has to be safe against two
> invocations racing each other. Rather than try to lock out such
> concurrency, the patch ensures that oldestXid can never go backwards.
> It doesn't really matter if the vac limits go backwards, it's no worse
> than what can already happen in the current code.
>
> We cannot advance the vacuum limits before we truncate the clog away,
> in case someone tries to access a very new xid (if we're near
> wraparound)
>
> I'm pretty sure that commit timestamps suffer from the same flaw as
> Robert identified upthread with clog. This patch fixes the clog race,
> but not the similar one in commit timestamps. Unlike the clog race
> with txid_status(), the commit timestamps one is already potentially
> user-visible since we allow arbitrary xids to be looked up for commit
> timestamps. I'll address that separately.

Rebased patch attached. I've split the clog changes out from
txid_status() its self.

There is relevant discussion on the commit timestamp truncation fix
thread where the similar fix for commit_ts got committed.

https://www.postgresql.org/message-id/flat/979ff13d-0b8e-4937-01e8-2925c0adc306%402ndquadrant.com#979ff13d-0b8e-4937-01e8-2925c0adc...@2ndquadrant.com

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From f8e5e89145fee7afa9c629c80a3d578fc31e21b4 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 23 Jan 2017 13:25:30 +0800
Subject: [PATCH 1/2] Fix race between clog truncation and lookup

There was previously no way to look up an arbitrary xid without
running the risk of having clog truncated out from under you.

This hasn 't been a problem because anything looking up xids in clog knows
they're protected by datminxid, but that's not the case for arbitrary
user-supplied XIDs. clog was truncated before we advanced oldestXid under
XidGenLock, so holding XidGenLock during a clog lookup was insufficient to
prevent the race. There's no way to look up a SLRU with soft-failure;
attempting a lookup produces an I/O error. There's also no safe way to trap and
swallow the SLRU lookup error due mainly to locking issues.

To address this, increase oldestXid under XidGenLock before we trunate clog
rather than after, so concurrent lookups of arbitrary XIDs are safe if they are
done under XidGenLock. The rest of the xid limits are still advanced after clog
truncation to ensure there's no chance of a new xid trying to access an
about-to-be-truncated clog page. In practice this can't happen anyway since we
use only half the xid space at any time, but additional guards against future
change are warranted with something this crucial.

This race also exists in a worse form on standby servers. On a standby we only
advance oldestXid when we replay the next checkpoint, so there's a much larger
window between clog truncation and subsequent updating of the limit. Fix this
by recording the oldest xid in clog truncation records and applying the
oldestXid under XidGenLock before replaying the clog truncation.

Note that There's no need to take XidGenLock for normal clog lookups protected
by datfrozenxid, only if accepting arbitrary XIDs that might not be protected by
vacuum thresholds.
---
 src/backend/access/rmgrdesc/clogdesc.c | 12 +--
 src/backend/access/transam/clog.c  | 33 +
 src/backend/access/transam/varsup.c| 38 --
 src/backend/access/transam/xlog.c  | 17 +++
 src/backend/commands/vacuum.c  | 13 
 src/include/access/clog.h  |  5 +
 src/include/access/transam.h   |  2 ++

Re: [HACKERS] Parallel Index Scans

2017-01-22 Thread Amit Kapila

On Fri, Jan 20, 2017 at 7:29 AM, Haribabu Kommi
 wrote:
>
> On Thu, Jan 19, 2017 at 1:18 AM, Amit Kapila 
> wrote:
>> >
>> > +extern BlockNumber _bt_parallel_seize(IndexScanDesc scan, bool
>> > *status);
>> > +extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber
>> > scan_page);
>> >
>> > Any better names for the above functions, as these function will
>> > provide/set
>> > the next page that needs to be read.
>> >
>>
>> These functions also set the state of scan.  IIRC, these names were
>> being agreed between Robert and Rahila as well (suggested offlist by
>> Robert).  I am open to change if you or others have any better
>> suggestions.
>
>
> I didn't find any better names other than the following,
>
> _bt_get_next_parallel_page
> _bt_set_next_parallel_page
>

I am not sure using *_next_* here will convey the message because for
backward scans we set the last page.  I am open to changing the names
of functions if committer and or others prefer the names suggested by
you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-22 Thread Beena Emerson

Hello,

Please find attached an updated WIP patch. I have incorporated almost all
comments. This is to be applied over Robert's patches. I will post
performance results later on.

1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
sets the WalModMask and WalShiftBit. All the modulo and division operations
using XLogSegSize has been replaced with these. However, there are many
preprocessors which divide with XLogSegSize in xlog_internal.h. I have not
changed these because it would mean I will have to reassign the WalShiftBit
along with XLogSegSize in all the modules which use these macros. That does
not seem to be a good idea. Also, this means shift operator can be used
only in couple of places.

2. pg_standby: it deals with WAL files, so I have used the file size to set
the XLogSegSize (similar to pg_xlogdump).  Also, macro
MaxSegmentsPerLogFile using  XLOG_SEG_SIZE  is now defined
in SetWALFileNameForCleanup where it is used. Since XLOG_SEG_SIZE is not
preset, the code which throws an message if the file size is greater than
XLOG_SEG_SIZE had to be removed.

3. XLOGChooseNumBuffers: This function, called during the creation of
Shared Memory Segment, requires XLogSegSize which is set from the
ControlFile. Hence temporarily read the ControlFile in XLOGShmemSize before
invoking  XLOGChooseNumBuffer. The ControlFile is read again and stored on
the Shared Memory later on.

4. IsValidXLogSegSize: This is a macro to verify the XLogSegSize. This is
used in initdb, pg_xlogdump, pg_standby.

5. Macro for power2: There were couple of ideas to make it centralised.
For now, I have just defined it in xlog_internal.

6. Since CRC is used to verify the ControlFile before reading all the
contents from it as is and I do not see the need to re-verify the
xlog_seg_size.

7. min/max_wal_size still take values in KB unit and internally store it as
segment count. Though the calculation is now shifted to their respective
assign hook as the GUC_UNIT_XSEGS had to be removed.

8. Declaring XLogSegSize: There are 2 internal variables for the same
parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated
file src/include/pg_config.h. And xlog_internal.h defines:

#define XLogSegSize ((uint32) XLOG_SEG_SIZE)

To avoid renaming all parts of code, I made the following change in
xlog_internal.h

+ extern uint32 XLogSegSize;

+#define XLOG_SEG_SIZE XLogSegSize

 would it be better to just use one variable XLogSegSize everywhere. But
few external modules could be using XLOG_SEG_SIZE. Thoughts?

9. Documentation will be added in next version of patch.
-- 

Beena Emerson

On Sat, Jan 21, 2017 at 5:30 AM, Michael Paquier 
wrote:

> On Sat, Jan 21, 2017 at 4:50 AM, Robert Haas 
> wrote:
> > On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier
> >  wrote:
> >> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas 
> wrote:
> >>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas 
> wrote:
>  Problems 2-4 actually have to do with a DestReceiver of type
>  DestRemote really, really wanting to have an associated Portal and
>  database connection, so one approach is to create a stripped-down
>  DestReceiver that doesn't care about those things and then passing
>  that to GetPGVariable.
> >>>
> >>> I tried that and it worked out pretty well, so I'm inclined to go with
> >>> this approach.  Proposed patches attached.  0001 adds the new
> >>> DestReceiver type, and 0002 is a revised patch to implement the SHOW
> >>> command itself.
> >>>
> >>> Thoughts, comments?
> >>
> >> This looks like a sensible approach to me. DestRemoteSimple could be
> >> useful for background workers that are not connected to a database as
> >> well. Isn't there a problem with PGC_REAL parameters?
> >
> > No, because the output of SHOW is always of type text, regardless of
> > the type of the GUC.
>
> Thinking about that over night, that looks pretty nice. What would be
> nicer though would be to add INT8OID and BYTEAOID in the list, and
> convert as well the other replication commands. At the end, I think
> that we should finish by being able to remove all pq_* routine
> dependencies in walsender.c, saving quite a couple of lines.
> --
> Michael
>

-- 
Thank you,

Beena Emerson

Have a Great Day!

initdb-walsegsize-v2_WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2017-01-22 Thread Amit Kapila

On Sat, Jan 21, 2017 at 12:23 PM, Amit Kapila  wrote:
> On Sat, Jan 21, 2017 at 1:15 AM, Robert Haas  wrote:
>>
>> I think it's a good idea to put all three of those functions together
>> in the listing, similar to what we did in
>> 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06 for FDWs.  After all they are
>> closely related in purpose, and it may be easiest to understand if
>> they are next to each other in the listing.  I suggest that we move
>> them to the end in IndexAmRoutine similar to the way FdwRoutine was
>> done; in other words, my idea is to make the structure consistent with
>> the way that I revised the documentation instead of making the
>> documentation consistent with the order you picked for the structure
>> members.  What I like about that is that it gives a good opportunity
>> to include some general remarks on parallel index scans in a central
>> place, as I did in that patch.  Also, it makes it easier for people
>> who care about parallel index scans to find all of the related things
>> (since they are together) and for people who don't care about them to
>> ignore it all (for the same reason).  What do you think about that
>> approach?
>>
>
> Sounds sensible.  Updated patch based on that approach is attached.
>

In spite of being careful, I missed reorganizing the functions in
genam.h which I have done in attached patch.

>  I
> will rebase the remaining work based on this patch and send them
> separately.
>

Rebased patches are attached.  I have fixed few review comments in
these patches.
parallel_index_scan_v6 - Changed the function btparallelrescan so that
it always expects a valid parallel scan descriptor.
parallel_index_opt_exec_support_v6 - Removed the usage of
GatherSupportsBackwardScan.  Expanded the comments in
ExecReScanIndexScan.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel-generic-index-scan.2.patch
Description: Binary data


parallel_index_scan_v6.patch
Description: Binary data


parallel_index_opt_exec_support_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-22 Thread Amit Langote

On 2017/01/21 6:29, Robert Haas wrote:
> On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund  wrote:
>> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>>> Committed.
>>
>> One of the patches around this topic committed recently seems to cause
>> valgrind failures like
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-01-19%2008%3A40%3A02
>> :
> 
> Tom Lane independently reported this same issue.  I believe I've fixed
> it.  Sorry for not noticing and crediting this report also.

Thanks Robert for looking at this and committing the fix.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Valgrind-detected bug in partitioning code

2017-01-22 Thread Amit Langote

On 2017/01/21 9:01, Tom Lane wrote:
> Robert Haas  writes:
>> The difference is that those other equalBLAH functions call a
>> carefully limited amount of code whereas, in looking over the
>> backtrace you sent, I realized that equalPartitionDescs is calling
>> partition_bounds_equal which does this:
>>cmpval =
>> DatumGetInt32(FunctionCall2Coll(>partsupfunc[j],
>>   key->partcollation[j],
>>   b1->datums[i][j],
>>   b2->datums[i][j]))
> 
> Ah, gotcha.
> 
>> That's of course opening up a much bigger can of worms.  But apart
>> from the fact that it's unsafe, I think it's also wrong, as I said
>> upthread.  I think calling datumIsEqual() there should be better all
>> around.  Do you think that's unsafe here?
> 
> That sounds like a plausible solution.  It is safe in the sense of
> being a bounded amount of code.  It would return "false" in various
> interesting cases like toast pointer versus detoasted equivalent,
> but I think that would be fine in this application.

Sorry for jumping in late.  Attached patch replaces the call to
partitioning-specific comparison function by the call to datumIsEqual().
I wonder if it is safe to assume that datumIsEqual() would return true for
a datum and copy of it made using datumCopy().  The latter is used to copy
a single datum from a bound's Const node (what is stored in the catalog
for every bound).

> It would probably be a good idea to add something to datumIsEqual's
> comment to the effect that trying to make it smarter would be a bad idea,
> because some callers rely on it being stupid.

I assume "making datumIsEqual() smarter" here means to make it account for
toasting of varlena datums, which is not a good idea because some of its
callers may be working in the context of an aborted transaction.  I tried
to update the header comment along these lines, though please feel to
rewrite it.

Thanks,
Amit
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ad95b1bc55..8c9a373f1f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -639,12 +639,14 @@ partition_bounds_equal(PartitionKey key,
 	continue;
 			}
 
-			/* Compare the actual values */
-			cmpval = DatumGetInt32(FunctionCall2Coll(>partsupfunc[j],
-	 key->partcollation[j],
-	 b1->datums[i][j],
-	 b2->datums[i][j]));
-			if (cmpval != 0)
+			/*
+			 * Compare the actual values; note that we do not invoke the
+			 * partitioning specific comparison operator here, because we
+			 * could be in an aborted transaction.
+			 */
+			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+			  key->parttypbyval[j],
+			  key->parttyplen[j]))
 return false;
 		}
 
diff --git a/src/backend/utils/adt/datum.c b/src/backend/utils/adt/datum.c
index 535e4277cc..071a7d4db1 100644
--- a/src/backend/utils/adt/datum.c
+++ b/src/backend/utils/adt/datum.c
@@ -209,6 +209,10 @@ datumTransfer(Datum value, bool typByVal, int typLen)
  * of say the representation of zero in one's complement arithmetic).
  * Also, it will probably not give the answer you want if either
  * datum has been "toasted".
+ *
+ * Do not try to make this any smarter than it currently is with respect
+ * to "toasted" datums, because some of the callers could be working in the
+ * context of an aborted transaction.
  *-
  */
 bool

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers

On 23 January 2017 at 13:19, Jaime Casanova
 wrote:
> On 22 January 2017 at 23:37, Michael Paquier  
> wrote:
>> Hi all,
>>
>> When spawning a new instance, I found the following thing, which is
>> surprising at first sight:
>> postgres: bgworker: logical replication launcher
>>
>
> This is because the downstream needs it
> https://www.postgresql.org/message-id/CAMsr%2BYHH2XRUeqWT6pn_X0tFpP5ci7Fsrsn67TDXbFLeMknhBA%40mail.gmail.com

... and the launcher is responsible for launching workers for downstreams.

We could probably have the postmaster check whether any logical
replication downstreams exist anywhere and avoid starting the
launcher, but that means the postmaster has to start poking in the
logical replication catalog tables. That seems unnecessarily risky.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers

2017-01-22 Thread Jaime Casanova

On 22 January 2017 at 23:37, Michael Paquier  wrote:
> Hi all,
>
> When spawning a new instance, I found the following thing, which is
> surprising at first sight:
> postgres: bgworker: logical replication launcher
>

This is because the downstream needs it
https://www.postgresql.org/message-id/CAMsr%2BYHH2XRUeqWT6pn_X0tFpP5ci7Fsrsn67TDXbFLeMknhBA%40mail.gmail.com

> In the same range of thoughts, it is also surprising to have the
> default value of max_logical_replication_workers set to 4, I would
> have thought that for a feature that has created more than 13k of code
> diffs, it would be disabled by default.
>

+1, we should that to 0 before release

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

On Mon, Jan 23, 2017 at 2:06 PM, Venkata B Nagothi  wrote:
>
> On Sat, Jan 14, 2017 at 5:10 AM, Vladimir Rusinov 
> wrote:
>>
>> Attached are two new version of the patch: one keeps aliases, one don't.
>
>
> Both the patches (with and without aliases) are not getting applied to the
> latest master. Below is the error -
>
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --
> |diff --git a/src/include/access/xlog_fn.h b/src/include/access/xlog_fn.h
> |index a013d047a2..a46b2f995c 100644
> |--- a/src/include/access/xlog_fn.h
> |+++ b/src/include/access/xlog_fn.h
> --
> File to patch:

xlog_fn.h is not necessary anymore as all the declarations of
functions defined in pg_proc.h are now generated automatically, so the
conflict is simple enough to resolve.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-22 Thread Venkata B Nagothi

On Sat, Jan 14, 2017 at 5:10 AM, Vladimir Rusinov 
wrote:

> Attached are two new version of the patch: one keeps aliases, one don't.
>

Both the patches (with and without aliases) are not getting applied to the
latest master. Below is the error -

Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/src/include/access/xlog_fn.h b/src/include/access/xlog_fn.h
|index a013d047a2..a46b2f995c 100644
|--- a/src/include/access/xlog_fn.h
|+++ b/src/include/access/xlog_fn.h
--
File to patch:

Also, remove stray reference to xlog function in one of the tests.
>
> I've lost vote count. Should we create a form to calculate which one of
> the patches should be commited?
>
> If we decide to go the extension way, perhaps it can be maintained outside
> of core Postgres?
>

My take on having aliases to the functions :

In my opinion as a DBA, I agree with having no-aliases. Having functions
doing the same thing with two different names could be confusing. I have
been doing quite a number of PostgreSQL upgrades since few years, i do not
see, the function name changes as a major issue or a road-blocker in the
upgrade exercise. If the function name has changed, it has changed. It
would be a serious thing to consider during the upgrade if there is a
change in the functionality of a particular function as it needs testing. I
think, changing of the function names in the scripts and custom-tools has
to be executed and might take up a bit of extra time.

IMHO, It is not a good idea to keep aliases as this would make the DBAs
lazier to change the function names on priority in the automated
scripts/jobs/tools during the upgrades. Especially, in bigger and complex
environments where-in database environments are handled by different
multiple groups of DBAs, it could be possible that both *xlog* functions
and *wal* functions will be used up in the scripts and all of them will be
working fine and once the *xlog* functions names are completely removed,
then some of the jobs/scripts start failing.

I would vote for "no-aliases".

Regards,
Venkata Balaji N

Database Consultant

Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-01-22 Thread Stephen Frost

Michael, all,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> There is one week left for this commit fest, and now is the time to
> push for things that have a chance to get in. As of now, there is a
> total of 25 patches waiting for some love from a committer.

Many thanks for your work as CFM this month!

> In the category of Clients, there are some as well:
> - pgbench - more operators and functions.

I don't mind working to review these, but my recollection is that
there's actually still question about just what operators, what
functions, and how we want to handle them.  In other words, this
particular patch, imv, could really use a 'summary' email, most likely
from the author, regarding the various opinions raised and the different
view-points presented.  Otherwise, I'm not sure it's really going to
make much more progress.

> - pgpassfile connection option.
> - pg_dump, pg_dumpall and data durability.

I'm interested in these.  I had thought the data durability one had
another committer already interested and involved, but if not, I may get
to it.

> Now for Miscellaneous:
> - Rename pg_switch_xlog to pg_switch_wal, the name is misleading here
> as this is basically renaming all the system functions including the
> term "xlog" prefix to "wal".

I'm a definite +1 on that happening but, for as much as it's marked
'ready for committer', it's entirely without consensus.  I'm not sure
how to resolve that- we could ask some committee to investigate it
further and provide a final determination (core?  some other set of
folks?), or simply tally the votes cast and decide to go with whatever
the result of that is, or change the options (and then figure out how to
decide between whatever the new options are..).

> For Monitoring & Control:
> - amcheck (B-Tree integrity checking tool). This patch is in such a
> state for a rather long time...

I'm a big +1 on having this included.  In my view, this is similar to
the 'enable checksums by default' question, but without nearly *all* of
the downside.  There's no performance impact, there's no usability
issue, the only concerns that I see off-hand are serious bugs that
screw things up and the possibility that it throws an error when nothing
is wrong., neither of which look very likely considering it's been used
extensivly by a number of people and quite a bit at Heroku, as I
understand it.  At this point, I tend to doubt that we're going to see
such errors without the broader community embracing and using it, which
means. to me at least, that it's ready to go into core/contrib.

> For server features:
> - Forbid use of LF and CR characters in database and role names.

I'll try and take a look at this.  In other words, for me, this comes
before the pgpassfile/pg_dump data durability stuff.  I have a couple
other things to clear off my plate, but I expect to get those taken care
of and still have time to handle this too.

> As of the current score of the CF, for a total of 155 patches, 52 have
> been committed, 3 rejected and 7 marked as returned with feedback. It
> may look low, but actually it is not that bad by experience looking at
> those numbers.

We've certainly had some really big patches land during this CF.  I'm
hopeful that we won't have any "really big" patches landing for the last
CF, or PG10 is going to be difficult to wrestle out the door.

Thanks again, Michael!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Parallel bitmap heap scan

2017-01-22 Thread Dilip Kumar

On Thu, Jan 19, 2017 at 12:26 AM, Robert Haas  wrote:
>> Patch 0001 and 0003 required to rebase on the latest head. 0002 is
>> still the same.
>
> I've committed the first half of 0001.
Thanks.  0001 and 0003 required rebasing after this commit.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0001-opt-parallelcost-refactoring-v12.patch
Description: Binary data


0002-hash-support-alloc-free-v12.patch
Description: Binary data


0003-parallel-bitmap-heap-scan-v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers

Hi all,

When spawning a new instance, I found the following thing, which is
surprising at first sight:
postgres: bgworker: logical replication launcher

There is perhaps no problem to keep that enabled by default until the
release 10 wraps to give it some buildfarm coverage similarly to what
has been done last year for parallel query, but what is surprising me
is that even if wal_level is *not* logical this gets started. I think
that something like the patch attached is needed, so as
ApplyLauncherRegister() is a noop if wal_level < logical.

In the same range of thoughts, it is also surprising to have the
default value of max_logical_replication_workers set to 4, I would
have thought that for a feature that has created more than 13k of code
diffs, it would be disabled by default.

Thanks,
-- 
Michael


logirep-bgworker.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

On 20 January 2017 at 21:40, Alvaro Herrera  wrote:

> One option would be to add another limit Xid which advances before the
> truncation but which is not used for other decisions other than limiting
> what can users consult.

This could be useful for other things, but it's probably heavier than needed.

What I've done in the latest revision of the txid_status() patch is
simply to advance OldestXid _before_ truncating the clog. The rest of
the xid info is advanced after. Currently this is incorporated into
the txid_status patch, but can be separated if desired.

Relevant commit message portion:

There was previously no way to look up an arbitrary xid without
running the risk of having clog truncated out from under you. This
hasn't been a problem because anything looking up xids in clog knows
they're protected by datminxid, but that's not the case for arbitrary
user-supplied XIDs. clog was truncated before we advance oldestXid so
taking XidGenLock was insufficient. There's no way to look up a
SLRU with soft-failure. To address this, increase oldestXid under XidGenLock
before we trunate clog rather than after, so concurrent access is safe.

Note that while oldestXid is advanced before clog truncation, the xid
limits are advanced _after_ it. If we advanced the xid limits before
truncation too, we'd theoretically run the risk of allocating an xid
from the clog section we're about to truncate, which would be no fun.
(In practice it can't really happen since we only use 1/2 the
available space at a time).

Moving the lower bound up, truncating, and moving the upper bound up
is the way to go IMO.

>  Another option is not to implement direct reads
> from the clog.

I think there's a pretty decent argument for having clog lookups;
txid_status(...) serves as a useful halfway position between accepting
indeterminate commit status on connection loss and using full 2PC.

> Yet another option is that before we add such interface
> somebody produces proof that the problem does not, in fact, exist.

It does exist.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-01-22 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> - Cascading standby cannot catch up and get stuck emitting the same message
> repeatedly, a 9.3-only bug fix.

Thank you for your hard work as a CFM.  For a trivial note, this is for 9.2 
only, not 9.3.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

On 20 January 2017 at 05:32, Peter Eisentraut
 wrote:
> On 1/19/17 10:06 AM, Alvaro Herrera wrote:
>>> Also, I wonder whether we should not in vacuum.c change the order of the
>>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
>>> to keep everything consistent.
>>
>> I am wary of doing that.  The current coding is well battle-tested by
>> now, but doing things in the opposite order, not at all.  Pending some
>> testing to show that there is no problem with a change, I would leave
>> things as they are.
>
> I appreciate this hesitation.
>
> The issue the patch under discussion is addressing is that we have a
> user-space interface to read information about commit timestamps.  So if
> we truncate away old information before we update the mark where the
> good information starts, then we get the race.  We don't have a
> user-space interface reading, say, the clog, but it doesn't seem
> implausible for that to exist at some point.  How would this code have
> to be structured then?

I have a patch in the current commitfest that exposes just such a user
interface, txid_status() .

See https://commitfest.postgresql.org/12/730/ .

Robert was about to commit when he identified this race in commit
status lookup, which led me to identify the same race addressed here
for commit timestamps.

>> What I fear is: what
>> happens if a concurrent checkpoint reads the values between the two
>> operations, and a crash occurs?  I think that the checkpoint might save
>> the updated values, so after crash recovery the truncate would not be
>> executed, possibly leaving files around.  Leaving files around might be
>> dangerous for multixacts at least (it's probably harmless for xids).
>
> Why is leaving files around dangerous?

As far as I can tell, leaving the files around as such isn't dangerous.

The problem arises if we wrap around and start treating old SLRU
contents as new again without first truncating them away.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a comment in feelist.c

2017-01-22 Thread Tatsuo Ishii

> I've found a mistake in a comment of StrategyNotifyBgWriter
> in freelist.c. bgwriterLatch was replaced by bgwprocno in
> the following commit, but this is remained in the comment.
> 
> commit d72731a70450b5e7084991b9caa15cb58a2820df
> Author: Andres Freund 
> Date:   Thu Dec 25 18:24:20 2014 +0100
> 
> Lockless StrategyGetBuffer clock sweep hot path.

Looks good to me. I will commit/push the patch if there's no
objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2017-01 will begin soon!

On Wed, Jan 4, 2017 at 4:02 PM, Michael Paquier
 wrote:
> Here are now the current stats of the patches:
> - Needs review: 82.
> - Waiting on Author: 18.
> - Ready for Committer: 18.
> Meaning that some effort is required from committers and reviewers to
> get into a better balance. For the authors of the patches waiting for
> their input, please avoid letting your patch in a latent state for too
> long.
>
> And of course, please double-check that the status of your patch on
> the CF app (https://commitfest.postgresql.org/12/) is set according to
> the status of the thread.
>
> Let's have a productive CF for Postgres 10! And the show begins..
> Thanks.

There is one week left for this commit fest, and now is the time to
push for things that have a chance to get in. As of now, there is a
total of 25 patches waiting for some love from a committer.

In the category of Bug Fixes, there are the following things that
deserve attention:
- pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled,
Windows-only patch, and a bug fix.
- Cascading standby cannot catch up and get stuck emitting the same
message repeatedly, a 9.3-only bug fix.
- Potential data loss of 2PC files, a bug fix to address the fact that
pg_twophase is never fsync()'d.
All those patches apply on multiple branches.

In the category of Clients, there are some as well:
- pgbench - more operators and functions.
- pgpassfile connection option.
- pg_dump, pg_dumpall and data durability.

Now for Miscellaneous:
- slab-like memory allocators for reorderbuffer
- Rename pg_switch_xlog to pg_switch_wal, the name is misleading here
as this is basically renaming all the system functions including the
term "xlog" prefix to "wal".
- Add GUCs for predicate lock promotion thresholds
- An isolation test for SERIALIZABLE READ ONLY DEFERRABLE

For Monitoring & Control:
- amcheck (B-Tree integrity checking tool). This patch is in such a
state for a rather long time...

For Performance:
- Group mode for CLOG updates.
- Vacuum: allow usage of more than 1GB of work mem.
- Parallel Index Scans.
- Unique Joins

For Replication & Recovery:
- Write Ahead Logging for Hash Indexes.
- WAL consistency check facility
- Reset slot xmin when HS feedback is turned off while standby is shut down

For server features:
- Forbid use of LF and CR characters in database and role names.
- XMLTABLE function
- background sessions
- pg_background contrib module proposal
- A contrib extension to help prevent common errors in DML
- Add pg_current_logfile() function to obtain the current log file
used by the syslogger.
- pg_xlogdump follow into the future when specifying filename. Magnus,
are you going to take care of that?

As of the current score of the CF, for a total of 155 patches, 52 have
been committed, 3 rejected and 7 marked as returned with feedback. It
may look low, but actually it is not that bad by experience looking at
those numbers.

There is also a very high number of patches waiting for input from the
author, so please be sure to update your patch status correctly, or
send a new version to address the comments from reviewers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump, pg_dumpall and data durability

On Tue, Nov 29, 2016 at 1:30 PM, Michael Paquier
 wrote:
> On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier
>  wrote:
>> On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz  
>> wrote:
>>> Michael Paquier wrote:
 Meh. I forgot docs and --help output updates.
>>>
>>> Looks good, except that you left the "N" option in the getopt_long
>>> call for pg_dumpall.  I fixed that in the attached patch.
>>
>> No, v5 has removed it, but it does not matter much now...
>>
>>> I'll mark the patch "ready for committer".
>>
>> Thanks!
>
> Moved to CF 2017-01.

Moved to CF 2017-03 with the same status, ready for committer. Perhaps
there is some interest in this feature? v5 of the patch still applies,
with a couple of hunks, so v6 is attached.
-- 
Michael


pgdump-sync-v6.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

On Wed, Jan 18, 2017 at 2:46 PM, Michael Paquier
 wrote:
> FWIW, this patch is on a "waiting on author" state and that's right.
> As the discussion on SASLprepare() and the decisions regarding the way
> to implement it, or at least have it, are still pending, I am not
> planning to move on with any implementation until we have a plan about
> what to do. Just using libidn (LGPL) for a first shot is rather
> painless but... I am not alone here.

With decisions on this matter pending, I am marking this patch as
"returned with feedback". If there is a consensus on what to do, I'll
be happy to do the implementation with the last CF in March in sight.
If no, that would mean that this feature will not be part of PG 10.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Enabling replication connections by default in pg_hba.conf

Hi all,

As now wal_level = replica has become the default for Postgres 10,
could we consider as well making replication connections enabled by
default in pg_hba.conf? This requires just uncommenting a couple of
lines in pg_hba.conf.sample.
Thanks,
-- 
Michael


pghba_rep_default.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-22 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander
> Is it time to enable checksums by default, and give initdb a switch to turn
> it off instead?
> 
> I keep running into situations where people haven't enabled it, because
> (a) they didn't know about it, or (b) their packaging system ran initdb
> for them so they didn't even know they could. And of course they usually
> figure this out once the db has enough data and traffic that the only way
> to fix it is to set up something like slony/bucardo/pglogical and a whole
> new server to deal with it.. (Which is something that would also be good
> to fix -- but having the default changed would be useful as well)

+10
I was wondering why the community had decided to turn it off by default.  IIRC, 
the reason was that the performance overhead was 20-30% when the entire data 
directory was placed on the tmpfs, but it's not as important as the data 
protection by default.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2017-01-22 Thread Petr Jelinek

On 20/01/17 22:30, Petr Jelinek wrote:
> Since it's not exactly straight forward to find when these need to be
> initialized based on commands, I decided to move the initialization code
> to exec_replication_command() since that's always called before anything
> so that makes it much less error prone (patch 0003).
> 
> The 0003 should be backpatched all the way to 9.4 where multiple
> commands started using those buffers.
> 

Actually there is better place, the WalSndInit().

Just to make it easier for PeterE (or whichever committer picks this up)
I attached all the logical replication followup fix/polish patches:

0001 - Changes the libpqrcv_connect to use async libpq api so that it
won't get stuck forever in case of connect is stuck. This is preexisting
bug that also affects walreceiver but it's less visible there as there
is no SQL interface to initiate connection there.

0002 - Close replication connection when CREATE SUBSCRIPTION gets
canceled (otherwise walsender on the other side may stay in idle in
transaction state).

0003 - Fixes buffer initialization in walsender that I found when
testing the above two. This one should be back-patched to 9.4 since it's
broken since then.

0004 - Fixes the foreign key issue reported by Thom Brown and also adds
tests for FK and trigger handling.

0005 - Adds support for renaming publications and subscriptions.

All rebased on top of current master (90992e0).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From e2c30b258e97fbba51c8d0f6e12ac557cafb129a Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 20 Jan 2017 21:20:56 +0100
Subject: [PATCH 1/5] Use asynchronous connect API in libpqwalreceiver

---
 src/backend/postmaster/pgstat.c|  4 +-
 .../libpqwalreceiver/libpqwalreceiver.c| 58 +-
 src/include/pgstat.h   |  2 +-
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7176cf1..19ad6b5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3340,8 +3340,8 @@ pgstat_get_wait_client(WaitEventClient w)
 		case WAIT_EVENT_WAL_RECEIVER_WAIT_START:
 			event_name = "WalReceiverWaitStart";
 			break;
-		case WAIT_EVENT_LIBPQWALRECEIVER_READ:
-			event_name = "LibPQWalReceiverRead";
+		case WAIT_EVENT_LIBPQWALRECEIVER:
+			event_name = "LibPQWalReceiver";
 			break;
 		case WAIT_EVENT_WAL_SENDER_WAIT_WAL:
 			event_name = "WalSenderWaitForWAL";
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 7df3698..8dc51d4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -112,6 +112,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
  char **err)
 {
 	WalReceiverConn *conn;
+	PostgresPollingStatusType status;
 	const char *keys[5];
 	const char *vals[5];
 	int			i = 0;
@@ -138,7 +139,60 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	vals[i] = NULL;
 
 	conn = palloc0(sizeof(WalReceiverConn));
-	conn->streamConn = PQconnectdbParams(keys, vals, /* expand_dbname = */ true);
+	conn->streamConn = PQconnectStartParams(keys, vals,
+			/* expand_dbname = */ true);
+	/* Check for conn status. */
+	if (PQstatus(conn->streamConn) == CONNECTION_BAD)
+	{
+		*err = pstrdup(PQerrorMessage(conn->streamConn));
+		return NULL;
+	}
+
+	/* Poll connection. */
+	do
+	{
+		int			rc;
+		int			extra_flag;
+
+		/* Determine which function we should use for Polling. */
+		status = PQconnectPoll(conn->streamConn);
+
+		/* Next action based upon status value. */
+		switch (status)
+		{
+			case PGRES_POLLING_READING:
+extra_flag = WL_SOCKET_READABLE;
+/* pass through */
+			case PGRES_POLLING_WRITING:
+extra_flag = WL_SOCKET_WRITEABLE;
+
+ResetLatch(>procLatch);
+rc = WaitLatchOrSocket(>procLatch,
+	   WL_POSTMASTER_DEATH |
+	   WL_LATCH_SET | extra_flag,
+	   PQsocket(conn->streamConn),
+	   0,
+	   WAIT_EVENT_LIBPQWALRECEIVER);
+if (rc & WL_POSTMASTER_DEATH)
+	exit(1);
+
+/* Interrupted. */
+if (rc & WL_LATCH_SET)
+{
+	CHECK_FOR_INTERRUPTS();
+	break;
+}
+break;
+
+			/* Either can continue polling or finished one way or another. */
+			default:
+break;
+		}
+
+		/* Loop until we have OK or FAILED status. */
+	} while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED);
+
+	/* Check the status. */
 	if (PQstatus(conn->streamConn) != CONNECTION_OK)
 	{
 		*err = pstrdup(PQerrorMessage(conn->streamConn));
@@ -508,7 +562,7 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query)
    WL_LATCH_SET,
    PQsocket(streamConn),
    0,
-

\if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-22 Thread Corey Huinker

>
>
> Fabien is pressed for time, so I've been speaking with him out-of-thread
> about how I should go about implementing it.
>
> The v1 patch will be \if , \elseif , \else, \endif, where
>  will be naively evaluated via ParseVariableBool().
>
> \ifs and \endifs must be in the same "file" (each MainLoop will start a
> new if-stack). This is partly for sanity (you can see the pairings unless
> the programmer is off in \gset meta-land), partly for ease  of design (data
> structures live in MainLoop), but mostly because it would an absolute
> requirement if we ever got around to doing \while.
>
> I hope to have something ready for the next commitfest.
>
> As for the fate of \quit_if, I can see it both ways. On the one hand,
> it's super-simple, already written, and handy.
>
> On the other hand, it's easily replaced by
>
> \if 
> \q
> \endif
>
>
> So I'll leave that as a separate reviewable patch.
>
> As for loops, I don't think anyone was pushing for implementing \while
> now, only to have a decision about what it would look like and how it would
> work. There's a whole lot of recording infrastructure (the input could be a
> stream) needed to make it happen. Moreover, I think \gexec scratched a
> lot of the itches that would have been solved via a psql looping structure.
>


And here's the patch. I've changed the subject line and will be submitting
a new entry to the commitfest.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 9915731..2c3fccd 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2007,6 +2007,83 @@ hello 10
 
   
 
+  
+\if expr
+\elseif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+\if true
+\if 1
+\if yes
+\if on
+\echo 'all true'
+all true
+\endif
+\endif
+\endif
+\endif
+\if false
+\elseif 0
+\elseif no
+\elseif off
+\else
+\echo 'all false'
+all false
+\endif
+\if true
+\else
+\echo 'should not print #1'
+\endif
+\if false
+\elseif true
+\else
+\echo 'should not print #2'
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all 
+\if-\endif are matched, then
+psql will raise an error.
+
+
+The \if and \elseif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to the same values allowed for
+other option booleans (true, false, 1, 0, on, off, yes, no, etc).
+
+
+Queries within a false branch of a conditional block will not be
+sent to the server.
+
+
+Non-conditional \-commands within a false branch
+of a conditional block will not be evaluated for correctness. The
+command will be ignored along with all remaining input to the end
+of the line.
+
+
+Expressions on \if and \elseif
+commands within a false branch of a conditional block will not be
+evaluated.
+
+
+A conditional block can at most one \else command.
+
+
+The \elseif command cannot follow the
+\else command.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4139b77..d4e0bb8 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -49,6 +49,7 @@
 #include "psqlscanslash.h"
 #include "settings.h"
 #include "variables.h"
+#include "fe_utils/psqlscan_int.h"
 
 /*
  * Editable database object types.
@@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state,
status = PSQL_CMD_ERROR;
}
 
-   if (status != PSQL_CMD_ERROR)
+   if ((status != PSQL_CMD_ERROR) && psqlscan_branch_active(scan_state))
{
/* eat any remaining arguments after a valid command */
/* note we suppress evaluation of backticks here */
@@ -194,6 +195,32 @@ read_connect_arg(PsqlScanState scan_state)
return result;
 }
 
+/*
+ * Read and interpret argument as a boolean expression.
+ */
+static bool
+read_boolean_expression(PsqlScanState scan_state, const char *action)
+{
+   boolresult = false;
+   char*expr = psql_scan_slash_option(scan_state,
+   
OT_NORMAL, NULL, false);
+   if (expr)
+   {
+   if (ParseVariableBool(expr, action))
+   result = true;
+   free(expr);
+   }
+   return result;
+}
+
+static bool
+is_branching_command(const char *cmd)
+{
+

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)


On 1/16/17 10:09 PM, Haribabu Kommi wrote:

Yes, that' correct. Currently with this approach, it is not possible to
ditch the
heap completely. This approach is useful for the cases, where the user wants
to store only some columns as part of clustered index.


Ahh, that's unfortunate. Billion row+ tables are becoming rather common, 
and that 24GB of overhead starts becoming very painful. It's actually a 
lot worse considering there will be at least one index on the table, so 
100GB+ of overhead isn't that uncommon.



Another complication is that one of the big advantages of a CSTORE
is allowing analysis to be done efficiently on a column-by-column
(as opposed to row-by-row) basis. Does your patch by chance provide
that?

Not the base patch that I shared. But the further patches provides the
data access
column-by-column basis using the custom plan methods.


Great, that's something else that a column store really needs to be 
successful. Something else I suspect is necessary is a faster/better way 
to eliminate chunks of rows from scans.


Just as an example, with my simple array-based approach, you can store a 
range type along with each array that contains the min and max values 
for the array. That means any query that wants values between 50 and 100 
can include a clause that filters on range types that overlap with 
[50,100]. That can be indexed very efficiently and is fast to run checks 
against.



Generally speaking, I do think the idea of adding support for this
as an "index" is a really good starting point, since that part of


... as discussed elsewhere in the thread, adding a bunch of hooks is 
probably not a good way to do this. :/



That would be a great way to gain knowledge on what users would want
to see in a column store, something else I suspect we need. It would
also be far less code than what you or Alvaro are proposing. When it
comes to large changes that don't have crystal-clear requirements, I
think that's really important.

The  main use case of this patch is to support mixed load environments,
where both OLTP and OLAP queries are possible. The advantage of
proposed patch design is, providing good performance to OLAP queries
without affecting OLTP.


Yeah, that's a big part of what I was envisioning with my array-based 
approach. In simple terms, there would be a regular row-based table, and 
an array-based table, with a view that allows seamless querying into 
both (re-presenting the array-storage on a per-row basis). There would 
be a periodic process that moves entire sets of rows from the row 
storage into the array storage.


If you updated or deleted a row that was part of an array, the contents 
of the entire array could be moved back into row-based storage. After a 
period of time, rows would get moved back into array storage. Or the 
array could be modified in place, but you need to be very careful about 
bloating the array storage if you do that.


The big missing piece here is getting the planner to intelligently 
handle a mixed row/column store. As I mentioned, you can easily add 
range type fields to greatly increase performance, but they won't do any 
good unless the appropriate filters get added. It's not THAT hard to do 
that by hand, but it'd be great if there was a more automated method. 
Such a method might also be very useful for transforming expressions 
like date_part('quarter', ...) into something that could use existing 
indexes.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2017-01-22 Thread Tom Lane

I wrote:
> I spent some time fooling with this today and came up with the attached
> patch.  I think this is basically the direction we should go in, but
> there are various loose ends:

Here's an updated patch that's rebased against today's HEAD and addresses
most of the loose ends:

> 1. I adjusted a couple of existing regression tests whose results are
> affected, but didn't do anything towards adding new tests showing the
> desirable results of this change (as per the examples in this thread).

Added some regression test cases.  These are mostly designed to prove
that coercion to text occurs where expected, but I did include a couple
of queries that would have failed outright before.

> 2. I didn't do anything about docs, either, though maybe no change
> is needed.  One user-visible change from this is that queries should
> never return any "unknown"-type columns to clients anymore.  But I
> think that is not documented now, so maybe there's nothing to change.

The only thing I could find in the SGML docs that directly addresses
unknown-type columns was a remark in the CREATE VIEW man page, which
I've updated.  I also added a section to typeconv.sgml to document
the behavior.

> 3. We need to look at whether pg_upgrade is affected.  I think we
> might be able to let it just ignore the issue for views, since they'd
> get properly updated during the dump and reload anyway.  But if someone
> had an actual table (or matview) with an "unknown" column, that would
> be a pg_upgrade hazard, because "unknown" doesn't store like "text".

I tested and found that simple views with unknown columns seem to update
sanely if we just let pg_upgrade dump and restore them.  So I suggest we
allow that to happen.  There might be cases where dependent views behave
unexpectedly after such a conversion, but I think that would be rare
enough that it's not worth forcing users to fix these views by hand before
updating.  However, tables with unknown columns would fail the upgrade
(since we'd reject the CREATE TABLE command) while matviews would be
accepted but then the DDL wouldn't match the physical data storage.
So I added code to pg_upgrade to check for those cases and refuse to
proceed.  This is almost a straight copy-and-paste of the existing
pg_upgrade code for checking for "line" columns.

I think this is committable now; the other loose ends can be dealt
with in follow-on patches.  Does anyone want to review it?

regards, tom lane

diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml
index 8641e19..a83d956 100644
*** a/doc/src/sgml/ref/create_view.sgml
--- b/doc/src/sgml/ref/create_view.sgml
*** CREATE VIEW [ schema . ]
*** 251,259 

  CREATE VIEW vista AS SELECT 'Hello World';

! is bad form in two ways: the column name defaults to ?column?,
! and the column data type defaults to unknown.  If you want a
! string literal in a view's result, use something like:

  CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

--- 251,260 

  CREATE VIEW vista AS SELECT 'Hello World';

! is bad form because the column name defaults to ?column?;
! also, the column data type defaults to text, which might not
! be what you wanted.  Better style for a string literal in a view's
! result is something like:

  CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml
index c031c01..63d41f0 100644
*** a/doc/src/sgml/typeconv.sgml
--- b/doc/src/sgml/typeconv.sgml
*** domain's base type for all subsequent st
*** 984,990 

  If all inputs are of type unknown, resolve as type
  text (the preferred type of the string category).
! Otherwise, unknown inputs are ignored.

--- 984,991 

  If all inputs are of type unknown, resolve as type
  text (the preferred type of the string category).
! Otherwise, unknown inputs are ignored for the purposes
! of the remaining rules.

*** but integer can be implicitly c
*** 1076,1081 
--- 1077,1129 
  result type is resolved as real.

+ 
+ 
+ 
+ SELECT Output Columns
+ 
+ 
+  SELECT
+  determination of result type
+ 
+ 
+ 
+ The rules given in the preceding sections will result in assignment
+ of non-unknown data types to all expressions in a SQL query,
+ except for unspecified-type literals that appear as simple output
+ columns of a SELECT command.  For example, in
+ 
+ 
+ SELECT 'Hello World';
+ 
+ 
+ there is nothing to identify what type the string literal should be
+ taken as.  In this situation PostgreSQL will fall back
+ to resolving the literal's type as text.
+ 
+ 
+ 
+ When the SELECT is one arm of a UNION
+ (or INTERSECT or EXCEPT) construct, or when it
+ appears within INSERT ... SELECT, this rule is not applied
+ since rules given in preceding sections take precedence.  The type of an
+ unspecified-type literal can be taken from the other

Re: [HACKERS] Logical replication failing when foreign key present

2017-01-22 Thread Petr Jelinek

On 22/01/17 18:50, Thom Brown wrote:
> Hi,
> 
> There's an issue which I haven't seen documented as expected
> behaviour, where replicating data to a table which has a foreign key
> results in a replication failure.  This produces the following log
> entries:
> 
> LOG:  starting logical replication worker for subscription "contacts_sub"
> LOG:  logical replication apply for subscription "contacts_sub" has started
> ERROR:  AfterTriggerSaveEvent() called outside of query
> LOG:  worker process: logical replication worker for subscription
> 16408 (PID 19201) exited with exit code 1
> 
> 

Hi, thanks for report.

Looks like I missed AfterTriggerBeginQuery/AfterTriggerEndQuery when
moving the executor stuff around. Attached should fix it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From b6d5a1830da5412520cb50287ee71ea312f689d6 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sun, 22 Jan 2017 23:16:57 +0100
Subject: [PATCH] Fix after trigger execution in logical replication

---
 src/backend/replication/logical/worker.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7d86736..6229bef 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -176,6 +176,9 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	if (resultRelInfo->ri_TrigDesc)
 		estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
 
+	/* Prepare to catch AFTER triggers. */
+	AfterTriggerBeginQuery();
+
 	return estate;
 }
 
@@ -536,6 +539,10 @@ apply_handle_insert(StringInfo s)
 	/* Cleanup. */
 	ExecCloseIndices(estate->es_result_relation_info);
 	PopActiveSnapshot();
+
+	/* Handle queued AFTER triggers. */
+	AfterTriggerEndQuery(estate);
+
 	ExecResetTupleTable(estate->es_tupleTable, false);
 	FreeExecutorState(estate);
 
@@ -676,6 +683,10 @@ apply_handle_update(StringInfo s)
 	/* Cleanup. */
 	ExecCloseIndices(estate->es_result_relation_info);
 	PopActiveSnapshot();
+
+	/* Handle queued AFTER triggers. */
+	AfterTriggerEndQuery(estate);
+
 	EvalPlanQualEnd();
 	ExecResetTupleTable(estate->es_tupleTable, false);
 	FreeExecutorState(estate);
@@ -763,6 +774,10 @@ apply_handle_delete(StringInfo s)
 	/* Cleanup. */
 	ExecCloseIndices(estate->es_result_relation_info);
 	PopActiveSnapshot();
+
+	/* Handle queued AFTER triggers. */
+	AfterTriggerEndQuery(estate);
+
 	EvalPlanQualEnd();
 	ExecResetTupleTable(estate->es_tupleTable, false);
 	FreeExecutorState(estate);
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online enabling of page level checksums


On 1/22/17 5:13 AM, Magnus Hagander wrote:

If the system is interrupted before the background worker is done, it
starts over from the beginning. Previously touched blocks will be read
and verified, but not written (because their checksum is already
correct). This will take time, but not re-generate the WAL.


Another option would be to store a watermark of the largest block ID 
that had been verified for each relation/fork. That would be used to 
determine if checksums could be trusted, and more usefully would allow 
you to start where you left off if the verification process got interrupted.



I think the actual functions and background worker could go in an
extension that's installed and loaded only by those who need it. But the
core functionality of being able to have "checksum in progress" would
have to be in the core codebase.


If it was in contrib I think that'd be fine.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-22 Thread Tom Lane

Jim Nasby  writes:
>> Ahh, I hadn't considered that. So one idea would be to only track
>> negative entries on caches where we know they're actually useful. That
>> might make the performance hit of some of the other ideas more
>> tolerable. Presumably you're much less likely to pollute the namespace
>> cache than some of the others.

> Ok, after reading the code I see I only partly understood what you were 
> saying. In any case, it might still be useful to do some testing with 
> CATCACHE_STATS defined to see if there's caches that don't accumulate a 
> lot of negative entries.

There definitely are, according to my testing, but by the same token
it's not clear that a shutoff check would save anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protect syscache from bloating with negative cache entries


On 1/22/17 4:41 PM, Jim Nasby wrote:

On 1/21/17 8:54 PM, Tom Lane wrote:

Jim Nasby  writes:

The other (possibly naive) question I have is how useful negative
entries really are? Will Postgres regularly incur negative lookups, or
will these only happen due to user activity?

It varies depending on the particular syscache, but in at least some
of them, negative cache entries are critical for performance.
See for example RelnameGetRelid(), which basically does a RELNAMENSP
cache lookup for each schema down the search path until it finds a
match.


Ahh, I hadn't considered that. So one idea would be to only track
negative entries on caches where we know they're actually useful. That
might make the performance hit of some of the other ideas more
tolerable. Presumably you're much less likely to pollute the namespace
cache than some of the others.


Ok, after reading the code I see I only partly understood what you were 
saying. In any case, it might still be useful to do some testing with 
CATCACHE_STATS defined to see if there's caches that don't accumulate a 
lot of negative entries.


Attached is a patch that tries to document some of this.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 253c7b53ed..d718d82d80 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -101,7 +101,12 @@ typedef struct catctup
 *
 * A negative cache entry is an assertion that there is no tuple 
matching
 * a particular key.  This is just as useful as a normal entry so far as
-* avoiding catalog searches is concerned.  Management of positive and
+* avoiding catalog searches is concerned.  In particular, caching 
negative
+* entries is critical for performance of some caches. For example, 
current
+* code will produce a negative RELNAMENSP entry for every un-qualified
+* table looking with the default search_path that puts the pg_catalog
+* schema first. This effect can be obverved by defining CATCACHE_STATS 
and
+* observing the log at backend exit.  Management of positive and
 * negative entries is identical.
 */
int refcount;   /* number of active 
references */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protect syscache from bloating with negative cache entries


On 1/21/17 8:54 PM, Tom Lane wrote:

Jim Nasby  writes:

The other (possibly naive) question I have is how useful negative
entries really are? Will Postgres regularly incur negative lookups, or
will these only happen due to user activity?

It varies depending on the particular syscache, but in at least some
of them, negative cache entries are critical for performance.
See for example RelnameGetRelid(), which basically does a RELNAMENSP
cache lookup for each schema down the search path until it finds a
match.


Ahh, I hadn't considered that. So one idea would be to only track 
negative entries on caches where we know they're actually useful. That 
might make the performance hit of some of the other ideas more 
tolerable. Presumably you're much less likely to pollute the namespace 
cache than some of the others.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum


On 1/20/17 12:40 AM, Amit Khandekar wrote:

My impression was that postmaster is supposed to just do a minimal
work of starting auto-vacuum launcher if not already. And, the work of
ensuring all the things keep going is the job of auto-vacuum launcher.


There's already a ton of logic in the launcher... ISTM it'd be nice to 
not start adding additional logic to the postmaster. If we had a generic 
need for rate limiting launching of things maybe it wouldn't be that 
bad, but AFAIK we don't.



That limits us to launching the
autovacuum launcher at most six times a minute when autovacuum = off.
You could argue that defeats the point of the SendPostmasterSignal in
SetTransactionIdLimit, but I don't think so.  If vacuuming the oldest
database took less than 10 seconds, then we won't vacuum the
next-oldest database until we hit the next 64kB transaction ID
boundary, but that can only cause a problem if we've got so many
databases that we don't get to them all before we run out of
transaction IDs, which is almost unthinkable.  If you had a ten
million tiny databases that all crossed the threshold at the same
instant, it would take you 640 million transaction IDs to visit them
all.  If you also had autovacuum_freeze_max_age set very close to the
upper limit for that variable, you could conceivably have the system
shut down before all of those databases were reached.  But that's a
pretty artificial scenario.  If someone has that scenario, perhaps
they should consider more sensible configuration choices.

Yeah this logic makes sense ...


I'm not sure that's true in the case of a significant number of 
databases and a very high XID rate, but I might be missing something. In 
any case I agree it's not worth worrying about. If you've disabled 
autovac you're already running with scissors.



But I guess , from looking at the code, it seems that it was carefully
made sure that in case of auto-vacuum off, we should clean up all
databases as fast as possible with multiple workers cleaning up
multiple tables in parallel.

Instead of autovacuum launcher and worker together making sure that
the cycle of iterations keep on running, I was thinking the
auto-vacuum launcher itself should make sure it does not spawn another
worker on the same database if it did nothing. But that seemed pretty
invasive.


IMHO we really need some more sophistication in scheduling for both 
launcher and worker. Somewhere on my TODO is allowing the worker to call 
a user defined SELECT to get a prioritized list, but since the launcher 
doesn't connect to a database that wouldn't work. What we could do 
rather simply is honor adl_next_worker in the logic that looks for 
freeze, something like the attached.


On another note, does anyone else find the database selection logic 
rather difficult to trace through? The logic is kinda spread throughout 
several functions. The naming of rebuild_database_list() and 
get_database_list() is rather confusing too.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 264298e8a9..34969b058f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1066,6 +1066,43 @@ db_comparator(const void *a, const void *b)
 }
 
 /*
+ * Returns true if database has been processed recently (less than
+ * autovacuum_naptime seconds ago).
+ */
+static boolean
+recent_activity(Oid datid)
+{
+   dlist_iter  iter;
+
+   dlist_reverse_foreach(iter, )
+   {
+   avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, 
iter.cur);
+
+   if (dbp->adl_datid == datid)
+   {
+   /*
+* Skip this database if its next_worker value falls 
between
+* the current time and the current time plus naptime.
+*/
+   if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
+   
current_time, 0) &&
+   !TimestampDifferenceExceeds(current_time,
+   
dbp->adl_next_worker,
+   
autovacuum_naptime * 1000))
+   return true;
+   return false;
+
+   }
+   }
+
+   /*
+* Didn't find the database on the internal list, so it must be new. 
Skip
+* it for now.
+*/
+   return true;
+}
+
+/*
  * do_start_worker
  *
  * Bare-bones procedure for starting an autovacuum worker from the launcher.
@@ -1162,24 +1199,33 @@ do_start_worker(void)
foreach(cell,

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-22 Thread Tom Lane

Nikita Glukhov  writes:
> On 01/08/2017 09:52 PM, Tom Lane wrote:
>> ... attndims is really syntactic sugar only and doesn't affect anything
>> meaningful semantically.

> I have fixed the first patch: when the number of dimensionsis unknown
> we determine it simply by the number of successive opening brackets at
> the start of a json array.  But I'm still using for verification non-zero
> ndims values that can be get from composite type attribute (attndims) or
> from domain array type (typndims) through specially added function 
> get_type_ndims().

Apparently I didn't make my point forcefully enough.  attndims is not
semantically significant in Postgres; the fact that we even store it
is just a historical artifact, I think.  There might be an argument
to start enforcing it, but we'd have to do that across the board, and
I think most likely any such proposal would fail because of backwards
compatibility concerns.  (There would also be a lot of discussion as
to whether, say, "int foo[3]" shouldn't enforce that the array length
is 3, not just that it be one-dimensional.)  In the meantime, we are
*not* going to have attndims be semantically significant in just this one
function.  That would not satisfy the principle of least astonishment.
Therefore, please remove this patch's dependence on attndims.

I looked through the patch briefly and have some other comments:

1. It's pretty bizarre that you need dummy forward struct declarations
for ColumnIOData and RecordIOData.  The reason evidently is that somebody
ignored project style and put a bunch of typedefs after the local function
declarations in jsonfuncs.c.  Project style of course is to put the
typedefs first, precisely because they may be needed in the local function
declarations.  I will go move those declarations right now, as a single-
purpose patch, so that you have something a bit cleaner to modify.

2. The business with isstring, saved_scalar_is_string, etc makes me itch.
I'm having a hard time explaining exactly why, but it just seems like a
single-purpose kluge.  Maybe it would seem less so if you saved the
original JsonTokenType instead.  But also I'm wondering why the ultimate
consumers are concerned with string-ness as such.  It seems like mainly
what they're trying to do is forbid converting a string into a non-scalar
Postgres type, but (a) why, and (b) if you are going to restrict it,
wouldn't it be more sensible to frame it as you can't convert any JSON
scalar to a non-scalar type?  As to (a), it seems like you're just
introducing unnecessary compatibility breakage if you insist on that.
As to (b), really it seems like an appropriate restriction of that sort
would be like "if target type is composite, source must be a JSON object,
and if target type is array, source must be a JSON array".  Personally
I think what you ought to do here is not change the semantics of cases
that are allowed today, but only change the semantics in the cases of
JSON object being assigned to PG composite and JSON array being assigned
to PG array; both of those fail today, so there's nobody depending on the
existing behavior.  But if you reject string-to-composite cases then you
will be breaking existing apps, and I doubt people will thank you for it.

3. I think you'd be better off declaring ColumnIOData.type_category as an
actual enum type, not "char" with some specific values documented only
by a comment.  Also, could you fold the domain case in as a fourth
type_category?  Also, why does ColumnIOData have both an ndims field and
another one buried in io.array.ndims?

4. populate_array_report_expected_array() violates our error message
guidelines, first by using elog rather than ereport for a user-facing
error, and second by assembling the message from pieces, which would
make it untranslatable even if it were being reported through ereport.
I'm not sure if this needs to be fixed though; will the function even
still be needed once you remove the attndims dependency?  Even if there
are still callers, you wouldn't necessarily need such a function if
you scale back your ambitions a bit as to the amount of detail required.
I'm not sure you really need to report what you think the dimensions
are when complaining that an array is nonrectangular.

5. The business with having some arguments that are only for json and
others that are only for jsonb, eg in populate_scalar(), also makes me
itch.  I wonder whether this wouldn't all get a lot simpler and more
readable if we gave up on trying to share code between the two cases.
In populate_scalar(), for instance, the amount of code actually shared
between the two cases amounts to a whole two lines, which are dwarfed
by the amount of crud needed to deal with trying to serve both cases
in the one function.  There are places where there's more sharable
code than that, but it seems like a better design might be to refactor
the sharable code out into subroutines called by separate functions for
json and

[HACKERS] Logical replication failing when foreign key present

2017-01-22 Thread Thom Brown

Hi,

There's an issue which I haven't seen documented as expected
behaviour, where replicating data to a table which has a foreign key
results in a replication failure.  This produces the following log
entries:

LOG:  starting logical replication worker for subscription "contacts_sub"
LOG:  logical replication apply for subscription "contacts_sub" has started
ERROR:  AfterTriggerSaveEvent() called outside of query
LOG:  worker process: logical replication worker for subscription
16408 (PID 19201) exited with exit code 1


Reproducible test case:

On both instances:

  CREATE TABLE b (bid int PRIMARY KEY);
  CREATE TABLE a (id int PRIMARY KEY, bid int REFERENCES b (bid));
  INSERT INTO b (bid) VALUES (1);

First instance:
  CREATE PUBLICATION a_pub FOR TABLE a;

Second instance:
  CREATE SUBSCRIPTION a_sub CONNECTION 'host=127.0.0.1 port=5530
user=thom dbname=postgres' PUBLICATION a_pub;

First instance:
 INSERT INTO a (id, bid) VALUES (1,1);


Regards

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new autovacuum criterion for visible pages

2017-01-22 Thread Stephen Frost

Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
> On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost  wrote:
> > * Simon Riggs (si...@2ndquadrant.com) wrote:
> >> On 12 August 2016 at 01:01, Tom Lane  wrote:
> >> > Michael Paquier  writes:
> >> >> In short, autovacuum will need to scan by itself the VM of each
> >> >> relation and decide based on that.
> >> >
> >> > That seems like a worthwhile approach to pursue.  The VM is supposed to 
> >> > be
> >> > small, and if you're worried it isn't, you could sample a few pages of 
> >> > it.
> >> > I do not think any of the ideas proposed so far for tracking the
> >> > visibility percentage on-the-fly are very tenable.
> >>
> >> Sounds good, but we can't scan the VM for every table, every minute.
> >> We need to record something that will tell us how many VM bits have
> >> been cleared, which will then allow autovac to do a simple SELECT to
> >> decide what needs vacuuming.
> >>
> >> Vik's proposal to keep track of the rows inserted seems like the best
> >> approach to this issue.
> >
> > I tend to agree with Simon on this.  I'm also worried that an approach
> > which was based off of a metric like "% of table not all-visible" might
> > result in VACUUM running over and over on a table because it isn't able
> > to actually make any progress towards improving that percentage.  We'd
> > have to have some kind of "cool-off" period or something.
> >
> > Tracking INSERTs and then kicking off a VACUUM based on them seems to
> > address that in a natural way and also seems like something that users
> > would generally understand as it's very similar to what we do for
> > UPDATEs and DELETEs.
> >
> > Tracking the INSERTs as a reason to VACUUM is also very natural when you
> > consider the need to update BRIN indexes.
> 
> Another possible advantage of tracking INSERTs is for hash indexes
> where after split we need to remove tuples from buckets that underwent
> split recently.

That's a good point also, and for clearing GIN pending lists too, now
that I think about it.

We really need to get this fixed for PG10.

Thanks!

Stephen


signature.asc
Description: Digital signature

[HACKERS] Online enabling of page level checksums

2017-01-22 Thread Magnus Hagander

So, that post I made about checksums certainly spurred a lot of discussion
:) One summary is that life would be a lot easier if we could turn
checksums on (and off) without re-initdbing.  I'm breaking out this
question into this thread to talk about it separately.


I've been toying with a branch to work on this, but haven't had a time to
get it even to compiling state. But instead of waiting until I have some
code to show, let me outline the idea I had.

My general idea is this:

Take one bit in the checksum version field and make it mean "in progress".
That means chat checksums can now be "on", "off", or "in progress".

When checksums are "in progress", PostgreSQL will compute and write
checksums whenever it writes out a buffer, but it will *not* verify
checksums on read.

This state would be set by calling a function (or an external command with
the system shut down if need be - I can accept a restart for this, but I'd
rather avoid it if possible).

This function would also launch a background worker. This worker would
enumerate the entire database block by block. Read a block, verify if the
checksum is set and correct. If it is, ignore it (because any further
updates will keep it in state ok when we're in state "in progress"). If not
then mark it as dirty and write it out through regular means, which will
include computing and writing the checksum since we're "in progress". With
something similar to vacuum cost delay to control how quickly it writes.

Yes, this means the entire db will end up in the transaction log since
everything is rewritten. That's not great, but for a lot of people that
will be a trade they're willing to make since it's a one-time thing. Yes,
this background process might take days or weeks - that's OK as long as it
happens online.

Once the background worker is done, it flips the checksum state to "on",
and the system starts verifying checksums as well.

If the system is interrupted before the background worker is done, it
starts over from the beginning. Previously touched blocks will be read and
verified, but not written (because their checksum is already correct). This
will take time, but not re-generate the WAL.

I think the actual functions and background worker could go in an extension
that's installed and loaded only by those who need it. But the core
functionality of being able to have "checksum in progress" would have to be
in the core codebase.

So, is there something obviously missing in this plan? Or just the code to
do it :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Checksums by default?

2017-01-22 Thread Magnus Hagander

On Sat, Jan 21, 2017 at 8:16 PM, Ants Aasma  wrote:

> On Sat, Jan 21, 2017 at 7:39 PM, Petr Jelinek
>  wrote:
> > So in summary "postgresql.conf options are easy to change" while "initdb
> > options are hard to change", I can see this argument used both for
> > enabling or disabling checksums by default. As I said I would be less
> > worried if it was easy to turn off, but we are not there afaik. And even
> > then I'd still want benchmark first.
>
> Adding support for disabling checksums is almost trivial as it only
> requires flipping a value in the control file. And I have somewhere
> sitting around a similarly simple tool for turning on checksums while
> the database is offline. FWIW, based on customers and fellow
> conference goers I have talked to most would gladly take the
> performance hit, but not the downtime to turn it on on an existing
> database.
>

This is exactly the scenario I've been exposed to over and over again. If
it can be turned on/off online, then the default matters a lot less. But it
has to be online.

Yes, you can set up a replica (which today requires third party products
like slony, bucardo or pglogical -- at least we'll hopefully have pglogical
fully in 10, but it's still a very expensive way to fix the problem).

If we can make it cheap and easy to turn them off, that makes a change of
the default a lot cheaper. Did you have a tool for that sitting around as
well?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Checksums by default?


On 01/21/2017 04:18 PM, Petr Jelinek wrote:

On 21/01/17 11:39, Magnus Hagander wrote:

Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?


I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.



... and those hint bits may easily trigger full-page writes, resulting 
in significant write amplification.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?


On 01/21/2017 05:53 PM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Stephen Frost  writes:

* Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote:

The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.



No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.


It was "supported" in the sense that somebody took the trouble to
measure the impact, so that we had some facts on which to base the
value judgment that the cost was acceptable. In the case of
checksums, you seem to be in a hurry to arrive at a conclusion
without any supporting evidence.




Exactly.


No, no one measured the impact in the cases where wal_level=minimal
makes a big difference, that I saw, at least.



We already knew we can construct data-loading workloads relying on the 
wal_level=minimal optimization and demonstrating pretty arbitrary 
benefits, so there was no point in benchmarking them. That does not mean 
those cases were not considered, though.


>

Further info with links to what was done are in my reply to Petr.

As for checksums, I do see value in them and I'm pretty sure that
the author of that particular feature did as well, or we wouldn't
even have it as an option. You seem to be of the opinion that we
might as well just rip all of that code and work out as being
useless.



I do see value in them too, and if turning then off again would be as 
simple as reverting back to wal_level=minimal, I would be strongly in 
favor of enabling then to 'on' by default (after a bit of benchmarking, 
similar to what we did in the wal_level=minimal case).


The fact that disabling them requires initdb makes the reasoning much 
trickier, though.


That being said, I'm ready to do some benchmarking on this, so that we 
have at least some numbers to argue about. Can we agree on a set of 
workloads that we want to benchmark in the first round?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

On 01/21/2017 05:51 PM, Stephen Frost wrote:

* Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote:

On 21/01/17 17:31, Stephen Frost wrote:

This is just changing the *default*, not requiring checksums to always
be enabled. We do not hold the same standards for our defaults as we do
for always-enabled code, for clear reasons- not every situation is the
same and that's why we have defaults that people can change.

I can buy that. If it's possible to turn checksums off without
recreating data directory then I think it would be okay to have default on.

I'm glad to hear that.

The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.

No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.

Really?

Yes.

https://www.postgresql.org/message-id/d34ce5b5-131f-66ce-f7c5-eb406dbe0...@2ndquadrant.com

From the above link:

So while it'd be trivial to construct workloads demonstrating the
optimizations in wal_level=minimal (e.g. initial loads doing
CREATE TABLE + COPY + CREATE INDEX in a single transaction), but
that would be mostly irrelevant I guess.

Instead, I've decided to run regular pgbench TPC-B-like workload on
a bunch of different scales, and measure throughput + some xlog
stats with each of the three wal_level options.

In other words, there was no performance testing of the cases where
wal_level=minimal (the old default) optimizations would have been
compared against wal_level > minimal.

I'm quite sure that the performance numbers for the CREATE TABLE +
COPY case with wal_level=minimal would have been *far* better than
for wal_level > minimal.

That case was entirely punted on as "mostly irrelevant" even though
there are known production environments where those optimizations
make a huge difference. Those are OLAP cases though, and not nearly
enough folks around here seem to care one bit about them, which I
continue to be disappointed by.

You make it look as if we swept that case under the carpet, despite
there being quite a bit of relevant discussion in that thread.

We might argue how many deployments benefit from the Wal_level=minimal
optimization (I'm sure some of our customers do benefit from it too),
and whether it makes it 'common workload' or not.

It's trivial to construct a workload demonstrating pretty arbitrary
performance advantage of the wal_level=minimal case. Hence no point in
wasting time on demonstrating it, making the case rather irrelevant for
the benchmarking.

Moreover, there are quite a few differences between enabling checksums
by default, and switching to wal_level=minimal:

- There are reasonable workaround that give you wal_level=minimal back
(UNLOGGED tables). And those work even when you actually need a standby,
which is pretty common these days. With checksums there's nothing like that.

- You can switch between wal_level values by merely restarting the
cluster, while checksums may only be enabled/disabled by initdb.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

On 01/21/2017 05:35 PM, Tom Lane wrote:

Stephen Frost writes:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Have we seen *even one* report of checksums catching problems in
auseful way?

This isn't the right question.

I disagree. If they aren't doing something useful for people who
have turned them on, what's the reason to think they'd do something
useful for the rest?

I believe Stephen is right. The fact that you don't see something, e.g.
reports about checksums catching something in production deployments,
proves nothing because of "survivorship bias" discovered by Abraham Wald
during WWW II [1]. Not seeing bombers with bullet holes in engines does
not mean you don't need to armor engines. Quite the opposite.

[1]
https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d#.j9d9c35mb

Applied to checksums, we're quite unlikely to see reports about data
corruption caught by checksums because "ERROR: invalid page in block X"
is such a clear sign of data corruption that people don't even ask us
about that. Combine that with the fact that most people are running with
defaults (i.e. no checksums) and that data corruption is a rare event by
nature, and we're bound to have no such reports.

What we got, however, are reports about strange errors from instances
without checksums enabled, that were either determined to be data
corruption, or disappeared after dump/restore or reindexing. It's hard
to say for sure whether those were cases of data corruption (where
checksums might have helped) or some other bug (resulting in a corrupted
page with the checksum computed on the corrupted page).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add function to import operating system collations