Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-28 Thread Craig Ringer
On 29 Aug 2016 12:10 PM, "Jim Nasby"  wrote:
>
> On 8/26/16 4:08 PM, Andres Freund wrote:
>>
>> Splitting of ephemeral data seems to have a benefit, the rest seems more
>> like rather noisy busywork to me.
>
>
> People accidentally blowing away pg_clog or pg_xlog is a pretty common
occurrence, and I don't think there's all that many tools that reference
them. I think it's well worth renaming them.

Yeah. I've seen it in BIG production users who really should have known
better.

People won't see a README in amongst 5000 xlog segments while freaking out
about the sever being down.

I don't care if it comes as part of some greater reorg or not but I'll be
really annoyed if scope creep lands up killing the original proposal to
just rename these dirs. I think that a simple rename should be done first.
Then if some greater reorg is to be done it can be done shortly after. The
only people that'll upset are folks tracking early 10.0 dev and they'll be
aware it's coming.


Re: [HACKERS] patch proposal

2016-08-28 Thread Venkata B Nagothi
On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost  wrote:

> * Venkata B Nagothi (nag1...@gmail.com) wrote:
> > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost 
> wrote:
> > > * Venkata B Nagothi (nag1...@gmail.com) wrote:
> > > > This behaviour will be similar to that of
> recovery_target="immediate" and
> > > > can be aliased.
> > >
> > > I don't believe we're really going at this the right way.  Clearly,
> > > there will be cases where we'd like promotion at the end of the WAL
> > > stream (as we currently have) even if the recovery point is not found,
> > > but if the new option's "promote" is the same as "immediate" then we
> > > don't have that.
> >
> > My apologies for the confusion. Correction - I meant, "promote" option
> > would promote the database once the consistent point is reached at the
> > end-of-the-WAL.
>
> "consistent point" and "end-of-the-WAL" are not the same.
>
> > > recovery_target = immediate, other
> > >
> > > recovery_action_target_found = promote, pause, shutdown
> >
> > This is currently supported by the existing parameter called
> > "recovery_target_action"
>
> Right, sure, we could possibly use that, or create an alias, etc.
>
> > > recovery_action_target_not_found = promote, pause, shutdown
> >
> > This is exactly what newly proposed parameter will do.
>
> Then it isn't the same as 'recovery_target = immediate'.
>
> > > One question to ask is if we need to support an option for xid and time
> > > related to when we realize that we won't find the recovery target.  If
> > > consistency is reached at a time which is later than the recovery
> target
> > > for time, what then?  Do we go through the rest of the WAL and perform
> > > the "not found" action at the end of the WAL stream?  If we use that
> > > approach, then at least all of the recovery target types are handled
> the
> > > same, but I can definitely see cases where an administrator might
> prefer
> > > an "error" option.
> >
> > Currently, this situation is handled by recovery_target_inclusive
> > parameter.
>
> No, that's not the same.
>
> > Administrator can choose to stop the recovery at any consistent
> > point before or after the specified recovery target. Is this what you
> were
> > referring to ?
>
> Not quite.
>
> > I might need a bit of clarification here, the parameter i am proposing
> > would be effective only if the specified recovery target is not reached
> and
> > may not be effective if the recovery goes beyond recovery target point.
>
> Ok, let's consider this scenario:
>
> Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
> recovery_target is not set
> recovery_target_time = 2016-08-26 08:29:30
> recovery_target_inclusive = false
>
> In such a case, we will necessairly commit transactions which happened
> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
> reached consistency.  We could possibly detect that case and throw an
> error instead.  The same could happen with xid.
>

Yes, currently, PG just recovers until the consistency is reached. It makes
sense to throw an error instead.


> Working through more scenarios would be useful, I believe.  For example,
> what if the xid or time specified happened before the backup started?
>
> Basically, what I was trying to get at is that we might be able to
> detect that we'll never find a given recovery target point without
> actually replaying all of WAL and wondering if we should provide options
> to control what to do in such a case.
>

Ah, Yes, I got the point and I agree. Thanks for the clarification.

Yes, good idea and it is important to ensure PG errors out and warn the
administrator with appropriate message indicating that... "a much earlier
backup must be used..."
if the specified recovery target xid, name or time is earlier than the
backup time.

Regards,

Venkata B N
Fujitsu Australia


Re: [HACKERS] WAL consistency check facility

2016-08-28 Thread Kuntal Ghosh
Thank you. I've updated it accordingly.

On Sun, Aug 28, 2016 at 11:20 AM, Peter Geoghegan  wrote:
> On Sat, Aug 27, 2016 at 9:47 PM, Amit Kapila  wrote:
>> Right, I think there is no need to mask all the flags.  However apart
>> from BTP_HAS_GARBAGE, it seems we should mask BTP_SPLIT_END as that is
>> just used to save some processing for vaccum and won't be set after
>> crash recovery or on standby after WAL replay.
>
> Right you are -- while BTP_INCOMPLETE_SPLIT is set during recovery,
> BTP_SPLIT_END is not. Still, most of the btpo_flags flags that are
> masked in the patch shouldn't be.
>
>
> --
> Peter Geoghegan



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-28 Thread Jim Nasby

On 8/26/16 4:08 PM, Andres Freund wrote:

Splitting of ephemeral data seems to have a benefit, the rest seems more
like rather noisy busywork to me.


People accidentally blowing away pg_clog or pg_xlog is a pretty common 
occurrence, and I don't think there's all that many tools that reference 
them. I think it's well worth renaming them.


I actually like the idea of re-organizing everything too. It would be 
easy to include a tool that puts symlinks in place to all the original 
locations if you need that.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-28 Thread Jim Nasby

On 8/26/16 4:17 PM, Andres Freund wrote:

On 2016-08-26 18:46:42 +0300, Yury Zhuravlev wrote:

Thanks all.
Now understand LSN strongly connected with WAL.
However how difficult put last system LSN instead 0?
It's not so important but will allow make use LSN more consistent.


Maybe explain why you're interested in page lsns, that'd perhaps allow
us to give more meaningful feedback.


Yeah, especially since you mentioned this being for backups. I suspect 
you *want* those WAL records marked with 0, because that tells you that 
you can't rely on WAL when you back that data up.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Version 10, missing minor version

2016-08-28 Thread Andres Freund
On 2016-08-29 11:41:00 +0800, Craig Ringer wrote:
> On 29 August 2016 at 02:52, Tom Lane  wrote:
> > "Regina Obe"  writes:
> >> The routine in PostGIS to parse out the version number from pg_config is
> >> breaking in the 10 cycle
> >
> > TBH, I wonder why you are doing that in the first place; it does not
> > seem like the most reliable source of version information.  If you
> > need to do compile-time tests, PG_CATALOG_VERSION is considered the
> > best thing to look at, or VERSION_NUM in Makefiles.
> 
> This is my cue to pop up and say that if you're looking at the startup
> message you have to use the version string, despite it not being the
> most reliable source of information, because we don't send
> server_version_num  ;)
> 
> Patch attached. Yes, I know PostGIS doesn't use it, but it makes no
> sense to tell people not to parse the server version out in some
> situations then force them to in others.

If they're using pg_config atm, that seems unlikely to be related. That
sounds more like a build time issue - there won't be a running server.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-28 Thread Andres Freund
Hi,

On 2016-08-29 11:25:39 +0800, Craig Ringer wrote:
>   ERROR:  could not access status of transaction 778793573
>   DETAIL:  could not open file "pg_clog/02E6": No such file or directory
> 
> What I'd really like is to be able to ask transam.c to handle the
> xid_in_recent_past logic, treating an attempt to read an xid from
> beyond the clog truncation threshold as a soft error indicating
> unknown xact state. But that involves delving into slru.c, and I
> really, really don't want to touch that for what should be a simple
> and pretty noninvasive utility function.

Can't you "just" check this against ShmemVariableCache->oldestXid while
holding appropriate locks?


> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
> except for two issues:

It seems like a bad idea to PG_CATCH and not re-throw an error. That
generally is quite error prone. At the very least locking and such gets
a lot more complicated (as you noticed below).

> * TransactionIdGetStatus() releases the CLogControlLock taken by
> SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
> thrown from SlruReportIOError(). It seems appropriate for
> SimpleLruReadPage() to release the LWLock before calling
> SlruReportIOError(), so I've done that as a separate patch (now 0001).

We normally prefer to handle this via the "bulk" releases in the error
handlers. It's otherwise hard to write code that handles these
situations reliably. It's different for spinlocks, but those normally
protect far smaller regions of code.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Version 10, missing minor version

2016-08-28 Thread Craig Ringer
On 29 August 2016 at 02:52, Tom Lane  wrote:
> "Regina Obe"  writes:
>> The routine in PostGIS to parse out the version number from pg_config is
>> breaking in the 10 cycle
>
> TBH, I wonder why you are doing that in the first place; it does not
> seem like the most reliable source of version information.  If you
> need to do compile-time tests, PG_CATALOG_VERSION is considered the
> best thing to look at, or VERSION_NUM in Makefiles.

This is my cue to pop up and say that if you're looking at the startup
message you have to use the version string, despite it not being the
most reliable source of information, because we don't send
server_version_num  ;)

Patch attached. Yes, I know PostGIS doesn't use it, but it makes no
sense to tell people not to parse the server version out in some
situations then force them to in others.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 9e6fce06b07ca8e272a6125c9a75ac2ba7714d03 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 29 Aug 2016 11:31:52 +0800
Subject: [PATCH] Report server_version_num alongside server_version in startup
 packet

We expose PG_VERSION_NUM in Makefiles and headers and in pg_settings,
but the equivalent server_version_num isn't sent in the startup packet
so clients must rely on parsing the server_version. Make
server_version_num GUC_REPORT so clients can use server_version_num if
present and fall back to server_version for older PostgreSQL versions.
---
 src/backend/utils/misc/guc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c5178f7..36e3604 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2767,7 +2767,7 @@ static struct config_int ConfigureNamesInt[] =
 		{"server_version_num", PGC_INTERNAL, PRESET_OPTIONS,
 			gettext_noop("Shows the server version as an integer."),
 			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_REPORT
 		},
 		_version_num,
 		PG_VERSION_NUM, PG_VERSION_NUM, PG_VERSION_NUM,
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-28 Thread Craig Ringer
On 24 August 2016 at 03:10, Robert Haas  wrote:
>
> On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer  wrote:
> > Also fine by me. You're right, keep it simple. It means the potential set of
> > values isn't discoverable the same way, but ... meh. Using it usefully means
> > reading the docs anyway.
> >
> > The remaining 2 patches of interest are attached - txid_status() and
> > txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
> >
> > Now I'd best stop pretending I'm in a sensible timezone.
>
> I reviewed this version some more and found some more problems.


Thanks. It took me a few days to prep a new patch as I found another
issue in the process. Updated patch attached.

The updated series starts (0001) with a change to slru.c to release
the control lock when throwing an exception so that we don't deadlock
with ourselves when re-entering slru.c; explanation below.

Then there's the txid_status (0002) patch with fixes, then
txid_convert_if_recent(0003).

I omitted txid_incinerate() ; I have an updated version that sets xact
status to aborted for burned xacts instead of leaving them at 0
(in-progress), but haven't had time to finish it so it doesn't try to
blindly set the status of xacts on pages where it didn't hold
XidGenLock when the page was added to the clog.

> +uint32 xid_epoch = (uint32)(xid_with_epoch >>32);
> +TransactionId xid = (TransactionId)(xid_with_epoch);
>
> I think this is not project style.  In particular, I think that the
> first one needs a space after the cast and another space before the
> 32; and I think the second one has an unnecessary set of parentheses
> and needs a space added.


OK, no problems. I didn't realise spacing around casts was specified.

>
> +/*
> + * Underlying implementation of txid_status, which is mapped to an enum in
> + * system_views.sql.
> + */
>
> Not any more...


That's why I shouldn't revise a patch at 1am ;)

>
> +if (TransactionIdIsCurrentTransactionId(xid) ||
> TransactionIdIsInProgress(xid))
> +status = gettext_noop("in progress");
> +else if (TransactionIdDidCommit(xid))
> +status = gettext_noop("committed");
> +else if (TransactionIdDidAbort(xid))
> +status = gettext_noop("aborted");
> +else
> +/* shouldn't happen */
> +ereport(ERROR,
> +(errmsg_internal("unable to determine commit status of
> xid "UINT64_FORMAT, xid)));
>
> Maybe I'm all wet here, but it seems like there might be a problem
> here if the XID is older than the CLOG truncation point but less than
> one epoch old. get_xid_in_recent_past only guarantees that the
> transaction is less than one epoch old, not that we still have CLOG
> data for it.


Good point. The call would then fail with something like

  ERROR:  could not access status of transaction 778793573
  DETAIL:  could not open file "pg_clog/02E6": No such file or directory

This probably didn't come up in my wraparound testing because I'm
aggressively forcing wraparound by writing a lot of clog very quickly
under XidGenLock, and because I'm mostly looking at xacts that are
either recent or past the xid boundary. I've added better add coverage
for xacts around 2^30 behind the nextXid to the wraparound tests;
can't add them to txid.sql since the xid never gets that far in normal
regression testing.

What I'd really like is to be able to ask transam.c to handle the
xid_in_recent_past logic, treating an attempt to read an xid from
beyond the clog truncation threshold as a soft error indicating
unknown xact state. But that involves delving into slru.c, and I
really, really don't want to touch that for what should be a simple
and pretty noninvasive utility function.

A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient,
except for two issues:

* I see no accepted way to access the errcode etc from within the
PG_CATCH block, though. PG_RE_THROW() can do it because pg_re_throw()
is in elog.c. I couldn't find any existing code that seems to check
details about an error thrown in a PG_TRY block, only SPI calls. I
don't want to ignore all types of errors and potentially hide
problems, so I just used geterrcode() - which is meant for errcontext
callbacks - and changed the comment to say it can be used in PG_CATCH
too. I don't see why it shouldn't be.
We should probably have some sort of PG_CATCH_INFO(varname) that
exposes the top ErrorData, but that's not needed for this patch so I
left it alone.

* TransactionIdGetStatus() releases the CLogControlLock taken by
SimpleLruReadPage_ReadOnly() on normal exit but not after an exception
thrown from SlruReportIOError(). It seems appropriate for
SimpleLruReadPage() to release the LWLock before calling
SlruReportIOError(), so I've done that as a separate patch (now 0001).


I also removed the TransactionIdInProgress check in txid_status and
just assume it's in progress if it isn't our xid, committed or
aborted. 

[HACKERS] VACUUM's ancillary tasks

2016-08-28 Thread Masahiko Sawada
On Monday, 29 August 2016, Andres Freund > wrote:

> Hi,
>
> On 2016-08-29 03:26:06 +0200, Vik Fearing wrote:
> > The attached two patches scratch two itches I've been having for a
> > while.  I'm attaching them together because the second depends on the
> first.
> >
> > Both deal with the fact that [auto]vacuum has taken on more roles than
> > its original purpose.
> >
> >
> > Patch One: autovacuum insert-heavy tables
> >
> > If you have a table that mostly receives INSERTs, it will never get
> > vacuumed because there are no (or few) dead rows.  I have added an
> > "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
> > the same way as "changes_since_analyze" does.
> >
> > The reason such a table needs to be vacuumed is currently twofold: the
> > visibility map is not updated, slowing down index-only scans; and BRIN
> > indexes are not maintained, rendering them basically useless.
>
> It might be worthwhile to look at
> http://archives.postgresql.org/message-id/CAMkU%3D1zGu5Oshfz
> xKBqDmxxKcoDJu4pJux8UAo5h7k%2BGA_jS3Q%40mail.gmail.com
> there's definitely some overlap.
>
>
> > Patch Two: autovacuum after table rewrites
> >
> > This patch addresses the absurdity that a standard VACUUM is required
> > after a VACUUM FULL because the visibility map gets blown away.  This is
> > also the case for CLUSTER and some versions of ALTER TABLE that rewrite
> > the table.
>
> I think this should rather fixed by maintaining the VM during
> cluster.


>
+1


>
> IIRC there was an attempt late in the 9.5 cycle, but Bruce
>
(IIRC) ran out of steam. And nobody picked it up again ... :(
>
>
It may be worth to look at
https://www.postgresql.org/message-id/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com

 I've updated this patch to apply to current HEAD, can propose it to pg10.

Regards,

--
Masahiko Sawada


-- 
Regards,

--
Masahiko Sawada


Re: [HACKERS] VACUUM's ancillary tasks

2016-08-28 Thread Andres Freund
Hi,

On 2016-08-29 03:26:06 +0200, Vik Fearing wrote:
> The attached two patches scratch two itches I've been having for a
> while.  I'm attaching them together because the second depends on the first.
> 
> Both deal with the fact that [auto]vacuum has taken on more roles than
> its original purpose.
> 
> 
> Patch One: autovacuum insert-heavy tables
> 
> If you have a table that mostly receives INSERTs, it will never get
> vacuumed because there are no (or few) dead rows.  I have added an
> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
> the same way as "changes_since_analyze" does.
> 
> The reason such a table needs to be vacuumed is currently twofold: the
> visibility map is not updated, slowing down index-only scans; and BRIN
> indexes are not maintained, rendering them basically useless.

It might be worthwhile to look at
http://archives.postgresql.org/message-id/CAMkU%3D1zGu5OshfzxKBqDmxxKcoDJu4pJux8UAo5h7k%2BGA_jS3Q%40mail.gmail.com
there's definitely some overlap.


> Patch Two: autovacuum after table rewrites
> 
> This patch addresses the absurdity that a standard VACUUM is required
> after a VACUUM FULL because the visibility map gets blown away.  This is
> also the case for CLUSTER and some versions of ALTER TABLE that rewrite
> the table.

I think this should rather fixed by maintaining the VM during
cluster. IIRC there was an attempt late in the 9.5 cycle, but Bruce
(IIRC) ran out of steam. And nobody picked it up again ... :(

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PassDownLimitBound for ForeignScan/CustomScan

2016-08-28 Thread Kouhei Kaigai
Hello,

The attached patch adds an optional callback to support special optimization
if ForeignScan/CustomScan are located under the Limit node in plan-tree.

Our sort node wisely switches the behavior when we can preliminary know
exact number of rows to be produced, because all the Sort node has to
return is the top-k rows when it is located under the Limit node.
It is much lightweight workloads than sorting of entire input rows when
nrows is not small.

In my case, this information is very useful because GPU can complete its
sorting operations mostly on L1-grade memory if we can preliminary know
the top-k value is enough small and fits to size of the fast memory.

Probably, it is also valuable for Fujita-san's case because this information
allows to attach "LIMIT k" clause on the remote query of postgres_fdw.
It will reduce amount of the network traffic and remote CPU consumption
once we got support of sort pushdown.

One thing we need to pay attention is cost estimation on the planner stage.
In the existing code, only create_ordered_paths() and create_merge_append_path()
considers the limit clause for cost estimation of sorting. They use the
'limit_tuples' of PlannerInfo; we can reference the structure when extension
adds ForeignPath/CustomPath, so I think we don't need a special enhancement
on the planner stage.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



pgsql-v10-fdw-css-limit-bound.v1.patch
Description: pgsql-v10-fdw-css-limit-bound.v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Sample configuration files

2016-08-28 Thread Vik Fearing
We have sample configuration files for postgresql.conf and
recovery.conf, but we do not have them for contrib modules.  This patch
attempts to add them.

Although the patch puts the sample configuration files in the tree, it
doesn't try to install them.  That's partly because I think it would
need an extension version bump and that doesn't seem worth it.

Also, while writing this patch, it crossed my mind that the plpgsql
variables should probably be in the main postgresql.conf file because we
install it by default.  I will happily write that patch if desired,
independently of this patch.

Current as of 26fa446, added to next commitfest.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


sample_confs_v01.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] VACUUM's ancillary tasks

2016-08-28 Thread Vik Fearing
The attached two patches scratch two itches I've been having for a
while.  I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.


Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows.  I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.


Patch Two: autovacuum after table rewrites

This patch addresses the absurdity that a standard VACUUM is required
after a VACUUM FULL because the visibility map gets blown away.  This is
also the case for CLUSTER and some versions of ALTER TABLE that rewrite
the table.

I thought about having those commands do the same work themselves, but
it seems better to have them simply trigger autovacuum than
quadruplicate the work.  I do this by having them fill in the
"inserts_since_vacuum" field added in Patch One with the number of rows
rewritten.  This assumes that autovacuum_vacuum_scale_factor is < 1.0
which hopefully is a safe assumption.

While doing this, I noticed that ALTER TABLE should also re-analyze the
table for obvious reasons, so I have that one set
"changes_since_analyze" to the number of rows rewritten as well.


I have not included any kind of test suite here because I don't really
have any ideas how to go about it in a sane way.  Suggestions welcome.

Attention reviewer: Please note that some of the documentation in the
first patch gets removed by the second patch, in case they both don't
get committed.


I have added this to the imminent commitfest.  These patches are rebased
as of 26fa446.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


autovac_01_v01.patch
Description: invalid/octet-stream


autovac_02_v01.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] comment fix for CUSTOMPATH_* flags

2016-08-28 Thread Kouhei Kaigai
Hello,

I noticed the source code comment around CustomPath structure says "see above"
for definition of CUSTOMPATH_* flags. It was originally right, but it was moved
to nodes/extensible.h on the further development. So, no comments are above.
The attached patch corrects the comment for the right location.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 




pgsql-v9.6-custom-flags-comments-fixup.patch
Description: pgsql-v9.6-custom-flags-comments-fixup.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reminder: 9.6rc1 wraps tomorrow

2016-08-28 Thread Tom Lane
Just making sure everyone's on the same page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-28 Thread Jeff Janes
On Mon, Aug 22, 2016 at 10:19 AM, Robert Haas  wrote:

> On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:
> > Jeff Janes  writes:
> >> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:
> >>> It does know it, what it doesn't know is how many duplicates there are.
> >
> >> Does it know whether the count comes from a parsed query-string
> list/array,
> >> rather than being an estimate from something else?  If it came from a
> join,
> >> I can see why it would be dangerous to assume they are mostly distinct.
> >> But if someone throws 6000 things into a query string and only 200
> distinct
> >> values among them, they have no one to blame but themselves when it
> makes
> >> bad choices off of that.
> >
> > I am not exactly sold on this assumption that applications have
> > de-duplicated the contents of a VALUES or IN list.  They haven't been
> > asked to do that in the past, so why do you think they are doing it?
>


I think most people wouldn't go out of their way to de-duplicate simply
because the way they get the list is inherently deduplicated in the first
place, or at least is not going to routinely have massive redundancy.  If I
thought it was likely to have massive redundancy, then I would add code to
deduplicate just because I don't want to lug redundant data around inside
my own app server code, regardless of what it does to PostgreSQL's planner.

We can be sure that someone somewhere is sending huge lists which only a
couple hundred distinct values, but I don't think we should let that block
changes.

I think the more serious source of regressions will be cases where the
current bad distinct estimate is cancelling some other bad estimate
elsewhere in the planner, yielding an efficient plan by accident.
Improving the estimate could push the plan over to a bad choice, leading to
a large regression upon upgrading.  For that reason, I think it is better
not to try to get your patch (which I tested and looks good to me, except
that it still isn't picking the best plan for unrelated reasons) into 9.6
after beta4 but just put it into 10.0.  I don't know when most people with
stringent SLA start validating the performance of new versions, but I
suspect some are already doing so before beta4 and it wouldn't be nice to
change this on them.


>
> It's hard to know, but my intuition is that most people would
> deduplicate.  I mean, nobody is going to want to their query generator
> to send X IN (1, 1, ) to the server if it
> could have just sent X IN (1).
>

Yes, I agree with that.  But note that this isn't currently relevant to X
IN (1) anyway.  'X IN (list)' gets planned as if it were "X=ANY(ARRAY...)",
which goes through a different machinery than "X=ANY(VALUES...)"

There is a 9 year old to-do item which proposes (if I understand it
correctly) to change this so that "X=ANY(ARRAY...)" goes through the same
plan as "X=ANY(VALUES...)"   A problem with doing that is currently people
can, in lieu of planner hints, re-write their queries between the two ways
of expressing them, in case one way gives a better plan.  If they were
always planned identically, it would take away that safety hatch.

The most conservative thing, given the current state, would be to fix
"X=ANY(ARRAY...)" so that it builds a hash table out of the ARRAY, and then
probes that table for each X, rather than iterating over the ARRAY for each
X, which is what it currently does.  That way you would get the benefit of
using a hash, without the risk of tipping over into some unsuitable join
order in the process.  It should have no user-facing downsides, just the
ugliness of having two ways to implement fundamentally the same thing.

So basically this line of a plan:

   Filter: (aid = ANY ('{...}')

Currently glosses over a nested loop over the array.  We could make it
gloss over a hash probe into the "array" instead.

Cheers,

Jeff


Re: [HACKERS] src/include/catalog/pg_foreign_table.h still refers genbki.sh

2016-08-28 Thread Tom Lane
Tomas Vondra  writes:
> I've noticed the comment in src/include/catalog/pg_foreign_table.h still 
> talks about genbki.sh, but AFAIK it was replaced with genbki.pl.

Hmm, somebody must have copied-and-pasted this header comment from some
old catalog include file.  Fixed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS related docs

2016-08-28 Thread Joe Conway
On 05/30/2016 01:56 PM, Joe Conway wrote:
> On 05/26/2016 12:26 AM, Dean Rasheed wrote:
>> On 25 May 2016 at 02:04, Joe Conway  wrote:
>>> Please see attached two proposed patches for the docs related to RLS:
>>>
>>> 1) Correction to pg_restore
>>> 2) Additional mentions that "COPY FROM" does not allow RLS to be enabled
>>>
>>> Comments?
>>>
>>
>> The pg_restore change looks good -- that was clearly wrong.
>>
>> Also, +1 for the new note in pg_dump.
> 
> Great, thanks!
> 
>> For COPY, I think perhaps it would be more logical to put the new note
>> immediately after the third note which describes the privileges
>> required, since it's kind of related, and then we can talk about the
>> RLS policies required, e.g.:
>>
>> If row-level security is enabled for the table, COPY table TO is
>> internally converted to COPY (SELECT * FROM table) TO, and the
>> relevant security policies are applied. Currently, COPY FROM is not
>> supported for tables with row-level security.
> 
> This sounds better than what I had, so I will do it that way.


Apologies for the delay, but new patch attached. Assuming no more
comments, will commit this, backpatched to 9.5, in a day or two.

Thanks,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 07e2f45..af15fd1 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY coun
*** 419,424 
--- 419,434 
 
  
 
+ If row-level security is enabled for the table, COPY
+ table TO is
+ internally converted to COPY (SELECT * FROM
+ table) TO ...,
+ and the relevant security policies are applied. Currently,
+ COPY FROM is not supported for tables with row-level
+ security. Use equivalent INSERT statements instead.
+
+ 
+
  Files named in a COPY command are read or written
  directly by the server, not by the client application. Therefore,
  they must reside on or be accessible to the database server machine,
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index be1b684..4fa925c 100644
*** a/doc/src/sgml/ref/pg_dump.sgml
--- b/doc/src/sgml/ref/pg_dump.sgml
*** PostgreSQL documentation
*** 699,704 
--- 699,709 
  to dump the parts of the contents of the table that they have access to.
 
  
+
+ Note that if you use this option currently, you probably also want
+ the dump be in INSERT format, as the
+ COPY FROM during restore does not support row security.
+

   
  
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index c906919..ef5bab4 100644
*** a/doc/src/sgml/ref/pg_restore.sgml
--- b/doc/src/sgml/ref/pg_restore.sgml
***
*** 527,533 
  
 
  Note that this option currently also requires the dump be in INSERT
! format, as COPY TO does not support row security.
 

   
--- 527,533 
  
 
  Note that this option currently also requires the dump be in INSERT
! format, as COPY FROM does not support row security.
 

   


signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] PostgreSQL Version 10, missing minor version

2016-08-28 Thread Tom Lane
"Regina Obe"  writes:
> The routine in PostGIS to parse out the version number from pg_config is
> breaking in the 10 cycle

TBH, I wonder why you are doing that in the first place; it does not
seem like the most reliable source of version information.  If you
need to do compile-time tests, PG_CATALOG_VERSION is considered the
best thing to look at, or VERSION_NUM in Makefiles.

However, if you're dead set on getting a version number out of a string
representation, you'll need to make changes similar to commit
69dc5ae408f68c302029a6b43912a2cc16b1256c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Version 10, missing minor version

2016-08-28 Thread Joe Conway
On 08/28/2016 09:55 AM, Regina Obe wrote:
> The routine in PostGIS to parse out the version number from pg_config is
> breaking in the 10 cycle.
> 
> Issue seems to be because there is no minor specified.
> 
> e.g.
> 
> pgconfig --version 
> 
> returns:
> 
> PostgreSQL 10devel
> 
> Instead of expected
> 
> PostgreSQL 10.0devel
> 
> Is this the way it's going to be or will there be a .0 tacked at the end
> before release?

Given the version numbering change, postgres version 10 is equivalent to
version 9.6 (i.e. the "major" version number), and 10.0 is equivalent to
9.6.0 (i.e. the "major.minor" version). So I suspect that given...

  pg_config --version
  PostgreSQL 9.5.3

  pg_config --version
  PostgreSQL 9.6beta4

... you will see Postgres 10 go through the stages...

  pg_config --version
  PostgreSQL 10devel

  pg_config --version
  PostgreSQL 10beta1

  pg_config --version
  PostgreSQL 10.0

HTH,

Joe




-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[HACKERS] PostgreSQL Version 10, missing minor version

2016-08-28 Thread Regina Obe
The routine in PostGIS to parse out the version number from pg_config is
breaking in the 10 cycle.

Issue seems to be because there is no minor specified.

e.g.

pgconfig --version 

returns:

PostgreSQL 10devel

Instead of expected

PostgreSQL 10.0devel

Is this the way it's going to be or will there be a .0 tacked at the end
before release?

Thanks,
Regina



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Tomas Vondra

On 08/28/2016 04:59 PM, Tom Lane wrote:

Tomas Vondra  writes:

I'm however pretty sure this will have absolutely no impact on profiles.


Yeah, I do not believe that either.  Also, I think that the intent of the
existing code is to defend against bad parameters even in non-assert
builds.  Which might argue for turning these into test-and-elog rather
than asserts.



I agree. There's a fair number of contrib modules, and when they compile 
just fine people are unlikely to test them with assert-enabled builds. 
So they would not notice any issues, but it'd get broken as we've 
removed the code that fixes the values for them.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Tom Lane
Tomas Vondra  writes:
> I'm however pretty sure this will have absolutely no impact on profiles. 

Yeah, I do not believe that either.  Also, I think that the intent of the
existing code is to defend against bad parameters even in non-assert
builds.  Which might argue for turning these into test-and-elog rather
than asserts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Tomas Vondra

On 08/28/2016 04:41 PM, Tom Lane wrote:

Robert Haas  writes:

Also, I think we ought to replace this code in aset.c:



initBlockSize = MAXALIGN(initBlockSize);
if (initBlockSize < 1024)
initBlockSize = 1024;
maxBlockSize = MAXALIGN(maxBlockSize);



With this:



Assert(initBlockSize >= 1024 && initBlockSize == MAXALIGN(initBlockSize));
Assert(maxBlockSize == MAXALIGN(maxBlockSize));


Good idea --- if we'd had it that way, these errors would never have
gotten committed in the first place.  I'm for doing that only in HEAD
though.

regards, tom lane




Then maybe also add

Assert(initBlockSize <= maxBlockSize);

And perhaps also an assert making sure the minContextSize value makes 
sens with respect to the min/max block sizes?


I'm however pretty sure this will have absolutely no impact on profiles. 
It might save a few cycles, but this only runs when creating the memory 
context and all profiles I've seen are about palloc/pfree.


It might matter when creating many memory contexts, but then you 
probably have other things to worry about.


regards


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Tomas Vondra

On 08/28/2016 04:41 PM, Tom Lane wrote:

Robert Haas  writes:

Also, I think we ought to replace this code in aset.c:



initBlockSize = MAXALIGN(initBlockSize);
if (initBlockSize < 1024)
initBlockSize = 1024;
maxBlockSize = MAXALIGN(maxBlockSize);



With this:



Assert(initBlockSize >= 1024 && initBlockSize == MAXALIGN(initBlockSize));
Assert(maxBlockSize == MAXALIGN(maxBlockSize));


Good idea --- if we'd had it that way, these errors would never have
gotten committed in the first place.  I'm for doing that only in HEAD
though.



Then maybe also add

Assert(initBlockSize <= maxBlockSize);

And perhaps also an assert making sure the minContextSize value makes 
sens with respect to the min/max block sizes?


I'm however pretty sure this will have absolutely no impact on profiles. 
It might save a few cycles, but this only runs when creating the memory 
context and all profiles I've seen are about palloc/pfree.


It might matter when creating many memory contexts, but then you 
probably have other things to worry about.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Tom Lane
Robert Haas  writes:
> Also, I think we ought to replace this code in aset.c:

> initBlockSize = MAXALIGN(initBlockSize);
> if (initBlockSize < 1024)
> initBlockSize = 1024;
> maxBlockSize = MAXALIGN(maxBlockSize);

> With this:

> Assert(initBlockSize >= 1024 && initBlockSize == MAXALIGN(initBlockSize));
> Assert(maxBlockSize == MAXALIGN(maxBlockSize));

Good idea --- if we'd had it that way, these errors would never have
gotten committed in the first place.  I'm for doing that only in HEAD
though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-28 Thread Robert Haas
On Fri, Aug 26, 2016 at 4:45 AM, Eduardo Morras  wrote:
> From my ignorance, could block size affect this WAL size increase?
>
> In past (didn't tried with >9 versions) you can change disk block page size 
> from 8KB to 16 KB or 32KB (or 4) modifing src/include/pg_config.h BLCKSZ 8192 
> and recompiling. (There are some mails in 1999-2002 about this topic)

Yeah, I think that's still supposed to work although I don't know
whether anyone has tried it lately.  It is a separate topic from this
issue, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-28 Thread Robert Haas
On Fri, Aug 26, 2016 at 12:39 AM, Andres Freund  wrote:
> Maybe I'm missing something here - but why would we need to do any of
> that? The WAL already isn't compatible between versions, and we don't
> reuse the old server's WAL anyway? Isn't all that's needed relaxing some
> error check?

Yeah.  If this change is made in a new major version - and how else
would we do it? - it doesn't introduce any incompatibility that
wouldn't be present already.  pg_upgrade doesn't (and can't) migrate
WAL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksum error and VACUUM FULL

2016-08-28 Thread Robert Haas
On Thu, Aug 25, 2016 at 5:45 AM, Tatsuki Kadomoto <
tatsuki.kadom...@proceranetworks.com> wrote:

> I faced incorrect checksum on "global/pg_filenode.map" at the right
> timing "VACUUM FULL" is executed and session was aborted.
>
> Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file 
> "global/pg_filenode.map" contains incorrect checksum
>
> Aug 16 20:51:19 postgres[22329]: [2-2] STATEMENT:  SELECT 
> id,readbm,writebm,survbm,timeout FROM Users WHERE username='packetlogicd' AND 
> password=md5('x')
>
> I'm reading the comment in src/backend/utils/cache/relmapper.c .
>
> ===
> Therefore mapped catalogs can only be relocated by operations such as
> VACUUM FULL and CLUSTER, which make no transactionally-significant changes:
> it must be safe for the new file to replace the old, even if the
> transaction itself aborts.
> ===
>
> Does this comment mean it's expected to get this kind of checksum error if
> the timing is really bad?
>

I don't think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Bogus sizing parameters in some AllocSetContextCreate calls

2016-08-28 Thread Robert Haas
On Sat, Aug 27, 2016 at 2:08 PM, Tom Lane  wrote:
> The standard calling pattern for AllocSetContextCreate is
>
> cxt = AllocSetContextCreate(parent, name,
> ALLOCSET_DEFAULT_MINSIZE,
> ALLOCSET_DEFAULT_INITSIZE,
> ALLOCSET_DEFAULT_MAXSIZE);
>
> or for some contexts you might s/DEFAULT/SMALL/g if you expect the context
> to never contain very much data.  I happened to notice that there are a
> few calls that get this pattern wrong.  After a bit of grepping, I found:
>
> hba.c lines 389, 2196:
>ALLOCSET_DEFAULT_MINSIZE,
>ALLOCSET_DEFAULT_MINSIZE,
>ALLOCSET_DEFAULT_MAXSIZE);
>
> guc-file.l line 146:
>ALLOCSET_DEFAULT_MINSIZE,
>ALLOCSET_DEFAULT_MINSIZE,
>ALLOCSET_DEFAULT_MAXSIZE);
>
> brin.c line 857:
>
> ALLOCSET_SMALL_INITSIZE,
> ALLOCSET_SMALL_MINSIZE,
> ALLOCSET_SMALL_MAXSIZE);
>
> autovacuum.c line 2184:
>   ALLOCSET_DEFAULT_INITSIZE,
>   ALLOCSET_DEFAULT_MINSIZE,
>   ALLOCSET_DEFAULT_MAXSIZE);
>
> typcache.c lines 757, 842:
>
> ALLOCSET_SMALL_INITSIZE,
> ALLOCSET_SMALL_MINSIZE,
> ALLOCSET_SMALL_MAXSIZE);
>
> In all of these cases, we're passing zero as the initBlockSize, which is
> invalid, but but AllocSetContextCreate silently forces it up to 1K.  So
> there's no failure but there may be some inefficiency due to small block
> sizes of early allocation blocks.  I seriously doubt that's intentional;
> in some of these cases it might be sane to use the SMALL parameters
> instead, but this isn't a good way to get that effect.  The last four
> cases are also passing a nonzero value as the minContextSize, forcing
> the context to allocate at least that much instantly and hold onto it
> over resets.  That's inefficient as well, though probably only minorly so.

I noticed this a while back while playing with my alternate allocator,
but wasn't sure how much of it was intentional.  Anyway, +1 for doing
something to clean this up.  Your proposal sounds OK, but maybe
AllocSetContextCreateDefault() and AllocSetContextCreateSmall() would
be nice and easier to remember.

Also, I think we ought to replace this code in aset.c:

initBlockSize = MAXALIGN(initBlockSize);
if (initBlockSize < 1024)
initBlockSize = 1024;
maxBlockSize = MAXALIGN(maxBlockSize);

With this:

Assert(initBlockSize >= 1024 && initBlockSize == MAXALIGN(initBlockSize));
Assert(maxBlockSize == MAXALIGN(maxBlockSize));

This might save a few cycles which wouldn't be unwelcome given that
AllocSetContextCreate does show up in profiles, but more importantly I
think it's completely inappropriate for this function to paper over
whatever errors may be made by callers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-08-28 Thread Amit Kapila
On Mon, Aug 15, 2016 at 8:15 PM, Anastasia Lubennikova
 wrote:
@@ -590,7 +622,14 @@ _bt_buildadd(BTWriteState *wstate, BTPageState
*state, IndexTuple itup)
  if (last_off == P_HIKEY)
  {
  Assert(state->btps_minkey == NULL);
- state->btps_minkey = CopyIndexTuple(itup);
+ /*
+ * Truncate the tuple that we're going to insert
+ * into the parent page as a downlink
+ */

+ if (indnkeyatts != indnatts && P_ISLEAF(pageop))
+ state->btps_minkey = index_truncate_tuple(wstate->index, itup);
+ else
+ state->btps_minkey = CopyIndexTuple(itup);

It seems that above code always ensure that for leaf pages, high key
is a truncated tuple.  What is less clear is if that is true, why you
need to re-ensure it again for the old  page in below code:

@@ -510,6 +513,8 @@ _bt_buildadd(BTWriteState *wstate, BTPageState
*state, IndexTuple itup)
{
..
+ if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+ {
+ /*
+ * It's essential to truncate High key here.
+ * The purpose is not just to save more space on this particular page,
+ * but to keep whole b-tree structure consistent. Subsequent insertions
+ * assume that hikey is already truncated, and so they should not
+ * worry about it, when copying the high key into the parent page
+ * as a downlink.
+ * NOTE It is not crutial for reliability in present,
+ * but maybe it will be that in the future.
+ */
+ keytup = index_truncate_tuple(wstate->index, oitup);
+
+ /*  delete "wrong" high key, insert keytup as P_HIKEY. */
+ PageIndexTupleDelete(opage, P_HIKEY);
+
+ if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), keytup, P_HIKEY))
+ elog(ERROR, "failed to rewrite compressed item in index \"%s\"",
+ RelationGetRelationName(wstate->index));
+ }
+
..
..

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers