Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-05 Thread Amit Kapila
On Thu, Aug 4, 2016 at 7:46 PM, Pavel Stehule  wrote:

> 2016-08-04 15:37 GMT+02:00 Amit Kapila :
>>
>> > I dislike automatic commit or rollback here.
>> >
>>
>> What problem you see with it, if we do so and may be mention the same
>> in docs as well.  Anyway, I think we should make the behaviour of both
>> ecpg and psql same.
>
>
> Implicit COMMIT can be dangerous
>

Not, when user has specifically requested for autocommit mode as 'on'.
I think here what would be more meaningful is that after "Set
AutoCommit On", when the first command is committed, it should commit
previous non-pending committed commands as well.

>>
>> Not sure what benefit we will get by raising warning.  I think it is
>> better to choose one behaviour (automatic commit or leave the
>> transaction open as is currently being done in psql) and make it
>> consistent across all clients.
>
>
> I am not sure about value of ecpg for this case. It is used by 0.0001%
> users. Probably nobody in Czech Republic knows this client.
>

Sure, but that doesn't give us the license for being inconsistent in
behaviour across different clients.

> Warnings enforce the user do some decision
>

They could be annoying as well, especially if that happens in scripts.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Pavan Deolasee
On Sat, Aug 6, 2016 at 8:34 AM, Bruce Momjian  wrote:

> On Fri, Aug  5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote:
> > So to summarize again:
> >
> > o  chains start out as HOT
> > o  they become WARM when some indexes change and others don't
> > o  for multiple index changes, we have to check all indexes
> >for key/ctid matches
> > o  for single index changes, we can fail HOT and create a new
> >non-HOT/WARM tuple if there are too many index matches
> > o  99% of index checks will not find a key/ctid match
>
> I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't
> pruned, so here is an updated list:
>
> o  chains start out as HOT
> o  they become WARM when some indexes change and others don't
> o  if WARM chain head is not LP_REDIRECT, check existing chain for key
>matches
> o  if WARM chain head is LP_REDIRECT:
> o  for single index changes, we can fail HOT and create a new
>non-HOT/WARM tuple if there are too many index matches
> o  for multiple index changes, we have to check all indexes
>for key/ctid matches
> o  99% of index checks will not find a key/ctid match
>
> So, we are only checking the index if the WARM chain was pruned, and we
> can bail out if there is only one index changed.  This is looking more
> doable.


The duplicate tuples problem that we are focusing on, happens when an index
already has two or index tuples pointing to the same root tuple/lp. When
it's time to insert third index tuple, we must not insert a duplicate (key,
CTID) tuple. I've a design where we can track which columns (we are
interested only in the columns on which indexes use) were ever changed in
the WARM chain. We allow one change for every index column, but the second
change will require a duplicate lookup. This is still quite an improvement,
the cold updates may reduce by at least more than 50% already, but someone
can argue that this does not handle the case where same index column is
repeatedly updated.

If we need to find an efficient way to convert WARM chains back to HOT,
which will happen soon when the old index tuple retires, the system can
attain a stable state, not for all but many use cases.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] "Some tests to cover hash_index"

2016-08-05 Thread Amit Kapila
On Thu, Aug 4, 2016 at 7:24 PM, Mithun Cy 
wrote:

> I am attaching the patch to improve some coverage of hash index code [1].
> I have added some basic tests, which mainly covers overflow pages. It took
> 2 sec extra time in my machine in parallel schedule.
>
>
>
>
> Hit Total Coverage
> old tests Line Coverage 780 1478 52.7
>
> Function Coverage 63 85 74.1
> improvement after tests Line Coverage 1181 1478 79.9 %
>
> Function Coverage 78 85 91.8 %
>
>

I think the code coverage improvement for hash index with these tests seems
to be quite good, however time for tests seems to be slightly on higher
side.  Do anybody have better suggestion for these tests?

diff --git a/src/test/regress/sql/concurrent_hash_index.sql
b/src/test/regress/sql/concurrent_hash_index.sql
I wonder why you have included a new file for these tests, why can't be
these added to existing hash_index.sql.

+--
+-- Cause some overflow insert and splits.
+--
+CREATE TABLE con_hash_index_table (keycol INT);
+CREATE INDEX con_hash_index on con_hash_index_table USING HASH (keycol);

The relation name con_hash_index* choosen in above tests doesn't seem to be
appropriate, how about hash_split_heap* or something like that.

Register your patch in latest CF (https://commitfest.postgresql.org/10/)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote:
> So to summarize again:
> 
> o  chains start out as HOT
> o  they become WARM when some indexes change and others don't
> o  for multiple index changes, we have to check all indexes
>for key/ctid matches
> o  for single index changes, we can fail HOT and create a new
>non-HOT/WARM tuple if there are too many index matches
> o  99% of index checks will not find a key/ctid match

I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't
pruned, so here is an updated list:

o  chains start out as HOT
o  they become WARM when some indexes change and others don't
o  if WARM chain head is not LP_REDIRECT, check existing chain for key
   matches
o  if WARM chain head is LP_REDIRECT:
o  for single index changes, we can fail HOT and create a new
   non-HOT/WARM tuple if there are too many index matches
o  for multiple index changes, we have to check all indexes
   for key/ctid matches
o  99% of index checks will not find a key/ctid match

So, we are only checking the index if the WARM chain was pruned, and we
can bail out if there is only one index changed.  This is looking more
doable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl promote wait

2016-08-05 Thread Peter Eisentraut
On 8/5/16 12:14 AM, Michael Paquier wrote:
> In do_stop, this patches makes the wait happen for a maximum of
> wait_seconds * 2, once when getting the control file information, and
> once when waiting for the server to shut down.

That's not how I read it.  get_controlfile() will decrease the
wait_seconds argument by how much wait time it has used.  The wait for
shutdown will then only use as much seconds as are left.

> This is not a good
> idea, and the idea of putting a wait argument in get_controlfile does
> not seem a good interface to me. I'd rather see get_controlfile be
> extended with a flag saying no_error_on_failure and keep the wait
> logic within pg_ctl.

I guess we could write a wrapper function in pg_ctl that encapsulated
the wait logic.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 09:02:39PM -0400, Bruce Momjian wrote:
> Yes, it seems we either find the entry fast and avoid the index
> addition, or don't find it quickly and use a non-HOT, non-WARM update.
> The problem is that you have to do this for multiple indexes, and if you
> quickly find you need to add an entry to the first index, when you get
> to the second one you can't easily bail out and go with a non-HOT,
> non-WARM update.  I suppose we could bail out of a long index search if
> there is only one index with a changed key.
> 
> Here's how I understand it --- if you are looking for a key that has
> only a few index entries, it will be fast to find of the key/ctid is
> listed.  If the index has many index entries for the key, it will be
> expensive to find if there is a matching key/ctid, but a read-only-query
> index lookup for that key will be expensive too, whether you use the
> bitmap scan or not.  And, effectively, if we bail out and decide to go
> with a non-HOT, non-WARM update, we are making the index even bigger.

So to summarize again:

o  chains start out as HOT
o  they become WARM when some indexes change and others don't
o  for multiple index changes, we have to check all indexes
   for key/ctid matches
o  for single index changes, we can fail HOT and create a new
   non-HOT/WARM tuple if there are too many index matches
o  99% of index checks will not find a key/ctid match

I am not sure how to optimize an index non-match.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 09:21:49PM -0300, Claudio Freire wrote:
> On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian  wrote:
> > On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
> >> > This does create more HOT chains where the root ctid cannot be removed,
> >> > but it does avoid the index key/ctid check because any entry in the HOT
> >> > chain has identical keys.  What this would not handle is when an entire
> >> > HOT chain is pruned, as the keys would be gone.
> >>
> >> I believe the only solution is to use bitmap index scans with WARM indexes.
> >
> > Let me back up and explain the benefits we get from allowing HOT chains
> > to be WARM:
> >
> > *  no index entries for WARM indexes whose values don't change
> > *  improved pruning because the HOT/WARM chains can be longer because we
> >don't have to break chains for index changes
> >
> > While I realize bitmap indexes would allow us to remove duplicate index
> > entries, it removes one of the two benefits of WARM, and it causes every
> > index read to be expensive --- I can't see how this would be a win over
> > doing the index check on writes.
> 
> But the index check could be prohibitely expensive.

True, but I am afraid the bitmap handling on reads will be worse.

> So we're back to bailing out?
> 
> When an update comes and we're considering WARM, we need to query the
> indexes, each one, and if one index cannot quickly verify the
> existence of an entry for the root tid for the key, bail out from
> WARM.

Yes, it seems we either find the entry fast and avoid the index
addition, or don't find it quickly and use a non-HOT, non-WARM update.
The problem is that you have to do this for multiple indexes, and if you
quickly find you need to add an entry to the first index, when you get
to the second one you can't easily bail out and go with a non-HOT,
non-WARM update.  I suppose we could bail out of a long index search if
there is only one index with a changed key.

Here's how I underestand it --- if you are looking for a key that has
only a few index entries, it will be fast to find of the key/ctid is
listed.  If the index has many index entries for the key, it will be
expensive to find if there is a matching key/ctid, but a read-only-query
index lookup for that key will be expensive too, whether you use the
bitmap scan or not.  And, effectively, if we bail out and decide to go
with a non-HOT, non-WARM update, we are making the index even bigger.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Claudio Freire
On Fri, Aug 5, 2016 at 9:21 PM, Claudio Freire  wrote:
> On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian  wrote:
>> On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
>>> > This does create more HOT chains where the root ctid cannot be removed,
>>> > but it does avoid the index key/ctid check because any entry in the HOT
>>> > chain has identical keys.  What this would not handle is when an entire
>>> > HOT chain is pruned, as the keys would be gone.
>>>
>>> I believe the only solution is to use bitmap index scans with WARM indexes.
>>
>> Let me back up and explain the benefits we get from allowing HOT chains
>> to be WARM:
>>
>> *  no index entries for WARM indexes whose values don't change
>> *  improved pruning because the HOT/WARM chains can be longer because we
>>don't have to break chains for index changes
>>
>> While I realize bitmap indexes would allow us to remove duplicate index
>> entries, it removes one of the two benefits of WARM, and it causes every
>> index read to be expensive --- I can't see how this would be a win over
>> doing the index check on writes.
>
> But the index check could be prohibitely expensive.

Well... it could be made efficient for the case of nbtree with a format change.

If nbtree sorted by tid as well, for instance.

But an upgrade there would involve a reindex before WARM can be applied.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Claudio Freire
On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian  wrote:
> On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
>> > This does create more HOT chains where the root ctid cannot be removed,
>> > but it does avoid the index key/ctid check because any entry in the HOT
>> > chain has identical keys.  What this would not handle is when an entire
>> > HOT chain is pruned, as the keys would be gone.
>>
>> I believe the only solution is to use bitmap index scans with WARM indexes.
>
> Let me back up and explain the benefits we get from allowing HOT chains
> to be WARM:
>
> *  no index entries for WARM indexes whose values don't change
> *  improved pruning because the HOT/WARM chains can be longer because we
>don't have to break chains for index changes
>
> While I realize bitmap indexes would allow us to remove duplicate index
> entries, it removes one of the two benefits of WARM, and it causes every
> index read to be expensive --- I can't see how this would be a win over
> doing the index check on writes.

But the index check could be prohibitely expensive.

So we're back to bailing out?

When an update comes and we're considering WARM, we need to query the
indexes, each one, and if one index cannot quickly verify the
existence of an entry for the root tid for the key, bail out from
WARM.

That may prevent the benefits of WARM in a lot of real world cases.
You just need one less-than-unique index to disable WARM for all
updates, and IIRC those are common. You'd pay the CPU price of WARM
without any of the benefits.

I don't think I have a single table in my databases that don't exhibit
this behavior.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-05 Thread Shay Rojansky
>
> > I really don't get what's problematic with posting a message on a mailing
> > list about a potential performance issue, to try to get people's
> reactions,
> > without diving into profiling right away. I'm not a PostgreSQL developer,
> > have other urgent things to do and don't even spend most of my
> programming
> > time in C.
>
> There's absolutely nothing wrong with that.  I find your questions
> helpful and interesting and I hope you will keep asking them.  I think
> that they are a valuable contribution to the discussion on this list.
>

Thanks for the positive feedback Robert.

I'm glad reducing the overhead of out-of-line parameters seems like an
important goal. FWIW, if as Vladimir seems to suggest, it's possible to
bring down the overhead of the v3 extended protocol to somewhere near the
simple protocol, that would obviously be a better solution - it would mean
things work faster here and now, rather than waiting for the v4 protocol. I
have no idea if that's possible though, I'll see if I can spend some time
on understand where the slowdown comes from.


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
> > This does create more HOT chains where the root ctid cannot be removed,
> > but it does avoid the index key/ctid check because any entry in the HOT
> > chain has identical keys.  What this would not handle is when an entire
> > HOT chain is pruned, as the keys would be gone.
> 
> I believe the only solution is to use bitmap index scans with WARM indexes.

Let me back up and explain the benefits we get from allowing HOT chains
to be WARM:

*  no index entries for WARM indexes whose values don't change
*  improved pruning because the HOT/WARM chains can be longer because we
   don't have to break chains for index changes

While I realize bitmap indexes would allow us to remove duplicate index
entries, it removes one of the two benefits of WARM, and it causes every
index read to be expensive --- I can't see how this would be a win over
doing the index check on writes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-05 Thread Tom Lane
Peter Eisentraut  writes:
> One hiccup I found is that server_version_num is not sent to clients.
> Instead, libpq assembles the numeric version number itself from the
> string version, and it will fail if it sees only one number (e.g.,
> 10devel).  It will then set the version number to 0 for "unknown".

Per discussion, I applied and back-patched the libpq/fe-exec.c part
of this, so that back-branch clients will have a decent chance of
understanding the new server_version format by the time it hits the field.

In a quick look around, it seemed like we might also want to fix and
back-patch the server version printing logic in psql's
connection_warnings() function.  However that would involve touching
a translatable string so I thought it best not to do it just before
back-branch releases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Claudio Freire
On Fri, Aug 5, 2016 at 4:26 PM, Bruce Momjian  wrote:
> On Fri, Aug  5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote:
>> On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian  wrote:
>>
>> On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
>> > But doing the WARM chain backtracking and checking for previous
>> > versions with appropriate keys should be enough to support this use
>> > case too, it just needs to be well optimized to avoid seriously
>> > impacting performance on the average case.
>>
>> Yes, that was an idea I had to --- if the in-page HOT chain already has
>> the key, we know it is already in the index and we can skip the index
>> addition.
>>
>> I just don't know how would you do that without delaying/not-doing HOT chain
>> prune. As soon as root and intermediate HOT tuples are gone, all information 
>> is
>> lost. You may delay that, but that will defeat the whole purpose. If chains 
>> are
>> not pruned in-time, you may not find any free space in the page and end up
>> doing a cold update anyways. But may be I am missing something and Claudio 
>> has
>> a different idea.
>
> Yes, pruning would be a problem.  :-(
>
> A check only needs to happen when the indexed key changes, right?  So we
> are comparing the cost of adding an index key vs. the cost of trying to
> find a matching key/ctid in the index.  Seems the later is cheaper, but
> it is not obvious.
>
> I think I see Claudio's idea now --- from his diagram, you can have WARM
> chains span multiple HOT chains.  What he is doing is creating a new HOT
> chain everytime _any_ key changes, and he knows the entire HOT chain has
> idential values for all indexes.  He moves from one HOT chain to another
> during an index scan by checking if the index value is the same in the
> old and new HOT chain (that is the same WARM chain).
>
> This does create more HOT chains where the root ctid cannot be removed,
> but it does avoid the index key/ctid check because any entry in the HOT
> chain has identical keys.  What this would not handle is when an entire
> HOT chain is pruned, as the keys would be gone.

I believe the only solution is to use bitmap index scans with WARM indexes.

Doing so simplifies things considerably.

For one, it isolates the new functionality and makes it opt-in.

Then, WARM indexes can always point to the root of the HOT chain, and
be ignored for satisfies HOT. The planner will only consider bitmap
index scans with WARM indexes, and always issue a recheck.

The bitmap is needed to remove duplicate item pointers, and the
recheck, as usual, to filter out unwanted versions.

On update, one only needs a pointer to the HOT head, which would
arguably be at hand, or at worst reachable by backtracking t_ctid
pointers. No key checks of any kind, and no weird flags required.

I don't see another solution to the information loss when pruning
whole HOT chains.

Suppose we start with this situation:

lp:  1  2h 3h 4w 5h 6h 7h
k1:  a  a  a  b  b  a  a
k2:  c  c  c  c  c  c  c

So we have 2 HOT chains, 1-3, 4-7, two indexes, one never updated, the
other with an update and then returning to the same key.

The first index will have two index entries, one for key a, another
for b, both pointing at 1.

a -> (p,1)
b -> (p,1)

Now versions 1-3 are dead, so we want to prune them.

We end up with a redirect on the LP for 1 -> 4, LPs 2 and 3 unused
because they were HOT

r4  u  u
lp:  1  2  3  4w 5h 6  7h
k1:  _  _  _  b  b  a  a
k2:  _  _  _  c  c  c  c

Now suppose versions 4 and 5 die. So we end up with:

r6  u  u  u  u
lp:  1  2  3  4  5  6  7h
k1:  _  _  _  _  _  a  a
k2:  _  _  _  _  _  c  c

We just forgot that "b" was in k1, yet we still have an index entry
pointing to the chain.

Should an update come:

r6  u  u  u  u
lp:  1  2  3  4  5  6  7h 8?
k1:  _  _  _  _  _  a  a  b
k2:  _  _  _  _  _  c  c  c

Is an index entry b->(p,1) needed for 8?

According to the logic, it is, k1 would need a new entry b -> (p,1),
but the entry is already there. It's just unverifiable in reasonable
time.

So a simpler solution is to always add such entries, and let the
bitmap index scan filter duplicates.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] System load consideration before spawning parallel workers

2016-08-05 Thread Jim Nasby

On 8/1/16 1:08 AM, Haribabu Kommi wrote:

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only for
the users who want it, can be loaded.


I think we need to provide more tools to allow users to control system 
behavior on a more dynamic basis. How many workers to launch is a good 
example. There's more reasons than just CPU that parallel workers can 
help (IO being an obvious one, but possible other things like GPU). 
Another example is allowing users to alter the selection process used by 
autovac workers.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v19)

2016-08-05 Thread Michael Paquier
On Sat, Aug 6, 2016 at 2:38 AM, Tomas Vondra
 wrote:
> On 08/05/2016 06:24 AM, Michael Paquier wrote:
>>
>> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
>>  wrote:
>>>
>>> Attached is v19 of the "multivariate stats" patch series - essentially
>>> v18
>>> rebased on top of current master. Aside from a few bug fixes, the main
>>> improvement is addition of SGML docs demonstrating the statistics in a
>>> way
>>> similar to the current "Row Estimation Examples" (and the docs are
>>> actually
>>> in the same section). I've tried to keep the right amount of technical
>>> detail (and pointing to the right README for additional details), but
>>> this
>>> may need improvements. I have not written docs explaining how statistics
>>> may
>>> be combined yet (more about this later).
>>
>>
>> What we have here is quite something:
>> $ git diff master --stat | tail -n1
>>  77 files changed, 12809 insertions(+), 65 deletions(-)
>> I will try to get familiar on the topic and added myself as a reviewer
>> of this patch. Hopefully I'll get feedback soon.
>
>
> Yes, it's a large patch. Although 25% of the insertions are SGML docs,
> regression tests and READMEs, and large part of the remaining ~9k insertions
> are comments. But it may still be overwhelming, no doubt about that.
>
> FWIW, if someone is interested in the patch but is unsure where to start,
> I'm ready to help with that as much as possible. For example if you happen
> to go to PostgresOpen, feel free to drag me to a corner and ask me as many
> questions as you want ...

Sure. Only PGconf SV is on my track this year.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 03:26:15PM -0400, Bruce Momjian wrote:
> > I just don't know how would you do that without delaying/not-doing HOT chain
> > prune. As soon as root and intermediate HOT tuples are gone, all 
> > information is
> > lost. You may delay that, but that will defeat the whole purpose. If chains 
> > are
> > not pruned in-time, you may not find any free space in the page and end up
> > doing a cold update anyways. But may be I am missing something and Claudio 
> > has
> > a different idea.
> 
> Yes, pruning would be a problem.  :-(
> 
> A check only needs to happen when the indexed key changes, right?  So we
> are comparing the cost of adding an index key vs. the cost of trying to
> find a matching key/ctid in the index.  Seems the later is cheaper, but
> it is not obvious.

Here is an illustration of the issue:

CREATE TABLE test (col1 INTEGER, col2 INTEGER, col3 INTEGER);

-- index first two columns
CREATE INDEX i_test1 ON test (col1);
CREATE INDEX i_test2 ON test (col2);

INSERT INTO test VALUES (1, 7, 20);

-- create a HOT chain
UPDATE test SET col3 = 30;

-- change the HOT chain to a WARM chain, changes i_test2 but not i_test1
UPDATE test SET col2 = 8;

-- we should avoid adding a col2=7 i_test2 index tuple
-- because we already have one;  how do we know that?
UPDATE test SET col2 = 7;

-- we should see only one col2=7 i_test2 index tuple
SELECT * FROM test WHERE col2 = 7;
 col1 | col2 | col3
--+--+--
1 |7 |   30

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Tom Lane
Andrew Gierth  writes:
> Objection withdrawn.

OK, thanks.  What shall we do about Andreas' request to back-patch this?
I'm personally willing to do it, but there is the old bugaboo of "maybe
it will destabilize a plan that someone is happy with".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Andres Freund
On 2016-08-05 16:44:20 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-05 16:35:02 -0400, Tom Lane wrote:
> >> In particular, it seems to me that rather than implement just this,
> >> we really ought to provide an API that lets FDWs actually implement
> >> TRUNCATE if they feel like it.  Having the trigger and not TRUNCATE
> >> capability itself just screams "half baked", wouldn't you say?
> 
> > Both is fine with me. I do object to the position that we need an answer
> > for all utility commands - that seems like a too high hurdle to pass -
> > but implementing truncate for FDWs directly sounds good.
> 
> To clarify: I was certainly not suggesting that we need to implement
> more than that in the first go.  I was just saying that some sort of
> long-term roadmap about utility commands for FDWs would be a good idea.

Well, my problem with that is that I don't see TRUNCATE as being in the
same camp as most other utility commands; thus I'm not sure there's
really a coherent view for it and the rest. In the end it's just an
optimized DELETE, and we didn't say that other DML needs to provide a
view for utility commands either.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-08-05 Thread Peter Geoghegan
On Fri, Aug 5, 2016 at 9:06 AM, Robert Haas  wrote:
> To some extent, sure, absolutely.  But it's our job as developers to
> try to foresee and minimize those cases.  When Noah was at
> EnterpriseDB a few years ago and we were talking about parallel
> internal sort, Noah started by doing a survey of the literature and
> identified parallel quicksort as the algorithm that seemed best for
> our use case.  Of course, every time quicksort partitions the input,
> you get two smaller sorting problems, so it's easy to see how to use 2
> CPUs after the initial partitioning step has been completed and 4 CPUs
> after each of those partitions has been partitioned again, and so on.
> However, that turns out not to be good enough because the first
> partitioning step can consume a significant percentage of the total
> runtime - so if you only start parallelizing after that, you're
> leaving too much on the table.  To avoid that, the algorithm he was
> looking at had a (complicated) way of parallelizing the first
> partitioning step; then you can, it seems, do the full sort in
> parallel.
>
> There are some somewhat outdated and perhaps naive ideas about this
> that we wrote up here:
>
> https://wiki.postgresql.org/wiki/Parallel_Sort

I'm familiar with that effort. I think that when research topics like
sorting, it can sometimes be a mistake to not look at an approach
specifically recommended by the database research community. A lot of
the techniques we've benefited from within tuplesort.c have been a
matter of addressing memory latency as a bottleneck; techniques that
are fairly simple and not worth writing a general interest paper on.
Also, things like abbreviated keys are beneficial in large part
because people tend to follow the first normal form, and therefore an
abbreviated key can contain a fair amount of entropy most of the time.
Similarly, Radix sort seems really cool, but our requirements around
generality seem to make it impractical.

> Anyway, you're proposing an algorithm that can't be fully
> parallelized.  Maybe that's OK.  But I'm a little worried about it.
> I'd feel more confident if we knew that the merge could be done in
> parallel and were just leaving that to a later development stage; or
> if we picked an algorithm like the one above that doesn't leave a
> major chunk of the work unparallelizable.

I might be able to resurrect the parallel merge stuff, just to guide
reviewer intuition on how much that can help or hurt. I can probably
repurpose it to show you the mixed picture on how effective it is. I
think it might help more with collatable text that doesn't have
abbreviated keys, for example, because you can use more of the
machines memory bandwidth for longer. But for integers, it can hurt.
(That's my recollection; I prototyped parallel merge a couple of
months ago now.)

>> Isn't that what you
>> generally see with queries that show off the parallel join capability?
>
> For nested loop joins, no.  The whole join operation can be done in
> parallel.

Sure, I know, but I'm suggesting that laws-of-physics problems may
still be more significant than implementation deficiencies, even
though those deficiencies should need to be stamped out. Linear
scalability is really quite rare for most database workloads.

> Obviously, parallel query is subject to a long list of annoying
> restrictions at this point.  On queries that don't hit any of those
> restrictions we can get 4-5x speedup with a leader and 4 workers.  As
> we expand the range of plan types that we can construct, I think we'll
> see those kinds of speedups for a broader range of queries.  (The
> question of exactly why we top out with as few workers as currently
> seems to be the case needs more investigation, too; maybe contention
> effects?)

You're probably bottlenecked on memory bandwidth. Note that I showed
improvements with 8 workers, not 4. 4 Workers are slower than 8, but
not by that much.

>> https://www.mssqltips.com/sqlservertip/3100/reduce-time-for-sql-server-index-rebuilds-and-update-statistics/
>
> I do agree that it is important not to have unrealistic expectations.

Great. My ambition for this patch is that it put parallel CREATE INDEX
on a competitive footing against the implementations featured in other
major systems. I don't think we need to do everything at once, but I
have no intention of pushing forward with something that doesn't do
respectably there. I also want to avoid partitioning in the first
version of this, and probably in any version that backs CREATE INDEX.
I've only made minimal changes to the tuplesort.h interface here to
support parallelism. That flexibility counts for a lot, IMV.

>> As I've said, there is probably a good argument to be made for
>> partitioning to increase parallelism. But, that involves risks around
>> the partitioning being driven by statistics or a cost model

> Yes.  Rushabh is working on that, and Finalize GroupAggregate ->
> Gather Merge -> Partial 

Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-05 16:35:02 -0400, Tom Lane wrote:
>> In particular, it seems to me that rather than implement just this,
>> we really ought to provide an API that lets FDWs actually implement
>> TRUNCATE if they feel like it.  Having the trigger and not TRUNCATE
>> capability itself just screams "half baked", wouldn't you say?

> Both is fine with me. I do object to the position that we need an answer
> for all utility commands - that seems like a too high hurdle to pass -
> but implementing truncate for FDWs directly sounds good.

To clarify: I was certainly not suggesting that we need to implement
more than that in the first go.  I was just saying that some sort of
long-term roadmap about utility commands for FDWs would be a good idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Andres Freund
On 2016-08-05 16:35:02 -0400, Tom Lane wrote:
> In particular, it seems to me that rather than implement just this,
> we really ought to provide an API that lets FDWs actually implement
> TRUNCATE if they feel like it.  Having the trigger and not TRUNCATE
> capability itself just screams "half baked", wouldn't you say?

Both is fine with me. I do object to the position that we need an answer
for all utility commands - that seems like a too high hurdle to pass -
but implementing truncate for FDWs directly sounds good.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-05 14:05:02 -0400, Robert Haas wrote:
>> I agree, but I still think it's weird if foreign tables support
>> TRUNCATE itself not but triggers on TRUNCATE.

> You mean the other way round?  To me this seems very comparable to
> INSTEAD triggers, but ...

While I have no particular objection to allowing TRUNCATE triggers on
FDWs, I concur with Robert's feeling that we ought to sketch out a plan
for what utility commands on foreign tables should do in general, before
we go introducing individual new features.  It would be annoying if we
stuck this in with little forethought and then found out it was
inconsistent with other stuff people want to add later.

In particular, it seems to me that rather than implement just this,
we really ought to provide an API that lets FDWs actually implement
TRUNCATE if they feel like it.  Having the trigger and not TRUNCATE
capability itself just screams "half baked", wouldn't you say?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add hint for people who place EXECUTE USING arguments in parentheses (in plpgsql)

2016-08-05 Thread David G. Johnston
Basically,

diff --git a/src/backend/parser/parse_param.c
b/src/backend/parser/parse_param.c
index b402843..97064fc 100644
--- a/src/backend/parser/parse_param.c
+++ b/src/backend/parser/parse_param.c
@@ -108,6 +108,9 @@ fixed_paramref_hook(ParseState *pstate, ParamRef *pref)
  ereport(ERROR,
  (errcode(ERRCODE_UNDEFINED_PARAMETER),
  errmsg("there is no parameter $%d", paramno),
+ parstate->numParams == 1 && < It is of pseudo-type record >
+ ? errhint("%s", _("did you incorrectly enclose USING arguments in
parentheses?"))
+ : 0
  parser_errposition(pstate, pref->location)));

  param = makeNode(Param);


I didn't spend too much time trying to figure out how to test that a
parameter is composite/record typed...

I think a false positive on SQL EXECUTE is going to be very small...but I
choose here mostly out of ease - a fix in pl/pgsql would be more correct.

David J.


Re: [HACKERS] Notice lock waits

2016-08-05 Thread Jeff Janes
On Fri, Aug 5, 2016 at 12:17 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I have it PGC_SUSET because it does send some tiny amount of
>> information about the blocking process (the PID) to the blocked
>> process.  That is probably too paranoid, because the PID can be seen
>> by anyone in the pg_locks table anyway.
>
> Why not just leave out the PID?  I think it's often far too simplistic
> to blame a lock wait on a single other process, anyway.

It actually wasn't including the PID anyway, as the
errdetail_log_plural was not getting passed to the client.

So I changed it to PGC_USERSET, didn't attempt to include details that
won't be sent anyway (although it would be nice for a superuser to be
able to see the statement text of the blocker, but that is a bigger
issue than I am willing to deal with here) and have removed a memory
leak/bug I introduced by foolishly trying to use 'continue' to avoid
introducing yet another layer of nesting.

Cheers,

Jeff


notice_lock_waits-V02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote:
> On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian  wrote:
> 
> On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
> > But doing the WARM chain backtracking and checking for previous
> > versions with appropriate keys should be enough to support this use
> > case too, it just needs to be well optimized to avoid seriously
> > impacting performance on the average case.
> 
> Yes, that was an idea I had to --- if the in-page HOT chain already has
> the key, we know it is already in the index and we can skip the index
> addition.
> 
> I just don't know how would you do that without delaying/not-doing HOT chain
> prune. As soon as root and intermediate HOT tuples are gone, all information 
> is
> lost. You may delay that, but that will defeat the whole purpose. If chains 
> are
> not pruned in-time, you may not find any free space in the page and end up
> doing a cold update anyways. But may be I am missing something and Claudio has
> a different idea.

Yes, pruning would be a problem.  :-(

A check only needs to happen when the indexed key changes, right?  So we
are comparing the cost of adding an index key vs. the cost of trying to
find a matching key/ctid in the index.  Seems the later is cheaper, but
it is not obvious.

I think I see Claudio's idea now --- from his diagram, you can have WARM
chains span multiple HOT chains.  What he is doing is creating a new HOT
chain everytime _any_ key changes, and he knows the entire HOT chain has
idential values for all indexes.  He moves from one HOT chain to another
during an index scan by checking if the index value is the same in the
old and new HOT chain (that is the same WARM chain).

This does create more HOT chains where the root ctid cannot be removed,
but it does avoid the index key/ctid check because any entry in the HOT
chain has identical keys.  What this would not handle is when an entire
HOT chain is pruned, as the keys would be gone.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-08-05 Thread Tom Lane
Jeff Janes  writes:
> I have it PGC_SUSET because it does send some tiny amount of
> information about the blocking process (the PID) to the blocked
> process.  That is probably too paranoid, because the PID can be seen
> by anyone in the pg_locks table anyway.

Why not just leave out the PID?  I think it's often far too simplistic
to blame a lock wait on a single other process, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-08-05 Thread Julien Rouhaud
On 05/08/2016 19:00, Jeff Janes wrote:
> One time too many, I ran some minor change using psql on a production
> server and was wondering why it was taking so much longer than it did
> on the test server.  Only to discover, after messing around with
> opening new windows and running queries against pg_stat_activity and
> pg_locks and so on, that it was waiting for a lock.
> 
> So I created a new guc, notice_lock_waits, which acts like
> log_lock_waits but sends the message as NOTICE so it will show up on
> interactive connections like psql.
> 
> I turn it on in my .psqlrc, as it doesn't make much sense for me to
> turn it on in non-interactive sessions.
> 
> A general facility for promoting selected LOG messages to NOTICE would
> be nice, but I don't know how to design or implement that.  This is
> much easier, and I find it quite useful.
> 
> I have it PGC_SUSET because it does send some tiny amount of
> information about the blocking process (the PID) to the blocked
> process.  That is probably too paranoid, because the PID can be seen
> by anyone in the pg_locks table anyway.
> 
> Do you think this is useful and generally implemented in the correct
> way?  If so, I'll try to write some sgml documentation for it.
> 

I really like the idea.

I'm not really sure on current implementation.  Unless I'm wrong,
disabling log_lock_waits would also disable notice_lock_waits, even if
it's on.

Maybe a new value for log_lock_waits, like "interactive". If switching
this GUC from bool to enum is not acceptable or allowing to see blocking
PID for anyone is an issue, then maybe adding a new GUC to say to also
send a NOTICE instead?

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 02:07:18PM -0400, Robert Haas wrote:
> On Fri, Aug 5, 2016 at 11:06 AM, Bruce Momjian  wrote:
> > On Fri, Aug  5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote:
> >> On 8/2/16 12:51 PM, Bruce Momjian wrote:
> >> > Yes, that's a strong argument for using a space.  I have adjusted the
> >> > patch to use spaces in all reasonable places.  Patch attached, which I
> >> > have gzipped because it was 133 KB.  (Ah, see what I did there?)  :-)
> >> >
> >> > I am thinking of leaving the 9.6 docs alone as I have already made them
> >> > consistent (no space) with minimal changes.  We can make it consistent
> >> > the other way in PG 10.
> >>
> >> I don't think anyone wanted to *remove* the spaces in the documentation.
> >>  I think this change makes the documentation harder to read.
> >
> > Well, we had spaces in only a few places in the docs, and as I said, it
> > is not consistent.  Do you want those few put back for 9.6?
> 
> +1 for that.  I can't see how it's good for 10 to be one way, 9.6 to
> be the opposite way, and 9.5 and prior to be someplace in the middle.
> That seems like a back-patching mess.

OK, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

2016-08-05 Thread Tom Lane
Thomas Munro  writes:
> The assertion in tsvector_delete_by_indices fails because its counting
> algorithm doesn't expect indices_to_delete to contain multiple
> references to the same index.  Maybe that could be fixed by
> uniquifying in tsvector_delete_arr before calling it, but since
> tsvector_delete_by_indices already qsorts its input, it should be able
> to handle duplicates cheaply.

I poked at this and realized that that's not sufficient.  If there are
duplicates in indices_to_delete, then the initial estimate

tsout->size = tsv->size - indices_count;

is wrong because indices_count is an overestimate of how many lexemes
will be removed.  And because the calculation "dataout = STRPTR(tsout)"
depends on tsout->size, we can't just wait till later to get it right.

We could possibly initialize tsout->size = tsv->size (the maximum
possible value), thereby ensuring that the WordEntry array doesn't
overlap the dataout area; compute the correct tsout->size in the loop;
and then memmove the data area into place to collapse out wasted space.
But I think it might be simpler and better-performant just to de-dup the
indices_to_delete array after qsort'ing it; that would certainly win
for the case of indices_count == 1.

The other problems I noted with failure to delete items seem to stem
from the fact that tsvector_delete_arr relies on tsvector_bsearch to
find items, but the input tsvector is not sorted (never mind de'duped)
by array_to_tsvector.  This seems like simple brain fade in
array_to_tsvector, as AFAICS that's a required property of tsvectors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables

2016-08-05 Thread Fujii Masao
On Tue, Aug 2, 2016 at 2:59 AM, Fujii Masao  wrote:
> On Tue, Aug 2, 2016 at 2:48 AM, Andres Freund  wrote:
>> Hi Fujii,
>>
>> On 2016-07-28 16:44:37 +0900, Fujii Masao wrote:
>>> On Sat, Jul 2, 2016 at 7:01 AM, Tom Lane  wrote:
>>> > Andres Freund  writes:
>>> >> On 2016-06-30 10:14:04 -0400, Tom Lane wrote:
>>> >>> Fujii Masao  writes:
>>>  As far as I read the code of the function, those arguments don't seem 
>>>  to
>>>  be necessary. So I'm afraid that the pg_proc entry for the function 
>>>  might
>>>  be incorrect and those two arguments should be removed from the 
>>>  definition.
>>> >
>>> >>> Sure looks that way from here.  Copy-and-paste from the previous
>>> >>> line in pg_proc.h, perhaps?
>>> >
>>> >> Yes, that's clearly wrong.
>>>
>>> Attached patch (pg_replication_origin_xact_reset_9.6.patch) fixes this.
>>> We need to apply this at least before RC1 of PostgreSQL9.6 will be released
>>> because the patch needs the change of catalog version.
>>>
>>> >> Damn.  Can't fix that for 9.5 anymore. The
>>> >> function isn't all that important (especially not from SQL), but still,
>>> >> that's annoying.  I'm inclined to just remove the args in 9.6. We could
>>> >> also add a note to the 9.5 docs, adding that the arguments are there by
>>> >> error?
>>>
>>> What about the attched patch (pg_replication_origin_xact_reset_9.5.patch)?
>>
>> except for the strictness remark in the other email,
>
> Yes, you're right. My careless mistake... :(
>
>> these look sane to
>> me.  Do you want to push them?  I'll do so by Wednesday otherwise, to
>> leave some room before the next RC.
>
> Could you do that if possible?

Pushed since right now I have time to do that. Anyway, thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Andres Freund
On 2016-08-05 14:05:02 -0400, Robert Haas wrote:
> On Fri, Aug 5, 2016 at 2:04 PM, Andres Freund  wrote:
> > On 2016-08-05 13:32:18 -0400, Robert Haas wrote:
> >> I think if we're going to add support utility commands on foreign
> >> tables, we ought to think about all of the different utility commands
> >> that someone might want and what exactly we want the behavior to be.
> >
> >> For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE.
> >> We might interpret TRUNCATE or CLUSTER as a request to dispatch the
> >> same request for the remote side, but ANALYZE can't mean that: it has
> >> to mean gather local statistics.  And what if the other side is not PG
> >> and supports other operations that we don't have, like OPTIMIZE TABLE
> >> or DISENGAGE FTL?
> >
> > That's not really comparable imo - we don't have triggers for those
> > locally either. For better or worse we've decided that TRUNCATE is more
> > like DML than DDL.
> 
> I agree, but I still think it's weird if foreign tables support
> TRUNCATE itself not but triggers on TRUNCATE.

You mean the other way round?  To me this seems very comparable to
INSTEAD triggers, but ...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 11:06 AM, Bruce Momjian  wrote:
> On Fri, Aug  5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote:
>> On 8/2/16 12:51 PM, Bruce Momjian wrote:
>> > Yes, that's a strong argument for using a space.  I have adjusted the
>> > patch to use spaces in all reasonable places.  Patch attached, which I
>> > have gzipped because it was 133 KB.  (Ah, see what I did there?)  :-)
>> >
>> > I am thinking of leaving the 9.6 docs alone as I have already made them
>> > consistent (no space) with minimal changes.  We can make it consistent
>> > the other way in PG 10.
>>
>> I don't think anyone wanted to *remove* the spaces in the documentation.
>>  I think this change makes the documentation harder to read.
>
> Well, we had spaces in only a few places in the docs, and as I said, it
> is not consistent.  Do you want those few put back for 9.6?

+1 for that.  I can't see how it's good for 10 to be one way, 9.6 to
be the opposite way, and 9.5 and prior to be someplace in the middle.
That seems like a back-patching mess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 2:04 PM, Andres Freund  wrote:
> On 2016-08-05 13:32:18 -0400, Robert Haas wrote:
>> I think if we're going to add support utility commands on foreign
>> tables, we ought to think about all of the different utility commands
>> that someone might want and what exactly we want the behavior to be.
>
>> For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE.
>> We might interpret TRUNCATE or CLUSTER as a request to dispatch the
>> same request for the remote side, but ANALYZE can't mean that: it has
>> to mean gather local statistics.  And what if the other side is not PG
>> and supports other operations that we don't have, like OPTIMIZE TABLE
>> or DISENGAGE FTL?
>
> That's not really comparable imo - we don't have triggers for those
> locally either. For better or worse we've decided that TRUNCATE is more
> like DML than DDL.

I agree, but I still think it's weird if foreign tables support
TRUNCATE itself not but triggers on TRUNCATE.  And I don't want to
start supporting TRUNCATE itself without a bit more thought about
where we go from there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Andres Freund
On 2016-08-05 13:32:18 -0400, Robert Haas wrote:
> I think if we're going to add support utility commands on foreign
> tables, we ought to think about all of the different utility commands
> that someone might want and what exactly we want the behavior to be.

> For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE.
> We might interpret TRUNCATE or CLUSTER as a request to dispatch the
> same request for the remote side, but ANALYZE can't mean that: it has
> to mean gather local statistics.  And what if the other side is not PG
> and supports other operations that we don't have, like OPTIMIZE TABLE
> or DISENGAGE FTL?

That's not really comparable imo - we don't have triggers for those
locally either. For better or worse we've decided that TRUNCATE is more
like DML than DDL.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Pavan Deolasee
On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian  wrote:

> On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
> > But doing the WARM chain backtracking and checking for previous
> > versions with appropriate keys should be enough to support this use
> > case too, it just needs to be well optimized to avoid seriously
> > impacting performance on the average case.
>
> Yes, that was an idea I had to --- if the in-page HOT chain already has
> the key, we know it is already in the index and we can skip the index
> addition.
>
>
I just don't know how would you do that without delaying/not-doing HOT
chain prune. As soon as root and intermediate HOT tuples are gone, all
information is lost. You may delay that, but that will defeat the whole
purpose. If chains are not pruned in-time, you may not find any free space
in the page and end up doing a cold update anyways. But may be I am missing
something and Claudio has a different idea.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
> But doing the WARM chain backtracking and checking for previous
> versions with appropriate keys should be enough to support this use
> case too, it just needs to be well optimized to avoid seriously
> impacting performance on the average case.

Yes, that was an idea I had to --- if the in-page HOT chain already has
the key, we know it is already in the index and we can skip the index
addition.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Alvaro Herrera
Anastasia Lubennikova wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.

Hm.  I'm hacking on heapam.c pretty heavily ATM.  Your first patch
causes no problem I think, or if it does it's probably pretty minor, so
that's okay.  I'm unsure about the second one -- I think it's okay too,
because it mostly seems to be about moving stuff from heapam.c to hio.c
and shuffling some code around that I don't think I'm modifying.

> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.

Agreed.  But changing its name while keeping it in heapam.c does not
really improve things enough.  I'd rather have it moved elsewhere that's
not specific to "heaps" (somewhere in access/common perhaps).  However,
renaming the functions would break third-party code for no good reason.
I propose that we only rename things if we also break it for other
reasons (say because the API changes in a fundamental way).

> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...

I don't see us fixing this bit, really.  Historical baggage and all
that.  For example, a lot of SQL queries use "WHERE relkind = 'r'".  If
we change the name, we should probably also change the relkind constant,
and that would break the queries.  If we change the name and do not
change the constant, then we have to have a comment "we call them
RELKIND_TABLE but the char is r because it was called RELATION
previously", which isn't any better.

> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Claudio Freire
On Fri, Aug 5, 2016 at 2:26 PM, Pavan Deolasee  wrote:
> On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire 
> wrote:
>>
>> On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee 
>> wrote:
>> >
>> >
>> > I don't see why it is hard.  Trying to find the index entry for the old
>> > update row seems odd, and updating an index row seems odd, but skipping
>> > an index addition for the new row seems simple.  Is the problem
>> > concurrent activity?  I assumed already had that ability to add to HOT
>> > chains because we have to lock the update chain.
>>
>>
>> Think of an index over a 1TB table whose keys have only 2 values: true
>> and false.
>>
>> Sure, it's a horrible index. But I've seen things like that, and I've
>> seen cases when they're useful too.
>>
>> So, conceptually, for each key you have circa N/2 tids on the index.
>> nbtree finds the leftmost valid insert point comparing keys, it
>> doesn't care about tids, so to find the index entries that point to
>> the page where the new tuple is, you'd have to scan the N/2 tids in
>> the index, an extremely expensive operation.
>>
>
> Well, it's always going to be extremely hard to solve for all use cases.
> This is one such extreme case and we should just give up and do cold update.
>
> I think we can look at the index type (unique vs non-unique) along with
> table statistics to find what fraction of column values are distinct and
> then estimate whether its worthwhile to look for duplicate (key, CTID) or
> just do a cold update. In addition put some cap of how hard we try once we
> decide to check for duplicates and give up after we cross that threshold.

I don't think bailing out in this case is necessary, and it could be
more common than you'd think. It doesn't need to be this extreme to
cause the issue, it only needs equal-key runs that span more than an
index page, and that could be far more common than the extreme example
I gave.

But doing the WARM chain backtracking and checking for previous
versions with appropriate keys should be enough to support this use
case too, it just needs to be well optimized to avoid seriously
impacting performance on the average case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Also, the way that the value is calculated in the
 Tom> samples-not-all-distinct case corresponds to the way I have it in
 Tom> the patch.

Ahh, gotcha.  You're referring to this:

/*
 * If we estimated the number of distinct values at more than 10% of
 * the total row count (a very arbitrary limit), then assume that
 * stadistinct should scale with the row count rather than be a fixed
 * value.
 */
if (stats->stadistinct > 0.1 * totalrows)
stats->stadistinct = -(stats->stadistinct / totalrows);

where "totalrows" includes nulls obviously. So this expects negative
stadistinct to be scaled by the total table size, and the all-distinct
case should do the same.

Objection withdrawn.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

2016-08-05 Thread Tom Lane
Robert Haas  writes:
> Action within 72 hours now seems inadequate; we are scheduled to wrap
> rc1 on Monday.  We need someone to either fix these bugs very very
> soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK
> to ship rc1 with these known defects, or postpone the planned release.

Given the time of year, I'd not be surprised if Oleg and Teodor are on
vacation.  In view of the time pressure, I'll take a whack at fixing this.
I think that Thomas Munro's suggestion is good as far as fixing the Assert
failure is concerned.  I do not know where the other problems are, but
maybe I can find them ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 10:39 AM, Andres Freund  wrote:
> On 2016-08-05 10:33:49 -0400, Tom Lane wrote:
>> Murat Tuncer  writes:
>> > I recently hit a road blocker when I tried to create a truncate trigger on
>> > a foreign table. trigger.c::CreateTrigger() function has explicit check to
>> > block truncate trigger on foreign tables.
>>
>> That's good, because we don't implement TRUNCATE on foreign tables: there
>> is nothing in the FDW API that would support it.  Not much point in
>> declaring a trigger for an event that can never happen.
>
> Well, allowing BEFORE triggers to return NULL or something, preventing
> the execution of the rest, would be such an implementation, and also
> independently useful.

I guess.

I think if we're going to add support utility commands on foreign
tables, we ought to think about all of the different utility commands
that someone might want and what exactly we want the behavior to be.
For example, consider CLUSTER or CREATE INDEX or VACUUM or ANALYZE.
We might interpret TRUNCATE or CLUSTER as a request to dispatch the
same request for the remote side, but ANALYZE can't mean that: it has
to mean gather local statistics.  And what if the other side is not PG
and supports other operations that we don't have, like OPTIMIZE TABLE
or DISENGAGE FTL?

That isn't, strictly speaking, a reason to reject a patch that just
allows TRUNCATE triggers on FDWs to "return NULL or something", but I
can't get very excited about such a patch because I think the utility
is fairly marginal.  I'd rather have a little more of a plan than that
before we go start tinkering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v19)

2016-08-05 Thread Tomas Vondra

On 08/05/2016 06:24 AM, Michael Paquier wrote:

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
 wrote:

Attached is v19 of the "multivariate stats" patch series - essentially v18
rebased on top of current master. Aside from a few bug fixes, the main
improvement is addition of SGML docs demonstrating the statistics in a way
similar to the current "Row Estimation Examples" (and the docs are actually
in the same section). I've tried to keep the right amount of technical
detail (and pointing to the right README for additional details), but this
may need improvements. I have not written docs explaining how statistics may
be combined yet (more about this later).


What we have here is quite something:
$ git diff master --stat | tail -n1
 77 files changed, 12809 insertions(+), 65 deletions(-)
I will try to get familiar on the topic and added myself as a reviewer
of this patch. Hopefully I'll get feedback soon.


Yes, it's a large patch. Although 25% of the insertions are SGML docs, 
regression tests and READMEs, and large part of the remaining ~9k 
insertions are comments. But it may still be overwhelming, no doubt 
about that.


FWIW, if someone is interested in the patch but is unsure where to 
start, I'm ready to help with that as much as possible. For example if 
you happen to go to PostgresOpen, feel free to drag me to a corner and 
ask me as many questions as you want ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index with larger hashes

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshall  wrote:
> I have been following the recent discussions on increasing the
> size of the hash function used in Postgres and the work to
> provide WAL and performance improvements for hash indexes.
> I know it was mentioned when we moved to the new hashing
> functions, but the existing functions do provide an additional
> 32-bits of hash. We currently do not use them, but they are
> already calculated.
>
> It had occurred to me that one way to decrease the space used
> to store the hash value would be to include information about
> the page number to determine the actual value. For example,
> a hash index of 65k pages (540mb) would get two additional
> bytes of hash with no associated storage cost. Also, if you
> subdivided the hash page into say 128 sub-pages you would
> get the extra 2 bytes of hash in a 4mb index. In this way,
> the bigger the hash index is, the more bits you get for free.
>
> Just wanted to throw it out there.

I'm not sure I understand exactly what you are proposing here.
Suppose we have 64 bits of hashcode and (1 << N) buckets.  Currently,
we store hashcode bits 0..31 on each item.  Maybe what you're saying
is that we could instead store bits B..(31+B) on each item - that is,
cram in as many extra bits on each individual item as log2(nbuckets).
The problem with that is that it would make it very hard to split
buckets.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Pavan Deolasee
On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire 
wrote:

> On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee 
> wrote:
> >
> >
> > I don't see why it is hard.  Trying to find the index entry for the old
> > update row seems odd, and updating an index row seems odd, but skipping
> > an index addition for the new row seems simple.  Is the problem
> > concurrent activity?  I assumed already had that ability to add to HOT
> > chains because we have to lock the update chain.
>
>
> Think of an index over a 1TB table whose keys have only 2 values: true
> and false.
>
> Sure, it's a horrible index. But I've seen things like that, and I've
> seen cases when they're useful too.
>
> So, conceptually, for each key you have circa N/2 tids on the index.
> nbtree finds the leftmost valid insert point comparing keys, it
> doesn't care about tids, so to find the index entries that point to
> the page where the new tuple is, you'd have to scan the N/2 tids in
> the index, an extremely expensive operation.
>
>
Well, it's always going to be extremely hard to solve for all use cases.
This is one such extreme case and we should just give up and do cold
update.

I think we can look at the index type (unique vs non-unique) along with
table statistics to find what fraction of column values are distinct and
then estimate whether its worthwhile to look for duplicate (key, CTID) or
just do a cold update. In addition put some cap of how hard we try once we
decide to check for duplicates and give up after we cross that threshold.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] money type overflow checks

2016-08-05 Thread Tom Lane
Peter Eisentraut  writes:
> The input function of the money type has no overflow checks:

Ugh.

> (Is checking for < 0 a valid overflow check?

No, I don't think it's sufficient after a multiplication by 10.  That
would be enough to shift some bits clear out of the word, but there's
no certainty that the new sign bit would be 1.

The scheme used in scanint8 is safe.  But I think it was written that way
mainly to avoid hard-wired assumptions about how wide int64 is, a
consideration that's a mite obsolete now.  You could possibly avoid the
cost of a division by relying on comparisons to PG_INT64_MAX/10.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-05 Thread Simon Riggs
On 5 August 2016 at 16:22, Andres Freund  wrote:
> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
>> as promised here is WIP version of logical replication patch.
>
> Yay!

Yay2

> I'm about to head out for a week of, desperately needed, holidays, but
> after that I plan to spend a fair amount of time helping to review
> etc. this.

Have a good one.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Notice lock waits

2016-08-05 Thread Jeff Janes
One time too many, I ran some minor change using psql on a production
server and was wondering why it was taking so much longer than it did
on the test server.  Only to discover, after messing around with
opening new windows and running queries against pg_stat_activity and
pg_locks and so on, that it was waiting for a lock.

So I created a new guc, notice_lock_waits, which acts like
log_lock_waits but sends the message as NOTICE so it will show up on
interactive connections like psql.

I turn it on in my .psqlrc, as it doesn't make much sense for me to
turn it on in non-interactive sessions.

A general facility for promoting selected LOG messages to NOTICE would
be nice, but I don't know how to design or implement that.  This is
much easier, and I find it quite useful.

I have it PGC_SUSET because it does send some tiny amount of
information about the blocking process (the PID) to the blocked
process.  That is probably too paranoid, because the PID can be seen
by anyone in the pg_locks table anyway.

Do you think this is useful and generally implemented in the correct
way?  If so, I'll try to write some sgml documentation for it.

Cheers,

Jeff


notice_lock_waits-V01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-05 Thread Robert Haas
On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev
 wrote:
> Thanks everyone for your remarks and comments!
>
> Here is an improved version of a patch.
>
> Main changes:
> * Patch passes `make installcheck`
> * Code is fully commented, also no more TODO's
>
> I wish I sent this version of a patch last time. Now I realize it was
> really hard to read and understand. Hope I managed to correct this
> flaw. If you believe that some parts of the code are still poorly
> commented or could be improved in any other way please let me know.
>
> And as usual, any other comments, remarks or questions are highly
> appreciated!

Three general comments:

1. It's always seemed to me that a huge problem with anything of this
sort is dependencies.  For example, suppose I create a fast temporary
table and then I create a functional index on the fast temporary table
that uses some SQL function defined in pg_proc.  Then, another user
drops the function.  Then, I try to use the index.  With regular
temporary tables, dependencies protect us here, but if there are no
catalog entries, I wonder how this can ever be made safe.  Similar
problems exist for triggers, constraints, RLS policies, and attribute
defaults.

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.

3. Currently, temporary tables are parallel-restricted: a query that
references temporary tables can use parallelism, but access to the
temporary tables is only possible from within the leader.  I suspect,
although I'm not entirely sure, that lifting this restriction would be
easier with our current temporary table implementation than with this
one, because the current temporary table implementation mostly relies
on a set of buffers that could be moved from backend-private memory to
DSM.  On a quick look, this implementation uses a bunch of new data
structures that are heavy on pointers, so that gets quite a bit more
complicated to make parallel-safe (unless we adopt threads instead of
processes!).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-05 Thread Jeff Janes
On Fri, Aug 5, 2016 at 3:25 AM, Tsunakawa, Takayuki
 wrote:
>> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
>> Yeah, I think I agree.  It would be bad to disable it by default on Unix,
>> because ps(1) is a very standard tool there, but the same argument doesn't
>> hold for Windows.
>
> It seems that we could reach a consensus.  The patch is attached.  I'll add 
> this to the next CommitFest.
>
>> Another route to a solution would be to find a cheaper way to update the
>> process title on Windows ... has anyone looked for alternatives?
>
> I couldn't find an alternative solution after asking some Windows support 
> staff.
>
> Regards
> Takayuki Tsunakawa

Shouldn't we change the code which initdb uses to create the example
postgresql.conf, so that it shows the commented out value as being
'off', when initdb is run on Windows?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-05 Thread Anastasia Lubennikova

30.07.2016 14:00, Andrew Borodin:

Here is BRIN-compatible version of patch. Now function
PageIndexTupleOverwrite consumes tuple size as a parameter, this
behavior is coherent with PageAddItem.
Also, i had to remove asserstion that item has a storage in the loop
that adjusts line pointers. It would be great if someone could check
that place (commented Assert(ItemIdHasStorage(ii)); in
PageIndexTupleOverwrite). I suspect that there might be a case when
linp's should not be adjusted.


Hi, I reviewed the code and I have couple of questions about
following code:

/* may have negative size here if new tuple is larger */
size_diff = oldsize - alignednewsize;
offset = ItemIdGetOffset(tupid);

if (offset < phdr->pd_upper || (offset + size_diff) > 
phdr->pd_special ||

offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("corrupted item offset: offset = %u, size = %u",
offset, (unsigned int) size_diff)));

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.

I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add  the check: (offset+size_diff < pd_lower).

Besides that moment, the patch seems pretty good.

BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column COMMENTs in CREATE TABLE?

2016-08-05 Thread Peter Eisentraut
On 8/5/16 11:58 AM, David Fetter wrote:
> For what it's worth, I tend to put the function body last.  That's
> just my taste, though.  Would it be hard to keep the ability to
> permute the stuff after
> 
> CREATE FUNCTION (args)
> RETURNS [SETOF] type
> 
> as we have it now?

I don't think anybody is suggesting to change that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

2016-08-05 Thread Robert Haas
On Thu, Aug 4, 2016 at 12:14 AM, Noah Misch  wrote:
> On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote:
>> I wrote:
>> > I'm thinking there are two distinct bugs here.
>>
>> Actually, make that three bugs.  I was so focused on the crashing
>> that I failed to notice that ts_delete wasn't producing sane answers
>> even when it didn't crash:
>
> [Action required within 72 hours.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> in advance of shipping 9.6rc1 next week.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

Action within 72 hours now seems inadequate; we are scheduled to wrap
rc1 on Monday.  We need someone to either fix these bugs very very
soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK
to ship rc1 with these known defects, or postpone the planned release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-08-05 Thread Robert Haas
On Wed, Aug 3, 2016 at 5:13 PM, Peter Geoghegan  wrote:
> On Wed, Aug 3, 2016 at 11:42 AM, Robert Haas  wrote:
>> I'm not going to say it's bad to be able to do things 2-2.5x faster,
>> but linear scalability this ain't - particularly because your 2.58x
>> faster case is using up to 7 or 8 times as much memory.  The
>> single-process case would be faster in that case, too: you could
>> quicksort.
>
> [ lengthy counter-argument ]

None of this convinces me that testing this in a way that is not
"apples to apples" is a good idea, nor will any other argument.

>> I also think that Amdahl's law is going to pinch pretty severely here.
>
> Doesn't that almost always happen, though?

To some extent, sure, absolutely.  But it's our job as developers to
try to foresee and minimize those cases.  When Noah was at
EnterpriseDB a few years ago and we were talking about parallel
internal sort, Noah started by doing a survey of the literature and
identified parallel quicksort as the algorithm that seemed best for
our use case.  Of course, every time quicksort partitions the input,
you get two smaller sorting problems, so it's easy to see how to use 2
CPUs after the initial partitioning step has been completed and 4 CPUs
after each of those partitions has been partitioned again, and so on.
However, that turns out not to be good enough because the first
partitioning step can consume a significant percentage of the total
runtime - so if you only start parallelizing after that, you're
leaving too much on the table.  To avoid that, the algorithm he was
looking at had a (complicated) way of parallelizing the first
partitioning step; then you can, it seems, do the full sort in
parallel.

There are some somewhat outdated and perhaps naive ideas about this
that we wrote up here:

https://wiki.postgresql.org/wiki/Parallel_Sort

Anyway, you're proposing an algorithm that can't be fully
parallelized.  Maybe that's OK.  But I'm a little worried about it.
I'd feel more confident if we knew that the merge could be done in
parallel and were just leaving that to a later development stage; or
if we picked an algorithm like the one above that doesn't leave a
major chunk of the work unparallelizable.

> Isn't that what you
> generally see with queries that show off the parallel join capability?

For nested loop joins, no.  The whole join operation can be done in
parallel.  For hash joins, yes: building the hash table once per
worker can run afoul of Amdahl's law in a big way.  That's why Thomas
Munro is working on fixing it:

https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap

Obviously, parallel query is subject to a long list of annoying
restrictions at this point.  On queries that don't hit any of those
restrictions we can get 4-5x speedup with a leader and 4 workers.  As
we expand the range of plan types that we can construct, I think we'll
see those kinds of speedups for a broader range of queries.  (The
question of exactly why we top out with as few workers as currently
seems to be the case needs more investigation, too; maybe contention
effects?)

>> If the final merge phase is a significant percentage of the total
>> runtime, picking an algorithm that can't parallelize the final merge
>> is going to limit the speedups to small multiples.  That's an OK place
>> to be as a result of not having done all the work yet, but you don't
>> want to get locked into it.  If we're going to have a substantial
>> portion of the work that can never be parallelized, maybe we've picked
>> the wrong algorithm.
>
> I suggest that this work be compared to something with similar
> constraints. I used Google to try to get some indication of how much
> of a difference parallel CREATE INDEX makes in other major database
> systems. This is all I could find:
>
> https://www.mssqltips.com/sqlservertip/3100/reduce-time-for-sql-server-index-rebuilds-and-update-statistics/

I do agree that it is important not to have unrealistic expectations.

> As I've said, there is probably a good argument to be made for
> partitioning to increase parallelism. But, that involves risks around
> the partitioning being driven by statistics or a cost model, and I
> don't think you'd be too on board with the idea of every CREATE INDEX
> after bulk loading needing an ANALYZE first. I tend to think of that
> as more of a parallel query thing, because you can often push down a
> lot more there, dynamic sampling might be possible, and there isn't a
> need to push all the tuples through one point in the end. Nothing I've
> done here precludes your idea of a sort-order-preserving gather node.
> I think that we may well need both.

Yes.  Rushabh is working on that, and Finalize GroupAggregate ->
Gather Merge -> Partial GroupAggregate -> Sort -> whatever is looking
pretty sweet.

>> The work on making the logtape infrastructure parallel-aware seems
>> very interesting and potentially useful for other things.  Sadly, I
>> don't 

[HACKERS] money type overflow checks

2016-08-05 Thread Peter Eisentraut
The input function of the money type has no overflow checks:

=> select '12345678901234567890'::money;
money
-
 -$13,639,628,150,831,692.72
(1 row)

The tests in the regression test file money.sql are bogus because they
only test the overflow checks of the bigint type before the cast.

Here is a patch that adds appropriate checks and tests.  We could
probably remove the bogus tests.

(Is checking for < 0 a valid overflow check?  We save the sign until the
very end, so it ought to work.  The code in int8.c works differently there.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 6fcce1d6c3685cfb5bacdb89981d4f6a3911dded Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 5 Aug 2016 11:50:53 -0400
Subject: [PATCH] Add overflow checks to money type input function

---
 src/backend/utils/adt/cash.c| 18 ++
 src/test/regress/expected/money.out | 47 +
 src/test/regress/sql/money.sql  | 11 +
 3 files changed, 76 insertions(+)

diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index b336185..0c06f71 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -197,6 +197,12 @@ cash_in(PG_FUNCTION_ARGS)
 		{
 			value = (value * 10) + (*s - '0');
 
+			if (value < 0)
+ereport(ERROR,
+		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+		 errmsg("value \"%s\" is out of range for type money",
+str)));
+
 			if (seen_dot)
 dec++;
 		}
@@ -216,10 +222,22 @@ cash_in(PG_FUNCTION_ARGS)
 	if (isdigit((unsigned char) *s) && *s >= '5')
 		value++;
 
+	if (value < 0)
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range for type money",
+		str)));
+
 	/* adjust for less than required decimal places */
 	for (; dec < fpoint; dec++)
 		value *= 10;
 
+	if (value < 0)
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range for type money",
+		str)));
+
 	/*
 	 * should only be trailing digits followed by whitespace, right paren,
 	 * trailing sign, and/or trailing currency symbol
diff --git a/src/test/regress/expected/money.out b/src/test/regress/expected/money.out
index 538235c..206f5c4 100644
--- a/src/test/regress/expected/money.out
+++ b/src/test/regress/expected/money.out
@@ -185,6 +185,53 @@ SELECT * FROM money_data;
  $123.46
 (1 row)
 
+-- input checks
+SELECT '1234567890'::money;
+   money   
+---
+ $1,234,567,890.00
+(1 row)
+
+SELECT '12345678901234567'::money;
+   money
+
+ $12,345,678,901,234,567.00
+(1 row)
+
+SELECT '123456789012345678'::money;
+ERROR:  value "123456789012345678" is out of range for type money
+LINE 1: SELECT '123456789012345678'::money;
+   ^
+SELECT '9223372036854775807'::money;
+ERROR:  value "9223372036854775807" is out of range for type money
+LINE 1: SELECT '9223372036854775807'::money;
+   ^
+SELECT '-12345'::money;
+money
+-
+ -$12,345.00
+(1 row)
+
+SELECT '-1234567890'::money;
+   money
+
+ -$1,234,567,890.00
+(1 row)
+
+SELECT '-12345678901234567'::money;
+money
+-
+ -$12,345,678,901,234,567.00
+(1 row)
+
+SELECT '-123456789012345678'::money;
+ERROR:  value "-123456789012345678" is out of range for type money
+LINE 1: SELECT '-123456789012345678'::money;
+   ^
+SELECT '-9223372036854775808'::money;
+ERROR:  value "-9223372036854775808" is out of range for type money
+LINE 1: SELECT '-9223372036854775808'::money;
+   ^
 -- Cast int4/int8 to money
 SELECT 1234567890::money;
money   
diff --git a/src/test/regress/sql/money.sql b/src/test/regress/sql/money.sql
index 09b9476..d07a616 100644
--- a/src/test/regress/sql/money.sql
+++ b/src/test/regress/sql/money.sql
@@ -57,6 +57,17 @@ CREATE TABLE money_data (m money);
 INSERT INTO money_data VALUES ('$123.459');
 SELECT * FROM money_data;
 
+-- input checks
+SELECT '1234567890'::money;
+SELECT '12345678901234567'::money;
+SELECT '123456789012345678'::money;
+SELECT '9223372036854775807'::money;
+SELECT '-12345'::money;
+SELECT '-1234567890'::money;
+SELECT '-12345678901234567'::money;
+SELECT '-123456789012345678'::money;
+SELECT '-9223372036854775808'::money;
+
 -- Cast int4/int8 to money
 SELECT 1234567890::money;
 SELECT 12345678901234567::money;
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Prevent "snapshot too old" from trying to return pruned TOAST tu

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 11:05 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Aug 4, 2016 at 10:58 PM, Robert Haas  wrote:
>>> What is the reason?  We refuse to separate frontend and backend
>>> headers in any sort of principled way?
>
>> That was poorly phrased.  I'll try again: I can't see any reason for
>> using a macro here except that it allows frontend code to compile this
>> without breaking.
>
> Well, the alternative would be to put "#ifndef FRONTEND" around the
> static-inline function.  That's not very pretty either, and it's
> inconsistent with the existing precedent (ie, InitDirtySnapshot).
> Also, it presumes that a non-backend includer actually has defined
> FRONTEND; that seems to be the case for pg_xlogdump but I do not
> think we do that everywhere.

That may not be pretty, but it'd be a lot more transparent.  If I see
#ifndef FRONTEND, I think, oh, that's protecting some stuff that
shouldn't be included in front-end compiles.  If I see a macro, I not
necessarily think of the fact that this may be a way of preventing
that code from being compiled in front-end compiles.

>> Here's a patch.  Is this what you want?
>
> OK by me.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column COMMENTs in CREATE TABLE?

2016-08-05 Thread David Fetter
On Fri, Aug 05, 2016 at 10:14:21AM -0400, Peter Eisentraut wrote:
> On 7/3/16 11:41 AM, Tom Lane wrote:
> > I can see the reasoning for
> > allowing COMMENT in a table column definition, but the argument for
> > allowing it in simpler CREATEs seems tissue-thin:
> > 
> > CREATE FUNCTION foo(int) RETURNS ... ;
> > COMMENT ON FUNCTION foo(int) IS 'blah';
> > 
> > vs
> > 
> > CREATE FUNCTION foo(int) RETURNS ...
> > WITH (COMMENT 'blah');
> > 
> > Not much of a keystroke savings, nor is the comment noticeably
> > "closer" to its object than before.
> 
> I had actually been thinking about a similar proposal, but specifically
> for CREATE FUNCTION.  But the syntax would have to put it above the
> function body, not below it.  I think the CREATE FUNCTION syntax could
> actually handle that.

For what it's worth, I tend to put the function body last.  That's
just my taste, though.  Would it be hard to keep the ability to
permute the stuff after

CREATE FUNCTION (args)
RETURNS [SETOF] type

as we have it now?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test for extended query protocol

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 10:11 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think it would be an interesting project for someone to try to
>> figure out how to make 'make check-extended-query-protocol' or similar
>> work.
>
> Seems like it would not be that hard to put some kind of option in psql
> to issue queries with PQexecParams not plain PQexec.

Yes, we did that.

> However, since
> that wouldn't exercise sending out-of-line parameters, it's not clear
> to me that it'd be a very interesting test.

Well, you exercise copyfuncs.c quite a bit more, if nothing else.

> I still suspect that doing this in core is mostly going to be duplicative
> effort, and we'd be better off making use of existing JDBC tests.

That's possible; I don't know much about the JDBC tests, so it's hard
for me to say how that would compare in terms of coverage to my
proposal.  Perhaps both things are worth doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 12:25:39PM -0300, Claudio Freire wrote:
> > 2. The existence of index pointers to intermediate tuples will lead to index
> > bloat. This is the same problem that we found in Simon's original proposal
> > (discussed internally). None of the intermediate line pointers can be
> > vacuumed until the entire chain becomes DEAD. Event if the a duplicate index
> > key is inserted for every index, knowing that and reclaiming to the index
> > pointers to the original root line pointer, will be difficult.
> 
> I don't see the difference it makes for bloat between storing the root
> tid and the intermediate tid, but I haven't yet figured out how
> vacuuming would work so maybe I have to think about that to realize
> the difference.

Think of page pruning --- we can't remove a ctid that an index points
to.  The more ctids you point to in a HOT chain, the fewer ctids you can
remove --- that's why we want to point only to the head of the HOT/WARM
chain.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Claudio Freire
On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee  wrote:
>
>
> On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire 
> wrote:
>>
>> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian  wrote:
>>
>> >
>> > OK, that's a lot of text, and I am confused.  Please tell me the
>> > advantage of having an index point to a string of HOT chains, rather
>> > than a single one?  Is the advantage that the index points into the
>> > middle of the HOT chain rather than at the beginning?  I can see some
>> > value to that, but having more ctid HOT headers that can't be removed
>> > except by VACUUM seems bad, plus the need for index rechecks as you
>> > cross to the next HOT chain seems bad.
>> >
>> > The value of WARM is to avoid index bloat --- right now we traverse the
>> > HOT chain on a single page just fine with no complaints about speed so I
>> > don't see a value to optimizing that traversal, and I think the key
>> > rechecks and ctid bloat will make it all much slower.
>> >
>> > It also seems much more complicated.
>>
>> The point is avoiding duplicate rows in the output of index scans.
>>
>> I don't think you can avoid it simply by applying index condition
>> rechecks as the original proposal implies, in this case:
>>
>> CREATE TABLE t (id integer not null primary key, someid integer, dat
>> integer);
>> CREATE INDEX i1 ON t (someid);
>>
>> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
>> UPDATE t SET dat = dat + 1 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 2 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>> SELECT * FROM t WHERE someid = 2;
>>
>> That, I believe, will cause the problematic chains where the condition
>> recheck passes both times the last visible tuple is visited during the
>> scan. It will return more than one tuple when in reality there is only
>> one.
>
>
> Hmm. That seems like a real problem. The only way to address that is to
> ensure that for a given WARM chain and given index key, there must exists
> only a single index entry. There can and will be multiple entries if the
> index key changes, but if the index key changes to the original value (or
> any other value already in the WARM chain) again, we must not add another
> index entry. Now that does not seem like a very common scenario and skipping
> a duplicate index entry does have its own benefit, so may be its not a
> terrible idea to try that. You're right, it may be expensive to search for
> an existing matching index key for the same root line pointer. But if we
> could somehow short-circuit the more common case, it might still be worth
> trying.

I think it can be done. I could try to write a POC and see how it works.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are
> reachable via different index tuple. I see two major problems with that
> approach:
>
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose
> the knowledge about key changes between intermediate updates. Without that
> its seems difficult to know when to stop while following chain starting from
> the old index tuple.
>
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be
> vacuumed until the entire chain becomes DEAD. Event if the a duplicate index
> key is inserted for every index, knowing that and reclaiming to the index
> pointers to the original root line pointer, will be difficult.

I don't see the difference it makes for bloat between storing the root
tid and the intermediate tid, but I haven't yet figured out how
vacuuming would work so maybe I have to think about that to realize
the difference.

>> Not to mention the cost of scanning the chain of N tuples N times,
>> making it quadratic - bound by the number of tuples that can fit a
>> page, but still high.
>>
>> Solving that, I believe, will remove all the simplicity of pointing to
>> root tuples.
>>
>
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find
> some solution to the problem.

Will try to go down that road and see what can be optimized.

>> I don't really see how you'd do that on yours. You seem to assume
>> finding a particular key-item pointer pair in an index would be
>> efficient, but IMHO that's not the case.
>
>
> That's true. But we could somehow short-circuit the more common pattern,
> that might still be worth. For corner cases, we can fall back to non-HOT
> update and keep things simple. It 

Re: [HACKERS] improved DefElem list processing

2016-08-05 Thread Peter Eisentraut
On 8/4/16 2:21 PM, Tom Lane wrote:
> Forgot to mention: seems like you should have added a location
> argument to makeDefElem.

I was hesitating to do that lest it break extensions or something, but I
guess we break bigger things than that all the time.  I'll change it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improved DefElem list processing

2016-08-05 Thread Peter Eisentraut
On 8/4/16 2:08 PM, Tom Lane wrote:
> +1 on the general idea, but I wonder if it's a problem that you replaced
> an O(N) check with an O(N^2) check.  I don't think there are any commands
> that would be likely to have so many options that this would become a
> serious issue, but ...

I'll run some tests.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling unconvertible error messages

2016-08-05 Thread Peter Eisentraut
On 8/4/16 9:42 AM, Tom Lane wrote:
> I'm inclined to think that we should reset the message locale
> to C as soon as we've forked away from the postmaster, and leave it
> that way until we've absorbed settings from the startup packet.
> Sending messages of this sort in English isn't great, but it's better
> than sending completely-unreadable ones.  Or is that just my
> English-centricity showing?

Well, most of the time this all works, only if there are different
client and server settings you might have problems.  We wouldn't want to
partially disable the NLS feature for the normal case.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling unconvertible error messages

2016-08-05 Thread Peter Eisentraut
On 8/4/16 2:45 AM, Victor Wagner wrote:
> 4. At the session startup try to reinitializie LC_MESSAGES locale
> category with the combination
> of the server (or better client-send) language and region and
> client-supplied encoding, and if this failed, use untranslated error
> message. Obvoisly, attempt to set locale to ru_RU.ISO8859-1 would fail.
> so, if client would ask server with ru_RU.UTF-8 default locale to use
> LATIN1 encoding, server would fallback to untranslated messages.

I think this is basically my solution (1), with the same problems.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-05 Thread Andres Freund
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
> as promised here is WIP version of logical replication patch.

Yay!

I'm about to head out for a week of, desperately needed, holidays, but
after that I plan to spend a fair amount of time helping to review
etc. this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 10:57:35AM -0400, Peter Eisentraut wrote:
> On 8/2/16 12:51 PM, Bruce Momjian wrote:
> > Yes, that's a strong argument for using a space.  I have adjusted the
> > patch to use spaces in all reasonable places.  Patch attached, which I
> > have gzipped because it was 133 KB.  (Ah, see what I did there?)  :-)
> > 
> > I am thinking of leaving the 9.6 docs alone as I have already made them
> > consistent (no space) with minimal changes.  We can make it consistent
> > the other way in PG 10.
> 
> I don't think anyone wanted to *remove* the spaces in the documentation.
>  I think this change makes the documentation harder to read.

Well, we had spaces in only a few places in the docs, and as I said, it
is not consistent.  Do you want those few put back for 9.6?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Prevent "snapshot too old" from trying to return pruned TOAST tu

2016-08-05 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 4, 2016 at 10:58 PM, Robert Haas  wrote:
>> What is the reason?  We refuse to separate frontend and backend
>> headers in any sort of principled way?

> That was poorly phrased.  I'll try again: I can't see any reason for
> using a macro here except that it allows frontend code to compile this
> without breaking.

Well, the alternative would be to put "#ifndef FRONTEND" around the
static-inline function.  That's not very pretty either, and it's
inconsistent with the existing precedent (ie, InitDirtySnapshot).
Also, it presumes that a non-backend includer actually has defined
FRONTEND; that seems to be the case for pg_xlogdump but I do not
think we do that everywhere.

> Here's a patch.  Is this what you want?

OK by me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-05 Thread Peter Eisentraut
On 8/2/16 12:51 PM, Bruce Momjian wrote:
> Yes, that's a strong argument for using a space.  I have adjusted the
> patch to use spaces in all reasonable places.  Patch attached, which I
> have gzipped because it was 133 KB.  (Ah, see what I did there?)  :-)
> 
> I am thinking of leaving the 9.6 docs alone as I have already made them
> consistent (no space) with minimal changes.  We can make it consistent
> the other way in PG 10.

I don't think anyone wanted to *remove* the spaces in the documentation.
 I think this change makes the documentation harder to read.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hash index with larger hashes

2016-08-05 Thread Kenneth Marshall
Hello Developers,

I have been following the recent discussions on increasing the
size of the hash function used in Postgres and the work to
provide WAL and performance improvements for hash indexes. 
I know it was mentioned when we moved to the new hashing
functions, but the existing functions do provide an additional
32-bits of hash. We currently do not use them, but they are
already calculated.

It had occurred to me that one way to decrease the space used
to store the hash value would be to include information about
the page number to determine the actual value. For example,
a hash index of 65k pages (540mb) would get two additional
bytes of hash with no associated storage cost. Also, if you
subdivided the hash page into say 128 sub-pages you would
get the extra 2 bytes of hash in a 4mb index. In this way,
the bigger the hash index is, the more bits you get for free.

Just wanted to throw it out there.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Andres Freund
On 2016-08-05 10:33:49 -0400, Tom Lane wrote:
> Murat Tuncer  writes:
> > I recently hit a road blocker when I tried to create a truncate trigger on
> > a foreign table. trigger.c::CreateTrigger() function has explicit check to
> > block truncate trigger on foreign tables.
> 
> That's good, because we don't implement TRUNCATE on foreign tables: there
> is nothing in the FDW API that would support it.  Not much point in
> declaring a trigger for an event that can never happen.

Well, allowing BEFORE triggers to return NULL or something, preventing
the execution of the rest, would be such an implementation, and also
independently useful.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Tom Lane
Murat Tuncer  writes:
> I recently hit a road blocker when I tried to create a truncate trigger on
> a foreign table. trigger.c::CreateTrigger() function has explicit check to
> block truncate trigger on foreign tables.

That's good, because we don't implement TRUNCATE on foreign tables: there
is nothing in the FDW API that would support it.  Not much point in
declaring a trigger for an event that can never happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Tom Lane
Andrew Gierth  writes:
> Hm. I am wrong about this, since it's the fact that consumers are taking
> stanullfrac into account that makes the value wrong in the first place.

Also, the way that the value is calculated in the samples-not-all-distinct
case corresponds to the way I have it in the patch.  What you want to do
would correspond to leaving these edge cases alone and changing all the
other ANALYZE cases instead (*plus* changing the consumers).  I find that
a tad scary.

> But I think the fix is still wrong, because it changes the meaning of
> ALTER TABLE ... ALTER col SET (n_distinct=...)  in a non-useful way; it
> is no longer possible to nail down a useful negative n_distinct value if
> the null fraction of the column is variable.

I think that argument is bogus.  If we change the way that
get_variable_numdistinct (and other consumers) use the value, that will
break all existing custom settings of n_distinct, because they will no
longer mean what they did before.  There have been exactly zero field
complaints that people could not get the results they wanted, so I do
not think that's justified.

In short, what you want to do constitutes a redefinition of stadistinct,
while my patch doesn't.  That is far more invasive and I fear it will
break things that are not broken today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column COMMENTs in CREATE TABLE?

2016-08-05 Thread Peter Eisentraut
On 7/3/16 11:41 AM, Tom Lane wrote:
> I can see the reasoning for
> allowing COMMENT in a table column definition, but the argument for
> allowing it in simpler CREATEs seems tissue-thin:
> 
>   CREATE FUNCTION foo(int) RETURNS ... ;
>   COMMENT ON FUNCTION foo(int) IS 'blah';
> 
> vs
> 
>   CREATE FUNCTION foo(int) RETURNS ...
>   WITH (COMMENT 'blah');
> 
> Not much of a keystroke savings, nor is the comment noticeably
> "closer" to its object than before.

I had actually been thinking about a similar proposal, but specifically
for CREATE FUNCTION.  But the syntax would have to put it above the
function body, not below it.  I think the CREATE FUNCTION syntax could
actually handle that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test for extended query protocol

2016-08-05 Thread Tom Lane
Robert Haas  writes:
> I think it would be an interesting project for someone to try to
> figure out how to make 'make check-extended-query-protocol' or similar
> work.

Seems like it would not be that hard to put some kind of option in psql
to issue queries with PQexecParams not plain PQexec.  However, since
that wouldn't exercise sending out-of-line parameters, it's not clear
to me that it'd be a very interesting test.

I still suspect that doing this in core is mostly going to be duplicative
effort, and we'd be better off making use of existing JDBC tests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-08-05 Thread Robert Haas
On Thu, Aug 4, 2016 at 3:24 AM, Andres Freund  wrote:
> Hi,
>
> On 2016-08-02 10:55:18 -0400, Noah Misch wrote:
>> [Action required within 72 hours.  This is a generic notification.]
>>
>> The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
>> since you committed the patch believed to have created it, you own this open
>> item.
>
> Well kinda (it was a partial fix for something not originally by me),
> but I'll deal with. Reading now, will commit tomorrow.

Thanks.  I kept meaning to get to this one, and failing to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-05 Thread Robert Haas
On Tue, Aug 2, 2016 at 2:00 PM, Shay Rojansky  wrote:
>> Shay, why don't you use a profiler? Seriously.
>> I'm afraid "iterate the per-message loop in PostgresMain five times not
>> once" /"just discussing what may or may not be a problem..."  is just
>> hand-waving.
>>
>> Come on, it is not that hard.
>
> I really don't get what's problematic with posting a message on a mailing
> list about a potential performance issue, to try to get people's reactions,
> without diving into profiling right away. I'm not a PostgreSQL developer,
> have other urgent things to do and don't even spend most of my programming
> time in C.

There's absolutely nothing wrong with that.  I find your questions
helpful and interesting and I hope you will keep asking them.  I think
that they are a valuable contribution to the discussion on this list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 01:12:39PM +0200, David Rowley wrote:
> On 5 August 2016 at 12:25, Tsunakawa, Takayuki
>  wrote:
> > It seems that we could reach a consensus.  The patch is attached.  I'll add 
> > this to the next CommitFest.
> 
> 
> + The default is off on Windows
> + because the overhead is significant, and on on other platforms.

I think "on on" is odd.  I think you want "and on for other platforms."

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-05 Thread Robert Haas
On Wed, Aug 3, 2016 at 7:35 PM, Bruce Momjian  wrote:
> On Wed, Aug  3, 2016 at 10:02:39AM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>> > On Sun, Jul 31, 2016 at 05:57:12PM -0400, Tom Lane wrote:
>> >> In hindsight it seems clear that what a lot of apps want out of extended
>> >> protocol is only the ability to send parameter values out-of-line instead
>> >> of having to quote/escape them into SQL literals.  Maybe an idea for the
>> >> fabled V4 protocol update is some compromise query type that corresponds
>> >> precisely to PQexecParams's feature set: you can send parameter values
>> >> out-of-line, and you can specify text or binary results, but there's no
>> >> notion of any persistent state being created and no feedback about
>> >> parameter data types.
>>
>> > Do you want this on the TODO list?
>>
>> I didn't hear anyone say it was a silly idea, so sure.
>
> Done:
>
> Create a more efficient way to handle out-of-line parameters

FWIW, I agree with this idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-05 Thread Bruce Momjian
On Fri, Aug  5, 2016 at 09:57:19AM +0530, Pavan Deolasee wrote:
> Hmm. That seems like a real problem. The only way to address that is to ensure
> that for a given WARM chain and given index key, there must exists only a
> single index entry. There can and will be multiple entries if the index key
> changes, but if the index key changes to the original value (or any other 
> value
> already in the WARM chain) again, we must not add another index entry. Now 
> that
> does not seem like a very common scenario and skipping a duplicate index entry
> does have its own benefit, so may be its not a terrible idea to try that.
> You're right, it may be expensive to search for an existing matching index key
> for the same root line pointer. But if we could somehow short-circuit the more
> common case, it might still be worth trying.

I assumed that behavior was part of the original design;  I stated
earlier:

Also, what happens when a tuple chain goes off-page, then returns to the
original page?  That is a new HOT chain given our current code, but
would the new tuple addition realize it needs to create a new index
tuple?  The new tuple code needs to check the index's root HOT tid for a
non-match.

We have to store the _head_ of the HOT/WARM chain in the index, not only
for pruning, but so we can know if this HOT/WARM chain is already in the
index and skip the index addition.

I think modifying an index tuple is expensive, but checking an index and
avoiding an addition should be quick.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are reachable
> via different index tuple. I see two major problems with that approach:

I don't see much value of doing that, and a lot of complexity.

> 
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose the
> knowledge about key changes between intermediate updates. Without that its
> seems difficult to know when to stop while following chain starting from the
> old index tuple.
> 
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be vacuumed
> until the entire chain becomes DEAD. Event if the a duplicate index key is
> inserted for every index, knowing that and reclaiming to the index pointers to
> the original root line pointer, will be difficult.
>  
> 
> 
> Not to mention the cost of scanning the chain of N tuples N times,
> making it quadratic - bound by the number of tuples that can fit a
> page, but still high.
> 
> Solving that, I believe, will remove all the simplicity of pointing to
> root tuples.
> 
> 
> 
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find some
> solution to the problem.

Why is that hard?  It seems simple to me as we just try to do the index
insert, and skip it an entry for our key and ctid already exists.

> I don't really see how you'd do that on yours. You seem to assume
> finding a particular key-item pointer pair in an index would be
> efficient, but IMHO that's not the case.
> 
> 
> That's true. But we could somehow short-circuit the more common pattern, that
> might still be worth. For corner cases, we can fall back to non-HOT update and
> keep things simple. It will still be a huge improvement over what we have
> currently.

I don't see why it is hard.  Trying to find the index entry for the old
update row seems odd, and updating an index row seems odd, but skipping
an index addition for the new row seems simple.  Is the problem
concurrent activity?  I assumed already had that ability to add to HOT
chains because we have to lock the update chain.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

05.08.2016 16:30, Kevin Grittner:

On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
 wrote:


They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.


Good point. I сonfused letters for views (virtual relations) and
materialized views (primary storage).
Should be:

"primary storage" - r, S, t, m. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, v. They have no physical storage, only entries
in caches and catalogs.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache Hash Index meta page.

2016-08-05 Thread Amit Kapila
On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane  wrote:
> Jeff Janes  writes:
>> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy  
>> wrote:
>>> I have created a patch to cache the meta page of Hash index in
>>> backend-private memory. This is to save reading the meta page buffer every
>>> time when we want to find the bucket page. In “_hash_first” call, we try to
>>> read meta page buffer twice just to make sure bucket is not split after we
>>> found bucket page. With this patch meta page buffer read is not done, if the
>>> bucket is not split after caching the meta page.
>
> Is this really safe?  The metapage caching in btree is all right because
> the algorithm is guaranteed to work even if it starts with a stale idea of
> where the root page is.  I do not think the hash code is equally robust
> about stale data in its metapage.
>

I think stale data in metapage could only cause problem if it leads to
a wrong calculation of bucket based on hashkey.  I think that
shouldn't happen.  It seems to me that the safety comes from the fact
that required fields (lowmask/highmask) to calculate the bucket won't
be changed more than once without splitting the current bucket (which
we are going to scan).   Do you see a problem in hashkey to bucket
mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by
one additional table half or do you have something else in mind?

>
>> What happens on a system which has gone through pg_upgrade?
>
> That being one reason why.  It might be okay if we add another hasho_flag
> bit saying that hasho_prevblkno really contains a maxbucket number, and
> then add tests for that bit everyplace that hasho_prevblkno is referenced.
>

Good idea.

- if (retry)
+ if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)

This code seems to be problematic with respect to upgrades, because
hasho_prevblkno will be initialized to 0x without the patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Kevin Grittner
On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
 wrote:

> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test for extended query protocol

2016-08-05 Thread Robert Haas
On Tue, Aug 2, 2016 at 10:33 PM, Tatsuo Ishii  wrote:
> In my understanding, we don't have any regression test for protocol
> level prepared query (we do have SQL level prepared query tests,
> though). Shouldn't we add those tests to the regression test suites?

A few years ago, EDB had a bug that only manifested itself when using
the extended query protocol.  We hacked together something that could
run our entire regression test suite (roughly equivalent to 'make
check', but an order of magnitude larger) under the extended query
protocol and found a few more bugs of the same general flavor.  There
were a few things that fell over that, I think, were not bugs, but it
mostly worked.

I think it would be an interesting project for someone to try to
figure out how to make 'make check-extended-query-protocol' or similar
work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-05 Thread Robert Haas
On Tue, Jul 26, 2016 at 11:20 PM, Etsuro Fujita
 wrote:
> I noticed that currently the core doesn't show any information on the target
> relations involved in a foreign/custom join in EXPLAIN, by itself.

I think that's a feature, not a bug.

> postgres_fdw shows the target relations in the Relations line, as shown
> above, but I think that the core should show such information independently
> of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on
> public.ft1 t1, public.ft2 t2".

I disagree with that.  Currently, when we say that something is a join
(Merge Join, Hash Join, Nested Loop) we mean that the executor is
performing a join, but that's not the case here.  The executor is
performing a scan.  The remote side, we suppose, is performing a join
for us, but we are not performing a join: we are performing a scan.
So I think the fact that it shows up in the plan as "Foreign Scan" is
exactly right.  We are scanning some foreign thing, and that thing may
internally be doing other stuff, like joins and aggregates, but all
we're doing is scanning it.

Also, I don't really see the point of moving this from postgres_fdw to
core.  If, at some point in time, there are many FDWs that implement
sophisticated pushdown operations and we figure out that they are all
duplicating the code to do the EXPLAIN printout, and they're all
printing basically the same thing, perhaps not in an entirely
consistent way, then we could try to unify all of them into one
implementation in core.  But that's certainly not where we are right
now.  I don't see any harm at all in leaving this under the control of
the FDW, and in fact, I think it's better.  Neither the postgres_fdw
format nor what you want to replace it with are so unambiguously
awesome that some other FDW author might not come up with something
better.

I think we should leave this the way it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-08-05 Thread Ashutosh Bapat
>
> >
> > Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a
> > list partitioned tables. I am suggesting that we arrange them as
> > ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row
> (either
> > for comparison or for inserting) has value 'c', we will search for it in
> > ('a','b','l') but will be eliminate other two partitions since the
> second's
> > partition's lowest value is higher than 'c' and lowest values of rest of
> the
> > partitions are higher than that of the second partition. Without this
> order
> > among the partitions, we will have to compare lowest values of all the
> > partitions.
>
> I would think that for that case what we'd want to do is create two
> lists.  One looks like this:
>
> [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ]
>
> The other looks like this:
>
> [3, 3, 2, 1, 1, 2, 1, 1, 3, 2]
>

small correction; there's an extra 1 here. Every partition in the example
has only three values.


>
> Given a particular value, bsearch the first list.  If the value is not
> found, it's not in any partition.  If it is found, then go look up the
> same array index in the second list; that's the containing partition.
>

+1, if we could do it. It will need a change in the way Amit's patch stores
partitioning scheme in PartitionDesc.

This way specifications {('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b',
'a')} and {('h', 'd','m') , ('e', 'i', 'f'), and ('l', 'b', 'a')} which are
essentially same, are represented in different ways. It makes matching
partitions for partition-wise join a bit tedius. We have to make sure that
the first array matches for both the joining relations and then make sure
that all the values belonging to the same partition for one table also
belong to the same partition in the other table. Some more complex logic
for matching subsets of lists for partition-wise join.

At least for straight forward partitioned table matching it helps to have
both these array look same independent of the user specification. From that
point of view, the partition be ordered by their lowest or highest list
values and the second array is the index in the ordered set. For both the
specifications above, the list will look like

[ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ]
[1, 1, 2, 3, 3, 2, 3, 1, 2]
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] handling unconvertible error messages

2016-08-05 Thread Victor Wagner
On Thu, 04 Aug 2016 14:25:52 -0400
Tom Lane  wrote:

> Victor Wagner  writes:

> > Really, if this response is sent after backend has been forked,
> > problem probably can be easily fixed better way - StartupMessage
> > contain information about desired client encoding, so this
> > information just should be processed earlier than any other
> > information from this message, which can cause errors (such as
> > database name).  
> 
> I think that's wishful thinking.  There will *always* be errors that
> come out before we can examine the contents of the startup message.
> Moreover, until we've done authentication, we should be very wary of
> applying client-specified settings at all: they might be malicious.

I think that this case can be an exception from the rule "don't apply
settings from the untrusted source".

Let's consider possible threat model:

1. We anyway parse StartupMessage before authentication. There is
nothing we can do with it, so parser should be robust enough, to handle
untrusted input. As I can see from the quick glance, it is.

2. When encoding name is parsed, it is used to search in the array of
supported encoding. No possible attack here - either it is valid or not.

3. As far as I know, we don't allow client to change language, only
encoding, so it is not even possible that attacker could make messages
in the log unreadable for the system administartor.

So, if we would fix the problem, reported by Peter Eisentraut at the
begining of this thread, and fall back to untranslated messages
whenever client-requested encoding is unable to represent messages in
the server default language, this solution,  would be not worse than
your solution. 

There would be fallback to C locale in any case of doubt, but in the
case when NLS messages can be made readable, they would be readable.


Really, there is at least one case, when fallback to C locale should be
done unconditionally - a CancelRequest. In this case client cannot send
an encoding, so C locale should be used.

As far as I understand it is not the case with SSLRequest. Although it
doesn't contain encoding information as well as CancelRequest, errors
in subsequent SSL negotiations would be reported by client-side SSL
libraries, not by server.
-- 




> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 8:10 AM, Ashutosh Bapat
 wrote:
> On Fri, Aug 5, 2016 at 5:21 PM, Robert Haas  wrote:
>> On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapat
>>  wrote:
>> > The lists for list partitioned tables are stored as they are specified
>> > by
>> > the user. While searching for a partition to route tuple to, we compare
>> > it
>> > with every list value of every partition. We might do something better
>> > similar to what's been done to range partitions. The list of values for
>> > a
>> > given partition can be stored in ascending/descending sorted order. Thus
>> > a
>> > binary search can be used to check whether given row's partition key
>> > column
>> > has same value as one in the list. The partitions can then be stored in
>> > the
>> > ascending/descending order of the least/greatest values of corresponding
>> > partitions.
>>
>> +1.  Here as with range partitions, we must be sure to know which
>> opclass should be used for the comparisons.
>>
>> > We might be able to eliminate search in a given partition if its
>> > lowest value is higher than the given value or its higher value is lower
>> > than the given value.
>>
>> I don't think I understand this part.
>
> Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a
> list partitioned tables. I am suggesting that we arrange them as
> ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row (either
> for comparison or for inserting) has value 'c', we will search for it in
> ('a','b','l') but will be eliminate other two partitions since the second's
> partition's lowest value is higher than 'c' and lowest values of rest of the
> partitions are higher than that of the second partition. Without this order
> among the partitions, we will have to compare lowest values of all the
> partitions.

I would think that for that case what we'd want to do is create two
lists.  One looks like this:

[ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ]

The other looks like this:

[3, 3, 2, 1, 1, 2, 1, 1, 3, 2]

Given a particular value, bsearch the first list.  If the value is not
found, it's not in any partition.  If it is found, then go look up the
same array index in the second list; that's the containing partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-08-05 Thread Ashutosh Bapat
On Fri, Aug 5, 2016 at 5:21 PM, Robert Haas  wrote:

> On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapat
>  wrote:
> > The lists for list partitioned tables are stored as they are specified by
> > the user. While searching for a partition to route tuple to, we compare
> it
> > with every list value of every partition. We might do something better
> > similar to what's been done to range partitions. The list of values for a
> > given partition can be stored in ascending/descending sorted order. Thus
> a
> > binary search can be used to check whether given row's partition key
> column
> > has same value as one in the list. The partitions can then be stored in
> the
> > ascending/descending order of the least/greatest values of corresponding
> > partitions.
>
> +1.  Here as with range partitions, we must be sure to know which
> opclass should be used for the comparisons.
>
> > We might be able to eliminate search in a given partition if its
> > lowest value is higher than the given value or its higher value is lower
> > than the given value.
>
> I don't think I understand this part.
>

Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a
list partitioned tables. I am suggesting that we arrange them as
('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row
(either for comparison or for inserting) has value 'c', we will search for
it in ('a','b','l') but will be eliminate other two partitions since the
second's partition's lowest value is higher than 'c' and lowest values of
rest of the partitions are higher than that of the second partition.
Without this order among the partitions, we will have to compare lowest
values of all the partitions.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning

2016-08-05 Thread Robert Haas
On Fri, Aug 5, 2016 at 6:53 AM, Ashutosh Bapat
 wrote:
> The lists for list partitioned tables are stored as they are specified by
> the user. While searching for a partition to route tuple to, we compare it
> with every list value of every partition. We might do something better
> similar to what's been done to range partitions. The list of values for a
> given partition can be stored in ascending/descending sorted order. Thus a
> binary search can be used to check whether given row's partition key column
> has same value as one in the list. The partitions can then be stored in the
> ascending/descending order of the least/greatest values of corresponding
> partitions.

+1.  Here as with range partitions, we must be sure to know which
opclass should be used for the comparisons.

> We might be able to eliminate search in a given partition if its
> lowest value is higher than the given value or its higher value is lower
> than the given value.

I don't think I understand this part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-08-05 Thread Amit Kapila
On Thu, Aug 4, 2016 at 8:02 PM, Mithun Cy  wrote:
> I did some basic testing of same. In that I found one issue with cursor.
>

Thanks for the testing.  The reason for failure was that the patch
didn't take into account the fact that for scrolling cursors, scan can
reacquire the lock and pin on bucket buffer multiple times.   I have
fixed it such that we release the pin on bucket buffers after we scan
the last overflow page in bucket. Attached patch fixes the issue for
me, let me know if you still see the issue.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


concurrent_hash_index_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-05 Thread David Rowley
On 5 August 2016 at 12:25, Tsunakawa, Takayuki
 wrote:
> It seems that we could reach a consensus.  The patch is attached.  I'll add 
> this to the next CommitFest.


+ The default is off on Windows
+ because the overhead is significant, and on on other platforms.

"than on other platforms"

But perhaps it's better written like:

+ This value defaults to "off" on Windows platforms due to the
platform's significant overhead for updating the process title.


+1 for this patch from me. I'd hate to think someone would quickly
dismiss Postgres on windows due to some low TPS caused by updating the
process title.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-08-05 Thread Ashutosh Bapat
The lists for list partitioned tables are stored as they are specified by
the user. While searching for a partition to route tuple to, we compare it
with every list value of every partition. We might do something better
similar to what's been done to range partitions. The list of values for a
given partition can be stored in ascending/descending sorted order. Thus a
binary search can be used to check whether given row's partition key column
has same value as one in the list. The partitions can then be stored in the
ascending/descending order of the least/greatest values of corresponding
partitions. We might be able to eliminate search in a given partition if
its lowest value is higher than the given value or its higher value is
lower than the given value.

On Thu, Jul 21, 2016 at 10:10 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2016/07/19 22:53, Ashutosh Bapat wrote:
> > I am seeing following warning with this set of patches.
> > gram.y:4734:24: warning: assignment from incompatible pointer type
> [enabled
> > by default]
>
>
> Thanks, will fix.  Was a copy-paste error.
>
> Thanks,
> Amit
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Andrew" == Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:

 Tom> What I did in the patch is to scale the formerly fixed "-1.0"
 Tom> stadistinct estimate to discount the fraction of nulls we found.

 Andrew> This seems quite dubious to me. stadistinct representing only
 Andrew> the non-null values seems to me to be substantially more useful
 Andrew> and less confusing; it should be up to consumers to take
 Andrew> stanullfrac into account (in general they already do) since in
 Andrew> many cases we explicitly do _not_ want to count nulls.

Hm. I am wrong about this, since it's the fact that consumers are taking
stanullfrac into account that makes the value wrong in the first place.
For example, if a million-row table has stanullfrac=0.9 and
stadistinct=-1, then get_variable_numdistinct is returning 1 million,
and (for example) var_eq_non_const divides 0.1 by that to give a
selectivity of 1 in 10 million, which is obviously wrong.

But I think the fix is still wrong, because it changes the meaning of
ALTER TABLE ... ALTER col SET (n_distinct=...)  in a non-useful way; it
is no longer possible to nail down a useful negative n_distinct value if
the null fraction of the column is variable. Would it not make more
sense to do the adjustment in get_variable_numdistinct, instead?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 Roadmaps

2016-08-05 Thread Simon Riggs
On 5 August 2016 at 04:40, Etsuro Fujita  wrote:
> On 2016/08/02 23:54, Simon Riggs wrote:
>>
>> https://wiki.postgresql.org/wiki/PostgreSQL10_Roadmap
>
>
> Thank you for creating the wiki page!
>
> I added a link to the NTT roadmap page on the links page.  We hope that we
> can collaborate on the projects with people working at other companies or
> with individual contributors in the same development areas.

Very good, thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-05 Thread Tsunakawa, Takayuki
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Yeah, I think I agree.  It would be bad to disable it by default on Unix,
> because ps(1) is a very standard tool there, but the same argument doesn't
> hold for Windows.

It seems that we could reach a consensus.  The patch is attached.  I'll add 
this to the next CommitFest.

> Another route to a solution would be to find a cheaper way to update the
> process title on Windows ... has anyone looked for alternatives?

I couldn't find an alternative solution after asking some Windows support staff.

Regards
Takayuki Tsunakawa




update_process_title_off_on_win.patch
Description: update_process_title_off_on_win.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andreas Joseph Krogh
På fredag 05. august 2016 kl. 01:01:06, skrev Tom Lane >:
I wrote:
 > Looking around, there are a couple of places outside commands/analyze.c
 > that are making the same mistake, so this patch isn't complete, but it
 > illustrates what needs to be done.

 Here's a more complete patch.

 regards, tom lane
 
(Sorry for HTML-email, but we live in 2016, and I think it improves 
readability)
 
Great!
 
I have tested it (with PG-9.6 from HEAD, 
e7caacf733f3ee77a555aa715ab6fbf4434e6b52) and it sure looks like it fixes the 
problem for my query.
 
The query:
 
explain analyze SELECT  log.relation_id as company_id , sum(log.duration) AS 
durationFROM onp_crm_activity_log log  JOIN onp_crm_person logfor ON 
logfor.onp_user_id =log.logged_for AND logfor.is_resource = FALSE WHERE 1 = 1 
-- Filter out already invoiced AND NOT EXISTS( SELECT * FROM 
onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON 
cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id = 
inv.entity_idWHERE cem.calendar_entry_id = log.id AND NOT EXISTS( SELECT * FROM 
onp_crm_invoice creditnoteWHERE il.invoice_id = creditnote.credit_against ) ) 
GROUP BY log.relation_id ; 
 
Gave the following explain-plan (before the patch), with enable_nestloop=off 
(needed to prevent nest-loop-anti-join which caused this query to execute in 
~26 min.):
 
QUERY PLAN 
---
HashAggregate (cost=25201.62..25220.06 rows=1475 width=36) (actual time
=381.191..381.281rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join 
(cost=4782.34..25148.28 rows=10667 width=12) (actual time=103.683..361.409 rows=
148330loops=1) Hash Cond: (log.id = cem.calendar_entry_id) -> Hash Join (cost
=81.46..20312.73rows=10667 width=20) (actual time=0.100..156.432 rows=380617 
loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Seq Scan on 
onp_crm_activity_loglog (cost=0.00..18696.71 rows=380771 width=24) (actual time
=0.005..49.195rows=380774 loops=1) -> Hash (cost=25.02..25.02 rows=4515 width=8
) (actualtime=0.078..0.078 rows=119 loops=1) Buckets: 8192 Batches: 1 Memory 
Usage: 69kB -> Index Scan using onp_crm_person_onp_id_idx on onp_crm_person 
logfor (cost=0.14..25.02 rows=4515 width=8) (actual time=0.005..0.063 rows=119 
loops=1) Filter: (NOT is_resource) Rows Removed by Filter: 8 -> Hash (cost
=4700.87..4700.87rows=1 width=4) (actual time=103.575..103.575 rows=232412 
loops=1) Buckets: 262144 (originally 1024) Batches: 1 (originally 1) Memory 
Usage: 10219kB -> Hash Join (cost=474.41..4700.87 rows=1 width=4) (actual time
=9.724..76.015rows=232412 loops=1) Hash Cond: (cem.invoice_line_id = il.id) -> 
Seq Scanon onp_crm_calendarentry_invoice_membership cem (cost=0.00..3354.28 rows
=232528 width=8) (actual time=0.004..17.626 rows=232528 loops=1) -> Hash (cost
=474.40..474.40rows=1 width=4) (actual time=9.710..9.710 rows=11535 loops=1) 
Buckets:16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 534kB 
-> MergeJoin (cost=415.33..474.40 rows=1 width=4) (actual time=4.149..8.467 rows
=11535 loops=1) Merge Cond: (il.invoice_id = inv.entity_id) -> Sort (cost
=415.05..415.06rows=1 width=8) (actual time=4.138..4.701 rows=11535 loops=1) 
SortKey: il.invoice_id Sort Method: quicksort Memory: 925kB -> Hash Anti Join (
cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1)
HashCond: (il.invoice_id = creditnote.credit_against) -> Seq Scan on 
onp_crm_invoice_line il (cost=0.00..294.30 rows=11630 width=8) (actual time
=0.003..0.995rows=11630 loops=1) -> Hash (cost=50.38..50.38 rows=2140 width=4) 
(actualtime=0.247..0.247 rows=37 loops=1) Buckets: 4096 Batches: 1 Memory Usage:
34kB -> Index Only Scan using origo_invoice_credit_against_idx on 
onp_crm_invoice creditnote (cost=0.28..50.38 rows=2140 width=4) (actual time
=0.013..0.182rows=2140 loops=1) Heap Fetches: 0 -> Index Only Scan using 
origo_invoice_id_status_sent_idxon onp_crm_invoice inv (cost=0.28..53.98 rows=
2140width=8) (actual time=0.008..1.274 rows=11576 loops=1) Heap Fetches: 0 
Planningtime: 0.955 ms Execution time: 381.884 ms (35 rows) 
 
AFAIU, this is the problematic estimate: -> Hash Anti Join (cost=77.13..415.04 
rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1)
Right?
 
now (after the patch, and without needing to disable nest_loop) gives the 
following explain-plan:
QUERY PLAN 

HashAggregate (cost=44409.89..44428.47 rows=1486 width=36) (actual time
=366.502..366.594rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join 
(cost=10157.30..43650.11 

[HACKERS] truncate trigger for foreign data wrappers

2016-08-05 Thread Murat Tuncer
Hello

I recently hit a road blocker when I tried to create a truncate trigger on
a foreign table. trigger.c::CreateTrigger() function has explicit check to
block truncate trigger on foreign tables.

However, trigger.c::ExecuteTruncate() does not seem to have any problems
issuing before/after triggers to fdws.

Just tapping on the utility hook to catch truncate statement did not work
when the fdw is inside inheritance hierarchy.

Is there a way to enable truncate triggers for foreign tables in the long
run ? Or be able to detect somehow my fdw table is getting a truncate
request ?

thanks

-- 

*Murat Tuncer*Software Engineer | Citus Data
mtun...@citusdata.com


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Thom Brown
On 5 August 2016 at 08:54, Anastasia Lubennikova
 wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.
>
> There is a number of problems I see in the existing code:
> Problem 1. Heap != Relation.
>
> File heapam.c has a note:
>  *  This file contains the heap_ routines which implement
>  *  the POSTGRES heap access method used for all POSTGRES
>  *  relations.
> This statement is wrong, since we also have index relations,
> that are implemented using other access methods.
>
> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.
>
> And so on, and so on.
>
> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...
>
> Problem 2.
> Some functions are shared between heap and indexes access methods.
> Maybe someday it used to be handy, but now, I think, it's time to separate
> them, because IndexTuples and HeapTuples have really little in common.
>
> Problem 3. The word "heap" is used in many different contexts.
> Heap - a storage where we have tuples and visibility information.
> Generally speaking, it can be implemented over any data structure,
> not just a plain table as it is done now.
>
> Heap - an access method, that provides a set of routines for plain table
> storage, such as insert, delete, update, gettuple, vacuum and so on.
> We use this access method not only for user's tables.
>
> There are many types of relations (pg_class.h):
> #define  RELKIND_RELATION'r'/* ordinary table */
> #define  RELKIND_INDEX   'i'/* secondary index */
> #define  RELKIND_SEQUENCE'S'/* sequence object */
> #define  RELKIND_TOASTVALUE  't'/* for out-of-line
> values */
> #define  RELKIND_VIEW'v'/* view */
> #define  RELKIND_COMPOSITE_TYPE  'c'/* composite type */
> #define  RELKIND_FOREIGN_TABLE   'f'/* foreign table */
> #define  RELKIND_MATVIEW 'm'/* materialized view */
>
> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.
>
> Now, for some reasons, we sometimes use name "heap" for both
> "primary storage" and "virual relations". See src/backend/catalog/heap.c.
> But some functions work only with "primary storage". See pgstat_relation().
> And we constantly have to check relkind everywhere.
> I'd like it would be clear from the code interface and naming.
>
> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.
>
> Patch 1:
> There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
> from the given page. It is used for both heap and index tuples.
> It doesn't seems a problem, until you are going to find anything in this
> code.
>
> The first patch "PageGetItem_refactoring.patch" is intended to fix it.
> It is just renaming:
> (IndexTuple) PageGetItem ---> PageGetItemIndex
> (HeapTupleHeader) PageGetItem ---> PageGetItemHeap
> Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple,
> SpGistDeadTuple)
> still use PageGetItem.
>
> I also changed functions that do not access tuple's data, but only need
> HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.
>
> I do not insist on these particular names, by the way.
>
> Patch 2.
> heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
> and outdated comments. The patch is intended to improve it.
> Fun fact: I found several "check it later" comments unchanged since
>  "Postgres95 1.01 Distribution - Virgin Sources" commit.
>
> I wonder if we can wind better name relation_drop_with_catalog() since,
> again, it's
> not about all kinds of relations. We use another function to drop index
> relations.
>
> I hope these changes will be useful for the future development.
> Suggested patches are mostly about renaming and rearrangement of functions
> between files, so, if nobody has conceptual objections, all I need from
> reviewers
> is an attentive look to find typos, grammar mistakes and overlooked areas.

Could you add this to the commitfest?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> What I did in the patch is to scale the formerly fixed "-1.0"
 Tom> stadistinct estimate to discount the fraction of nulls we found.

This seems quite dubious to me. stadistinct representing only the
non-null values seems to me to be substantially more useful and less
confusing; it should be up to consumers to take stanullfrac into account
(in general they already do) since in many cases we explicitly do _not_
want to count nulls.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-05 Thread Etsuro Fujita

On 2016/08/02 21:35, Etsuro Fujita wrote:

I removed the Relations line.  Here is an updated version of the patch.


I revised code and comments a bit.  Attached is an updated version of  
the patch.


Best regards,
Etsuro Fujita


explain-for-foreign-join-pushdown-v2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-05 Thread Kyotaro HORIGUCHI
Hello,

While the exact cause of the situation is not known yet but we
have apparently forgot that pg_stop_backup can be called
simultaneously with pg_start_backup. It seems anyway to me that
pg_stop_backup should be called before the end of pg_start_backup
from their definition and what they do. And it should fix the
assertion failure.

On solution is exclusiveBackupStarted (I'd like to rename the
variable) on shared memory as the patch does.

Another place where we can have something with the same effect is
file system. We can create 'backup_label.tmp" at very early in
pg_start_backup and rename it to backup_label at the end of the
function. Anyway exclusive pg_stop_backup needs that. A defect of
that would be the remaining backup_label.tmp file after crash
during pg_start_backup. Renaming tmp to (none) is atomic enough
for this usage but I'm not sure if it's in a network file system.
exclusiveBackup is also used this kind of exclusion, this also is
replasable with the .tmp file.

But after some thoughts, it found to need to add a bunch of error
handling code for the file operations and it should become too
complex. So I abandoned it.


At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquier  
wrote in 
> On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
>  wrote:
> > Andreas, with the patch attached is the assertion still triggered?
> >
> > Thoughts from others are welcome as well.
> 
> Self review:
>   * of streaming base backups currently in progress. forcePageWrites is 
> set
>   * to true when either of these is non-zero. lastBackupStart is the 
> latest
>   * checkpoint redo location used as a starting point for an online 
> backup.
> + * exclusiveBackupStarted is true while pg_start_backup() is being called
> + * during an exclusive backup.
>   */
>  boolexclusiveBackup;
>  intnonExclusiveBackups;
>  XLogRecPtrlastBackupStart;
> +boolexclusiveBackupStarted;
> It would be better to just use one variable that uses an enum to track
> the following states of an exclusive backup: none, starting and
> in-progress.

What I thought when I looked the first patch is that. Addition to
that I don't see the name exclusiveBackupStated is
appropriate.

One more argument is the necessity of the WALInsertLock at the
end of pg_start_backup. No other backend cannot reach there
concurrently and possible latency from cache coherency won't be a
matter for the purpose, I suppose. What do you think it is needed
for?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >