from:"Alexander Korotkov"

Re: [HACKERS] PinBuffer() no longer makes use of strategy

2017-02-04 Thread Alexander Korotkov

On Sat, Feb 4, 2017 at 4:33 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2017-02-03 19:13:45 -0600, Jim Nasby wrote:
> > No, I noticed it while reading code. Removing that does mean that if any
> > non-default strategy (in any backend) hits that buffer again then the
> buffer
> > will almost certainly migrate into the main buffer pool the next time
> one of
> > the rings hits that buffer
>
> Well, as long as the buffer is used from the ring, BufferAlloc() -
> BufferAlloc() will reset the usagecount when rechristening the
> buffer. So unless anything happens inbetween the buffer being remapped
> last and remapped next, it'll be reused. Right?
>
> The only case where I can see the old logic mattering positively is for
> synchronized seqscans.  For pretty much else that logic seems worse,
> because it essentially prevents any buffers ever staying in s_b when
> only ringbuffer accesses are performed.
>
> I'm tempted to put the old logic back, but more because this likely was
> unintentional, not because I think it's clearly better.
>

+1
Yes, it was unintentional change.  So we should put old logic back unless
we have proof that this change make it better.
Patch is attached.  I tried to make some comments, but probably they are
not enough.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


put-buffer-usagecount–logic–back.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Alexander Korotkov

On Fri, Feb 3, 2017 at 8:01 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Unfortunately, I have no big enough Power machine at hand to reproduce
> that results.  Actually, I have no Power machine at hand at all.  So,
> lwlock-power-2.patch was written "blindly".  I would very appreciate if
> someone would help me with testing and benchmarking.
>

UPD: It appears that Postgres Pro have access to big Power machine now.
So, I can do testing/benchmarking myself.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Alexander Korotkov

Hi everybody!

During FOSDEM/PGDay 2017 developer meeting I said that I have some special
assembly optimization for multicore Power machines.  From the answers of
other hackers I realized following.

   1. There are some big Power machines with PostgreSQL in production use.
   Not as many as Intel, but some of them.
   2. Community could be interested in special assembly optimization for
   Power machines despite cost of maintaining it.

Power processors use specific implementation of atomic operations.  This
implementation is some kind of optimistic locking. 'lwarx' instruction
'reserves index', but that reservation could be broken on 'stwcx', and then
we have to retry.  So, for instance CAS operation on Power processor is a
loop.  So, loop of CAS operations is two level nested loop.  Benchmarks
showed that it becomes real problem for LWLockAttemptLock().  However, one
actually can put arbitrary logic between 'lwarx' and 'stwcx' and make it a
single loop.  The downside is that this logic has to be implemented in
assembly.  See [1] for experiment details.

Results in [1] have a lot of junk which isn't relevant anymore.  This is
why I draw a separate graph.

power8-lwlock-asm-ro.png – results of read-only pgbench test on IBM E880
which have 32 physical cores and 256 virtual thread via SMT.  The curves
have following meaning.
 * 9.5: unpatched PostgreSQL 9.5
 * pinunpin-cas: PostgreSQL 9.5 + earlier version of 48354581
 * pinunpin-lwlock-asm: PostgreSQL 9.5 + earlier version of 48354581 +
LWLock implementation in assembly.

lwlock-power-1.patch – is the patch for assembly implementation of LWLock
which I used that time rebased to current master.

Using assembly in lwlock.c looks rough.  This is why I refactored it by
introducing new atomic operation pg_atomic_fetch_mask_add_u32 (see
lwlock-power-2.patch).  It checks that all masked bits are clear and then
adds to variable.  This atomic have special assembly implementation for
Power, and generic implementation for other platforms with loop of CAS.
Probably we would have other implementations for other architectures in
future.  This level of abstraction is the best I managed to invent.

Unfortunately, I have no big enough Power machine at hand to reproduce that
results.  Actually, I have no Power machine at hand at all.  So,
lwlock-power-2.patch was written "blindly".  I would very appreciate if
someone would help me with testing and benchmarking.

1. https://www.postgresql.org/message-id/CAPpHfdsogj38HTDhNMLE56uJy9N8-
=gya2nnuwbpujgp2n1...@mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


lwlock-power-1.patch
Description: Binary data


lwlock-power-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-01-30 Thread Alexander Korotkov

On Sat, Aug 20, 2016 at 9:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Robert Haas <robertmh...@gmail.com> writes:
> > Wow, nice results.  My intuition on why PGXACT helped in the first place
> was that it minimized the number of cache lines that had to be touched to
> take a snapshot. Padding obviously would somewhat increase that again, so I
> can't quite understand why it seems to be helping... any idea?
>
> That's an interesting point.  I wonder whether this whole thing will be
> useless or even counterproductive after (if) Heikki's CSN-snapshot patch
> gets in.  I would certainly not mind reverting the PGXACT/PGPROC
> separation if it proves no longer helpful after that.
>

Assuming, we wouldn't realistically have CSN-snapshot patch committed to
10, I think we should consider PGXACT cache line alignment patch for 10.
Revision on PGXACT align patch is attached.  Now it doesn't introduce new
data structure for alignment, but uses manually calculated padding.  I
added static assertion that PGXACT is exactly PG_CACHE_LINE_SIZE because it
still have plenty of room for new fields before PG_CACHE_LINE_SIZE would be
exceeded.
Readonly pgbench results on 72 physically cores Intel server are attached.
It quite similar to results I posted before, but it's curious that
performance degradation of master on high concurrency became larger.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgxact-align-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC 2017

2017-01-10 Thread Alexander Korotkov

Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my
understanding the reasons for that are following.

1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been
changed.  In particular, it required more detailed information about
possible project.

As result we didn't manage to make a good enough application that time.
Thus, our application was declined. See [1] and [2] for details.

I think that the right way to manage this in 2017 would be to start
collecting required information in advance.  According to GSoC 2017
timeline [3] mentoring organization can submit their applications from
January 19 to February 9.  Thus, now it's a good time to start collecting
project ideas and make call for mentors.  Also, we need to decide who would
be our admin this year.

In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin if
needed.

[1]
https://www.postgresql.org/message-id/flat/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com
[3] https://developers.google.com/open-source/gsoc/timeline

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Alexander Korotkov

On Thu, Dec 8, 2016 at 7:29 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Dec 8, 2016 at 11:13 AM, Dmitry Ivanov <d.iva...@postgrespro.ru>
> wrote:
> > We (PostgresPro) have been working on pg_pathman for quite a while, and
> > since it's obviously going to become the thing of the past, it would be a
> > wasted effort if we didn't try to participate.
> >
> > For starters, I'd love to work on both plan-time & run-time partition
> > pruning. I created a custom node for run-time partition elimination, so I
> > think I'm capable of developing something similar.

That would be fantastic.  I and my colleagues at EnterpriseDB can
> surely help review;

Great! And it is very cool that we have basic infrastructure already
committed.  Thanks a lot to you and everybody involved.

> of course, maybe you and some of your colleagues
> would like to help review our patches, too.

We understand our reviewing performance is not sufficient.  Will try to do
better during next commitfest.

> Do you think this is
> likely to be something where you can get something done quickly, with
> the hope of getting it into v10?

Yes, because we have set of features already implemented in pg_pathman.  In
particular we have following features from your list and some more.

- more efficient plan-time partition pruning (constraint exclusion is too
slow)
- run-time partition pruning
- insert (and eventually update) tuple routing for foreign partitions
- hash partitioning
- not scanning the parent

Time is growing short, but it would
> be great to polish this a little more before we ship it.
>

Yes. Getting at least some of this features committed to v10 would be great
and improve partitioning usability a lot.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] PgConf.Russia 2017 Call for Papers

2016-12-08 Thread Alexander Korotkov

Hi!

We are now accepting talk proposals for PgConf.Russia 2017.

The Russian PostgreSQL conference will take place on 15-17 March 2017 in
Moscow, at the Digital October venue, same as for PgConf.Russia 2015. The
audience include a wide range of application and system developers,
database architects and administrators.

 * 15 (Wed) tutorials
 * 16 & 17 (Thu-Fri) talks - the main part of the conference

Please, note that PgConf.Russian 2017 takes place in MARCH.

See https://pgconf.ru/en

The major topics for the talks are:

 * PostgreSQL scalability, performance and security.
 * PostgreSQL core development. Internals. Current and future projects.
 * Live experience experience of using PostgreSQL in industrial grade
systems. Integration, migration, application development.
 * Cluster. Scalable and fail-safe PostgreSQL-based systems.
 * PostgreSQL ecosystem. Related and supplementary software.
 * PostgreSQL community in Russia and abroad.

If you will to report a practical case, or would like to make on online
demo, or have proposals expanding the named topics, please state this in
your talk proposal. We will be glad to make the conference programm more
interesting.

The typical duration of a talk is 45 minutes, half-talks for 22 minutes are
also possible.

The application deadline is December, 15 2016. The preliminary program will
be published by December, 20.

Talk proposal submission form is available at
https://pgconf.ru/en/2017/register_speaker

The confirmed speakers will get free admission to the conference (all
days), tickets to Moscow and accommodation booked and paid by the
organizers.

We will be glad to see you in Moscow!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] UNDO and in-place update

2016-12-02 Thread Alexander Korotkov

On Thu, Dec 1, 2016 at 6:25 PM, Robert Haas <robertmh...@gmail.com> wrote:

> There's a couple of possible designs here, but there is the
> possibility for extra hops in some cases.  But there are things we can
> do to mitigate that.
>
> 1. If each tuple has a definitely-all-visible bit, you can check that
> first; if it's set, the tuple is definitely all-visible you don't need
> to do anything further.
>
> 2. If we still maintain a visibility map, you can check that next.  If
> the bit is set for the target page, then the tuple is definitely
> all-visible and you don't need to do anything further.
>
> 3. Otherwise, you need to look at the heap page.  But if the heap
> page's UNDO pointer is invalid, then the whole page is all-visible, so
> you're done.  (You can also set the visibility map bit and/or the
> index tuple's definitely-all-visible bit to avoid having to recheck
> the heap page in the future.)
>
> 4. Each tuple will have a bit indicating whether it's been recently
> modified; that is, whether the transaction whose UNDO log is
> referenced by the page's UNDO pointer did anything to that tuple; or
> if the page points to a TPD entry, whether any of the transactions in
> the TPD modified that tuple.  If that bit is not set for that
> particular tuple, it's all-visible and you're done.
>
> 5. Otherwise you have to look at the TPD entry (if any) and UNDO logs.
>
> If we put in the visibility information in the index as you are
> proposing, then we can always resolve the visibility of the index
> tuple at step 1; we just test xmin and xmax against the snapshot.  For
> index-only scans, that's great, because we never need to consult the
> heap at all.  For regular index scans, it's not nearly as great,
> because we're still going to need to consult the TPD and UNDO logs to
> determine which version of the tuple is visible to that snapshot.  You
> can avoid looking at the heap in the case where no version of the
> tuple is visible to the scan, but that's probably mostly going to
> happen only in cases where the transaction is still in flight so in
> most cases the heap will be in shared_buffers anyway and you won't
> save very much.
>

Idea of storing just one visibility bit in index tuple is a subject of
serious doubts for me.

1. When definitely-all-visible isn't set then we have to recheck during
scanning heap, right?
But our current recheck (i.e. rechecking scan quals) is not enough.
Imagine you have a range
index scan (val >= 100 AND val < 200), and val field was updated from val =
50 and val = 60
in some row.  Then you might have this row duplicated in the output.
Removing index scan and
sticking to only bitmap index scan doesn't seem to be fair.  Thus, we need
to introduce another
kind of recheck that heapTuple.indexKey = indexTuple.indexKey.

2. Another question is how it could work while index key being intensively
updated.  There is a risk
that we would have to traverse same UNDO-chains multiple times.  In worst
case, we would have
to spend quadratic time of number of row versions since our snapshot
taken.  We might try to mitigate this
by caching TID => heap tuple map for our snapshot.  But I don't like this
approach.
If we have large enough index scan, then we could either run out of cache
or consume too much memory
for that cache.

So, I think storing visibility information in index tuple is great for
regular index scans too.  At least,
it allow us to evade #2 problem.  That is great by itself.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] UNDO and in-place update

2016-11-29 Thread Alexander Korotkov

On Tue, Nov 29, 2016 at 8:21 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >>> Well, my original email did contain a discussion of the need for
> >>> delete-marking.  I said that you couldn't do in-place updates when
> >>> indexed columns were modified unless the index AMs had support for
> >>> delete-marking, which is the same point you are making here.
> >>
> >> Sorry, I had not read that part earlier, but now that I read it, I
> >> think there is a slight difference in what I am saying.   I thought
> >> along with delete-marking, we might need transaction visibility
> >> information in the index as well.
> >
> > I think we need to avoid putting the visibility information in the
> > index because that will make the index much bigger.
> >
>
> I agree that there will be an increase in index size, but it shouldn't
> be much if we have transaction information (and that too only active
> transactions) at page level instead of at tuple level.  I think the
> number of concurrent writes on the index will be lesser as compared to
> the heap.  There are a couple of benefits of having visibility
> information in the index.
>
> a. Heap and index could be independently cleaned up and in most cases
> by foreground transactions only.  The work for vacuum will be quite
> less as compared to now.  I think this will help us in avoiding the
> bloat to a great degree.
>
> b. Improved index-only scans, because now we don't need visibility map
> of the heap to check tuple visibility.
>
> c. Some speed improvements for index scans can also be expected
> because with this we don't need to perform heap fetch for invisible
> index tuples.
>

+1
I think once we're considering marking deleted index tuples, we should
provide an option of visibility-aware indexes.
Probably, it shouldn't be the only option for UNDO-based table engine.  But
it definitely should be one of options.

d. This is the way to eventually have index-organized tables.  Once index
is visiblity-aware, it becomes possible
to store date there without heap but with snapshots and transactions.
Also, it would be possible to achieve more
unification between heap and index access methods.  Imagine that heap is
TID => tuple map and index is index_key => tuple
map.  I think the reason why there is distinguishing between heap (which is
hardcoded) access method and index access method is that
heap is visiblity-aware and index is not visiblity aware.  Once they both
are visibility-aware, there is no much difference between them:
they are just different kind of maps.  And they could implement the same
interface.  Imagine you can select between heap-organized table
and index-organized table just by choosing its primary access method.  If
you select heap for primary access method, indexes would
refer TID.  If you select btree on could id as primary access method,
indexes would refer id could.  That would be great extendability.
Way better than what we have now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Indirect indexes

2016-10-19 Thread Alexander Korotkov

On Wed, Oct 19, 2016 at 12:53 PM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 18 October 2016 at 23:46, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
>
> > Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx.
> > But how will it remove (1,1) tuple from tbl_val_indirect_idx?  Thus,
> before
> > vacuuming tbl_val_indirect_idx we should know not only values of id which
> > are being removed, but actually (id, val) pairs which are being removed.
> > Should we collect those paris while scanning heap?  But we should also
> take
> > into account that multiple heap tuples might have same (id, val) pair
> values
> > (assuming there could be other columns being updated).  Therefore, we
> should
> > take into account when last pair of particular (id, val) pair value was
> > deleted from heap.  That would be very huge change to vacuum, may be even
> > writing way more complex vacuum algorithm from scratch.  Probably, you
> see
> > the better solution of this problem.
>
> The best way to sum up the problem is to consider how we deal with
> repeated updates to a single tuple that flip the value from A to B
> then back to A then to B then A etc.. Any value in the index can point
> to multiple versions of the same tuple and multiple index values can
> point to the same tuple (PK value). This problem behaviour was already
> known to me from Claudio's earlier analysis of WARM (thanks Claudio).
>

Thank you for pointing.  I didn't follow details of WARM discussion.

Yes, VACUUMing that is likely to be a complex issue, as you say. At
> the moment I don't have a plan for that, but am not worried.
>

AFAICS, the main goal of indirect indexes is to reduce their maintenance
cost.  Indirect indexes are much easier to maintain during UPDATEs and this
is good.  But it's harder to VACUUM them.  So, we need to figure out how
much maintenance cost would be reduced for indirect indexes.  This is why I
think digging into VACUUM problems is justified for now.

Indirect indexes produce less index entries in general than current,
> so the problem is by-design much smaller than current situation.
> Indirect indexes can support killed-tuple interface, so scanning the
> index by users will result in natural index maintenance, further
> reducing the problem.
>

That makes sense.  But that is not necessary true for any workload.  For
instance, keys, which are frequently updated, are not necessary same that
keys, which are frequently selected.  Thus, there is still some risk of
bloat.

So there will be a much reduced need for bulk maintenance. Bulk
> maintainence of the index, when needed, can be performed by scanning
> the whole table via the index, after the PK index has been vacuumed.
>

That's possible, but such vacuum is going to be very IO consuming when heap
doesn't fit cache.  It's even possible that rebuilding of index would be
cheaper.

> That can be optimized using an index-only scan of the PK to avoid
> touching the heap, which should be effective since the VM has been so
> recently refreshed.

But we can't get which of indirect index keys still persist in heap by
using index only scan by PK, because PK doesn't contain those keys.  So, we
still need to scan heap for it.

> For correctness it would require the index blocks
> to be locked against write while checking for removal, so bulk
> collection of values to optimize the underlying index doesn't seem
> useful. The index scan could also be further optimized by introducing
> a visibility map for the index, which is something that would also
> optimize normal index VACUUMs as well, but that is a later project and
> not something for 10.x
>

Visibility map for indexes sounds interesting.  And that means including
visibility information into index.  It's important property of current MVCC
implementation of PostgreSQL, that while updating heap tuple, we don't have
to find location of old index tuples referring it, we only have to insert
new index tuples.  Finding location of old index tuples, even for barely
updating index visibility map, would be a substantial change.

At this stage, the discussion should be 1) can it work? 2) do we want
> it?

I think that we definitely need indirect indexes and they might work.  The
question is design.  PostgreSQL MVCC is designed so that index contain no
visibility information.  So, currently we're discussing approach which
implies expanding of this design to indirect indexes.  The downsides of
this approach are (at least): 1) we should always recheck results obtained
from index, 2) VACUUM becomes very difficult.

There is also alternative approach: include visibility information into
indirect index.  In this approach we should include fields required for
visibility (xmin, xmax, etc) into indirect index tuple and keep th

Re: [HACKERS] Indirect indexes

2016-10-19 Thread Alexander Korotkov

On Wed, Oct 19, 2016 at 3:52 PM, Robert Haas <robertmh...@gmail.com> wrote:

> The VACUUM problems seem fairly serious.  It's true that these indexes
> will be less subject to bloat, because they only need updating when
> the PK or the indexed columns change, not when other indexed columns
> change.  On the other hand, there's nothing to prevent a PK from being
> recycled for an unrelated tuple.  We can guarantee that a TID won't be
> recycled until all index references to the TID are gone, but there's
> no such guarantee for a PK.  AFAICT, that would mean that an indirect
> index would have to be viewed as unreliable: after looking up the PK,
> you'd *always* have to recheck that it actually matched the index
> qual.
>

AFAICS, even without considering VACUUM, indirect indexes would be always
used with recheck.
As long as they don't contain visibility information.  When indirect
indexed column was updated, indirect index would refer same PK with
different index keys.
There is no direct link between indirect index tuple and heap tuple, only
logical link using PK.  Thus, you would anyway have to recheck.

Another approach would be to include visibility information into indirect
indexes themselves.  In this case, index should support snapshots by itself
and don't produce false positives.
This approach would require way more intrusive changes in index AMs.  We
would probably not able to reuse same index AM and have to make a new one.
But it seems like rather better design for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Indirect indexes

2016-10-18 Thread Alexander Korotkov

On Wed, Oct 19, 2016 at 12:21 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Oct 18, 2016 at 9:28 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
> wrote:
>
>> Vacuuming presents an additional challenge: in order to remove index
>> items from an indirect index, it's critical to scan the PK index first
>> and collect the PK values that are being removed.  Then scan the
>> indirect index and remove any items that match the PK items removed.
>> This is a bit problematic because of the additional memory needed to
>> store the array of PK values.  I haven't implemented this yet.
>>
>
> Imagine another situation: PK column was not updated, but indirect indexed
> column was updated.
> Thus, for single heap tuple we would have single PK tuple and two indirect
> index tuples (correct me if I'm wrong).
> How are we going to delete old indirect index tuple?
>

Let me explain it in more details.

There is a table with two columns and indirect index on it.

CREATE TABLE tbl (id integer primary key, val integer);
CREAET INDIRECT INDEX tbl_val_indirect_idx ON tbl (val);

Then do insert and update.

INSERT INTO tbl VALUES (1, 1);
UPDATE tbl SET val = 2 WHERE id = 1;

Then heap would contain two tuples.

 ctid  | id | val
---++-
 (0;1) |  1 |   1
 (0;2) |  1 |   2

tbl_pk_idx would contain another two tuples

 id | item_pointer
+--
  1 |(0;1)
  1 |(0;2)

And tbl_val_indirect_idx would have also two tuples

 val | id
-+
   1 |  1
   2 |  1

Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx.
But how will it remove (1,1) tuple from tbl_val_indirect_idx?  Thus, before
vacuuming tbl_val_indirect_idx we should know not only values of id which
are being removed, but actually (id, val) pairs which are being removed.
Should we collect those paris while scanning heap?  But we should also take
into account that multiple heap tuples might have same (id, val) pair
values (assuming there could be other columns being updated).  Therefore,
we should take into account when last pair of particular (id, val) pair
value was deleted from heap.  That would be very huge change to vacuum, may
be even writing way more complex vacuum algorithm from scratch.  Probably,
you see the better solution of this problem.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Indirect indexes

2016-10-18 Thread Alexander Korotkov

Hi, Alvaro!

Thank you for your proposal.  One question about vacuum excites me most.

On Tue, Oct 18, 2016 at 9:28 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Vacuuming presents an additional challenge: in order to remove index
> items from an indirect index, it's critical to scan the PK index first
> and collect the PK values that are being removed.  Then scan the
> indirect index and remove any items that match the PK items removed.
> This is a bit problematic because of the additional memory needed to
> store the array of PK values.  I haven't implemented this yet.
>

Imagine another situation: PK column was not updated, but indirect indexed
column was updated.
Thus, for single heap tuple we would have single PK tuple and two indirect
index tuples (correct me if I'm wrong).
How are we going to delete old indirect index tuple?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] PoC: Partial sort

2016-09-13 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan <p...@heroku.com> wrote:

> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
> <aekorot...@gmail.com> wrote:
> > Hmm... I'm not completely agree with that. In typical usage partial sort
> > should definitely use quicksort.  However, fallback to other sort
> methods is
> > very useful.  Decision of partial sort usage is made by planner.  But
> > planner makes mistakes.  For example, our HashAggregate is purely
> in-memory.
> > In the case of planner mistake it causes OOM.  I met such situation in
> > production and not once.  This is why I'd like partial sort to have
> graceful
> > degradation for such cases.
>
> I think that this should be moved to the next CF, unless a committer
> wants to pick it up today.
>

Patch was rebased to current master.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Alexander Korotkov

On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas <hlinn...@iki.fi>
wrote:

> On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:
>
>> On 08/22/2016 08:38 PM, Andres Freund wrote:
>>
>>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
>>>
>>>> I
>>>> remember seeing ProcArrayLock contention very visible earlier, but I
>>>> can't
>>>> hit that now. I suspect you'd still see contention on bigger hardware,
>>>> though, my laptop has oly 4 cores. I'll have to find a real server for
>>>> the
>>>> next round of testing.
>>>>
>>>
>>> Yea, I think that's true. I can just about see ProcArrayLock contention
>>> on my more powerful laptop, to see it really bad you need bigger
>>> hardware / higher concurrency.
>>>
>>
>> As soon as I sent my previous post, Vladimir Borodin kindly offered
>> access to a 32-core server for performance testing. Thanks Vladimir!
>>
>> I installed Greg Smith's pgbench-tools kit on that server, and ran some
>> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
>> modifying the test script to use "-M prepared", and using Unix domain
>> sockets instead of TCP to connect. Apparently those things add enough
>> overhead to mask out the little difference.
>>
>> Attached is a graph with the results. Full results are available at
>> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
>> improved throughput, measured in TPS, with >= 32 or so clients. The
>> biggest difference was with 44 clients, which saw about 5% improvement.
>>
>> So, not phenomenal, but it's something. I suspect that with more cores,
>> the difference would become more clear.
>>
>> Like on a cue, Alexander Korotkov just offered access to a 72-core
>> system :-). Thanks! I'll run the same tests on that.
>>
>
> And here are the results on the 72 core machine (thanks again,
> Alexander!). The test setup was the same as on the 32-core machine, except
> that I ran it with more clients since the system has more CPU cores. In
> summary, in the best case, the patch increases throughput by about 10%.
> That peak is with 64 clients. Interestingly, as the number of clients
> increases further, the gain evaporates, and the CSN version actually
> performs worse than unpatched master. I don't know why that is. One theory
> that by eliminating one bottleneck, we're now hitting another bottleneck
> which doesn't degrade as gracefully when there's contention.
>

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S.  Also, it would be nice to check
something like 10% of writes, 90% of reads (which is quite typical workload
in real life I believe).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alexander Korotkov

On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Claudio Freire <klaussfre...@gmail.com> writes:
> > Not only that, but from your description (I haven't read the patch,
> > sorry), you'd be scanning the whole index multiple times (one per
> > worker).
>
> What about pointing each worker at a separate index?  Obviously the
> degree of concurrency during index cleanup is then limited by the
> number of indexes, but that doesn't seem like a fatal problem.
>

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov

On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I took a look at your patch. Couple of notes from me.
>
> const char *
>> GetEventIdentifier(uint16 eventId)
>> {
>> const char *res;
>> switch (eventId)
>> {
>> case EVENT_ARCHIVER_MAIN:
>> res = "ArchiverMain";
>> break;
>> ... long long list of events ...
>> case EVENT_WAL_SENDER_WRITE_DATA:
>> res = "WalSenderWriteData";
>> break;
>> default:
>> res = "???";
>> }
>> return res;
>> }
>
>
> Would it be better to use an array here?
>
> typedef enum EventIdentifier
>> {
>
>
> EventIdentifier seems too general name for me, isn't it?  Could we name it
> WaitEventIdentifier? Or WaitEventId for shortcut?
>

I'm also not sure about handling of secure_read() and secure_write()
functions.
In the current patch we're tracking latch event wait inside them.  But we
setup latch only for blocking sockets and can do it multiple times in loop.
Would it be better to make separate wait events NETWORK_READ and
NETWORK_WRITE and setup them for the whole time spent in secure_read()
and secure_write()?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov

Hi, Michael!

On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
> > Attached is an updated patch.
>
> Updated version for 2 minor issues:
> 1) s/stram/stream/
> 2) Docs used incorrect number
>

I took a look at your patch. Couple of notes from me.

const char *
> GetEventIdentifier(uint16 eventId)
> {
> const char *res;
> switch (eventId)
> {
> case EVENT_ARCHIVER_MAIN:
> res = "ArchiverMain";
> break;
> ... long long list of events ...
> case EVENT_WAL_SENDER_WRITE_DATA:
> res = "WalSenderWriteData";
> break;
> default:
> res = "???";
> }
> return res;
> }

Would it be better to use an array here?

typedef enum EventIdentifier
> {

EventIdentifier seems too general name for me, isn't it?  Could we name it
WaitEventIdentifier? Or WaitEventId for shortcut?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-19 Thread Alexander Korotkov

On Fri, Aug 19, 2016 at 3:19 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Fri, Aug 19, 2016 at 11:42 AM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > Hackers,
> >
> > originally this idea was proposed by Andres Freund while experimenting
> with
> > lockfree Pin/UnpinBuffer [1].
> > The patch is attached as well as results of pgbench -S on 72-cores
> machine.
> > As before it shows huge benefit in this case.
> > For sure, we should validate that it doesn't cause performance
> regression in
> > other cases.  At least we should test read-write and smaller machines.
> > Any other ideas?
> >
>
> may be test on Power m/c as well.
>

Good idea.  I don't have any such machine at hand now.  Do you have one?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-19 Thread Alexander Korotkov

On Fri, Aug 19, 2016 at 4:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> > originally this idea was proposed by Andres Freund while experimenting
> with
> > lockfree Pin/UnpinBuffer [1].
> > The patch is attached as well as results of pgbench -S on 72-cores
> > machine.  As before it shows huge benefit in this case.
>
> That's one mighty ugly patch.  Can't you do it without needing to
> introduce the additional layer of struct nesting?
>

That's worrying me too.
We could use anonymous struct, but it seems to be prohibited in C89 which
we stick to.
Another idea, which comes to my mind, is to manually calculate size of
padding and insert it directly to PGXACT struct.  But that seems rather
ugly too.  However, it would be ugly definition not ugly usage...
Do you have better ideas?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Should we cacheline align PGXACT?

2016-08-19 Thread Alexander Korotkov

Hackers,

originally this idea was proposed by Andres Freund while experimenting with
lockfree Pin/UnpinBuffer [1].
The patch is attached as well as results of pgbench -S on 72-cores
machine.  As before it shows huge benefit in this case.
For sure, we should validate that it doesn't cause performance regression
in other cases.  At least we should test read-write and smaller machines.
Any other ideas?

1.
https://www.postgresql.org/message-id/20160411214029.ce3fw6zxim5k6...@alap3.anarazel.de

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgxact-align-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2016-08-18 Thread Alexander Korotkov

On Thu, Aug 18, 2016 at 10:58 AM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 16 August 2016 at 19:46, Andres Freund <and...@anarazel.de> wrote:
> > On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
> >> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
> >>
> >> One, I'm worried that adding an additional layer of pointer-jumping is
> >> going to slow things down and make Andres' work to speed up the
> >> executor more difficult.  I don't know that there is a problem there,
> >> and if there is a problem I don't know what to do about it, but I
> >> think it's something we need to consider.
> >
> > I'm quite concerned about that as well.
>
> This objection would apply to all other proposals as well, FDW etc..
>
> Do you see some way to add flexibility yet without adding a branch
> point in the code?
>

It's impossible without branch point in code.  The question is where this
branch should be located.
In particular, be can put this branch point into planner by defining
distinct executor nodes for each pluggable storage.  In this case, each
storage would have own optimized executor nodes.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Index Onlys Scan for expressions

2016-08-16 Thread Alexander Korotkov

On Tue, Aug 16, 2016 at 9:09 AM, Oleg Bartunov <obartu...@gmail.com> wrote:

> On Tue, Aug 16, 2016 at 1:03 AM, Ildar Musin <i.mu...@postgrespro.ru>
> wrote:
> > Hi, hackers!
> >
> > There is a known issue that index only scan (IOS) can only work with
> simple
> > index keys based on single attributes and doesn't work with index
> > expressions. In this patch I propose a solution that adds support of IOS
> for
> > index expressions. Here's an example:
> >
> > create table abc(a int, b int, c int);
> > create index on abc ((a * 1000 + b), c);
> >
> > with t1 as (select generate_series(1, 1000) as x),
> >  t2 as (select generate_series(0, 999) as x)
> > insert into abc(a, b, c)
> > select t1.x, t2.x, t2.x from t1, t2;
> > vacuum analyze;
> >
> > Explain results with the patch:
> >
> > explain (analyze, buffers) select a * 1000 + b + c from abc where a *
> 1000 +
> > b = 1001;
> >QUERY PLAN
> > 
> -
> >  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
> > width=4) (actual time=0.032..0.033 rows=1 loops=1)
> >Index Cond: a * 1000) + b)) = 1001)
> >Heap Fetches: 0
> >Buffers: shared hit=4
> >  Planning time: 0.184 ms
> >  Execution time: 0.077 ms
> > (6 rows)
> >
> > Before the patch it was:
> >
> > explain (analyze, buffers) select a * 1000 + b + c from abc where a *
> 1000 +
> > b = 1001;
> >  QUERY PLAN
> > 
> 
> >  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 width=4)
> > (actual time=0.039..0.041 rows=1 loops=1)
> >Index Cond: (((a * 1000) + b) = 1001)
> >Buffers: shared hit=4
> >  Planning time: 0.177 ms
> >  Execution time: 0.088 ms
> > (5 rows)
> >
> > This solution has limitations though: the restriction or the target
> > expression tree (or its part) must match exactly the index. E.g. this
> > expression will pass the check:
> >
> > select a * 1000 + b + 100 from ...
> >
> > but this will fail:
> >
> > select 100 + a * 1000 + b from ...
> >
> > because the parser groups it as:
> >
> > (100 + a * 1000) + b
> >
> > In this form it won't match any index key. Another case is when we create
> > index on (a+b) and then make query like 'select b+a ...' or '... where
> b+a =
> > smth' -- it won't match. This applies to regular index scan too.
> Probably it
> > worth to discuss the way to normalize index expressions and clauses and
> work
> > out more convenient way to match them.
>
> pg_operator.oprcommutative ?


Do you mean pg_operator.oprcom?
In these examples we're also lacking of pg_operator.oprassociative...
Another problem is computational complexity of such transformations.
AFAIR, few patches for making optimizer smarter with expressions were
already rejected because of this reason.
Also, even if we would have such transformation of expressions, floating
points types would be still problematic, because (a + b) + c = a + (b + c)
is not exactly true for them because of computational error.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2016-08-15 Thread Alexander Korotkov

On Sat, Aug 13, 2016 at 2:15 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.

That's it.  My design presented at PGCon was very sketchy, and I didn't
deliver any prototype yet.

> Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.

Great!  It's nice to see you're working in this direction!

> This will allow us to write permanent table storage that works
> differently than heapam.c.  At this stage, I haven't throught through
> whether this is going to allow extensions to define new storage modules;
> I am focusing on AMs that can coexist with heapam in core.
>

So, as I get you're proposing extendability to replace heap with something
another
but compatible with heap (with executor nodes, index access methods and so
on).
That's good, but what alternative storage access methods could be?
AFAICS, we can't fit here, for instance, another MVCC implementation (undo
log) or
in-memory storage or columnar storage. However, it seems that we would be
able to make compressed heap or alternative HOT/WARM tuples mechanism.
Correct me if I'm wrong.

ISTM you're proposing something quite orthogonal to my view
of pluggable storage engines.  My idea, in general, was to extend FDW
mechanism
to let it be something manageable inside database (support for VACUUM,
defining
indexes and so on).  But imperative was that it should have its own
executor nodes,
and it doesn't have to compatible with existing index access methods.

Therefore, I think my design presented at PGCon and your current proposal
are
about orthogonal features which could coexist.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Relocation of tuple between release and re-acquire of tuple lock

2016-08-12 Thread Alexander Korotkov

On Fri, Aug 12, 2016 at 3:15 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I'm now exploring code working with heap tuples.  The following code
> in heap_update() catch my eyes.
>
> if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
>> *lockmode))
>> {
>> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>> /* acquire tuple lock, if necessary */
>> heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
>> LockWaitBlock, _tuple_lock);
>> /* wait for multixact */
>> MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
>> relation, _self, XLTW_Update,
>> );
>> checked_lockers = true;
>> locker_remains = remain != 0;
>> LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
>> /*
>> * If xwait had just locked the tuple then some other xact
>> * could update this tuple before we get to this point.  Check
>> * for xmax change, and start over if so.
>> */
>> if (xmax_infomask_changed(oldtup.t_data->t_infomask,
>>  infomask) ||
>> !TransactionIdEquals(HeapTupleGetRawXmax(),
>> xwait))
>> goto l2;
>> }
>
>
> Is it safe to rely on same oldtup.t_data pointer after release
> and re-acquire of buffer content lock?  Could the heap tuple be relocated
> between lock release and re-acquire?  I know that we still hold a buffer
> pin and vacuum would wait for pin release.  But other heap operations could
> still call heap_page_prune() which correspondingly can relocate tuple.
> Probably, I'm missing something...
>

Please, forget it. heap_page_prune_opt() do:


> /* OK, try to get exclusive buffer lock */
> if (!ConditionalLockBufferForCleanup(buffer))
> return;


Nobody repairs buffer fragmentation while there is a pin. Everything is
right.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Relocation of tuple between release and re-acquire of tuple lock

2016-08-12 Thread Alexander Korotkov

Hackers,

I'm now exploring code working with heap tuples.  The following code
in heap_update() catch my eyes.

if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
> *lockmode))
> {
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> /* acquire tuple lock, if necessary */
> heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
> LockWaitBlock, _tuple_lock);
> /* wait for multixact */
> MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
> relation, _self, XLTW_Update,
> );
> checked_lockers = true;
> locker_remains = remain != 0;
> LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
> /*
> * If xwait had just locked the tuple then some other xact
> * could update this tuple before we get to this point.  Check
> * for xmax change, and start over if so.
> */
> if (xmax_infomask_changed(oldtup.t_data->t_infomask,
>  infomask) ||
> !TransactionIdEquals(HeapTupleGetRawXmax(),
> xwait))
> goto l2;
> }


Is it safe to rely on same oldtup.t_data pointer after release
and re-acquire of buffer content lock?  Could the heap tuple be relocated
between lock release and re-acquire?  I know that we still hold a buffer
pin and vacuum would wait for pin release.  But other heap operations could
still call heap_page_prune() which correspondingly can relocate tuple.
Probably, I'm missing something...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov

On Wed, Aug 10, 2016 at 8:26 PM, Greg Stark <st...@mit.edu> wrote:

> On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > Oh, I found that I underestimated complexity of async commit...  :)
>
> Indeed. I think Tom's attitude was right even if the specific
> conclusion was wrong. While I don't think removing async commit is
> viable I think it would be a laudable goal if we can remove some of
> the complication in it. I normally describe async commit as "just like
> a normal commit but don't block on the commit" but it's actually a bit
> more complicated.
>
> AIUI the additional complexity is that while async commits are visible
> to everyone immediately other non-async transactions can be committed
> but then be in limbo for a while before they are visible to others. So
> other sessions will see the async commit "jump ahead" of any non-async
> transactions even if those other transactions were committed first.
> Any standbys will see the non-async transaction in the logs before the
> async transaction and in a crash it's possible to lose the async
> transaction even though it was visible but not the sync transaction
> that wasn't.
>
> Complexity like this makes it hard to implement other features such as
> CSNs. IIRC this already bit hot standby as well. I think it would be a
> big improvement if we had a clear, well defined commit order that was
> easy to explain and easy to reason about when new changes are being
> made.
>

Heikki, Ants, Greg, thank you for the explanation.  You restored order in
my personal world.
Now I see that introduction of own sequence of CSN which is not equal to
LSN makes sense.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov

On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

> On 08/10/2016 05:51 PM, Tom Lane wrote:
>
>> Heikki Linnakangas <hlinn...@iki.fi> writes:
>>
>>> On 08/10/2016 05:09 PM, Tom Lane wrote:
>>>
>>>> Uh, what?  That's not the semantics we have today, and I don't see why
>>>> it's necessary or a good idea.  Once the commit is in the WAL stream,
>>>> any action taken on the basis of seeing the commit must be later in
>>>> the WAL stream.  So what's the problem?
>>>>
>>>
>> I was talking about synchronous commits in the above. A synchronous
>>> commit is not made visible to other transactions, until the commit WAL
>>> record is flushed to disk.
>>>
>>
>> [ thinks for a bit... ]  Oh, OK, that's because we don't treat a
>> transaction as committed until its clog bit is set *and* it's not
>> marked as running in the PGPROC array.  And sync transactions will
>> flush WAL in between.
>>
>
> Right.
>
> Still, having to invent CSNs seems like a huge loss for this design.
>> Personally I'd give up async commit first.  If we had only sync commit,
>> the rule could be "xact LSN less than snapshot threshold and less than
>> WAL flush position", and we'd not need CSNs.  I know some people like
>> async commit, but it's there because it was easy and cheap in our old
>> design, not because it's the world's greatest feature and worth giving
>> up performance for.
>>
>
> I don't think that's a very popular opinion (I disagree, for one).
> Asynchronous commits are a huge performance boost for some applications.
> The alternative is fsync=off, and I don't want to see more people doing
> that. SSDs have made the penalty of an fsync much smaller, but it's still
> there.
>
> Hmm. There's one more possible way this could all work. Let's have CSN ==
> LSN, also for asynchronous commits. A snapshot is the current insert
> position, but also make note of the current flush position, when you take a
> snapshot. Now, when you use the snapshot, if you ever see an XID that
> committed between the snapshot's insert position and the flush position,
> wait for the WAL to be flushed up to the snapshot's insert position at that
> point. With that scheme, an asynchronous commit could return to the
> application without waiting for a flush, but if someone actually looks at
> the changes the transaction made, then that transaction would have to wait.
> Furthermore, we could probably skip that waiting too, if the reading
> transaction is also using synchronous_commit=off.
>
> That's slightly different from the current behaviour. A transaction that
> runs with synchronous_commit=on, and reads data that was modified by an
> asynchronous transaction, would take a hit. But I think that would be
> acceptable.


Oh, I found that I underestimated complexity of async commit...  :)

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.
2) Other transactions sees effect of async transaction only when its WAL
flushed.
3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Wait events monitoring future development

2016-08-10 Thread Alexander Korotkov

On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjian <br...@momjian.us> wrote:

> On Tue, Aug  9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote:
> > I hope wait event monitoring will be on by default even if the overhead
> is not
> > almost zero, because the data needs to be readily available for faster
> > troubleshooting.  IMO, the benefit would be worth even 10% overhead.  If
> you
> > disable it by default because of overhead, how can we convince users to
> enable
> > it in production systems to solve some performance problem?  I’m afraid
> severe
> > users would say “we can’t change any setting that might cause more
> trouble, so
> > investigate the cause with existing information.”
>
> If you want to know why people are against enabling this monitoring by
> default, above is the reason.  What percentage of people do you think
> would be willing to take a 10% performance penalty for monitoring like
> this?  I would bet very few, but the argument above doesn't seem to
> address the fact it is a small percentage.
>

Just two notes from me:

1) 10% overhead from monitoring wait events is just an idea without any
proof so soon.
2) We already have functionality which trades insight into database with
way more huge overhead.  auto_explain.log_analyze = true can slowdown
queries *in times*.  Do you think we should remove it?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Wait events monitoring future development

2016-08-10 Thread Alexander Korotkov

On Tue, Aug 9, 2016 at 12:47 AM, Ilya Kosmodemiansky <
ilya.kosmodemian...@postgresql-consulting.com> wrote:

> On Mon, Aug 8, 2016 at 7:03 PM, Bruce Momjian <br...@momjian.us> wrote:
> > It seems asking users to run pg_test_timing before deploying to check
> > the overhead would be sufficient.
>
> I'am not sure. Time measurement for waits is slightly more complicated
> than a time measurement for explain analyze: a good workload plus
> using gettimeofday in a straightforward manner can cause huge
> overhead.

What makes you think so?  Both my thoughts and observations are opposite:
it's way easier to get huge overhead from EXPLAIN ANALYZE than from
measuring wait events.  Current wait events are quite huge events itself
related to syscalls, context switches and so on. In contrast EXPLAIN
ANALYZE calls gettimeofday for very cheap operations like transfer tuple
from one executor node to another.

> Thats why a proper testing is important - if we can see a
> significant performance drop if we have for example large
> shared_buffers with the same concurrency,  that shows gettimeofday is
> too expensive to use. Am I correct, that we do not have such accurate
> tests now?
>

Do you think that large shared buffers is a kind a stress test for wait
events monitoring? If so, why?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

> (Reviving an old thread)
>
> I spent some time dusting off this old patch, to implement CSN snapshots.
> Attached is a new patch, rebased over current master, and with tons of
> comments etc. cleaned up. There's more work to be done here, I'm posting
> this to let people know I'm working on this again. And to have a backup on
> the 'net :-).
>

Great! It's very nice seeing that you're working on this patch.
After PGCON I've your patch rebased to the master, but you already did much
more.

> I switched to using a separate counter for CSNs. CSN is no longer the same
> as the commit WAL record's LSN. While I liked the conceptual simplicity of
> CSN == LSN a lot, and the fact that the standby would see the same commit
> order as master, I couldn't figure out how to make async commits to work.
>

I didn't get async commits problem at first glance.  AFAICS, the difference
between sync commit and async is only that async commit doesn't wait WAL
log to flush.  But async commit still receives LSN.
Could you describe it in more details?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal for CSN based snapshots

2016-08-10 Thread Alexander Korotkov

On Wed, Aug 10, 2016 at 2:10 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

> Yeah, if the csnlog access turns out to be too expensive, we could do
> something like this. In theory, you can always convert a CSN snapshot into
> an old-style list of XIDs, by scanning the csnlog between the xmin and
> xmax. That could be expensive if the distance between xmin and xmax is
> large, of course. But as you said, you can have various hybrid forms, where
> you use a list of XIDs of some range as a cache, for example.
>
> I'm hopeful that we can simply make the csnlog access fast enough, though.
> Looking up an XID in a sorted array is O(log n), while looking up an XID in
> the csnlog is O(1). That ignores all locking and different constant
> factors, of course, but it's not a given that accessing the csnlog has to
> be slower than a binary search of an XID array.


FYI, I'm still fan of idea to rewrite XID with CSN in tuple in the same way
we're writing hint bits now.
I'm going to make prototype of this approach which would be enough for
performance measurements.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-07-03 Thread Alexander Korotkov

Hi!

On Sun, Jul 3, 2016 at 12:24 PM, Andrew Borodin <boro...@octonica.com>
wrote:

> I think there is some room for improving GiST inserts. Following is
> the description of what I think is performance problem.
>
> Function gistplacetopage in file /src/backend/access/gist/gist.c is
> responsible for updating or appending new tuple on GiST page.
> Currently, after checking necessity of page split due to overflow, it
> essentially executes following:
>
>if (OffsetNumberIsValid(oldoffnum))
>PageIndexTupleDelete(page, oldoffnum);
>gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
>
> That is: remove old tuple if it’s there, then place updated tuple at the
> end.
>
> Half of the old data have to be shifted my memmove inside
> PageIndexTupleDelete() call, half of the linp-s have to be corrected.
>
> If the updated tuple has same size as already residing on page we can
> just overwrite it. Attached patch demonstrates that concept. Attached
> test.sql inserts million rows into GiST index based on cube extension.
> My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
> storage. Before patch, insert part of test is executed on average
> within 159 second, after patch application: insert part is executed
> within 77 seconds on average. That is almost twice faster (for
> CPU\Mem-bounded inserts, disk-constrained test will show no
> improvement). But it works only for fixed-size tuple inserts.
>

Very promising results!

I know that code in patch is far from beautiful: it operates with
> three different levels of abstraction within 5 lines of code. Those
> are low level memmove(), system-wide PageAddItem() and GiST private
> gistfillBuffer().
>
> By the way PageAddItem() have overwrite flag, but it only works with
> unused ItemId’s. Marking old ItemId as unused before PageAddItem()
> didn’t work for me. Unfortunately bufpage.c routines do not contain
> one for updating(replacing with new) tuple on page. It is important
> for me because I’m working on advanced GiST page layout (
>
> https://www.postgresql.org/message-id/CAJEAwVE0rrr%2BOBT-P0gDCtXbVDkBBG_WcXwCBK%3DGHo4fewu3Yg%40mail.gmail.com
> ), current approach is to use skip-tuples which can be used to skip
> many flowing tuples with one key check. Obviously, this design cares
> about tuples order. And update in a fashion “place updated tuple at
> the end” won’t work for me.
>
> So, I think it would be better to implement PageReplaceItem()
> functionality to make code better, to make existing GiST inserts
> faster and to enable new advanced page layouts in indices.
>

+1 for PageReplaceItem()
Even if item is not the same size, we can move the tail of page once
instead of twice.
I think you should implement PageReplaceItem() version and add it to the
commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Why we don't have checksums on clog files

2016-06-07 Thread Alexander Korotkov

On Tue, Jun 7, 2016 at 5:41 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Mon, Jun 6, 2016 at 8:43 PM, Alex Ignatov <a.igna...@postgrespro.ru>
> wrote:
>
>> Why we don't have checksums on clog files.
>>
>> We have checksum on pg_control, optional checksumming on data files, some
>> form of checksumming on wal's. But why we don't have any checksumming on
>> clogs. Corruptions on clogs lead to transaction visisbility problems and
>> database consistency violation.
>>
>> Can anybody explain this situation with clogs?
>>
>>
> I think it will be better if users can have an option to checksum clog
> pages.  However, I think that will need a change in page (CLOG-page) format
> which might not be a trivial work to accomplish.
>

I think we should think no only about CLOG, but about all SLRU pages.
For data pages we have to reinitialize database cluster to enable
checksums. I think we can also require reinitialization to enable checksums
of SLRU pages. Thus, major problem would be that number of items fitting to
page would depend on whether checksums are enabled.  However, it shouldn't
too hard.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Is the unfair lwlock behavior intended?

2016-05-24 Thread Alexander Korotkov

Hi!

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.
> That appears to apply to 9.6, too, as far as I examine the code.  Could you
> tell me if the behavior is intended or needs fix?
>
> Simply put, the unfair behavior is that waiters for exclusive mode are
> overtaken by share-mode lockers who arrive later.
>
>
> PROBLEM
> 
>
> Under a heavy read/write workload on a big machine with dozens of CPUs and
> hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect
> to the database (and actually, it failed to connect due to our
> connect_timeout setting.)  The backend corresponding to the psql was
> waiting to acquire exclusive mode lock on ProcArrayLock.  Some other
> backends took more than 10 seconds to commit their transactions, waiting
> for exclusive mode lock on ProcArrayLock.
>
> At that time, many backend processes (I forgot the number) were acquiring
> and releasing share mode lock on ProcArrayLock, most of which were from
> TransactionIsInProgress().
>
>
> CAUSE
> 
>
> Going into the 9.2 code, I realized that those who request share mode
> don't pay attention to the wait queue.  That is, if some processes hold
> share mode lock and someone is waiting for exclusive mode in the wait
> queue, other processes who come later can get share mode overtaking those
> who are already waiting.  If many processes repeatedly request share mode,
> the waiters can't get exclusive mode for a long time.
>
> Is this intentional, or should we make the later share-lockers if someone
> is in the wait queue?
>

I've already observed such behavior, see [1].  I think that now there is no
consensus on how to fix that.  For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix
various scalability issues including high ProcArrayLock contention.

References.

   1.
   
http://www.postgresql.org/message-id/CAPpHfdsytkTFMy3N-zfSo+kAuUx=u-7jg6q2byb6fpuw2cd...@mail.gmail.com
   2.
   
http://www.postgresql.org/message-id/20151211130413.go14...@awork2.anarazel.de
   3.
   
http://www.postgresql.org/message-id/CA+CSw_tEpJ=md1zgxPkjH6CWDnTDft4gBi=+p9snoc+wy3p...@mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Autovacuum to prevent wraparound tries to consume xid

2016-05-22 Thread Alexander Korotkov

On Sun, May 22, 2016 at 12:39 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Mon, Mar 28, 2016 at 4:35 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Hackers,
>>
>> one our customer meet near xid wraparound situation.  xid counter
>> reached xidStopLimit value.  So, no transactions could be executed in
>> normal mode.  But what I noticed is strange behaviour of autovacuum to
>> prevent wraparound.  It vacuums tables, updates pg_class and pg_database,
>> but then falls with "database is not accepting commands to avoid wraparound
>> data loss in database" message.  We end up with situation that according to
>> pg_database  maximum age of database was less than 200 mln., but
>> transactions couldn't be executed, because ShmemVariableCache wasn't
>> updated (checked by gdb).
>>
>> I've reproduced this situation on my laptop as following:
>>
>> 1) Connect gdb, do "set ShmemVariableCache->nextXid =
>> ShmemVariableCache->xidStopLimit"
>> 2) Stop postgres
>> 3) Make some fake clog: "dd bs=1m if=/dev/zero
>> of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
>> 4) Start postgres
>>
>> Then I found the same situation as in customer database.  Autovacuum to
>> prevent wraparound regularly produced following messages in the log:
>>
>> ERROR:  database is not accepting commands to avoid wraparound data loss
>> in database "template1"
>> HINT:  Stop the postmaster and vacuum that database in single-user mode.
>> You might also need to commit or roll back old prepared transactions.
>>
>> Finally all databases was frozen
>>
>> # SELECT datname, age(datfrozenxid) FROM pg_database;
>>   datname  │   age
>> ───┼──
>>  template1 │0
>>  template0 │0
>>  postgres  │ 5000
>> (3 rows)
>>
>> but no transactions could be executed (ShmemVariableCache wasn't updated).
>>
>> After some debugging I found that vac_truncate_clog consumes xid just to
>> produce warning.  I wrote simple patch which replaces
>> GetCurrentTransactionId() with ShmemVariableCache->nextXid.  That
>> completely fixes this situation for me: ShmemVariableCache was successfully
>> updated.
>>
>
> As per your latest patch, you are using ReadNewTransactionId() to get the
> nextXid which then is used to check if any database's frozenxid is already
> wrapped.  Now, isn't the value of nextXID in your patch same as
> lastSaneFrozenXid in most cases (I mean there is a small window where some
> new transaction might have started due to which the value of
> ShmemVariableCache->nextXid has been advanced)? So isn't relying on
> lastSaneFrozenXid check sufficient?
>

Hmm... So, this code already contains comparison with lastSaneFrozenXid.
Thus, current code compares against both of lastSaneFrozenXid and myXID.  I
have no comment clarifying why this should be so.  In my opinion we can
just remove myXID with its checks.  Git shows that Tom Lane
committed lastSaneFrozenXid and lastSaneMinMulti checks in addition to
myXID check in 78db307b.

Tom, what do you think?  Could we remove myXID from vac_truncate_clog()?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Autovacuum to prevent wraparound tries to consume xid

2016-05-19 Thread Alexander Korotkov

On Mon, Mar 28, 2016 at 2:05 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> After some debugging I found that vac_truncate_clog consumes xid just to
> produce warning.  I wrote simple patch which replaces
> GetCurrentTransactionId() with ShmemVariableCache->nextXid.  That
> completely fixes this situation for me: ShmemVariableCache was successfully
> updated.
>

I found that direct reading of ShmemVariableCache->nextXid is not corrent,
it's better to use ReadNewTransactionId() then.  Fixed version of patch is
attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_vac_truncate_clog_xid_consume_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN logging GIN_SEGMENT_UNMODIFIED actions?

2016-05-10 Thread Alexander Korotkov

Hi!

On Mon, May 9, 2016 at 10:46 PM, Andres Freund <and...@anarazel.de> wrote:

> trying to debug something I saw the following in pg_xlogdump output:
>
> rmgr: Gin len (rec/tot):  0/   274, tx:  0, lsn:
> 1C/DF28AEB0, prev 1C/DF289858, desc: VACUUM_DATA_LEAF_PAGE  3 segments: 5
> unknown action 0 ???, blkref #0: rel 1663/16384/16435 blk 310982
>
> note the "segments: 5 unknown action 0 ???" bit.  That doesn't seem
> right, given:
> #define GIN_SEGMENT_UNMODIFIED  0   /* no action (not used in
> WAL records) */
>

I've checked GIN code.  Have no idea of how such wal record could be
generated...
The only idea I have is to add check that we're inserting valid WAL record
immediately before XLogInsert().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-04 Thread Alexander Korotkov

On Wed, May 4, 2016 at 2:05 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:
> > I've bisected the errors I was seeing, discussed in
> >
> http://www.postgresql.org/message-id/CAMkU=1xqehc0ok4d+tkjfq1nvuho37pyrkhjp6q8oxifmx7...@mail.gmail.com
> >
> > It look like they first appear in:
> >
> > commit 48354581a49c30f5757c203415aa8412d85b0f70
> > Author: Andres Freund <and...@anarazel.de>
> > Date:   Sun Apr 10 20:12:32 2016 -0700
> >
> > Allow Pin/UnpinBuffer to operate in a lockfree manner.
> >
> >
> > I get the errors:
> >
> > ERROR:  attempted to delete invisible tuple
> > STATEMENT:  update foo set count=count+1,text_array=$1 where text_array
> @> $2
> >
> > And also:
> >
> > ERROR:  unexpected chunk number 1 (expected 2) for toast value
> > 85223889 in pg_toast_16424
> > STATEMENT:  update foo set count=count+1 where text_array @> $1
>
> Hm. I appear to have trouble reproducing this issue (continuing to try)
> on master as of 8826d8507.  Is there any chance you could package up a
> data directory after the issue hit?
>

FWIW, I'm also trying to reproduce it on big x86 machine on 9888b34f.
I'll write about results when get any.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] snapshot too old, configured by time

2016-04-25 Thread Alexander Korotkov

On Sat, Apr 23, 2016 at 5:20 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> > On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <br...@momjian.us> wrote:
> > >
> > > I kind of agreed with Tom about just aborting transactions that held
> > > snapshots for too long, and liked the idea this could be set per
> > > session, but the idea that we abort only if a backend actually touches
> > > the old data is very nice.  I can see why the patch author worked hard
> > > to do that.
> > >
> > > How does/did Oracle handle this?
> > >
> >
> > IIRC then Oracle gives this error when the space in undo tablespace (aka
> > rollback segment) is low.  When the rollback segment gets full, it
> overwrites
> > the changed data which might be required by some old snapshot and when
> that old
> > snapshot statement tries to access the data (which is already
> overwritten), it
> > gets "snapshot too old" error.  Assuming there is enough space in
> rollback
> > segment, Oracle seems to provide a way via Alter System set
> undo_retention =
> > .
> >
> > Now, if the above understanding of mine is correct, then I think the
> current
> > implementation done by Kevin is closer to what Oracle provides.
>
> But does the rollback only happen if the long-running Oracle transaction
> tries to _access_ specific data that was in the undo segment, or _any_
> data that potentially could have been in the undo segment?  If the
> later, it seems Kevin's approach is better because you would have to
> actually need to access old data that was there to be canceled, not just
> any data that could have been overwritten based on the xid.
>

I'm not sure that we should rely that much on Oracle behavior.  It has very
different MVCC model.
Thus we can't apply same features one-by-one: they would have different pro
and cons for us.

Also, it seems we have similar behavior already in applying WAL on the
> standby --- we delay WAL replay when there is a long-running
> transaction.  Once the time expires, we apply the WAL.  Do we cancel the
> long-running transaction at that time, or wait for the long-running
> transaction to touch some WAL we just applied?  If the former, does
> Kevin's new code allow us to do the later?


That makes sense for me.  If we could improve read-only queries on slaves
this way, Kevin's new code becomes much more justified.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-25 Thread Alexander Korotkov

On Sun, Apr 17, 2016 at 7:32 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund <and...@anarazel.de> wrote:
> >
> > On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > > What you want to see by prewarming?
> >
> > Prewarming appears to greatly reduce the per-run variance on that
> > machine, making it a lot easier to get meaningful results.
> >
>
> I think you are referring the tests done by Robert on power-8 m/c, but the
> performance results I have reported were on intel x86.  In last two days, I
> have spent quite some effort to do the performance testing of this patch
> with pre-warming by using the same query [1] as used by Robert in his
> tests.  The tests are done such that first it start server, pre-warms the
> relations, ran read-only test, stop server, again repeat this for next test.
>

What did you include into single run: test of single version (HEAD or
Patch) or test of both of them?


> I have observed that the variance in run-to-run performance still occurs
> especially at higher client count (128).  Below are results for 128 client
> count both when the tests ran first with patch and then with HEAD and vice
> versa.
>
> Test-1
> --
> client count - 128 (basically -c 128 -j 128)
>
> first tests ran with patch and then with HEAD
>
> Patch_ver/Runs HEAD (commit -70715e6a) Patch
> Run-1 156748 174640
> Run-2 151352 150115
> Run-3 177940 165269
>
>
> Test-2
> --
> client count - 128 (basically -c 128 -j 128)
>
> first tests ran with HEAD and then with patch
>
> Patch_ver/Runs HEAD (commit -70715e6a) Patch
> Run-1 173063 151282
> Run-2 173187 140676
> Run-3 177046 166726
>
> I think this patch (padding pgxact) certainly is beneficial as reported
> above thread. At very high client count some variation in performance is
> observed with and without patch, but I feel in general it is a win.
>

So, what hardware did you use for these tests: power-8 or x86? How long was
single run?
Per-run variation seems quite high.  It also seems that it depends on which
version runs first.  But that could be a coincidence.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Ordering in guc.c vs. config.sgml vs. postgresql.sample.conf

2016-04-25 Thread Alexander Korotkov

On Sun, Apr 24, 2016 at 9:58 PM, Andres Freund <and...@anarazel.de> wrote:

> While working on
>
> http://www.postgresql.org/message-id/cabuevezwma9y+bp4fi4fe4hmpfzmjozomulvtbhhpwtcujr...@mail.gmail.com
> I once more taken aback by the total lack of consistency between the
> three files in $subject. Some of the inconsistency of guc.c vs. the rest
> comes from having separate lists for different datatypes - hard to avoid
> - but even disregarding that, there seems to be little to no
> consistency.
>
> How about we try to order them the same? That's obviously not a 9.6
> topic at this point, but it'd probably be good to that early in 9.7.
>

+1

Also, what do you think about validation script which checks consistency
between them?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Declarative partitioning

2016-04-19 Thread Alexander Korotkov

On Tue, Apr 19, 2016 at 4:57 PM, Ildar Musin <i.mu...@postgrespro.ru> wrote:

> On 15.04.2016 07:35, Amit Langote wrote:
>
>> Thanks a lot for the comments. The patch set changed quite a bit since
>> the last version. Once the CF entry was marked returned with feedback on
>> March 22, I held off sending the new version at all. Perhaps, it would have
>> been OK. Anyway here it is, if you are interested. I will create an entry
>> in CF 2016-09 for the same. Also, see below replies to you individual
>> comments.
>>
>
> Thanks for your new patch! I've tried it and discovered some strange
> behavior for partitioning by composite key. Here is an example of my setup:
>
> create table test(a int, b int) partition by range (a, b);
> create table test_1 partition of test for values start (0, 0) end (100,
> 100);
> create table test_2 partition of test for values start (100, 100) end
> (200, 200);
> create table test_3 partition of test for values start (200, 200) end
> (300, 300);
>
> It's alright so far. But if we try to insert record in which attribute 'a'
> belongs to one partition and attribute 'b' belongs to another then record
> will be inserted in the first one:
>
> insert into test(a, b) values (150, 50);
>
> select tableoid::regclass, * from test;
>  tableoid |  a  | b
> --+-+
>  test_2   | 150 | 50
> (1 row)


That's how composite keys work. First subkey is checked. If it's equal then
second subkey is checked and so on.

# SELECT (100, 100) < (150, 50), (150, 50) < (200, 200);
 ?column? | ?column?
--+--
 t| t
(1 row)

Another question is that it might be NOT what users expect from that.  From
the syntax side it very looks like defining something boxes regions for two
keys which could be replacement for subpartitioning.  But it isn't so.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-14 Thread Alexander Korotkov

On Thu, Apr 14, 2016 at 5:35 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > What you want to see by prewarming?
>
> Prewarming appears to greatly reduce the per-run variance on that
> machine, making it a lot easier to get meaningful results.  Thus it'd
> make it easier to compare pre/post padding numbers.
>
> > Will it have safe effect, if the tests are run for 10 or 15 mins
> > rather than 5 mins?
>
> s/safe/same/? If so, no, I doesn't look that way. The order of buffers
> appears to play a large role; and it won't change in a memory resident
> workload over one run.
>

I've tried to run read-only benchmark of pad_pgxact_v1.patch on 4x18 Intel
machine.  The results are following.

clients no-pad  pad
1   12997   13381
2   26728   25645
4   52539   51738
8   103785  102337
10  132606  126174
20  255844  252143
30  371359  357629
40  450429  443053
50  497705  488250
60  564385  564877
70  718860  725149
80  934170  931218
90  1152961 1146498
100 1240055 1300369
110 1207727 1375706
120 1167681 1417032
130 1120891 1448408
140 1085904 1449027
150 1022160 1437545
160 976487  1441720
170 978120  1435848
180 953843  1414925

snapshot_too_old patch was reverted in the both cases.
On high number of clients padding gives very significant benefit.  However,
on low number of clients there is small regression.  I think this
regression could be caused by alignment of other data structures.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-14 Thread Alexander Korotkov

On Thu, Apr 14, 2016 at 12:23 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-13 16:05:25 -0500, Kevin Grittner wrote:
> > OK, thanks.  I can't think of anything else to ask for at this
> > point.  If you feel that you have enough to press for some
> > particular course of action, go for it.
>
> I think we, at the very least, need a clear proposal how to resolve the
> scalability issue around OldSnapshotTimeMapLock in 9.6.  Personally I
> think we shouldn't release with such a large regression due to a
> performance oriented feature; but if we do, we need to be confident that
> we can easily resolve it for 9.7. In contrast to the spinlock issue I
> don't see an easy way unfortunately. Without such a plan it seems too
> likely to go unfixed for a long time otherwise.
>
>
> > Personally, I want to do some more investigation on those big
> > machines.
>
> Sounds good, especially around the regression with the feature disabled.
>

I've also run read-only test on 4x18 Intel machine between master and
snapshot_too_old reverted. In particular, I've reverted following commits:
8b65cf4c5edabdcae45ceaef7b9ac236879aae50
848ef42bb8c7909c9d7baa38178d4a209906e7c1
80647bf65a03e232c995c0826ef394dad8d685fe
a6f6b78196a701702ec4ff6df56c346bdcf9abd2
2201d801b03c2d1b0bce4d6580b718dc34d38b3e

I've obtained following results.

clients master  sto-reverted
1   13918   12997
2   26143   26728
4   50521   52539
8   104330  103785
10  129067  132606
20  255561  255844
30  368472  371359
40  86  450429
50  489950  497705
60  563606  564385
70  710579  718860
80  916480  934170
90  1089917 1152961
100 1201337 1240055
110 1147208 1207727
120 1116256 1167681
130 1066475 1120891
140 1040379 1085904
150 974064  1022160
160 938396  976487
170 953636  978120
180 920772  953843

We can see small but certain regression after snapshot too old feature.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-13 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 5:12 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund <and...@anarazel.de>
>> wrote:
>>
>>> I did get access to the machine (thanks!). My testing shows that
>>> performance is sensitive to various parameters influencing memory
>>> allocation. E.g. twiddling with max_connections changes
>>> performance. With max_connections=400 and the previous patches applied I
>>> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
>>> dealing with an alignment/sharing related issue.
>>>
>>> Padding PGXACT to a full cache-line seems to take care of the largest
>>> part of the performance irregularity. I looked at perf profiles and saw
>>> that most cache misses stem from there, and that the percentage (not
>>> absolute amount!) changes between fast/slow settings.
>>>
>>> To me it makes intuitive sense why you'd want PGXACTs to be on separate
>>> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
>>> making it immediately return propels performance up to 172, without
>>> other changes. Additionally cacheline-padding PGXACT speeds things up to
>>> 175 tps.
>>>
>>
>> It seems like padding PGXACT to a full cache-line is a great
>> improvement.  We have not so many PGXACTs to care about bytes wasted to
>> padding.
>>
>
> Yes, it seems generally it is a good idea, but not sure if it is a
> complete fix for variation in performance we are seeing when we change
> shared memory structures.  Andres suggested me on IM to take performance
> data on x86 m/c by padding PGXACT and the data for the same is as below:
>
> median of 3, 5-min runs
>
> Client_Count/Patch_ver 8 64 128
> HEAD 59708 329560 173655
> PATCH 61480 379798 157580
>
> Here, at 128 client-count the performance with patch still seems to have
> variation.  The highest tps with patch (170363) is close to HEAD (175718).
> This could be run-to-run variation, but I think it indicates that there are
> more places where we might need such padding or may be optimize them, so
> that they are aligned.
>
> I can do some more experiments on similar lines, but I am out on vacation
> and might not be able to access the m/c for 3-4 days.
>

Could share details of hardware you used?  I could try to find something
similar to reproduce this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 6:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> > I agree with both of these ideas. Patch is attached.
>
> Pushed with minor cleanup.
>

Great, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 6:34 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:

> I agree with both of these ideas. Patch is attached.
>>
>
> Hmm, you accidentally forget to change flag type of GenericXLogRegister in
> contrib/bloom signature.


Fixed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-register-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 3:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> ... BTW, with respect to the documentation angle, it seems to me
> that it'd be better if GenericXLogRegister were renamed to
> GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
> I think this would make the documentation clearer, and it would
> also make it easier to add other sorts of Register actions later,
> if we ever think of some (which seems not unlikely, really).
>
> Another thing to think about is whether we're going to regret
> hard-wiring the third argument as a boolean.  Should we consider
> making it a bitmask of flags, instead?  It's not terribly hard
> to think of other flags we might want there in future; for example
> maybe something to tell GenericXLogFinish whether it's worth trying
> to identify data movement on the page rather than just doing the
> byte-by-byte delta calculation.


I agree with both of these ideas. Patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-register.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper.  The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole.  But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update.  The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery.  That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming.  We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?
>

Attached patch is intended to fix this.  It zeroes "hole" in both
GenericXLogFinish() and generic_redo().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


generic-xlog-zero-gap.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 3:08 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> What I thought we should be able to do with that should not be only
> limited to indexes, aka to:
> - Be able to register and log full pages
> - Be able to log data associated to a block
> - Be able to have record data
> - Use at recovery custom routines to apply those WAL records
> The first 3 ones are done via the set of routines generating WAL
> records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
> generic_redo. Looking at the thread where this patch has been debated
> I am not seeing a discussion regarding that.
>

I've designed generic xlog under following restrictions:
 - We don't want users to register their own redo functions since it could
affect reliability. Unfortunately, I can't find original discussion now.
 - API should be as simple as possible for extension author.

However, I think we should add some way for custom redo functions one day.
If we will add pluggable storage engines (I hope one day we will), then we
would face some engines which don't use our buffer manager.  For such
pluggable storage engines, current generic WAL wouldn't work, but
user-defined redo functions would.

Now, my view is that we will need kind of two-phase recovery in order to
provide user-defined redo functions:
1) In the first phase, all built-in objects are recovered.  After this
phase, we can use system catalog.
2) In the second phase, user-defined redo functions are called on the base
of consistent system catalog.

I think we can implement such two-phase recovery even now on the base of
generic logical messages.  We can create logical slot.  When extension is
loaded it starts second phase of recovery by reading generic logical
messages from this logical slot.  Logical slot position should be also
advanced on checkpoint.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund <and...@anarazel.de> wrote:

> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
>
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
>
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 172, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 175 tps.
>

It seems like padding PGXACT to a full cache-line is a great improvement.
We have not so many PGXACTs to care about bytes wasted to padding.  But
could it have another negative side-effect?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Mon, Apr 11, 2016 at 5:04 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund <and...@anarazel.de> wrote:
>
>> Could you retry after applying the attached series of patches?
>>
>
> Yes, I will try with these patches and snapshot too old reverted.
>

I've run the same benchmark with 279d86af and 848ef42b reverted.  I've
tested of all 3 patches from you applied and, for comparison, 3 patches +
clog buffers reverted back to 32.

clients patches patches + clog_32
1 12594   12556
2 26705   26258
4 50985   53254
8103234  104416
10   135321  130893
20   268675  267648
30   370437  409710
40   486512  482382
50   539910  525667
60   616401  672230
70   667864  660853
80   924606  737768
90  1217435  799581
100 1326054  863066
110 1446380  980206
120 1484920 1000963
130 1512440 1058852
140 1536181 1088958
150 1504750 1134354
160 1461513 1132173
170 1453943 1158656
180 1426288 1120511

I hardly can understand how clog buffers influence read-only benchmark.  It
even harder for me why influence of clog buffers change its direction after
applying your patches.  But the results are following.  And I've rechecked
some values manually to verify that there is no error.  I would be very
thankful for any explanation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-04-11 Thread Alexander Korotkov

On Mon, Apr 11, 2016 at 4:27 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Alexander Korotkov wrote:
> > Kevin,
> >
> > This commit makes me very uneasy.  I didn't much care about this patch
> > mainly because I didn't imagine its consequences. Now, I see following:
> >
> > 1) We uglify buffer manager interface a lot.  This patch adds 3 more
> > arguments to BufferGetPage().  It looks weird.  Should we try to find
> less
> > invasive way for doing this?
>
> Kevin's patch was much less invasive originally.  It became invasive in
> the course of later review -- there was fear that future code would
> "forget" the snapshot check accidentally, which would have disastrous
> effects (data becomes invisible without notice).
>

OK, I will read that thread and try to verify these thoughts.


> > 2) Did you ever try to examine performance regression?  I tried simple
> read
> > only test on 4x18 Intel server.
> > pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> > fits to shared_buffers)
> >
> > master-   193 740.3 TPS
> > snapshot too old reverted - 1 065 732.6 TPS
> >
> > So, for read-only benchmark this patch introduces more than 5 times
> > regression on big machine.
>
> Wow, this is terrible, but the patch is supposed to have no impact when
> the feature is not in use.  Maybe there's some simple oversight that can
> be fixed.


Perf show that 50% of time is spent in s_lock() called from
GetXLogInsertRecPtr() called from GetSnapshotData().  I think at very least
we should at least surround with "if" checking that "snapshot too old" is
enabled.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Some other things about contrib/bloom and generic_xlog.c

2016-04-11 Thread Alexander Korotkov

Hi, Tom!

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper.  The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole.  But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update.  The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery.  That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming.  We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?
>
> 2. Unless I'm missing something, contrib/bloom is making no effort
> to ensure that bloom index pages can be distinguished from other pages
> by pg_filedump.  That's fine if your expectation is that bloom will always
> be a toy with no use in production; but otherwise, not so much.  You need
> to make sure that the last two bytes of the page special space contain a
> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
>

Thank you for spotting these issues.
I'm going to prepare patches for fixing both of them.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-10 09:03:37 +0300, Alexander Korotkov wrote:
> > On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund <and...@anarazel.de>
> wrote:
> > >
> > >>
> > >>
> > >> On April 9, 2016 12:43:03 PM PDT, Andres Freund <and...@anarazel.de>
> > >> wrote:
> > >> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> > >> >> There are results with 5364b357 reverted.
> > >> >
> > >> >Crazy that this has such a negative impact. Amit, can you reproduce
> > >> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on
> that
> > >> >machine as well?
> > >>
> > >> How sure are you about these measurements?
> > >
> > >
> > > I'm pretty sure.  I've retried it multiple times by hand before re-run
> the
> > > script.
> > >
> > >
> > >> Because there really shouldn't be clog lookups one a steady state is
> > >> reached...
> > >>
> > >
> > > Hm... I'm also surprised. There shouldn't be clog lookups once hint
> bits
> > > are set.
> > >
> >
> > I also tried to run perf top during pgbench and get some interesting
> > results.
> >
> > Without 5364b357:
> >5,69%  postgres [.] GetSnapshotData
> >4,47%  postgres [.] LWLockAttemptLock
> >3,81%  postgres [.] _bt_compare
> >3,42%  postgres [.] hash_search_with_hash_value
> >3,08%  postgres [.] LWLockRelease
> >2,49%  postgres [.] PinBuffer.isra.3
> >1,58%  postgres [.] AllocSetAlloc
> >1,17%  [kernel] [k] __schedule
> >1,15%  postgres [.] PostgresMain
> >1,13%  libc-2.17.so [.] vfprintf
> >1,01%  libc-2.17.so [.] __memcpy_ssse3_back
> >
> > With 5364b357:
> >   18,54%  postgres [.] GetSnapshotData
> >3,45%  postgres [.] LWLockRelease
> >3,27%  postgres [.] LWLockAttemptLock
> >3,21%  postgres [.] _bt_compare
> >2,93%  postgres [.] hash_search_with_hash_value
> >    2,00%  postgres [.] PinBuffer.isra.3
> >1,32%  postgres [.] AllocSetAlloc
> >1,10%  libc-2.17.so [.] vfprintf
> >
> > Very surprising.  It appears that after 5364b357, GetSnapshotData
> consumes
> > more time.  But I can't see anything depending on clog buffers
> > in GetSnapshotData code...
>
> Could you retry after applying the attached series of patches?
>

Yes, I will try with these patches and snapshot too old reverted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 2:24 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sun, Apr 10, 2016 at 11:33 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
>> a.korot...@postgrespro.ru> wrote:
>>
>>> On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund <and...@anarazel.de>
>>> wrote:
>>>
>>>>
>>>>
>>>> On April 9, 2016 12:43:03 PM PDT, Andres Freund <and...@anarazel.de>
>>>> wrote:
>>>> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>>>> >> There are results with 5364b357 reverted.
>>>> >
>>>> >Crazy that this has such a negative impact. Amit, can you reproduce
>>>> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
>>>> >machine as well?
>>>>
>>>> How sure are you about these measurements?
>>>
>>>
>>> I'm pretty sure.  I've retried it multiple times by hand before re-run
>>> the script.
>>>
>>>
>>>> Because there really shouldn't be clog lookups one a steady state is
>>>> reached...
>>>>
>>>
>>> Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
>>> are set.
>>>
>>
>> I also tried to run perf top during pgbench and get some interesting
>> results.
>>
>> Without 5364b357:
>>5,69%  postgres [.] GetSnapshotData
>>4,47%  postgres [.] LWLockAttemptLock
>>3,81%  postgres [.] _bt_compare
>>3,42%  postgres [.] hash_search_with_hash_value
>>3,08%  postgres [.] LWLockRelease
>>2,49%  postgres [.] PinBuffer.isra.3
>>1,58%  postgres [.] AllocSetAlloc
>>1,17%  [kernel] [k] __schedule
>>1,15%  postgres [.] PostgresMain
>>1,13%  libc-2.17.so [.] vfprintf
>>1,01%  libc-2.17.so [.] __memcpy_ssse3_back
>>
>> With 5364b357:
>>   18,54%  postgres [.] GetSnapshotData
>>3,45%  postgres [.] LWLockRelease
>>3,27%  postgres [.] LWLockAttemptLock
>>3,21%  postgres [.] _bt_compare
>>2,93%  postgres [.] hash_search_with_hash_value
>>2,00%  postgres [.] PinBuffer.isra.3
>>1,32%  postgres [.] AllocSetAlloc
>>1,10%  libc-2.17.so [.] vfprintf
>>
>> Very surprising.  It appears that after 5364b357, GetSnapshotData
>> consumes more time.  But I can't see anything depending on clog buffers
>> in GetSnapshotData code...
>>
>
> There is a related fact presented by Mithun C Y as well [1] which suggests
> that Andres's idea of reducing the cost of snapshot shows noticeable gain
> after increasing the clog buffers.  If you read that thread you will notice
> that initially we didn't notice much gain by that idea, but with increased
> clog buffers, it started showing noticeable gain.  If by any chance, you
> can apply that patch and see the results (latest patch is at [2]).
>
>
> [1] -
> http://www.postgresql.org/message-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=ogx+3jtfg...@mail.gmail.com
>
> [2] -
> http://www.postgresql.org/message-id/cad__ouiwei5she2wwqck36ac9qmhvjuqg3cfpn+ofcmb7rd...@mail.gmail.com
>

I took a look at this thread but I still didn't get why number of clog
buffers affects read-only benchmark.
Could you please explain it to me in more details?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-04-11 Thread Alexander Korotkov

Kevin,

This commit makes me very uneasy.  I didn't much care about this patch
mainly because I didn't imagine its consequences. Now, I see following:

1) We uglify buffer manager interface a lot.  This patch adds 3 more
arguments to BufferGetPage().  It looks weird.  Should we try to find less
invasive way for doing this?
2) Did you ever try to examine performance regression?  I tried simple read
only test on 4x18 Intel server.
pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
fits to shared_buffers)

master-   193 740.3 TPS
snapshot too old reverted - 1 065 732.6 TPS

So, for read-only benchmark this patch introduces more than 5 times
regression on big machine.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [COMMITTERS] pgsql: Bloom index contrib module

2016-04-10 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 9:01 AM, Noah Misch <n...@leadboat.com> wrote:

> On Sat, Apr 09, 2016 at 10:08:01PM -0400, Tom Lane wrote:
> > I wrote:
> > > I was depressed, though not entirely surprised, to find that you get
> > > exactly that same line-count coverage if the table size is cut back
> > > to ONE row.
> >
> > Oh, I found the flaw in my testing: there are two INSERTs in the test
> > script and I was changing only one of them.  After correcting that,
> > the results behave a little more sanely:
> >
> >Line Coverage   Functions
> > 1 row: 70.4 % 349 / 496   93.1 %  27 / 29
> > 10 row:73.6 % 365 / 496   93.1 %  27 / 29
> > 100 rows:  73.6 % 365 / 496   93.1 %  27 / 29
> > 1000 rows: 75.4 % 374 / 496   93.1 %  27 / 29
> >
> > Still, we've reached the most coverage this test can give us at 1000
> > rows, which still means it's wasting the last 99% of its runtime.
>
> If dropping the row count to 1000 shaves >500ms on your primary machine, +1
> for committing such a row count change.  This is exactly what I meant by
> "someone identifies a way to realize similar coverage with lower duration."
> Thanks for contributing this study.


+1, row count reduction is a good to reduce regression test time.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund <and...@anarazel.de> wrote:
>
>>
>>
>> On April 9, 2016 12:43:03 PM PDT, Andres Freund <and...@anarazel.de>
>> wrote:
>> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>> >> There are results with 5364b357 reverted.
>> >
>> >Crazy that this has such a negative impact. Amit, can you reproduce
>> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
>> >machine as well?
>>
>> How sure are you about these measurements?
>
>
> I'm pretty sure.  I've retried it multiple times by hand before re-run the
> script.
>
>
>> Because there really shouldn't be clog lookups one a steady state is
>> reached...
>>
>
> Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
> are set.
>

I also tried to run perf top during pgbench and get some interesting
results.

Without 5364b357:
   5,69%  postgres [.] GetSnapshotData
   4,47%  postgres [.] LWLockAttemptLock
   3,81%  postgres [.] _bt_compare
   3,42%  postgres [.] hash_search_with_hash_value
   3,08%  postgres [.] LWLockRelease
   2,49%  postgres [.] PinBuffer.isra.3
   1,58%  postgres [.] AllocSetAlloc
   1,17%  [kernel] [k] __schedule
   1,15%  postgres [.] PostgresMain
   1,13%  libc-2.17.so [.] vfprintf
   1,01%  libc-2.17.so [.] __memcpy_ssse3_back

With 5364b357:
  18,54%  postgres [.] GetSnapshotData
   3,45%  postgres [.] LWLockRelease
   3,27%  postgres [.] LWLockAttemptLock
   3,21%  postgres [.] _bt_compare
   2,93%  postgres [.] hash_search_with_hash_value
   2,00%  postgres [.] PinBuffer.isra.3
   1,32%  postgres [.] AllocSetAlloc
   1,10%  libc-2.17.so [.] vfprintf

Very surprising.  It appears that after 5364b357, GetSnapshotData consumes
more time.  But I can't see anything depending on clog buffers
in GetSnapshotData code...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 7:26 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund <and...@anarazel.de> wrote:
>
>> On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>> > There are results with 5364b357 reverted.
>>
>>
> What exactly is this test?
> I think assuming it is a read-only -M prepared pgbench run where data fits
> in shared buffers.  However if you can share exact details, then I can try
> the similar test.
>

Yes, the test is:

pgbench -s 1000 -c $clients -j 100 -M prepared -S -T 300
(shared_buffers=24GB)


>
>> Crazy that this has such a negative impact. Amit, can you reproduce
>> that?
>
>
> I will try it.
>

Good.


>
>
>> Alexander, I guess for r/w workload 5364b357 is a benefit on that
>> machine as well?
>>
>
> I also think so. Alexander, if try read-write workload with unlogged
> tables, then we should see an improvement.
>

I'll try read-write workload.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund <and...@anarazel.de> wrote:

>
>
> On April 9, 2016 12:43:03 PM PDT, Andres Freund <and...@anarazel.de>
> wrote:
> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> >> There are results with 5364b357 reverted.
> >
> >Crazy that this has such a negative impact. Amit, can you reproduce
> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
> >machine as well?
>
> How sure are you about these measurements?


I'm pretty sure.  I've retried it multiple times by hand before re-run the
script.


> Because there really shouldn't be clog lookups one a steady state is
> reached...
>

Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
are set.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sat, Apr 9, 2016 at 11:24 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 8, 2016 at 10:19 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund <and...@anarazel.de> wrote:
>>
>>> As you can see in
>>>
>>
>>> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
>>> I'm planning to apply this sometime this weekend, after running some
>>> tests and going over the patch again.
>>>
>>> Any chance you could have a look over this?
>>>
>>
>> I took a look at this.  Changes you made look good for me.
>> I also run test on 4x18 Intel server.
>>
>
> On the top of current master results are following:
>
> clientsTPS
> 1 12562
> 2 25604
> 4 52661
> 8103209
> 10   128599
> 20   256872
> 30   365718
> 40   432749
> 50   513528
> 60   684943
> 70   696050
> 80   923350
> 90  1119776
> 100 1208027
> 110 1229429
> 120 1163356
> 130 1107924
> 140 1084344
> 150 1014064
> 160  961730
> 170  980743
> 180  968419
>
> The results are quite discouraging because previously we had about 1.5M
> TPS in the peak while we have only about 1.2M now.  I found that it's not
> related to the changes you made in the patch, but it's related to 5364b357
> "Increase maximum number of clog buffers".  I'm making same benchmark with
> 5364b357 reverted.
>

There are results with 5364b357 reverted.

clientsTPS
1  12980
2  27105
4  51969
8 105507
   10 132811
   20 256888
   30 368573
   40 467605
   50 544231
   60 590898
   70 799094
   80 967569
   901211662
  1001352427
  1101432561
  1201480324
  1301486624
  1401492092
  1501461681
  1601426733
  1701409081
  1801366199

It's much closer to what we had before.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 10:19 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund <and...@anarazel.de> wrote:
>
>> As you can see in
>>
>
>> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
>> I'm planning to apply this sometime this weekend, after running some
>> tests and going over the patch again.
>>
>> Any chance you could have a look over this?
>>
>
> I took a look at this.  Changes you made look good for me.
> I also run test on 4x18 Intel server.
>

On the top of current master results are following:

clientsTPS
1 12562
2 25604
4 52661
8103209
10   128599
20   256872
30   365718
40   432749
50   513528
60   684943
70   696050
80   923350
90  1119776
100 1208027
110 1229429
120 1163356
130 1107924
140 1084344
150 1014064
160  961730
170  980743
180  968419

The results are quite discouraging because previously we had about 1.5M TPS
in the peak while we have only about 1.2M now.  I found that it's not
related to the changes you made in the patch, but it's related to 5364b357
"Increase maximum number of clog buffers".  I'm making same benchmark with
5364b357 reverted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] 2016-03 Commitfest

2016-04-08 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 6:52 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Apr 8, 2016 at 11:00 AM, Andres Freund <and...@anarazel.de> wrote:
> > On 2016-04-08 10:56:28 -0400, Robert Haas wrote:
> >> On Fri, Apr 8, 2016 at 10:06 AM, David Steele <da...@pgmasters.net>
> wrote:
> >> > On 4/8/16 10:00 AM, Tom Lane wrote:
> >> >> David Steele <da...@pgmasters.net> writes:
> >> >>> So the commitfest is 84% complete with less than twelve hours to go.
> >> >>
> >> >> Have we set a particular time-of-day for closing the CF, and if so
> >> >> what is it exactly?
> >> >
> >> > From the referenced email:
> >> >
> >> > "Accordingly, the release management has decided that all
> >> > feature patches destined for PostgreSQL 9.6 must be committed no later
> >> > than April 8, 2016.  Any patch not committed prior to 2016-04-09
> >> > 00:00:00 GMT may not be committed to PostgreSQL 9.6 unless (a) it is a
> >> > bug fix, (b) it represents essential cleanup of a previously-committed
> >> > patch, or (c) the release management team has approved an extension to
> >> > the deadline for that particular patch."
> >>
> >> IOW, the deadline is 8pm US/Eastern time, or about 9 hours from now.
> >>
> >> Let's try not to introduce more bugs in the next 9 hours than we have
> >> in the preceding 9 months.
> >
> > I've finished polishing the Pin/Unpin patch. But the final polishing
> > happened on an intercontential flight, after days spent preparing my
> > move to SF. I'd be glad if you would allow me to look over the patch
> > again, before pushing it sometime this weekend; this stuff is subtle,
> > and I'm not exactly my best right now.
>
> In view of these circumstances, the RMT has voted to extend the
> deadline for this particular patch by 2.5 days; that is, this patch
> may be committed with RMT approval no later than 2016-04-11 12:00:00
> GMT, which I believe is approximately 4am Monday morning where you
> are.
>

Good to hear.  Wise decision, because we really need this patch in 9.6.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-08 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-07 16:50:44 +0300, Alexander Korotkov wrote:
> > On Thu, Apr 7, 2016 at 4:41 PM, Andres Freund <and...@anarazel.de>
> wrote:
> >
> > > On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> > > > ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > > >
> > > > ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > > > ! wasDirty = (state & BM_DIRTY) ? true : false;
> > > > ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> > > > ! if (state == oldstate)
> > > > ! break;
> > >
> > > I'm doubtful that this early exit is entirely safe. None of the
> > > preceding operations imply a memory barrier. The buffer could
> previously
> > > have been marked dirty, but cleaned since. It's pretty critical that we
> > > re-set the dirty bit (there's no danger of loosing it with a barrier,
> > > because we hold an exclusive content lock).
> > >
> >
> > Oh, I get it.
> >
> >
> > > Practically the risk seems fairly low, because acquiring the exclusive
> > > content lock will have implied a barrier. But it seems unlikely to have
> > > a measurable performance effect to me, so I'd rather not add the early
> > > exit.
> > >
> >
> > Ok, let's just remove it.
>
> Here's my updated version of the patch. I've updated this on an
> intercontinental flight, after a otherwise hectic week (moving from SF
> to Berlin); so I'm planning to look over this once more before pushing (.
>

Ok.

I've decided that the cas-loop macros are too obfuscating for my
> taste. To avoid duplicating the wait part I've introduced
> WaitBufHdrUnlocked().
>

That's OK for me.  Cas-loop macros looks cute, but too magic.


> As you can see in
>
> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
> I'm planning to apply this sometime this weekend, after running some
> tests and going over the patch again.
>
> Any chance you could have a look over this?
>

I took a look at this.  Changes you made look good for me.
I also run test on 4x18 Intel server.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-07 Thread Alexander Korotkov

On Thu, Apr 7, 2016 at 4:41 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> > ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> >
> > ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > ! wasDirty = (state & BM_DIRTY) ? true : false;
> > ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> > ! if (state == oldstate)
> > ! break;
>
> I'm doubtful that this early exit is entirely safe. None of the
> preceding operations imply a memory barrier. The buffer could previously
> have been marked dirty, but cleaned since. It's pretty critical that we
> re-set the dirty bit (there's no danger of loosing it with a barrier,
> because we hold an exclusive content lock).
>

Oh, I get it.


> Practically the risk seems fairly low, because acquiring the exclusive
> content lock will have implied a barrier. But it seems unlikely to have
> a measurable performance effect to me, so I'd rather not add the early
> exit.
>

Ok, let's just remove it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Alexander Korotkov

On Tue, Apr 5, 2016 at 5:45 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> > Could the reason be that we're increasing concurrency for LWLock state
> > atomic variable by placing queue spinlock there?
>
> Don't think so, it's the same cache-line either way.
>

Yes, it's very unlikely.

> But I wonder why this could happen during "pgbench -S", because it doesn't
> > seem to have high traffic of exclusive LWLocks.
>
> Yea, that confuses me too. I suspect there's some mis-aligned
> datastructures somewhere. It's hard to investigate such things without
> access to hardware.
>

But it's quite easy to check if it is alignment issue.  We can try your
patch but without removing mutex from LWLock struct.  If it's alignment
issue, then TPS should become stable again.

(FWIW, I'm working on getting pinunpin committed)
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Alexander Korotkov

On Tue, Apr 5, 2016 at 10:26 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:

>
> On Mon, Apr 4, 2016 at 2:28 PM, Andres Freund <and...@anarazel.de> wrote:
>
>> Hm, interesting. I suspect that's because of the missing backoff in my
>> experimental patch. If you apply the attached patch ontop of that
>> (requires infrastructure from pinunpin), how does performance develop?
>>
>
> I have applied this patch also, but still results are same, I mean around
> 550,000 with 64 threads and 650,000 with 128 client with lot of
> fluctuations..
>
> *128 client 
> **(head+**0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
> +pinunpin-cas-9+backoff)*
>
> run1 645769
> run2 643161
> run3 *285546*
> run4 *289421*
> run5 630772
> run6 *284363*
>

Could the reason be that we're increasing concurrency for LWLock state
atomic variable by placing queue spinlock there?
But I wonder why this could happen during "pgbench -S", because it doesn't
seem to have high traffic of exclusive LWLocks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-04-01 Thread Alexander Korotkov

Hi!

On Fri, Apr 1, 2016 at 12:45 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Code looks much better now, thanks. Still I believe it could be improved.
>
> I don't think that using srand() / rand() in signValue procedure the
> way you did is such a good idea. You create a side affect (changing
> current randseed) which could cause problems in some cases. And there
> is no real need for that. For instance you could use following formula
> instead:
>
> hash(attno || hashVal || j)
>

I've discussed this with Teodor privately.  Extra hash calculation could
cause performance regression.  He proposed to use own random generator
instead.  Implemented in attached version of patch.


> And a few more things.
>
> > + memset(opaque, 0, sizeof(BloomPageOpaqueData));
> > + opaque->maxoff = 0;
>


> This looks a bit redundant.
>

Fixed.


> > + for (my $i = 1; $i <= 10; $i++)
>
> More idiomatic Perl would be `for my $i (1..10)`.
>

Fixed.


> > + UnlockReleaseBuffer(buffer);
> > + ReleaseBuffer(metaBuffer);
> > + goto away;
>
> In general I don't have anything against goto. But are you sure that
> using it here is really justified?


Fixed with small code duplication which seems to be better than goto.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0003-bloom-contrib.16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

On Thu, Mar 31, 2016 at 8:21 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

>
> I think these changes worth running benchmark again.  I'm going to run it
> on 4x18 Intel.
>

The results are following.

clients master  v3  v5  v9
1   11671   12507   12679   12408
2   24650   26005   25010   25495
4   49631   48863   49811   50150
8   96790   96441   99946   98383
10  121275  119928  124100  124180
20  243066  243365  246432  248723
30  359616  342241  357310  378881
40  431375  415310  441619  425653
50  489991  489896  500590  502549
60  538057  636473  554069  685010
70  588659  714426  738535  719383
80  405008  923039  902632  909126
90  295443  1181247 1155918 1179163
100 258695  1323125 1325019 1351578
110 238842  1393767 1410274 1421531
120 226018  1432504 1474982 1497122
130 215102  1465459 1503241 1521619
140 206415  1470454 1505380 1541816
150 197850  1475479 1519908 1515017
160 190935  1420915 1484868 1523150
170 185835  1438965 1453128 1499913
180 182519  1416252 1453098 1472945

It appears that atomic OR for LockBufHdr() gives small but measurable
effect.  Great idea, Andres!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

On Thu, Mar 31, 2016 at 7:14 PM, Andres Freund <and...@anarazel.de> wrote:

> > +/*
> > + * The following two macros are aimed to simplify buffer state
> modification
> > + * in CAS loop.  It's assumed that variable "uint32 state" is defined
> outside
> > + * of this loop.  It should be used as following:
> > + *
> > + * BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > + * modifications of state variable;
> > + * END_BUFSTATE_CAS_LOOP(bufHdr);
> > + *
> > + * For local buffers, these macros shouldn't be used..  Since there is
> > + * no cuncurrency, local buffer state could be chaged directly by atomic
> > + * read/write operations.
> > + */
> > +#define BEGIN_BUFSTATE_CAS_LOOP(bufHdr) \
> > + do { \
> > + SpinDelayStatus delayStatus =
> init_spin_delay((Pointer)(bufHdr)); \
>
> > + uint32  oldstate; \
> > + oldstate = pg_atomic_read_u32(&(bufHdr)->state); \
> > + for (;;) { \
> > + while (oldstate & BM_LOCKED) \
> > + { \
> > + make_spin_delay(); \
> > + oldstate =
> pg_atomic_read_u32(&(bufHdr)->state); \
> > + } \
> > + state = oldstate
> > +
> > +#define END_BUFSTATE_CAS_LOOP(bufHdr) \
> > + if (pg_atomic_compare_exchange_u32(>state,
> , state)) \
> > + break; \
> > + } \
> > + finish_spin_delay(); \
> > + } while (0)
>
> Hm. Not sure if that's not too much magic. Will think about it.
>

I'm not sure too...


> >   /*
> > +  * Lock buffer header - set BM_LOCKED in buffer state.
> > +  */
> > + uint32
> > + LockBufHdr(BufferDesc *desc)
> > + {
> > + uint32  state;
> > +
> > + BEGIN_BUFSTATE_CAS_LOOP(desc);
> > + state |= BM_LOCKED;
> > + END_BUFSTATE_CAS_LOOP(desc);
> > +
> > + return state;
> > + }
> > +
>
> Hm. It seems a bit over the top to do the full round here. How about
>
> uint32 oldstate;
> while (true)
> {
> oldstate = pg_atomic_fetch_or(..., BM_LOCKED);
> if (!(oldstate & BM_LOCKED)
> break;
> perform_spin_delay();
> }
> return oldstate | BM_LOCKED;
>
> Especially if we implement atomic xor on x86, that'd likely be a good
> bit more efficient.
>

Nice idea.  Done.


> >  typedef struct BufferDesc
> >  {
> >   BufferTag   tag;/* ID of page contained in
> buffer */
> > - BufFlagsflags;  /* see bit definitions
> above */
> > - uint8   usage_count;/* usage counter for clock sweep
> code */
> > - slock_t buf_hdr_lock;   /* protects a subset of fields,
> see above */
> > - unsignedrefcount;   /* # of backends holding
> pins on buffer */
> > - int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >
> > + /* state of the tag, containing flags, refcount and usagecount */
> > + pg_atomic_uint32 state;
> > +
> > + int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >   int buf_id; /* buffer's index
> number (from 0) */
> >   int freeNext;   /* link in
> freelist chain */
>
> Hm. Won't matter on most platforms, but it seems like a good idea to
> move to an 8 byte aligned boundary. Move buf_id up?
>

I think so.  Done.


> > +/*
> > + * Support for spin delay which could be useful in other places where
> > + * spinlock-like procedures take place.
> > + */
> > +typedef struct
> > +{
> > + int spins;
> > + int delays;
> > + int     cur_delay;
> > + Pointer ptr;
> > + const char *file;
> > + int line;
> > +} SpinDelayStatus;
> > +
> > +#define init_spin_delay(ptr) {0, 0, 0, (ptr), __FILE__, __LINE__}
> > +
> > +void make_spin_delay(SpinDelayStatus *status);
> > +void finish_spin_delay(SpinDelayStatus *status);
> > +
> >  #endif/* S_LOCK_H */
>
> s/make_spin_delay/perform_spin_delay/? The former sounds like it's
> allocating something or such.
>

Done.

I think these changes worth running benchmark again.  I'm going to run it
on 4x18 Intel.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

Hi!

On Thu, Mar 31, 2016 at 4:59 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Tue, Mar 29, 2016 at 10:52 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Hi, Andres!
>>
>> Please, find next revision of patch in attachment.
>>
>>
> Couple of minor comments:
>
> +  * The following two macroses
>
> is macroses right word to be used here?
>
> +  * of this loop.  It should be used as fullowing:
>
> /fullowing/following
>
> +  * For local buffers usage of these macros shouldn't be used.
>
> isn't it better to write it as
>
> For local buffers, these macros shouldn't be used.
>
>
>   static int ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);
>
> -
>
> Spurious line deletion.
>

All of above is fixed.

+  * Since buffers are pinned/unpinned very frequently, this functions tries
> +  * to pin buffer as cheap as possible.
>
> /this functions tries
>
> which functions are you referring here? Comment seems to be slightly
> unclear.
>

I meant just PinBuffer() there.  UnpinBuffer() has another comment in the
body.  Fixed.


> ! if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(>state) &
> BM_PERMANENT))
>
> Is there a reason that you have kept macro's to read refcount and
> usagecount, but not for flags?
>

We could change dealing with flags to GET/SET macros.  But I guess such
change would make review more complicated, because it would touch some
places which are unchanged for now.  I think it could be done in a separate
refactoring patch.

Apart from this, I have verified that patch compiles on Windows and passed
> regressions (make check)!
>

Thank you!  I didn't manage to try this on Windows.


> Nice work!
>

Thank you!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-31 Thread Alexander Korotkov

Hi!

New revision of patches is attached.

Changes are following:
1) API of generic xlog was changed: now there is no static variables,
GenericXLogStart()
returns palloc'd struct.
2) Generic xlog use elog instead ereport since it reports internal errors
which shouldn't happen normally.
3) Error messages doesn't contains name of the function.
4) Bloom contrib was pgindented.
5) More comments for bloomb.
6) One more assert was added to bloom.
7) makeDefaultBloomOptions was renamed to adjustBloomOptions.  Now it only
modifies its parameter. palloc is done outside of this function.
8) BloomBuildState is explicitly zeroed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-generic-xlog.15.patch
Description: Binary data


0003-bloom-contrib.15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-03-30 Thread Alexander Korotkov

function.


> >> Optimizer
> >> -
> >>
>
> >> * cost_sort() needs way way more comments.  Doesn't even mention
> >> indexes. Not worth commenting further on until I know what it's
> >> *supposed* to do, though.
> >
> >
> > I've added some comments.
>
> Looking at cost_sort() now, it's a bit clearer. I think that you
> should make sure that everything is costed as a quicksort, though, if
> you accept that we should try and make every small sort done by the
> partial sort a quicksort. Which I think is a good idea. The common
> case is that groups are small, but the qsort() insertion sort will be
> very very fast for that case.
>

I'm not sure that in partial sort we should estimate sorting of one group
in other way than simple sort does.  I see following reasons:
1) I'd like partial sort to be able to use other sorting methods to provide
graceful degradation in the case of planner mistakes as I pointed above.
2) Planner should don't choose partial sort plans in some cases.  That
should happen because higher cost of these plans including high cost of
partial sort.  Estimation of other sort methods looks like good way for
reporting such high costs.


> This looks like an old change you missed:
>
> - * compare_path_fractional_costs
> + * compare_fractional_path_costs
>

I think this is rather a typo fix.  Because now comment doesn't meet
function name.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-30 Thread Alexander Korotkov

On Wed, Mar 30, 2016 at 10:16 AM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
> > On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund <and...@anarazel.de>
> wrote:
> >
> > > My gut feeling is that we should do both 1) and 2).
> > >
> > > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > > #if defined(__ppc__) typedef from an int to a char.
> > >
> >
> > I set that, but after that it hangs, even Initdb hangs..
>
> Yea, as Tom pointed out that's not going to work.  I'll try to write a
> patch for approach 1).
>

Great!  Do you need any improvements for pinunpin-cas-7.patch from me?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Alexander Korotkov

On Tue, Mar 29, 2016 at 8:34 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:
>
>> I incorporated your changes and did some additional refinements on top of
>>> them
>>> still.
>>>
>>> Attached is delta against v12, that should cause less issues when
>>> merging for
>>> Teodor.
>>>
>>
>> But last version is 13th?
>>
>> BTW, it would be cool to add odcs in VII. Internals chapter, description
>> should be similar to index's interface description, i.e. how to use generic
>> WAL interface.
>
>
> We already have generic WAL interface description in comments to
> generic_xlog.c.  I'm not fan of duplicating things.  What about moving
> interface description from comments to docs completely?
>

I heard no objections.  There is revision of patch where generic WAL
interface description was moved to documentation.  This description
contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier
(who sent it to me privately).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-generic-xlog.14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL access to access method details

2016-03-30 Thread Alexander Korotkov

On Mon, Mar 28, 2016 at 11:02 PM, Jim Nasby <jim.na...@bluetreble.com>
wrote:

> While working on a tool to capture catalog/stats info and looking at cross
> version compatibility, I noticed that the pg_am changes removed SQL access
> to a bunch of AM info. [1] indicates that's part of the purpose of the
> patch; are we sure no tools are using this info?

Idea is that this kind of information is internal API and shouldn't be
exposed at SQL level.  I'm not quite sure there is no tool using this
info.  Do you know such tools?

> Unlike previous catalog compatibility breaks, this one completely removes
> that information, so if someone was using it they're now completely hosed.
>
> [1] http://www.postgresql.org/message-id/55fec1ab.8010...@2ndquadrant.com

Do you have any example of this information usage?  If so, we can think
about providing some way of exposing it.  If not, default assumption is
that this information shouldn't be exposed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Alexander Korotkov

On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:

> I incorporated your changes and did some additional refinements on top of
>> them
>> still.
>>
>> Attached is delta against v12, that should cause less issues when merging
>> for
>> Teodor.
>>
>
> But last version is 13th?
>
> BTW, it would be cool to add odcs in VII. Internals chapter, description
> should be similar to index's interface description, i.e. how to use generic
> WAL interface.

We already have generic WAL interface description in comments to
generic_xlog.c.  I'm not fan of duplicating things.  What about moving
interface description from comments to docs completely?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] PoC: Partial sort

2016-03-29 Thread Alexander Korotkov

On Tue, Mar 29, 2016 at 4:56 PM, David Steele <da...@pgmasters.net> wrote:

> On 3/23/16 8:39 PM, Peter Geoghegan wrote:
>
> This looks like an old change you missed:
>>
>> - * compare_path_fractional_costs
>> + * compare_fractional_path_costs
>>
>> All in all, this looks significantly better. Thanks for your work on
>> this. Sorry for the delay in my response, and that my review was
>> relatively rushed, but I'm rather busy at the moment with fighting
>> fires.
>>
>
> It looks like a new patch is required before this can be marked "ready for
> committer".  Will you have that ready soon?
>

Yes, that's it.  I'm working on it now.  I'm going to post it until
tomorrow.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Alexander Korotkov

On Tue, Mar 29, 2016 at 6:20 PM, David Steele <da...@pgmasters.net> wrote:

> On 3/28/16 1:26 PM, Alvaro Herrera wrote:
>
>> Alexander Korotkov wrote:
>>
>>> On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <
>>> alvhe...@2ndquadrant.com>
>>> wrote:
>>>
>>> .. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
>>>> getAccessMethods.  And we're missing psql tab-complete support for the
>>>> new commands.
>>>>
>>>
>>> Attached patches fix both these issues.
>>>
>>
>> I see Teodor just pushed these two fixes.  Thanks!
>>
>
> Does that mean this patch should be closed or is there more remaining to
> commit?
>

There are still two pending patches:
 * Generic WAL records
 * Bloom filter contrib

Teodor is going to commit them.  But he is waiting Petr Jelinek to review
the English of the first one.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alexander Korotkov

On Sun, Mar 27, 2016 at 4:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund <and...@anarazel.de> wrote:
>
>> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
>> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
>> > a.korot...@postgrespro.ru> wrote:
>> >
>> > > Thank you very much for testing!
>> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going
>> to
>> > > post results of tests on this server in next Monday.
>> > >
>> >
>> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
>> 100
>> > -M prepared -T 300.
>> > See results in the table and chart.
>> >
>> > clients master  v3  v5
>> > 1   11671   12507   12679
>> > 2   24650   26005   25010
>> > 4   49631   48863   49811
>> > 8   96790   96441   99946
>> > 10  121275  119928  124100
>> > 20  243066  243365  246432
>> > 30  359616  342241  357310
>> > 40  431375  415310  441619
>> > 50  489991  489896  500590
>> > 60  538057  636473  554069
>> > 70  588659  714426  738535
>> > 80  405008  923039  902632
>> > 90  295443  1181247 1155918
>> > 100 258695  1323125 1325019
>> > 110 238842  1393767 1410274
>> > 120 226018  1432504 1474982
>> > 130 215102  1465459 1503241
>> > 140 206415  1470454 1505380
>> > 150 197850  1475479 1519908
>> > 160 190935  1420915 1484868
>> > 170 185835  1438965 1453128
>> > 180 182519  1416252 1453098
>> >
>> > My conclusions are following:
>> > 1) We don't observe any regression in v5 in comparison to master.
>> > 2) v5 in most of cases slightly outperforms v3.
>>
>> What commit did you base these tests on? I guess something recent, after
>> 98a64d0bd?
>>
>
> Yes, more recent than 98a64d0bd. It was based on 676265eb7b.
>
>
>> > I'm going to do some code cleanup of v5 in Monday
>>
>> Ok, I'll try to do a review and possibly commit after that.
>>
>
> Sounds good.
>

There is next revision of patch.  It contains mostly cosmetic changes.
Comment are adjusted to reflect changes of behaviour.
I also changed atomic AND/OR for local buffers to read/write pair which
would be cheaper I suppose.  However, I don't insist on it, and it could be
reverted.
The patch is ready for your review.  It's especially interesting what do
you think about the way I abstracted exponential back off of spinlock.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-28 Thread Alexander Korotkov

Hi, Petr!

On Thu, Mar 17, 2016 at 10:56 AM, Petr Jelinek <p...@2ndquadrant.com> wrote:

> On 16/03/16 15:31, Teodor Sigaev wrote:
>
>> Good catch, thanks! Tests were added.
>>>
>>
>> I don't see any objection, is consensus reached? I'm close to comiit
>> that...
>>
>>
> I did only cursory review on the bloom contrib module and don't really
> have complaints there, but I know you can review that one. I'd like the
> English of the generic_xlog.c description improved but I won't get to it
> before weekend.


What is your plans about English of the generic_xlog.c?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Autovacuum to prevent wraparound tries to consume xid

2016-03-28 Thread Alexander Korotkov

Hackers,

one our customer meet near xid wraparound situation.  xid counter
reached xidStopLimit value.  So, no transactions could be executed in
normal mode.  But what I noticed is strange behaviour of autovacuum to
prevent wraparound.  It vacuums tables, updates pg_class and pg_database,
but then falls with "database is not accepting commands to avoid wraparound
data loss in database" message.  We end up with situation that according to
pg_database  maximum age of database was less than 200 mln., but
transactions couldn't be executed, because ShmemVariableCache wasn't
updated (checked by gdb).

I've reproduced this situation on my laptop as following:

1) Connect gdb, do "set ShmemVariableCache->nextXid =
ShmemVariableCache->xidStopLimit"
2) Stop postgres
3) Make some fake clog: "dd bs=1m if=/dev/zero
of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
4) Start postgres

Then I found the same situation as in customer database.  Autovacuum to
prevent wraparound regularly produced following messages in the log:

ERROR:  database is not accepting commands to avoid wraparound data loss in
database "template1"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

Finally all databases was frozen

# SELECT datname, age(datfrozenxid) FROM pg_database;
  datname  │   age
───┼──
 template1 │0
 template0 │0
 postgres  │ 5000
(3 rows)

but no transactions could be executed (ShmemVariableCache wasn't updated).

After some debugging I found that vac_truncate_clog consumes xid just to
produce warning.  I wrote simple patch which replaces
GetCurrentTransactionId() with ShmemVariableCache->nextXid.  That
completely fixes this situation for me: ShmemVariableCache was successfully
updated.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_vac_truncate_clog_xid_consume.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-28 Thread Alexander Korotkov

Hi!

On Sat, Mar 26, 2016 at 12:02 AM, David Steele <da...@pgmasters.net> wrote:

> On 3/25/16 3:54 PM, Artur Zakirov wrote:
> > On 25.03.2016 21:42, Dmitry Ivanov wrote:
> >> Sorry for the delay, I desperately needed some time to finish a bunch of
> >> dangling tasks.
> >>
> >> I've added some new comments and clarified the ones that were obscure.
> >> Moreover, I felt an urge to recheck most parts of the code since
> >> apparently
> >> nobody (besides myself) has gone so far yet.
> >>
> >> On 25.03.16 18:42 MSK, David Steele wrote:
> >>> Time is short and it's not encouraging that you say there is "still
> much
> >> work to be done".
> >>
> >> Indeed, I was inaccurate. I am more than interested in the positive
> >> outcome.
> >>
> >
> > I tested the patch and take a look on it. All regression tests passed.
> > The code looks good and the patch introduce a great functionality.
> >
> > I think the patch can be marked as "Ready for Commiter". But I do not
> > feel the force to do that myself.
> >
> > Also I agree with you about tsvector_setweight(). There is not a problem
> > with it because this weights are immutable and so there is not benefits
> > from new function.
>
> Ideally Alexander can also look at it.  If not, then you should mark it
> "ready for committer" since I doubt any more reviewers will sign on this
> late in the CF.
>

I've checked the last version of the patch. Patch applies cleanly on
master, builds without warnings, regression tests pass.
The issue I reported about tsquery textual representation is fixed.
I'm going to mark this "Ready for Committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov

On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > Thank you very much for testing!
> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > > post results of tests on this server in next Monday.
> > >
> >
> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
> 100
> > -M prepared -T 300.
> > See results in the table and chart.
> >
> > clients master  v3  v5
> > 1   11671   12507   12679
> > 2   24650   26005   25010
> > 4   49631   48863   49811
> > 8   96790   96441   99946
> > 10  121275  119928  124100
> > 20  243066  243365  246432
> > 30  359616  342241  357310
> > 40  431375  415310  441619
> > 50  489991  489896  500590
> > 60  538057  636473  554069
> > 70  588659  714426  738535
> > 80  405008  923039  902632
> > 90  295443  1181247 1155918
> > 100 258695  1323125 1325019
> > 110 238842  1393767 1410274
> > 120 226018  1432504 1474982
> > 130 215102  1465459 1503241
> > 140 206415  1470454 1505380
> > 150 197850  1475479 1519908
> > 160 190935  1420915 1484868
> > 170 185835  1438965 1453128
> > 180 182519  1416252 1453098
> >
> > My conclusions are following:
> > 1) We don't observe any regression in v5 in comparison to master.
> > 2) v5 in most of cases slightly outperforms v3.
>
> What commit did you base these tests on? I guess something recent, after
> 98a64d0bd?
>

Yes, more recent than 98a64d0bd. It was based on 676265eb7b.


> > I'm going to do some code cleanup of v5 in Monday
>
> Ok, I'll try to do a review and possibly commit after that.
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov

On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Thank you very much for testing!
> I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> post results of tests on this server in next Monday.
>

I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
-M prepared -T 300.
See results in the table and chart.

clients master  v3  v5
1   11671   12507   12679
2   24650   26005   25010
4   49631   48863   49811
8   96790   96441   99946
10  121275  119928  124100
20  243066  243365  246432
30  359616  342241  357310
40  431375  415310  441619
50  489991  489896  500590
60  538057  636473  554069
70  588659  714426  738535
80  405008  923039  902632
90  295443  1181247 1155918
100 258695  1323125 1325019
110 238842  1393767 1410274
120 226018  1432504 1474982
130 215102  1465459 1503241
140 206415  1470454 1505380
150 197850  1475479 1519908
160 190935  1420915 1484868
170 185835  1438965 1453128
180 182519  1416252 1453098

My conclusions are following:
1) We don't observe any regression in v5 in comparison to master.
2) v5 in most of cases slightly outperforms v3.

I'm going to do some code cleanup of v5 in Monday

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov

Hi, Dilip!

On Fri, Mar 25, 2016 at 8:32 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:

> On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Could anybody run benchmarks?  Feature freeze is soon, but it would be
>> *very nice* to fit it into 9.6 release cycle, because it greatly improves
>> scalability on large machines.  Without this patch PostgreSQL 9.6 will be
>> significantly behind competitors like MySQL 5.7.
>
>
> I have run the performance and here are the results.. With latest patch I
> did not see any regression at lower client count (median of 3 reading).
>
> scale factor 1000 shared buffer 8GB readonly
> *Client Base patch*
> 1 12957 13068
> 2 24931 25816
> 4 46311 48767
> 32 300921 310062
> 64 387623 493843
> 128 249635 583513
> scale factor 300 shared buffer 8GB readonly
> *Client Base patch*
> 1 14537 14586--> one thread number looks little less, generally I get
> ~18000 (will recheck).
> 2 34703 33929--> may be run to run variance (once I get time, will
> recheck)
> 4 67744 69069
> 32 312575 336012
> 64 213312 539056
> 128 190139 380122
>
> *Summary:*
>
> Actually with 64 client we have seen ~470,000 TPS with head also, by
> revering commit 6150a1b0.
> refer this thread: (
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
> )
>
> I haven't tested this patch by reverting commit 6150a1b0, so not sure can
> this patch give even better performance ?
>
> It also points to the case, what Andres has mentioned in this thread.
>
>
> http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de
>

Thank you very much for testing!
I also got access to 4 x 18 Intel server with 144 threads. I'm going to
post results of tests on this server in next Monday.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-25 Thread Alexander Korotkov

On Fri, Mar 25, 2016 at 6:42 PM, David Steele <da...@pgmasters.net> wrote:

> On 3/16/16 12:38 PM, Dmitry Ivanov wrote:
>
> I've made an attempt to fix some of the issues you've listed, although
>> there's
>> still much work to be done. I'll add some comments later.
>>
>
> Do you know when you'll have a chance to respond to reviews and provide a
> new patch?
>
> Time is short and it's not encouraging that you say there is "still much
> work to be done".  Perhaps it would be best to mark this "returned with
> feedback"?
>

Dmitry will provide a new version of patch today.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 1:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar <dilipbal...@gmail.com>
> wrote:
>
>>
>> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar <dilipbal...@gmail.com>
>> wrote:
>>
>>> ! pg_atomic_write_u32(>state, state);
>>>   } while (!StartBufferIO(bufHdr, true));
>>>
>>> Better Write some comment, about we clearing the BM_LOCKED from stage
>>> directly and need not to call UnlockBufHdr explicitly.
>>> otherwise its confusing.
>>>
>>
>> Few more comments..
>>
>> *** 828,837 
>> */
>>do
>>{
>> !  LockBufHdr(bufHdr);
>> *!  Assert(bufHdr->flags & BM_VALID);*
>> !  bufHdr->flags &= ~BM_VALID;
>> !  UnlockBufHdr(bufHdr);
>>} while (!StartBufferIO(bufHdr, true));
>>}
>>}
>> --- 826,834 
>> */
>>do
>>{
>> !  uint32 state = LockBufHdr(bufHdr);
>> !  state &= ~(BM_VALID | BM_LOCKED);
>> !  pg_atomic_write_u32(>state, state);
>>} while (!StartBufferIO(bufHdr, true));
>>
>> 1. Previously there was a Assert, any reason why we removed it ?
>>  Assert(bufHdr->flags & BM_VALID);
>>
>
> It was missed.  In the attached patch I've put it back.
>
> 2. And also if we don't need assert then can't we directly clear BM_VALID
>> flag from state variable (if not locked) like pin/unpin buffer does,
>> without taking buffer header lock?
>>
>
> In this version of patch it could be also done as loop of CAS operation.
> But I'm not intended to it so because it would significantly complicate
> code.  It's not yes clear that traffic in this place is high enough to make
> such optimizations.
> Since v4 patch implements slightly different approach.  Could you please
> test it?  We need to check that this approach worth putting more efforts on
> it.  Or through it away otherwise.
>

Could anybody run benchmarks?  Feature freeze is soon, but it would be
*very nice* to fit it into 9.6 release cycle, because it greatly improves
scalability on large machines.  Without this patch PostgreSQL 9.6 will be
significantly behind competitors like MySQL 5.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-03-25 Thread Alexander Korotkov

On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> .. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
> getAccessMethods.  And we're missing psql tab-complete support for the
> new commands.


Attached patches fix both these issues.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-pg_dump-comment.patch
Description: Binary data


fix-am-tab-complete.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-24 Thread Alexander Korotkov

On Thu, Mar 24, 2016 at 5:17 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Mar 18, 2016 at 1:19 PM, Anastasia Lubennikova
> <a.lubennik...@postgrespro.ru> wrote:
> > Please, find the new version of the patch attached. Now it has WAL
> > functionality.
> >
> > Detailed description of the feature you can find in README draft
> > https://goo.gl/50O8Q0
> >
> > This patch is pretty complicated, so I ask everyone, who interested in
> this
> > feature,
> > to help with reviewing and testing it. I will be grateful for any
> feedback.
> > But please, don't complain about code style, it is still work in
> progress.
> >
> > Next things I'm going to do:
> > 1. More debugging and testing. I'm going to attach in next message
> couple of
> > sql scripts for testing.
> > 2. Fix NULLs processing
> > 3. Add a flag into pg_index, that allows to enable/disable compression
> for
> > each particular index.
> > 4. Recheck locking considerations. I tried to write code as less
> invasive as
> > possible, but we need to make sure that algorithm is still correct.
> > 5. Change BTMaxItemSize
> > 6. Bring back microvacuum functionality.
>
> I really like this idea, and the performance results seem impressive,
> but I think we should push this out to 9.7.  A btree patch that didn't
> have WAL support until two and a half weeks into the final CommitFest
> just doesn't seem to me like a good candidate.  First, as a general
> matter, if a patch isn't code-complete at the start of a CommitFest,
> it's reasonable to say that it should be reviewed but not necessarily
> committed in that CommitFest.  This patch has had some review, but I'm
> not sure how deep that review is, and I think it's had no code review
> at all of the WAL logging changes, which were submitted only a week
> ago, well after the CF deadline.  Second, the btree AM is a
> particularly poor place to introduce possibly destabilizing changes.
> Everybody depends on it, all the time, for everything.  And despite
> new tools like amcheck, it's not a particularly easy thing to debug.
>

It's all true.  But:
1) It's a great feature many users dream about.
2) Patch is not very big.
3) Patch doesn't introduce significant infrastructural changes.  It just
change some well-isolated placed.

Let's give it a chance.  I've signed as additional reviewer and I'll do my
best in spotting all possible issues in this patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-24 Thread Alexander Korotkov

Hi!

Since, patch for exposing current wait event information in PGPROC was
committed, it becomes possible to collect wait event statistics using
sampling.  Despite I'm not fan of this approach, it is still useful and
definitely better than nothing.
In PostgresPro, we actually already had it.  Now, it's too late to include
something new to 9.6.  This is why I've rework it and publish at github as
an extension for 9.6: https://github.com/postgrespro/pg_wait_sampling/
Hopefully, it could be considered as contrib for 9.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Access method extendability

2016-03-23 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 11:53 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Alexander Korotkov wrote:
> > On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I noticed this state of affairs because I started reading the complete
> > > thread in order to verify whether a consensus had been reached
> regarding
> > > the move of IndexQualInfo and GenericCosts to selfuncs.h.  But I only
> > > see that Jim Nasby mentioned it and Alexander said "yes they move";
> > > nothing else.  The reason for raising this is that, according to
> > > Alexander, new index AMs will want to use those structs; but current
> > > index AMs only include index_selfuncs.h, and none of them includes
> > > selfuncs.h, so it seems completely wrong to be importing all the
> planner
> > > knowledge embodied in selfuncs.h into extension index AMs.
> > >
> > > One observation is that the bloom AM patch (0003 of this series) uses
> > > GenericCosts but not IndexQualInfo.  But btcostestimate() and
> > > gincostestimate() both use IndexQualInfo, so other AMs might well want
> > > to use it too.
> > >
> > > We could move GenericCosts and the prototype for genericcostestimate()
> > > to index_selfuncs.h; that much is pretty reasonable.  But I'm not sure
> > > what to do about IndexQualInfo.  We could just add forward struct
> > > declarations for RestrictInfo and Node to index_selfuncs.h.  But then
> > > the extension code is going to have to import the includes were those
> > > are defined anyway.  Certainly it seems that deconstruct_indexquals()
> > > (which returns a list of IndexQualInfo) is going to be needed by
> > > extension index AMs, at least ...
> >
> > Thank you for highlighting these issues.
>
> After thinking some more about this, I think it's all right to move both
> IndexQualInfo and GenericCosts to selfuncs.h.  The separation that we
> want is this: the selectivity functions themselves need access to a lot
> of planner knowledge, and so selfuncs.h knows a lot about planner.  But
> the code that _calls_ the selectivity function doesn't; and
> index_selfuncs.h is about the latter.  Properly architected extension
> index AMs are going to have their selectivity functions in a separate .c
> file which knows a lot about planner, and which includes selfuncs.h (and
> maybe index_selfuncs.h if it wants to piggyback on the existing
> selecticity functions, but that's unlikely); only the prototype for the
> selfunc is going to be needed elsewhere, and the rest of the index code
> is not going to know anything about planner stuff so it will not need to
> include selfuncs.h nor index_selfuncs.h.


Right, the purposes of headers are:
 * selfuncs.h – for those who going to estimate selectivity,
 * index_selfuncs.h – for those who going to call selectivity estimation
functions of built-in AMs.

>From this point for view we should keep both IndexQualInfo and GenericCosts
in selfuncs.h.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] improving GROUP BY estimation

2016-03-22 Thread Alexander Korotkov

Hi, Tomas!

On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed <dean.a.rash...@gmail.com>
> wrote:
>
>> Probably a better URL to give is
>> http://www.adellera.it/investigations/distinct_balls/ which has a link
>> to the PDF version of the paper and also some supporting material.
>>
>> However, while that paper is in general very clear, I don't think it
>> gives a very clear explanation of that particular formula, and it
>> doesn't explain what it represents. It merely says that if "i" can be
>> ignored "for some reason (e.g. i << Nr)", then that formula is an
>> approximation to the exact "without replacement" formula, which is the
>> subject of that paper.
>>
>> But actually, that formula is the exact formula for the expected
>> number of distinct values when selecting *with replacement* from a
>> collection where each of the distinct values occurs an equal number of
>> times. So I think we should say that.
>>
>> Perhaps something along the lines of:
>>
>> /*
>>  * Update the estimate based on the restriction selectivity.
>>  *
>>  * This uses the formula for the expected number of distinct
>> values
>>  * when selecting with replacement from a collection where
>> each of
>>  * the distinct values occurs an equal number of times (a
>> uniform
>>  * distribution of values). This is a very close
>> approximation to
>>  * the more correct method of selecting without replacement
>> when
>>  * the number of distinct values is quite large --- for
>> example,
>>  * see http://www.adellera.it/investigations/distinct_balls/.
>>  * Additionally, it also turns out to work very well even
>> when the
>>  * number of distinct values is small.
>>  */
>>
>
> +1
> Thank you for work on this patch. The formula you propose and explanation
> look great!
>

I think you should send a revision of patch including comments proposed by
Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] pg_dump dump catalog ACLs

2016-03-22 Thread Alexander Korotkov

Hi, Stephen!

I'm signed to review this patch. At first, patch wasn't applied cleanly, it
had a conflict at the end of system_views.sql. But that way very easy to
fix.

On Mon, Mar 14, 2016 at 7:00 PM, Stephen Frost <sfr...@snowman.net> wrote:

> I've not included it in this patch, but my thinking here would be to add
> a check to the GRANT functions which checks 'if (creating_extension)'
> and records/updates the entries in pg_init_privs for those objects.
> This is similar to how we handle dependencies for extensions currently.
> That way, extensions don't have to do anything and if the user changes
> the permissions on an extension's object, the permissions for that
> object would then be dumped out.
>
> This still requires more testing, documentation, and I'll see about
> making it work completely for extensions also, but wanted to provide an
> update and solicit for review/comments.
>
> The patches have been broken up in what I hope is a logical way for
> those who wish to review/comment:
>
> #1 - Add the pg_init_privs catalog
> #2 - Change pg_dump to use a bitmap instead of boolean for 'dump'
> #3 - Split 'dump' into 'dump' and 'dump_contains'
> #4 - Make pg_dump include changed ACLs in dump of 'pg_catalog'
>

+  "CASE WHEN pip.initprivs IS NOT DISTINCT FROM n.nspacl "
+  "THEN false ELSE true END as dumpacl "

Why don't just write " pip.initprivs IS DISTINCT FROM n.nspacl AS dumpacl"?
I think there are few places whene CASE could be eliminated.

+ appendPQExpBuffer(query, "SELECT tableoid, oid, proname AS aggname, "
+  "pronamespace AS aggnamespace, "
+  "pronargs, proargtypes, "
+  "(%s proowner) AS rolname, "
+  "proacl AS aggacl "
+  "FROM pg_proc p "
+  "WHERE proisagg AND ("
+  "pronamespace != "
+  "(SELECT oid FROM pg_namespace "
+  "WHERE nspname = 'pg_catalog') OR "
+  "EXISTS (SELECT * FROM pg_init_privs pip "
+  "WHERE p.oid = pip.objoid AND pip.classoid = "
+  "(SELECT oid FROM pg_class "
+  "WHERE relname = 'pg_proc') "
+  "AND p.proacl <> pip.initprivs)",
+  username_subquery);

Why do we compare ACLs using <> funcs and using IS DISTINCT for other
objects?  Is it intentional?  Assuming most of proacl values are NULLs when
you change it, acl wouldn't be dumped since NULL <> value IS NULL.

In general, these checks using subqueries looks complicated for me, hard to
validate and needs a lot of testing.  Could we implement function "bool
need_acl_dump(oid objoid, oid classoid, int objsubid, proacl acl)" and call
it?

> #5 - Remove 'if (!superuser()) ereport()' calls and REVOKE EXECUTE
>

Other things in the patches looks good for me except they need tests and
documentation.
I'm marking this waiting for author.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-22 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar <dilipbal...@gmail.com> wrote:

>
> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar <dilipbal...@gmail.com>
> wrote:
>
>> ! pg_atomic_write_u32(>state, state);
>>   } while (!StartBufferIO(bufHdr, true));
>>
>> Better Write some comment, about we clearing the BM_LOCKED from stage
>> directly and need not to call UnlockBufHdr explicitly.
>> otherwise its confusing.
>>
>
> Few more comments..
>
> *** 828,837 
> */
>do
>{
> !  LockBufHdr(bufHdr);
> *!  Assert(bufHdr->flags & BM_VALID);*
> !  bufHdr->flags &= ~BM_VALID;
> !  UnlockBufHdr(bufHdr);
>} while (!StartBufferIO(bufHdr, true));
>}
>}
> --- 826,834 
> */
>do
>{
> !  uint32 state = LockBufHdr(bufHdr);
> !  state &= ~(BM_VALID | BM_LOCKED);
> !  pg_atomic_write_u32(>state, state);
>} while (!StartBufferIO(bufHdr, true));
>
> 1. Previously there was a Assert, any reason why we removed it ?
>  Assert(bufHdr->flags & BM_VALID);
>

It was missed.  In the attached patch I've put it back.

2. And also if we don't need assert then can't we directly clear BM_VALID
> flag from state variable (if not locked) like pin/unpin buffer does,
> without taking buffer header lock?
>

In this version of patch it could be also done as loop of CAS operation.
But I'm not intended to it so because it would significantly complicate
code.  It's not yes clear that traffic in this place is high enough to make
such optimizations.
Since v4 patch implements slightly different approach.  Could you please
test it?  We need to check that this approach worth putting more efforts on
it.  Or through it away otherwise.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-03-22 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> So.  Before this version of the patch was posted in Nov 4th 2015, both
> Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless,
> let's pursue this stuff without those commands".
> http://www.postgresql.org/message-id/54730dfd.2060...@vmware.com (Nov
> 2014)
>

Yes, that's it. In this email Heikki asserts that INSERTs into pg_am tables
in extensions would survive pg_upgrade, because pg_dump will dump CREATE
EXTENSION command which execute extension script.
In my response I noticed that it's not correct.  pg_dump in binary upgrade
mode works so that we will miss records in pg_am.  I haven't any answer
here.
http://www.postgresql.org/message-id/capphfdsbkntlchypngr0ffu9wlhh__nukdp13qlqukgtnv_...@mail.gmail.com
And, as Petr Jelinek mentioned, there is also problem of dependencies in
DROP EXTENSION.

> http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us (Oct
> 2014)
>

> And then Alexander posted this version, without any discussion that
> evidenced that those old objections were overridden.  What happened
> here?  Did somebody discuss this in unarchived meetings?  If so, it
> would be good to know about that in this thread.

After that Tom Lane committed very huge patch about rework AM API.  One of
aims of this patch was support for CREATE ACCESS METHOD command.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=65c5fcd353a859da9e61bfb2b92a99f12937de3b
I've initially positioned this patch as refactoring for support CREATE
ACCESS METHOD command.
http://www.postgresql.org/message-id/capphfdtlisxmxk2b4tw+4+no_1-t0rao5foyszho6+sn2om...@mail.gmail.com
During discussing Tom mentioned future CREATE ACCESS METHOD command.
http://www.postgresql.org/message-id/16164.1439222...@sss.pgh.pa.us
I don't think Tom would put so much efforts in this direction if he would
remain opposed to this command.

I noticed this state of affairs because I started reading the complete
> thread in order to verify whether a consensus had been reached regarding
> the move of IndexQualInfo and GenericCosts to selfuncs.h.  But I only
> see that Jim Nasby mentioned it and Alexander said "yes they move";
> nothing else.  The reason for raising this is that, according to
> Alexander, new index AMs will want to use those structs; but current
> index AMs only include index_selfuncs.h, and none of them includes
> selfuncs.h, so it seems completely wrong to be importing all the planner
> knowledge embodied in selfuncs.h into extension index AMs.
>
> One observation is that the bloom AM patch (0003 of this series) uses
> GenericCosts but not IndexQualInfo.  But btcostestimate() and
> gincostestimate() both use IndexQualInfo, so other AMs might well want
> to use it too.
>
> We could move GenericCosts and the prototype for genericcostestimate()
> to index_selfuncs.h; that much is pretty reasonable.  But I'm not sure
> what to do about IndexQualInfo.  We could just add forward struct
> declarations for RestrictInfo and Node to index_selfuncs.h.  But then
> the extension code is going to have to import the includes were those
> are defined anyway.  Certainly it seems that deconstruct_indexquals()
> (which returns a list of IndexQualInfo) is going to be needed by
> extension index AMs, at least ...
>

Thank you for highlighting these issues.

> I've made a few edits to the patch already, but I need to do some more
> testing.

Did you already fix the issues above or do you need me to fix them?

> Particularly I would like to understand the relcache issues to
> figure out whether the current one is right.

Good point.  I'll also try to find relcache issues.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-21 Thread Alexander Korotkov

On Mon, Mar 21, 2016 at 2:19 PM, Abhijit Menon-Sen <a...@2ndquadrant.com>
wrote:

> At 2016-03-21 13:04:33 +0300, a.korot...@postgrespro.ru wrote:
> >
> > I'm not sure why we want to make new dependency type by ALTER FUNCTION
> > command, not ALTER EXTENSION?
>
> It's a matter of semantics. It means something very different than what
> an 'e' dependency means. The extension doesn't own the function (and so
> pg_dump shouldn't ignore it), but the function depends on the extension
> (and so dropping the extension should drop it).
>
> > The argument could be that 'x' dependency type would be used for other
> > objects not extensions.
>
> I suppose this is possible, but yes, I agree with you that it's not
> clear how or why this would be useful.
>
> > So, I would prefer this patch to extend ALTER EXTENSION command while
> > it's aimed to this particular problem.
>
> OK, so that's what the patch does, and it's certainly the simplest
> approach for reasons discussed earlier (though perhaps not as clear
> semantically as the ALTER FUNCTION approach). But:
>
> > I even think we could extend existent grammar rule
> >
> > | ALTER EXTENSION name add_drop FUNCTION
> function_with_argtypes
> > > *** AlterExtensionContentsStmt:
> > > *** 3982,3987 
> > > --- 3987,3993 
> > > n->objtype = OBJECT_FUNCTION;
> > > n->objname = $6->funcname;
> > > n->objargs = $6->funcargs;
> > > +   n->deptype = 'e';
> > > $$ = (Node *)n;
> > > }
>
> How exactly do you propose to do this, i.e., what would the final
> command to declare an 'x' dependency look like?
>

I'm proposed something like this.

extension_dependency_type:
DEPENDENT { $$ = 'x'; }
| /*EMPTY*/ { $$ = 'e'; }
;

...
| ALTER EXTENSION name add_drop extension_dependency_type  FUNCTION
function_with_argtypes
{
AlterExtensionContentsStmt *n = makeNode(AlterExtensionContentsStmt);
n->extname = $3;
n->action = $4;
n->objtype = OBJECT_FUNCTION;
n->objname = $7->funcname;
n->objargs = $7->funcargs;
n->deptype = $5;
$$ = (Node *)n;
}

I didn't try it.  Probably it causes a grammar conflicts.  In this case I
don't insist on it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-21 Thread Alexander Korotkov

Hi!

I see that patch changes existing regression tests in tsearch2.out.

*** a/contrib/tsearch2/expected/tsearch2.out
--- b/contrib/tsearch2/expected/tsearch2.out
*** SELECT '(!1|2)&3'::tsquery;
*** 278,292 
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!  tsquery
! -
!  '1' | ( '2' | ( '4' | ( '5' | '6' ) ) )
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!  tsquery
! -
!  ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6'
  (1 row)

  SELECT '1&(2&(4&(5&6)))'::tsquery;
--- 278,292 
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!tsquery
! -
!  '1' | '2' | '4' | '5' | '6'
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!tsquery
! -
!  '1' | '2' | '4' | '5' | '6'
  (1 row)


This change looks like improvement, without braces tsquery readability is
much better.

*** select rewrite('moscow & hotel', 'select
*** 461,469 
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample
from test_tsquery'::text );
!rewrite
!
-
!  'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' &
'york' ) )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;
--- 461,469 
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample
from test_tsquery'::text );
!  rewrite
!
-
!  ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' |
'qq' )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;

However, such reorderings look unclear and need motivation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1043 matches

Mail list logo