Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)

2017-12-25 Thread Andres Freund


On December 26, 2017 5:55:03 AM GMT+01:00, Michael Paquier 
 wrote:
>Hi all,
>
>Triggering "tuple concurrently updated" from heapam.c, which is an
>elog(),
>is not that difficult for some DDL commands. For example with ALTER
>ROLE,
>just create a role:
>psql --no-psqlrc -c 'create role popo'
>
>And then run that in two sessions and the error will show up, triggered
>from CatalogTupleUpdate():
>while true; do psql --no-psqlrc -c "alter role popo password 'foo'";
>done
>
>In this case, upgrading the lock taken on pg_authid from
>RowExclusiveLock
>to ShareRowExclusiveLock prevents concurrent sessions to update each
>other,
>which is what the patch attached does.
>
>I think that we should get rid of all those errors, for many
>applications
>doing concurrent DDLs getting this error is surprising, and could be
>thought
>as a corrupted instance. I am just touching the tip of the iceberg
>here, as
>I have the gut feeling that there are other objects where it is
>possible to
>trigger the failure, and an analysis effort is going to be costly. So
>I'd
>like to get first the temperature about such analysis.
>
>So, opinions?

You're proposing to lock the entire relation against many forms of concurrent 
DDL, just to get rid of that error? That seems unacceptable.
Isn't the canonical way to solve this to take object locks? 

Andres

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Deadlock in multiple CIC.

2017-12-25 Thread Jeff Janes
c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock (or at least
not as reliably as before) by dropping its own snapshot before waiting for
all the other ones to go away.

With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX CONCURRENTLY on
different tables in the same database started deadlocking against each
other again quite reliably.

I think the solution is simply to drop the catalog snapshot as well, as in
the attached.

reported by Jeremy Finzel here:
https://www.postgresql.org/message-id/flat/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com#cama1xuhhjcv8qkx0wor1mpm_r4qxn26eibwcrj0oor2ybuf...@mail.gmail.com


Cheers,

Jeff
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 97091dd..d4ac2e1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -861,6 +861,7 @@ DefineIndex(Oid relationId,
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
+	InvalidateCatalogSnapshotConditionally();
 
 	/*
 	 * The index is now valid in the sense that it contains all currently


Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)

2017-12-25 Thread Michael Paquier
Hi all,

Triggering "tuple concurrently updated" from heapam.c, which is an elog(),
is not that difficult for some DDL commands. For example with ALTER ROLE,
just create a role:
psql --no-psqlrc -c 'create role popo'

And then run that in two sessions and the error will show up, triggered
from CatalogTupleUpdate():
while true; do psql --no-psqlrc -c "alter role popo password 'foo'"; done

In this case, upgrading the lock taken on pg_authid from RowExclusiveLock
to ShareRowExclusiveLock prevents concurrent sessions to update each other,
which is what the patch attached does.

I think that we should get rid of all those errors, for many applications
doing concurrent DDLs getting this error is surprising, and could be thought
as a corrupted instance. I am just touching the tip of the iceberg here, as
I have the gut feeling that there are other objects where it is possible to
trigger the failure, and an analysis effort is going to be costly. So I'd
like to get first the temperature about such analysis.

So, opinions?
--
Michael
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 24613e3c75..a93b31ae6d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -971,6 +971,7 @@ ERROR:  could not serialize access due to read/write 
dependencies among transact
 
 
  Acquired by CREATE COLLATION,
+ ALTER ROLE,
  CREATE TRIGGER, and many forms of
  ALTER TABLE (see ).
 
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index f2941352d7..d2e300d949 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -671,9 +671,11 @@ AlterRole(AlterRoleStmt *stmt)
bypassrls = intVal(dbypassRLS->arg);
 
/*
-* Scan the pg_authid relation to be certain the user exists.
+* Scan the pg_authid relation to be certain the user exists. Take
+* ShareRowExclusiveLock to prevent other sessions to update this
+* tuple in parallel with another ALTER ROLE command.
 */
-   pg_authid_rel = heap_open(AuthIdRelationId, RowExclusiveLock);
+   pg_authid_rel = heap_open(AuthIdRelationId, ShareRowExclusiveLock);
pg_authid_dsc = RelationGetDescr(pg_authid_rel);
 
tuple = get_rolespec_tuple(stmt->role);


signature.asc
Description: PGP signature


Re: AS OF queries

2017-12-25 Thread Masahiko Sawada
On Thu, Dec 21, 2017 at 3:57 AM, Alvaro Hernandez  wrote:
>
>
> On 20/12/17 14:48, Konstantin Knizhnik wrote:
>
>
>
> On 20.12.2017 16:12, Laurenz Albe wrote:
>
> Konstantin Knizhnik wrote:
>
> I wonder if Postgres community is interested in supporting time travel
> queries in PostgreSQL (something like AS OF queries in Oracle:
> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
> As far as I know something similar is now developed for MariaDB.
>
> I think that would be a good thing to have that could make
> the DBA's work easier - all the requests to restore a table
> to the state from an hour ago.
>
>
> Please notice that it is necessary to configure postgres in proper way in
> order to be able to perform time travels.
>
>
> This makes sense. BTW, I believe this feature would be an amazing
> addition to PostgreSQL.
>
>
> If you do not disable autovacuum, then old versions will be just cleaned-up.
> If transaction commit timestamps are not tracked, then it is not possible to
> locate required timeline.
>
> So DBA should make a decision in advance whether this feature is needed or
> not.
> It is not a proper instrument for restoring/auditing existed database which
> was not configured to keep all versions.
>
> May be it is better to add special configuration parameter for this feature
> which should implicitly toggle
> autovacuum and track_commit_timestamp parameters).
>
>
> Downthread a "moving xid horizon" is proposed. I believe this is not too
> user friendly. I'd rather use a timestamp horizon (e.g. "up to 2 days ago").
> Given that the commit timestamp is tracked, I don't think this is an issue.
> This is the same as the undo_retention in Oracle, which is expressed in
> seconds.

I agree but since we cannot have same xid beyond xid wraparounds we
would have to remove old tuples even if we're still in the time
interval

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Replication status in logical replication

2017-12-25 Thread Masahiko Sawada
On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
 wrote:
> On 21/11/17 22:06, Masahiko Sawada wrote:
>>
>> After investigation, I found out that my previous patch was wrong
>> direction. I should have changed XLogSendLogical() so that we can
>> check the read LSN and set WalSndCaughtUp = true even after read a
>> record without wait. Attached updated patch passed 'make check-world'.
>> Please review it.
>>
>
> Hi,
>
> This version looks good to me and seems to be in line with what we do in
> physical replication.
>
> Marking as ready for committer.
>

Thank you for reviewing this patch! I think this patch can be
back-patched to 9.4 as Simon mentioned.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: How to Works with Centos

2017-12-25 Thread Michael Paquier
On Mon, Dec 25, 2017 at 06:48:09PM -0500, Jaime Casanova wrote:
> so you have two options:
> 
> 1) use the packages from yum.postgresql.org for a supported version
> 2) get commercial support for your out-of-community-support verssion
> 
> but even if you do 2, that would be a preparatory step  looking
> forward to upgrade to a newer version

You need to think long-term here. The product that you are developing
and/or maintaining will need to stay around for a couple of years as
well, those are years where you should keep up with the community support
window of 5 years for a major version of PostgreSQL. That's what I do
on the stuff I work with, and the outcome is much better at the end
as there is no need to finish with a private fork of an out-of-support
version, normally at least.
--
Michael


signature.asc
Description: PGP signature


Re: How to Works with Centos

2017-12-25 Thread Jaime Casanova
On 25 December 2017 at 09:39, Benyamin Guedj  wrote:
>
> Upon doing so, our DevOps team in response insisted (and still insists) that
> we keep using version 9.2 as it is part of the Centos 7 distribution, and
> they believe that version to be “best practice”, even though PostgreSQL 9.2
> is no longer supported.
>
> My question is:
>
> Is working with the default distribution’s version (9.2) really the “best
> practice”, even though it is no longer supported?
>

clearly no, our versioning page says
(https://www.postgresql.org/support/versioning/):
"""
The PostgreSQL project aims to fully support a major release for five
years. After its end-of-life (EOL) month ends, a major version
receives one final minor release. After that final minor release, bug
fixing ceases for that major version.
"""

so, if bug fixing ceases for a non-supported version it's clearly no
"best practice" to continue using it.

so you have two options:

1) use the packages from yum.postgresql.org for a supported version
2) get commercial support for your out-of-community-support verssion

but even if you do 2, that would be a preparatory step  looking
forward to upgrade to a newer version

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



How to Works with Centos

2017-12-25 Thread Benyamin Guedj
Hello,



The company I’m working for develops a product which uses Centos 6/7
(different versions of the product) and also uses Vertica and PostgreSQL.

During the course of the development of the latest version of our product,
we ran into problems that lead us to contact Vertica’s R team, which in
turn suggested we upgrade our database to the latest version as the current
one is not supported.

When the product was first developed we used PostgreSQL 9.2, and after
upgrading to Centos 7 we considered upgrading our PostgreSQL version as
well.



Trying to learn from the aforementioned incident, we floated idea of
upgrading our PostgreSQL version to the latest one in order to get the
performance improvement, latest features, bug fixes and *support*.

Upon doing so, our DevOps team in response insisted (and still insists)
that we keep using version 9.2 as it is part of the Centos 7 distribution,
and they believe that version to be “best practice”, even though PostgreSQL
9.2 is *no longer supported*.



My question is:

Is working with the default distribution’s version (9.2) really the “best
practice”, even though it is no longer supported?

I have looked for documentation regarding the matter but could not find any
answers. I would greatly appreciate your assistance with the matter.



Thanks in advance,

Benjamin


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2017-12-25 Thread Erik Rijkers
That indeed fixed the problem: running that same pgbench test, I see no 
crashes anymore (on any of 3 different machines, and with several 
pgbench parameters).


Thank you,

Erik Rijkers



Re: Unique indexes & constraints on partitioned tables

2017-12-25 Thread Alvaro Herrera
Amit Langote wrote:

> Have you considered what happens when ON CONFLICT code tries to depend on
> such an index (a partitioned unique index)?

Not yet, but it was on my list of things to fix.  Thanks for working on
it -- I'll be reviewing this soon.

> +create table parted_conflict_test_2 partition of parted_conflict_test for 
> values in (2);
> +create unique index on parted_conflict_test (a);
> +insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
> +insert into parted_conflict_test values (1, 'b') on conflict (a) do update 
> set b = excluded.b;
> +insert into parted_conflict_test values (1, 'b') on conflict (a) do update 
> set b = excluded.b where parted_conflict_test.b = 'a';
> +select * from parted_conflict_test;
> + a | b 
> +---+---
> + 1 | b
> +(1 row)

sweet.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2017-12-25 Thread Tomas Vondra


On 12/24/2017 10:00 AM, Erik Rijkers wrote:
>
> logical replication of 2 instances is OK but 3 and up fail with:
>
> TRAP: FailedAssertion("!(last_lsn < change->lsn)", File:
> "reorderbuffer.c", Line: 1773)
>
> I can cobble up a script but I hope you have enough from the assertion
> to see what's going wrong...

 The assertion says that the iterator produces changes in order that
 does
 not correlate with LSN. But I have a hard time understanding how that
 could happen, particularly because according to the line number this
 happens in ReorderBufferCommit(), i.e. the current (non-streaming)
 case.

 So instructions to reproduce the issue would be very helpful.
>>>
>>> Using:
>>>
>>> 0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v2.patch
>>> 0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v2.patch
>>> 0003-Issue-individual-invalidations-with-wal_level-log-v2.patch
>>> 0004-Extend-the-output-plugin-API-with-stream-methods-v2.patch
>>> 0005-Implement-streaming-mode-in-ReorderBuffer-v2.patch
>>> 0006-Add-support-for-streaming-to-built-in-replication-v2.patch
>>>
>>> As you expected the problem is the same with these new patches.
>>>
>>> I have now tested more, and seen that it not always fails.  I guess that
>>> it here fails 3 times out of 4.  But the laptop I'm using at the moment
>>> is old and slow -- it may well be a factor as we've seen before [1].
>>>
>>> Attached is the bash that I put together.  I tested with
>>> NUM_INSTANCES=2, which yields success, and NUM_INSTANCES=3, which fails
>>> often.  This same program run with HEAD never seems to fail (I tried a
>>> few dozen times).
>>>
>>
>> Thanks. Unfortunately I still can't reproduce the issue. I even tried
>> running it in valgrind, to see if there are some memory access issues
>> (which should also slow it down significantly).
> 
> One wonders again if 2ndquadrant shouldn't invest in some old hardware ;)
> 

Well, I've done tests on various machines, including some really slow
ones, and I still haven't managed to reproduce the failures using your
script. So I don't think that would really help. But I have reproduced
it by using a custom stress test script.

Turns out the asserts are overly strict - instead of

  Assert(prev_lsn < current_lsn);

it should have been

  Assert(prev_lsn <= current_lsn);

because some XLOG records may contain multiple rows (e.g. MULTI_INSERT).

The attached v3 fixes this issue, and also a couple of other thinkos:

1) The AssertChangeLsnOrder assert check was somewhat broken.

2) We've been sending aborts for all subtransactions, even those not yet
streamed. So downstream got confused and fell over because of an assert.

3) The streamed transactions were written to /tmp, using filenames using
subscription OID and XID of the toplevel transaction. That's fine, as
long as there's just a single replica running - if there are more, the
filenames will clash, causing really strange failures. So move the files
to base/pgsql_tmp where regular temporary files are written. I'm not
claiming this is perfect, perhaps we need to invent another location.

FWIW I believe the relation sync cache is somewhat broken by the
streaming. I thought resetting it would be good enough, but it's more
complicated (and trickier) than that. I'm aware of it, and I'll look
into that next - but probably not before 2018.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v3.patch.gz
Description: application/gzip


0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v3.patch.gz
Description: application/gzip


0003-Issue-individual-invalidations-with-wal_level-log-v3.patch.gz
Description: application/gzip


0004-Extend-the-output-plugin-API-with-stream-methods-v3.patch.gz
Description: application/gzip


0005-Implement-streaming-mode-in-ReorderBuffer-v3.patch.gz
Description: application/gzip


0006-Add-support-for-streaming-to-built-in-replication-v3.patch.gz
Description: application/gzip


Re: General purpose hashing func in pgbench

2017-12-25 Thread Fabien COELHO


Hello,


However, the key can be used if controlled so that different values do
not have the same randomization in different part of the script, so as
to avoid using the same patterns for different things if not desirable.


Original murmur algorithm accepts seed as a parameter, which can be used
for this purpose. I used value itself as a seed in the patch, but I can
turn it into a function argument.


Hmmm. Possibly. However it needs a default, something like

  x = hash(v[, key])

with key some pseudo-random value?


For the pgbench pseudo-random permutation I'm looking at, the key can
be specified to control whether the same pattern or not should be used.


Just curious, which algorithm are you intended to choose?


I have found nothing convincing, so I'm inventing.

Most "permutation" functions are really cryptographic cyphers which are 
quite expensive, and require powers of two, which is not what is needed. 
ISTM that there are some constructs to deal with arbitrary sizes based on 
cryptographic functions, but that would make it too expensive for the 
purpose.


Currently I use the key as a seed for a cheap a prng, two rounds of 
overlapping (to deal with non power of two) xor (from the prng output) and 
prime-multiply-modulo to scatter the value. See attached work in progress.


If you have a pointer for a good and cheap generic (any size) keyed
permutation, feel free to share it.

--
Fabien.// $Id: test_hash_2.c 1049 2017-12-24 17:52:23Z fabien $

/*
 * Pseudo-Random Permutation (PRP): the usual approach is a 4 stage
 * (Luby-Rackoff) Feistel network based on some "good" hash function.
 * However this construction only works for even powers of 2.
 * Unbalance is possible as well, but it must still be a power of 2 size.
 * TOO BAD.
 */

#include 
#include 
#include 
#include 
#include 
//#include 
#include  // PRI?64
#include  // pow

// #define DEBUG

#define SEED  999484257552941047LL

typedef enum { false, true } bool;

/* overall parameters
 */
#define N_MASKS 6
#define PRP_ROUNDS 8

// NOT GOOD WITH POWER OF 2
#define ROUNDS 2

static int64_t primes[PRP_ROUNDS] = {
  // a few 25-27 bits primes
  33303727LL,
  49979693LL,
  67867979LL,
  86028157LL,
  104395303LL,
  122949829LL,
  141650963LL,
  160481219LL
};

/* compute largest hierarchical mask to hold 0 .. size-2
 * YES, size-2: If size is a power of 2, we want the half mask
 */
static int64_t compute_mask(const int64_t size)
{
  // 8-bit shortcut up to 2**32
  int64_t m =
size > 0xLL ? 0xLL:
size > 0xffLL ? 0xffLL:
size > 0xLL ? 0xLL:
size > 0xffLL ? 0xffLL: 0LL;

  // remaining bits
  do {
m = (m << 1) | 1LL;
  } while (size-1 > m);

  m >>= 1;

  return m;
}

/* Donald Knuth linear congruential generator */
#define DK_MUL 6364136223846793005LL
#define DK_INC 1442695040888963407LL

/* do not use all small bits */
#define PRP_SHIFT 13

/*
 * Apply pseudo-random permutation with a seed.
 *
 * The key idea behind "pseudo-random"... is that it is not random at all,
 * kind of an oxymoron. So it really means "steered-up permutation", whatever
 * good it may bring.
 *
 * Result in [0, size) is a permutation for inputs in the same set,
 * at least if size is under 112.9E9 (otherwise it is unclear because
 * of int64_t overflow in the multiply-modulo phase).
 *
 * The result are not fully convincing on very small sizes, but this
 * is not the typical use case.
 *
 * Function performance is reasonable.
 *
 * THIS FUNCTION IS ** NOT ** CRYPTOGRAPHICALLY SECURE AT ALL.
 * PLEASE DO NOT USE FOR SUCH PURPOSE.
 */
static int64_t
pseudorandom_perm(const int64_t data, const int64_t size, const int64_t iseed)
{
  assert(size > 0);

  uint64_t seed = iseed;
  uint64_t v = (uint64_t) data % size;

  assert(0 <= v && v < size);

  int64_t mask = compute_mask(size);

  // for small seeds:
  // seed = seed * DK_MUL + DK_INC;

  // fprintf(stderr, "mask=%ld (0x%lx) size=%ld\n",
  // mask, (uint64_t) mask, size);
  assert(mask < size && size <= 2 * mask + 2);

  for (int i = 0, p = 0; i < ROUNDS; i++, p++)
  {
// first "half" whitening
if (v <= mask)
  v ^= (seed >> PRP_SHIFT) & mask;
seed = seed * DK_MUL + DK_INC;
// assert(0 <= v && v < size);

// and second overlapping "half" whitening and reversing
if (size-v-1 <= mask)
  v = size - mask + ((size - v - 1) ^ ((seed >> PRP_SHIFT) & mask)) - 1;
seed = seed * DK_MUL + DK_INC;
// assert(0 <= v && v < size);

// given the prime sizes, at most 2 primes are skipped for a given size
while (size % primes[p] == 0)
  p++;

// scatter
v = (primes[p] * v + (seed >> PRP_SHIFT)) % size;
seed = seed * DK_MUL + DK_INC;
// assert(0 <= v && v < size);

// bit rotate? bit permute? others?
  }

  /*
  // first "half" whitening
  if (v <= mask)
v ^= (seed >> PRP_SHIFT) & mask;

  seed = seed * DK_MUL + DK_INC;
  // assert(0 <= v && v < size);

  // and second overlapping "half" whitening and 

Re: [HACKERS] Replication status in logical replication

2017-12-25 Thread Petr Jelinek
On 21/11/17 22:06, Masahiko Sawada wrote:
> 
> After investigation, I found out that my previous patch was wrong
> direction. I should have changed XLogSendLogical() so that we can
> check the read LSN and set WalSndCaughtUp = true even after read a
> record without wait. Attached updated patch passed 'make check-world'.
> Please review it.
> 

Hi,

This version looks good to me and seems to be in line with what we do in
physical replication.

Marking as ready for committer.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: compress method for spgist - 2

2017-12-25 Thread Teodor Sigaev




Now, this patch is ready for committer from my point of view.


Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers

2017-12-25 Thread Petr Jelinek
Hi,

thanks for writing the patch.

On 05/12/17 06:58, Craig Ringer wrote:
> Hi all
> 
> [...]
>> The cause appears to be that walsender.c's ProcessRepliesIfAny writes a
> LOG for unexpected EOF then calls proc_exit(0). But  serialized txn
> cleanup is done by 
> ReorderBufferRestoreCleanup, as called by ReorderBufferCleanupTXN, which
> is invoked from the PG_CATCH() in ReorderBufferCommit() and on various
> normal exits. It won't get called if we proc_exit() without an ERROR, so
> we leave stale data lying around.
> 
> It's not a problem on crash restart because StartupReorderBuffer already
> does the required delete. 
> 
> ReorderBufferSerializeTXN, which spills the txns to disk, doesn't appear
> to have any guard to ensure that the segment files don't already exist
> when it goes to serialize a snapshot. Adding one there would probably be
> expensive; we don't know the last lsn of the txn yet, so to be really
> safe we'd have to do a directory listing and scan for any txn-$OURXID-*
> entries.
> 
> So to fix, I suggest that we should do a
> slot-specific StartupReorderBuffer-style deletion when we start a new
> decoding session on a slot, per attached patch.
> 
> It might be nice to also add a hook on proc exit, so we don't have stale
> buffers lying around until next decoding session, but I didn't want to
> add new global state to reorderbuffer.c so I've left that for now.


Hmm, can't we simply call the new cleanup function in
ReplicationSlotRelease()? That's called during process exit and we know
there about slot so no extra global variables are needed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin

25/12/2017 17:12, Fabien COELHO пишет:
>
> However, the key can be used if controlled so that different values do
> not have the same randomization in different part of the script, so as
> to avoid using the same patterns for different things if not desirable.
Original murmur algorithm accepts seed as a parameter, which can be used
for this purpose. I used value itself as a seed in the patch, but I can
turn it into a function argument.
>
> For the pgbench pseudo-random permutation I'm looking at, the key can
> be specified to control whether the same pattern or not should be used.
>
Just curious, which algorithm are you intended to choose?

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: genomic locus

2017-12-25 Thread Teodor Sigaev



I think I can wrangle this type into GiST just by tweaking consistent(), 
union(), and picksplit(), if I manage to express my needs in C without breaking 
too many things. My first attempt segfaulted.
Actually, consistent() can determ actual query data type by strategy number. See 
examples in ltree, intarray



If all goes to plan, I will end up with an index tree partitioned by contig at 
the top level and geometrically down from there. That will be as close as I can 
get to an array of config-specific indices, without having to store data in 
separate tables.


What do you think of that?


I have some doubt that you can distinguish root page, but it's possible to 
distinguish leaf pages, intarray and tsearch do that.


Reading your plan, I found an idea for GIN: key for GIN is a pair of (contig, 
one genome position). So, any search for interset operation with be actually a 
range search from (contig, start) to (contig, end)





I have a low-level technical question. Because I can’t anticipate the maximum 
length of contig names (and do not want to waste space), I have made the new 
locus type a varlena, like this:


#include "utils/varlena.h"

typedef struct LOCUS
{
   int32 l_len_; /* varlena header (do not touch directly!) */
   int32 start;
   int32 end;
   char  contig[FLEXIBLE_ARRAY_MEMBER];
} LOCUS;

#define LOCUS_SIZE(str) (offsetof(LOCUS, contig) + sizeof(str))

sizeof? or strlen ?



That flexible array member messes with me every time I need to copy it while 
deriving a new locus object from an existing one (or from a pair). What I ended 
up doing is this:


   LOCUS  *l = PG_GETARG_LOCUS_P(0);
   LOCUS  *new_locus;
   char   *contig;
   int    size;
   new_locus = (LOCUS *) palloc0(sizeof(*new_locus));
   contig = pstrdup(l->contig); // need this to determine the length of contig 
l->contig should be null-terminated for pstrdup, but if so, you don't need to 
pstrdup() it - you could use l->contig directly below. BTW, LOCUS_SIZE should 
add 1 byte for '\0' character in this case.




name at runtime
   size = LOCUS_SIZE(contig);
   SET_VARSIZE(new_locus, size);
   strcpy(new_locus->contig, contig);

Is there a more direct way to clone a varlena structure (possibly assigning an 
differently-sized contig to it)? One that is also memory-safe?

Store length of contig in LOCUS struct.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] Flexible configuration for full-text search

2017-12-25 Thread Aleksandr Parfenov
Hi Arthur,

Thank you for the review.

On Thu, 21 Dec 2017 17:46:42 +0300
Arthur Zakirov  wrote:

> I noticed that there are typos in the documentation. And I think it
> is necessary to keep information about previous sintax. The syntax
> will be supported anyway. For example, information about TSL_FILTER
> was removed. And it will be good to add more examples of the new
> syntax.

In the current version of the patch, configurations written in old
syntax are rewritten into the same configuration in the new syntax.
Since new syntax doesn't support a TSL_FILTER, it was removed from the
documentation. It is possible to store configurations written in old
syntax in a special way and simulate a TSL_FILTER behavior for them.
But it will lead to maintenance of two different behavior of the FTS
depends on a version of the syntax used during configuration. Do you
think we should keep both behaviors?

> The result of ts_debug() function was changed. Is it possible to keep
> the old ts_debug() result? To be specific, 'dictionaries' column is
> text now, not array, as I understood. It will be good to keep the
> result for the sake of backward compatibility.

Columns' 'dictionaries' and 'dictionary' type were changed to text
because after the patch the configuration may be not a plain array of
dictionaries but a complex expression tree. In the column
'dictionaries' the result is textual representation of configuration
and it is the same as a result of \dF+ description of the configuration.

I decide to rename newly added column to 'configuration' and keep
column 'dictionaries' with an array of all dictionaries used in
configuration (no matter how). Also, I fixed a bug in 'command' output
of the ts_debug in some cases.

Additionally, I added some examples to documentation regarding
multilingual search and combination of exact and linguistic-aware
search and fixed typos.diff --git a/doc/src/sgml/ref/alter_tsconfig.sgml b/doc/src/sgml/ref/alter_tsconfig.sgml
index ebe0b94b27..a1f483e10b 100644
--- a/doc/src/sgml/ref/alter_tsconfig.sgml
+++ b/doc/src/sgml/ref/alter_tsconfig.sgml
@@ -21,8 +21,12 @@ PostgreSQL documentation
 
  
 
+ALTER TEXT SEARCH CONFIGURATION name
+ADD MAPPING FOR token_type [, ... ] WITH config
 ALTER TEXT SEARCH CONFIGURATION name
 ADD MAPPING FOR token_type [, ... ] WITH dictionary_name [, ... ]
+ALTER TEXT SEARCH CONFIGURATION name
+ALTER MAPPING FOR token_type [, ... ] WITH config
 ALTER TEXT SEARCH CONFIGURATION name
 ALTER MAPPING FOR token_type [, ... ] WITH dictionary_name [, ... ]
 ALTER TEXT SEARCH CONFIGURATION name
@@ -88,6 +92,17 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA 

 
+   
+config
+
+ 
+  The dictionaries tree expression. The dictionary expression
+  is a triple of condition/command/else that define way to process
+  the text. The ELSE part is optional.
+ 
+
+   
+

 old_dictionary
 
@@ -133,7 +148,7 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA 
 

- 
+  
 
   
The ADD MAPPING FOR form installs a list of dictionaries to be
@@ -154,6 +169,53 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA 
 
+ 
+  Dictionaries Map Config
+
+  
+   Format
+   
+Formally config is one of:
+   
+   
+* dictionary_name
+
+* config { UNION | INTERSECT | EXCEPT | MAP } config
+
+* CASE config
+WHEN [ NO ] MATCH THEN { KEEP | config }
+[ ELSE config ]
+  END
+   
+  
+
+  
+   Description
+   
+config can be used
+in three different formats. The most simple format is name of dictionary to
+use for tokens processing.
+   
+   
+In order to use more than one dictionary
+simultaneously user should interconnect dictionaries by operators. Operators
+UNION, EXCEPT and
+INTERSECT have same meaning as in operations on sets.
+Special operator MAP gets output of left subexpression
+and uses it as an input to right subexpression.
+   
+   
+The third format of config is similar to
+CASE/WHEN/THEN/ELSE structure. It's consists of three
+replaceable parts. First one is configuration which is used to construct lexemes set
+for matching condition. If the condition is triggered, the command is executed.
+Use command KEEP to avoid repeating of the same
+configuration in condition and command part. However, command may differ from
+the condition. The ELSE branch is executed otherwise.
+   
+  
+ 
+
  
   Examples
 
@@ -167,6 +229,34 @@ ALTER TEXT SEARCH CONFIGURATION name SET SCHEMA 
+
+  
+   Next example shows how to analyse documents in both English and German languages.
+   english_hunspell and german_hunspell
+   return result only if a word is recognized. Otherwise, stemmer dictionaries
+   are used to process a token.
+  
+
+
+ALTER TEXT SEARCH CONFIGURATION my_config
+  ALTER MAPPING FOR asciiword, word WITH
+   CASE english_hunspell WHEN MATCH THEN KEEP ELSE english_stem END
+UNION
+   CASE 

Re: General purpose hashing func in pgbench

2017-12-25 Thread Ildar Musin
Hello Fabien,


24/12/2017 11:12, Fabien COELHO пишет:
>
> Yep. The ugliness is significantly linked to the choice of name. With
> MM2_MUL and MM2_ROT ISTM that it is more readable:
>
>>     k *= MM2_MUL;
>>     k ^= k >> MM2_ROT;
>>     k *= MM2_MUL;
>>     result ^= k;
>>     result *= MM2_MUL;
Ok, will do.
>
>> [...] So I'd better leave it the way it is. Actually I was thinking
>> to do the same to fnv1a too : )
>
> I think that the implementation style should be homogeneous, so I'd
> suggest at least to stick to one style.
>
> I noticed from the source of all human knowledege (aka Wikipedia:-)
> that there seems to be a murmur3 successor. Have you considered it?
> One good reason to skip it would be that the implementation is long
> and complex. I'm not sure about a 8-byte input simplified version.
Murmur2 naturally supports 8-byte data. Murmur3 has 32- and 128-bit
versions. So to handle int64 I could
1) split input value into two halfs and combine somehow the results of
32 bit version or
2) use 128-bit version and discard higher bytes.

Btw, postgres core already has a 32bit murmur3 implementation, but it
only uses the finalization part of algorithm (see murmurhash32). As my
colleague Yura Sokolov told me in a recent conversation it alone
provides pretty good randomization. I haven't tried it yet though.

>
> Just a question: Have you looked at SipHash24?
>
> https://en.wikipedia.org/wiki/SipHash
>
> The interesting point is that it can use a key and seems somehow
> cryptographically secure, for a similar cost. However the how to
> decide for/control the key is unclear.
>
Not yet. As I can understand from the wiki its main feature is to
prevent attacks with crafted input data. How can it be useful in
benchmarking? Unless it shows superior performance and randomization.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 




Re: Unique indexes & constraints on partitioned tables

2017-12-25 Thread Amit Langote
Hi Alvaro,

On 2017/12/23 6:29, Alvaro Herrera wrote:
> Hello,
> 
> I'm giving this patch its own thread for mental sanity, but this is
> essentially what already posted in [1], plus some doc fixes.  This patch
> depends on the main "local partitioned indexes" in that thread, last
> version of which is at [2].

Thanks for working on this.

Have you considered what happens when ON CONFLICT code tries to depend on
such an index (a partitioned unique index)?  It seems we'll need some new
code in the executor for the same.  I tried the following after applying
your patch:

create table p (a char) partition by list (a);
create table pa partition of p for values in ('a');;
create table pb partition of p for values in ('b');
create unique index on p (a);

insert into p values ('a', 1);
INSERT 0 1

insert into p values ('a', 1);
ERROR:  duplicate key value violates unique constraint "pa_a_idx"
DETAIL:  Key (a)=(a) already exists.

insert into p values ('a', 1) on conflict do nothing;
INSERT 0 0

Fine so far... but

insert into p values ('a', 1) on conflict (a) do nothing;
ERROR:  unexpected failure to find arbiter index

or

insert into p values ('a', 1) on conflict (a) do update set b = excluded.b;
ERROR:  unexpected failure to find arbiter index

I mentioned this case at [1] and had a WIP patch to address that.  Please
find it attached here.  It is to be applied on top of both of your patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/a26f7823-6c7d-3f41-c5fb-7d50dd2f4848%40lab.ntt.co.jp
>From 490a8302b71cbe78e7028879b0dffa2c43d03f33 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 25 Dec 2017 18:45:33 +0900
Subject: [PATCH 3/3] Teach executor to handle ON CONFLICT (key) on partitioned
 tables

---
 src/backend/executor/execIndexing.c   | 14 +---
 src/backend/executor/nodeModifyTable.c| 32 +++
 src/test/regress/expected/insert_conflict.out | 27 +-
 src/test/regress/sql/insert_conflict.sql  | 18 ++-
 4 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/src/backend/executor/execIndexing.c 
b/src/backend/executor/execIndexing.c
index 89e189fa71..f76a2ede76 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -531,10 +531,18 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
if (!indexInfo->ii_ReadyForInserts)
continue;
 
-   /* When specific arbiter indexes requested, only examine them */
+   /*
+* When specific arbiter indexes requested, only examine them.  
If
+* this is a partition (after a tuple is routed to it from the
+* parent into which the original tuple has been inserted), we 
must
+* check the parent index id, instead of our own id, because 
that's
+* the one that appears in the arbiterIndexes list.
+*/
if (arbiterIndexes != NIL &&
-   !list_member_oid(arbiterIndexes,
-
indexRelation->rd_index->indexrelid))
+   !(list_member_oid(arbiterIndexes,
+ 
indexRelation->rd_index->indexrelid) ||
+ list_member_oid(arbiterIndexes,
+ 
indexRelation->rd_index->indparentidx)))
continue;
 
if (!indexRelation->rd_index->indimmediate)
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index afb83ed3ae..94f819006a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2186,6 +2186,38 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
 
resultRelInfo->ri_onConflictSetWhere = qualexpr;
}
+
+   /* Build the above information for each leaf partition rel */
+   for (i = 0; i < mtstate->mt_num_partitions; i++)
+   {
+   Relationpartrel;
+   List   *leaf_oc_set;
+
+   resultRelInfo = mtstate->mt_partitions[i];
+   partrel = resultRelInfo->ri_RelationDesc;
+
+   /* varno = node->nominalRelation */
+   leaf_oc_set = 
map_partition_varattnos(node->onConflictSet,
+   
node->nominalRelation,
+   
partrel, rel, NULL);
+   resultRelInfo->ri_onConflictSetProj =
+   ExecBuildProjectionInfo(leaf_oc_set, 
econtext,
+   

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-25 Thread Amit Langote
On 2017/12/25 13:52, Amit Langote wrote:
> Hi Alvaro,
> 
> On 2017/12/23 0:10, Alvaro Herrera wrote:
>> I believe these are all fixed by the attached delta patch.
> 
> @@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end)
> "UNION SELECT 'ALL IN TABLESPACE'");
>  /* ALTER INDEX  */
>  else if (Matches3("ALTER", "INDEX", MatchAny))
> -COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO",
> "SET", "RESET");
> +COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET",
> +"RESET", "ATTACH PARTITION");
> 
> This should be COMPLETE_WITH_LIST6().

I noticed a few things I thought I'd report.  I was using the latest patch
(v8 + the delta patch).

1. The following crashes because of an Assert in ATExecCmd():

ALTER TABLE an_index DETACH PARTITION ...

It seems ATPrepCmd() should not pass ATT_INDEX to ATSimplePermissions()
for the ATDetachPartition sub-command type.

2. Something strange about how dependencies are managed:

create table p (a char) partition by list (a);
create table pa partition of p for values in ('a');;
create table pb partition of p for values in ('b');
create index on p (a);

\d pa
   Table "public.pa"
 Column | Type | Collation | Nullable | Default
|---+--+---+--+-
 a  | character(1) |   |  |
Partition of: p FOR VALUES IN ('a')
Indexes:
"pa_a_idx" btree (a)

drop table pa;
ERROR:  cannot drop table pa because other objects depend on it
DETAIL:  index p_a_idx depends on table pa
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

set client_min_messages to debug2;

drop table pa;
DEBUG:  drop auto-cascades to index pa_a_idx
DEBUG:  drop auto-cascades to index pb_a_idx
DEBUG:  drop auto-cascades to type pa
DEBUG:  drop auto-cascades to type pa[]
ERROR:  cannot drop table pa because other objects depend on it
DETAIL:  index p_a_idx depends on table pa
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Thanks,
Amit




Re: AS OF queries

2017-12-25 Thread Konstantin Knizhnik



On 25.12.2017 06:26, Craig Ringer wrote:
On 24 December 2017 at 04:53, konstantin knizhnik 
> wrote:




But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?


That's totally impractical, you'd have unbounded bloat and a 
nonfunctional system in no time.


You'd need a mechanism - akin to what we have with replication slots - 
to set a threshold for age.


Well, there are systems with "never delete" and "append only" semantic.
For example, I have participated in SciDB project: database for 
scientific applications.

One of the key requirements for scientific researches is reproducibility.
From the database point of view it means that we need to store all raw 
data and never delete it.
If you performed some measurements and made some conclusions based on 
this results, then everybody should be able to repeat it, even if later
you find some errors in input data and made corrections or just add more 
data.
So one of the SciDB requirements was to store all versions. Delete 
operation should just mark data as been deleted (although later we have 
to add true delete:)


But I agree with you: in most cases more flexible policy of managing 
versions is needed.

I am not sure that it should be similar with logical replication slot.
Here semantic is quite clear: we preserve segments of WAL until them are 
replicated to the subscribers.
With time travel situation is less obscure: we may specify some 
threshold for age - keep data for example for one year.
But what if somebody later wants to access  older data? At this moment 
them are already lost...


It seems to me that version pinning policy mostly depends on source of 
the data.
If  them have "append only" semantic (like as raw scientific data, 
trading data, measurements from IoT sensors...)

then it will be desirable to keep all version forever.
If we speak about OLTP tables (like accounts in pgbench), then may be 
time travel is not the proper mechanism for such data at all.


I think that in addition to logged/unlogged tables it will be useful to 
support historical/non-historical tables. Historical table should 
support time travel, while
non-historical (default) acts like normal table. It is already possible 
in Postgres to disable autovacuum for particular tables.
But unfortunately trick with snapshot (doesn't matter how we setup 
oldest xmin horizon) affect all tables.
There is similar (but not the same) problem with logical replication: 
assume that we need to replicate only one small table. But we have to 
pin in WAL all updates of other huge table which is not involved in 
logical replication at all.





> Then there's another issue that logical replication has had to deal
> with -- catalog changes. You can't start looking at tuples that
have a
> different structure than the current catalog unless you can
figure out
> how to use the logical replication infrastructure to use the old
> catalogs. That's a huge problem to bite off and probably can just be
> left for another day if you can find a way to reliably detect the
> problem and raise an error if the schema is inconsistent.


Yes, catalog changes this is another problem of time travel.
I do not know any suitable way to handle several different catalog
snapshots in one query.


I doubt it's practical unless you can extract it to subplans that can 
be materialized separately. Even then, UDTs, rowtype results, etc...


Well, I am really not sure about user's demands to time travel. This is 
one of the reasons of initiating this discussion in hackers... May be it 
is not the best place for such discussion, because there are mostly 
Postgres developers and not users...
At least, from experience of few SciDB customers, I can tell that we 
didn't have problems with schema evolution: mostly schema is simple, 
static and well defined.
There was problems with incorrect import of data (this is why we have to 
add real delete), with splitting data in chunks (partitioning),...



The question is how we should handle such catalog changes if them
are happen. Ideally we should not allow to move back beyond  this
point.
Unfortunately it is not so easy to implement.


I think you can learn a lot from studying logical decoding here.


Working with multimaster and shardman I have to learn a lot about 
logical replication.
It is really powerful and flexible mechanism ... with a lot of 
limitations and problems: lack of catalog replication, inefficient bulk 
insert, various race conditions,...
But I think that time travel and logical replication are really serving 
different goals so require different approaches.





--
 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Konstantin Knizhnik
Postgres Professional: