date:20160813

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-13 Thread Amit Kapila

On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia
 wrote:
> Hi All,
>
> Recently while running tpc-h queries on postgresql master branch, I am
> noticed
> random server crash. Most of the time server crash coming while turn tpch
> query
> number 3 - (but its very random).
>
>
> Here its clear that work_instrument is either corrupted or Un-inililized
> that is the
> reason its ending up with server crash.
>
> With bit more debugging and looked at git history I found that issue started
> coming
> with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
> calls
> ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
> calls ExecParallelFinish() which collects the instrumentation before marking
> ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the
> allocation
> of planstate->worker_instrument.
>
> With commit af33039317 now we calling the gather_readnext() with per-tuple
> context,
> but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
> allocation
> of planstate->worker_instrument into per-tuple context - which is wrong.
>
> Now fix can be:
>
> 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
> let
> ExecEndGather() do that things.
>

I don't think we can wait till ExecEndGather() to collect statistics,
as we need it before that for explain path.  However, we do call
ExecShutdownNode() from ExecutePlan() when there are no more tuples
which can take care of ensuring the shutdown of Gather node.   I think
the advantage of calling it in
gather_readnext() is that it will resources to be released early and
populating the instrumentation/statistics as early as possible.

> But with this change, gather_readread() and
> gather_getnext() depend on planstate->reader structure to continue reading
> tuple.
> Now either we can change those condition to be depend on planstate->nreaders
> or
> just pfree(planstate->reader) into gather_readnext() instead of calling
> ExecShutdownGatherWorkers().
>
>
> Attaching patch, which fix the issue with approach 1).
>

AFAICS, your patch seems to be the right fix for this issue, unless we
need the instrumentation information during execution (other than for
explain) for some purpose.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new autovacuum criterion for visible pages

2016-08-13 Thread Amit Kapila

On Thu, Aug 11, 2016 at 9:29 PM, Jeff Janes  wrote:
> On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila  wrote:
>> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes  wrote:
>>> I wanted to create a new relopt named something like
>>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>>> vacuum a table once less than a certain fraction of the relation's
>>> pages are marked allvisible.
>>>
>>
>> Why would it more convenient for a user to set such a parameter as
>> compare to existing parameters (autovacuum_vacuum_threshold +
>> autovacuum_vacuum_scale_factor)?
>
> Insertions and HOT-updates clear vm bits but don't increment the
> counters that those existing parameters are compared to.
>
> Also, the relationship between number of updated/deleted rows and the
> number of hint-bits cleared can be hard to predict due to possible
> clustering of the updates into the same blocks.  So it would be hard
> to know what to set the values to.
>

Okay.  What I was slightly worried about was that how many users can
understand *pagevisible_* parameters as compare to what we have now
(number of updated/deleted tuples).  However if we have some mechanism
where autovacuum can be triggered automatically based on
pagevisibility, then I think that would be quite beneficial (not sure,
if such a mechanism can be feasible).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Vladimir Sitnikov

Victor>We don't have 190 message catalog translations in the PostgreSQL.
Victor>So problem with encoding for messages is quite limited.

Even though the number of translations is limited, there's a problem when
trying to tell one "one-byte-encoding" from another "one-byte" one.
It would be so much better if ServerErrorMessages included encoding right
in the message itself.

For pgjdbc, I've implemented a workaround that relies on the following:
1) It knows how "FATAL" looks like in several translations, and it knows
often used encodings in those translations. For instance, for Russian it
tries CP1251, KOI8, and ALT encodings. It converts "ВАЖНО" (Russian for
FATAL) using those three encodings and searches that byte sequence in the
error message. If there's a match, then the encoding is identified.
2) Unfortunately, it does not help for Japanese, as "FATAL there is
translated as FATAL". So I hard-coded several typical words like
"database", "user", "role" (see [1]), so if those byte sequences are
present, the message is assumed to be in Japanese. It would be great if
someone could review those as I do not speak Japanese.
3) Then it tries different LATIN encodings.

Here's the commit
https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2

Kyotaro> Is there any source to know the compatibility for any combination
Kyotaro> of language vs encoding? Maybe we need a ground for the list.

I use "locale -a" for that.

For instance, for Japanese it prints the following on my machine (OS X
10.11.6):
locale -a | grep ja
ja_JP
ja_JP.eucJP
ja_JP.SJIS
ja_JP.UTF-8

[1]:
https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2#diff-57ed15f90f50144391f1c134bf08a45cR47

Vladimir

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Victor Wagner

On Sat, 13 Aug 2016 09:24:47 +
Vladimir Sitnikov  wrote:

> Victor>We don't have 190 message  catalog translations in the
> Victor>PostgreSQL. So problem with encoding for messages is quite
> Victor>limited.
> 
> Even though the number of translations is limited, there's a problem
> when trying to tell one "one-byte-encoding" from another "one-byte"
> one. It would be so much better if ServerErrorMessages included
> encoding right in the message itself.

I think it is better to avoid such a problem and fix system so server
would never send a message in the encoding, different from client one.
It is not a client job to convert encodings.

In most cases server does know which encoding client requests from the
very first protocol message. (if it is startup message). 
So, server can easily tell if it is able to convert NLS messages into
the client desired encoding, and if not - fall back to untranslated
messages.

-- 
   Victor Wagner 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Vladimir Sitnikov

Shay>I don't know much about the Java world, but both pgbouncer and pgpool
(the major pools?)

In Java world, https://github.com/brettwooldridge/HikariCP is a very good
connection pool.
Neither pgbouncer nor pgpool is required.
The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL

The idea is HikariCP pools pgjdbc connections, and server-prepared
statements are per-pgjdbc connection, so there is no reason to "discard
all" or "deallocate all" or whatever.

Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do
with any bugs or issues pgbouncer may have.

That is configurable. If pgbouncer/pgpool defaults to "wrong
configuration", why should we (driver developers, backend developers) try
to accommodate that?

Shay> What? Do you mean you do implicit savepoints and autorollback too?

I mean that.
On top of that it enables opt-in psql-like ON_ERROR_ROLLBACK functionality
so it could automatically roll back the latest statement if it failed.
For instance, that might simplify migration from other DBs that have the
same "rollback just one statement on error" semantics by default.

Shay>How does the driver decide when to do a savepoint?

By default it would set a savepoint in a case when there's open
transaction, and the query to be executed has been previously described.

In other words, the default mode is to protect user from "cached plan
cannot change result type" error.

Shay>Is it on every single command?

Exactly (modulo the above). If user manually sets "autosave=always", every
command would get prepended with a savepoint and rolled back.

Shay>f you do a savepoint on every single command, that surely would impact
performance even without extra roundtrips...?

My voltmeter says me that the overhead is just 3us for a simple "SELECT"
statement (see
https://github.com/pgjdbc/pgjdbc/pull/477#issuecomment-239579547 ).
I think it is acceptable to have it enabled by default, however it would be
nice if the database did not require a savepoint dance to heal its "not
implemented" "cache query cannot change result type" error.


Shay>I'm not aware of other drivers that implicitly prepare statements,
Shay >and definitely of no drivers that internally create savepoints and
Shay> roll the back without explicit user APIs

Glad to hear that.
Are you aware of other drivers that translate "insert into table(a,b,c)
values(?,?,?)" =into=> "insert into table(a,b,c)
values(?,?,?),(?,?,?),...,(?,?,?)"
statement on the fly?

That gives 2-3x performance boost (that includes client result processing
as well!) for batched inserts since "bind/exec" message processing is not
that fast. That is why I say that investing into "bind/exec performance"
makes more sense than investing into "improved execution of non-prepared
statements".

Shay>you're doing something very different - not necessarily wrong - and not
Shay>attempt to impose your ideas on everyone as if it's the only true way
Shay>to write a db driver.

1) Feel free to pick ideas you like

2) I don't say "it is the only true way". I would change my mind if someone
would suggest better solution. Everybody makes mistakes, and I have no
problem with admitting that "I made a mistake" and moving forward.
They say: "Don't cling to a mistake just because you spent a lot of time
making it"

However, no-one did suggest a case when application issues lots of unique
SQL statements.
The suggested alternative "a new message for non-prepared extended query"
might shave 5-10us per query for those who are lazy to implement a query
cache. However, that is just 5-10ms per 1000 queries. Would that be
noticeable by the end-users? I don't think so.

Having a query cache can easily shave 5-10+ms for each query, that is why I
suggest driver developers to implement a query cache first, and only then
ask for new performance-related messages.

3) For "performance related" issues, a business case and a voltmeter is
required to prove there's an issue.


Shay>But the second query, which should be invalidated, has already been
Shay>sent to the server (because of batching), and boom

-- Doctor, it hurts me when I do that
-- Don't do that

When doing batched SQL, some of the queries might fail with "duplicate
key", or "constraint violation". There's already API that covers those kind
of cases and reports which statements did succeed (if any).
In the case as you described (one query in a batch somehow invalidates the
subsequent one) it would just resort to generic error handling.


Shay>When would you send this ValidatePreparedStatement?
Shay>Before each execution as a separate roundtrip?
Shay>That would kill performance.

Why do you think the performance would degrade? Once again: the current
problem is the client thinks it knows "which columns will be returned by a
particular server-prepared statement" but the table might get change behind
the scenes (e.g. online schema upgrade), so the error occurs. That "return
type" is already validated by the database (the time is al

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Dave Cramer

On 11 August 2016 at 10:18, Shay Rojansky  wrote:

>
>
> On Thu, Aug 11, 2016 at 1:22 PM, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
> 2) The driver can use safepoints and autorollback to the good "right
>> before failure" state in case of a known failure. Here's the
>> implementation: https://github.com/pgjdbc/pgjdbc/pull/477
>>
> As far as I can remember, performance overheads are close to zero (no
>> extra roundtrips to create a safepoint)
>>
>
> What? Do you mean you do implicit savepoints and autorollback too? How
> does the driver decide when to do a savepoint? Is it on every single
> command? If not, commands can get lost when an error is raised and you
> automatically roll back? If you do a savepoint on every single command,
> that surely would impact performance even without extra roundtrips...?
>
> You seem to have a very "unique" idea of what a database driver should do
> under-the-hood for its users. At the very least I can say that your concept
> is very far from almost any database driver I've seen up to now (PostgreSQL
> JDBC, psycopg, Npgsql, libpq...). I'm not aware of other drivers that
> implicitly prepare statements, and definitely of no drivers that internally
> create savepoints and roll the back without explicit user APIs. At the very
> least you should be aware (and also clearly admit!) that you're doing
> something very different - not necessarily wrong - and not attempt to
> impose your ideas on everyone as if it's the only true way to write a db
> driver.
>


A number of other drivers default to this behaviour, including at least
MS-SQL and Oracle. psqlODBC also supports this behaviour with statement
rollback mode. And obviously PostgreSQL JDBC which Vladimir is referring to.

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Vladimir Sitnikov

Victor>It is not a client job to convert encodings.

Of course.

However, there is a vast amount of old PG versions deployed in the wild
that send wrong data to clients.

This indeed makes bad user experience, so it is worth doing 2 things:
1) Implement proper solution in further PostgreSQL versions (e.g. include
encoding name right into the error message).
2) Implement workaround for current drivers, so clients would get proper
error messages even when trying to connect to unpatched server.

Vladimir

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-13 Thread Andrew Gierth

Latest patch.

Names and scopes are as per discussion. New files for code and
regression test. Docs included.

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..4552a74 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16290,6 +16290,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);

 

+pg_indexam_has_property
+   
+
+   
+pg_index_column_has_property
+   
+
+   
+pg_index_has_property
+   
+
+   
 pg_options_to_table

 
@@ -16477,6 +16489,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   number of columns, pretty-printing is implied
   
   
+   pg_indexam_has_property(am_oid, prop_name)
+   boolean
+   Test whether an index access method has a specified property
+  
+  
+   pg_index_column_has_property(index_oid, column_no, prop_name)
+   boolean
+   Test whether an index column has a specified property
+  
+  
+   pg_index_has_property(index_oid, prop_name)
+   boolean
+   Test whether the access method for the specified index has a specified property
+  
+  
pg_options_to_table(reloptions)
setof record
get the set of storage option name/value pairs
@@ -16620,6 +16647,141 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   
 
   
+   pg_indexam_has_property,
+   pg_index_has_property, and
+   pg_index_column_has_property return whether the
+   specified access method, index, or index column possesses the named
+   property. NULL is returned if the property name is not
+   known; true if the property is present,
+   false if it is not. Refer to
+for column properties,
+for index properties, and
+for access method properties.
+  
+
+  
+   Index Column Properties
+   
+
+ NameDescription
+
+
+ 
+  asc
+  Does the column sort in ascending order on a forward scan?
+  
+ 
+ 
+  desc
+  Does the column sort in descending order on a forward scan?
+  
+ 
+ 
+  nulls_first
+  Does the column sort with nulls first on a forward scan?
+  
+ 
+ 
+  nulls_last
+  Does the column sort with nulls last on a forward scan?
+  
+ 
+ 
+  orderable
+  Does the column possess any ordering properties such
+  as ASC or DESC
+ 
+ 
+  distance_orderable
+  Can the column be returned in order by a "distance" operator,
+  for example ORDER BY col <-> constant
+ 
+ 
+  returnable
+  Can the column value be returned in an index-only scan?
+  
+ 
+ 
+  search_array
+  Does the column support array queries with ANY
+  natively in the index AM?
+ 
+ 
+  search_nulls
+  Does the column support IS NULL or
+  IS NOT NULL conditions in the index?
+  
+ 
+
+   
+  
+
+  
+   Index Properties
+   
+
+ NameDescription
+
+
+ 
+  clusterable
+  Can this index be used in a CLUSTER operation?
+  
+ 
+ 
+  backward_scan
+  Can this index be scanned in the reverse direction?
+  
+ 
+ 
+  index_scan
+  Does this index support plain (non-bitmap) scans?
+  
+ 
+ 
+  bitmap_scan
+  Does this index support bitmap scans?
+  
+ 
+
+   
+  
+
+  
+   Index Access Method Properties
+   
+
+ NameDescription
+
+
+ 
+  can_order
+  Does this access method support ASC,
+  DESC and related keywords on columns in
+  CREATE INDEX?
+  
+ 
+ 
+  can_unique
+  Does this access method support
+  CREATE UNIQUE INDEX?
+  
+ 
+ 
+  can_multi_col
+  Does this access method support multiple columns?
+  
+ 
+ 
+  can_exclude
+  Does this access method support exclusion constraints?
+  
+ 
+
+   
+  
+
+  
pg_options_to_table returns the set of storage
option name/value pairs
(option_name/option_value) when passed
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index e8034b9..fece954 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -16,11 +16,14 @@
 
 #include "access/gist_private.h"
 #include "access/gistscan.h"
+#include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_opclass.h"
 #include "miscadmin.h"
 #include "utils/index_selfuncs.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "utils/syscache.h"
 
 
 /* non-export function prototypes */
@@ -87,6 +90,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amendscan = gistendscan;
 	amroutine->ammarkpos = NULL;
 	amroutine->amrestrpos = NULL;
+	amroutine->amproperty = gistproperty;
 
 	PG_RETURN_POINTER(amroutine);
 }
@@ -1490,6 +1494,98 @@ freeGISTstate(GISTSTATE *giststate)
 	MemoryContextDelete(giststate->scanCxt);
 }
 
+
+/*
+ *	gistproperty() -

Re: [HACKERS] Logical Replication WIP

2016-08-13 Thread Steve Singer


On 08/05/2016 11:00 AM, Petr Jelinek wrote:

Hi,

as promised here is WIP version of logical replication patch.



Thanks for keeping on this.  This is important work


Feedback is welcome.



+
+  Publication
+  
+A Publication object can be defined on any master node, owned by one
+user. A Publication is a set of changes generated from a group of
+tables, and might also be described as a Change Set or Replication Set.
+Each Publication exists in only one database.

'A publication object can be defined on *any master node*'.  I found 
this confusing the first time I read it because I thought it was 
circular (what makes a node a 'master' node? Having a publication object 
published from it?).   On reflection I realized that you mean ' any 
*physical replication master*'.  I think this might be better worded as 
'A publication object can be defined on any node other than a standby 
node'.  I think referring to 'master' in the context of logical 
replication might confuse people.


I am raising this in the context of the larger terminology that we want 
to use and potential confusion with the terminology we use for physical 
replication. I like the publication / subscription terminology you've 
gone with.



 
+Publications are different from table schema and do not affect
+how the table is accessed. Each table can be added to multiple
+Publications if needed.  Publications may include both tables
+and materialized views. Objects must be added explicitly, except
+when a Publication is created for "ALL TABLES". There is no
+default name for a Publication which specifies all tables.
+  
+  
+The Publication is different from table schema, it does not affect
+how the table is accessed and each table can be added to multiple

Those 2 paragraphs seem to start the same way.  I get the feeling that 
there is some point your trying to express that I'm not catching onto. 
Of course a publication is different than a tables schema, or different 
than a function.


The definition of publication you have on the CREATE PUBLICATION page 
seems better and should be repeated here (A publication is essentially a 
group of tables intended for managing logical replication. See Section 
30.1  for details about how 
publications fit into logical replication setup. )



+  
+Conflicts happen when the replicated changes is breaking any
+specified constraints (with the exception of foreign keys which are
+not checked). Currently conflicts are not resolved automatically and
+cause replication to be stopped with an error until the conflict is
+manually resolved.

What options are there for manually resolving conflicts?  Is the only 
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the 
publisher and say ignore it.  I don't think this is something we need to 
support for the first version.



+  Architecture
+  
+Logical replication starts by copying a snapshot of the data on
+the Provider database. Once that is done, the changes on Provider

I notice the user of 'Provider' above do you intend to update that to 
'Publisher' or does provider mean something different. If we like the 
'publication' terminology then I think 'publishers' should publish them 
not providers.



I'm trying to test a basic subscription and I do the following

I did the following:

cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1;
 alter publication testpub1 add table a;
insert into a(b) values ('1');

cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection 
'host=localhost port=5440 dbname=test1';

NOTICE:  created replication slot "testsub2" on provider
NOTICE:  synchronized table states
CREATE SUBSCRIPTION

This resulted in
LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "0494-1" with 0 
transaction IDs

LOG:  logical replication apply for subscription testsub2 started
LOG:  starting logical decoding for slot "testsub2"
DETAIL:  streaming transactions committing after 0/1562618, reading WAL 
from 0/15625E0

LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  logical replication sync for subscription testsub2, table a started
LOG:  logical decoding found consistent point at 0/1562640
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "0495-1" with 0 
transaction IDs

LOG:  logical replication synchronization worker finished processing


The initial sync completed okay, then I did

insert into a(b) values ('2');

but the second insert never replicated.

I had the following output

LOG:  terminating walsender process due to replication timeout


On cluster 1 I do

select

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-08-13 Thread Alvaro Herrera

Michael Paquier wrote:

> Being cautious pays more in the long term, so seeing the number of
> bugs that showed up I'd rather vote for having it disabled by default
> in 9.6 stable, and enabled on master to aim at enabling it in 10.0.

I too prefer to keep it turned off in 9.6 and consider enabling it by
default on a future release (10 is probably good).  Interested users can
carefully test the feature without endangering other unsuspecting users.

I agree with the idea of keeping it enabled in master, so that it'll get
a modicum of testing there by hackers, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Tom Lane

Victor Wagner  writes:
> I think it is better to avoid such a problem and fix system so server
> would never send a message in the encoding, different from client one.

Don't hold your breath waiting for that to happen.

Quite aside from the question of whether we want to trust GUC settings
from the startup packet before we've authenticated the user, there's a
small problem that the server *can't* translate *any* encoding until
it's successfully connected to a database and is able to read the
pg_conversion catalog.

We might be able to institute some rule like "examine the startup
packet and see if it specifies a client_encoding different from what
we inherited from the postmaster.  If not, continue with current
behavior (send messages localized per postmaster's settings).  If so,
fall back to English messages/C locale until startup is complete."
This would preserve current functionality in cases where it actually,
er, functions, while giving something at least passable in the cases
that are broken today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Vladimir Sitnikov

Tom> while giving something at least passable in the cases
that are broken today.

Would you mind adding an explicit "encoding" field to the error message?
At least it would give clear explanation how to parse that message without
resorting to a guess dance.

The biggest problem is client has no idea how to parse backend error
messages. If implementing client_encoding properly is too hard at this
point in time, then I would rather have "encoding field" right in the
startup error message.

That "encoding" field would enable sending properly localized messages in
the future if "pre-connect client_encoding" would be implemented somehow.

Vladimir

Re: [HACKERS] handling unconvertible error messages

2016-08-13 Thread Victor Wagner

On Sat, 13 Aug 2016 12:02:30 -0400
Tom Lane  wrote:

> Victor Wagner  writes:
> > I think it is better to avoid such a problem and fix system so
> > server would never send a message in the encoding, different from
> > client one.  
> 
> Don't hold your breath waiting for that to happen.
> 
> Quite aside from the question of whether we want to trust GUC settings
> from the startup packet before we've authenticated the user, there's a

What's wrong with trusting this particular setting? I cannot think of
any meaningful exploit. Really, there are lot of http servers out
there, which typically do accept connections from anywhere (which is
seldom case for postgresql servers) and trust Accept-Charset and
Accept-Language client header without any authentication.

There can be attacks that exploits errors in the message parsing, 
but startup message is parsed anyway before authentication.

> small problem that the server *can't* translate *any* encoding until
> it's successfully connected to a database and is able to read the
> pg_conversion catalog.

> 
> We might be able to institute some rule like "examine the startup
> packet and see if it specifies a client_encoding different from what
> we inherited from the postmaster.  If not, continue with current
> behavior (send messages localized per postmaster's settings).  If so,
> fall back to English messages/C locale until startup is complete."
> This would preserve current functionality in cases where it actually,
> er, functions, while giving something at least passable in the cases
> that are broken today.

I think that we can do a bit more than this. We use GNU gettext
library to provide message translation. These library are able to
perform limited set of encoding conversion itself.

So, we can have two-stage fallback here:

1. If encoding is different, but compatible with language, inherited
from postmaster, ask gettext via bind_textdomain_encoding function to
provide messages in this encoding.

2. If it is not possible, fall back to English messages, which are
compatible with any of supported encoding. The same goes for session
which doesn't specify encoding at all (i.e. CancelMessage).

> 
>   regards, tom lane
> 
> 

-- 
   Victor Wagner 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-13 Thread Tom Lane

Andrew Gierth  writes:
> Latest patch.
> Names and scopes are as per discussion. New files for code and
> regression test. Docs included.

I'm starting to review this now.  If anyone has objections to the
property names/behaviors shown in Andrew's previous message,
speak up now ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Shay Rojansky

Vladimir wrote:

Shay>I don't know much about the Java world, but both pgbouncer and pgpool
> (the major pools?)
>
> In Java world, https://github.com/brettwooldridge/HikariCP is a very good
> connection pool.
> Neither pgbouncer nor pgpool is required.
> The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL
>
> The idea is HikariCP pools pgjdbc connections, and server-prepared
> statements are per-pgjdbc connection, so there is no reason to "discard
> all" or "deallocate all" or whatever.
>
> Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do
> with any bugs or issues pgbouncer may have.
>
> That is configurable. If pgbouncer/pgpool defaults to "wrong
> configuration", why should we (driver developers, backend developers) try
> to accommodate that?
>

In a nutshell, that sentence represents much of the problem I have with
this conversation: you simply consider that anything that's different from
your approach is simply "wrong".

Shay>f you do a savepoint on every single command, that surely would impact
> performance even without extra roundtrips...?
>
> My voltmeter says me that the overhead is just 3us for a simple "SELECT"
> statement (see https://github.com/pgjdbc/pgjdbc/pull/477#
> issuecomment-239579547 ).
> I think it is acceptable to have it enabled by default, however it would
> be nice if the database did not require a savepoint dance to heal its "not
> implemented" "cache query cannot change result type" error.
>

That's interesting (I don't mean that ironically). Aside from simple
client-observed speed which you're measuring, are you sure there aren't
"hidden" costs on the PostgreSQL side for generating so many implicit
savepoints? I don't know anything about PostgreSQL transaction/savepoint
internals, but adding savepoints to each and every statement in a
transaction seems like it would have some impact. It would be great to have
the confirmation of someone who really knows the internals.

> Shay>I'm not aware of other drivers that implicitly prepare statements,
> Shay >and definitely of no drivers that internally create savepoints and
> Shay> roll the back without explicit user APIs
>
> Glad to hear that.
> Are you aware of other drivers that translate "insert into table(a,b,c)
> values(?,?,?)" =into=> "insert into table(a,b,c) values(?,?,?),
> (?,?,?),...,(?,?,?)" statement on the fly?
>
> That gives 2-3x performance boost (that includes client result processing
> as well!) for batched inserts since "bind/exec" message processing is not
> that fast. That is why I say that investing into "bind/exec performance"
> makes more sense than investing into "improved execution of non-prepared
> statements".
>

I'm not aware of any, but I also don't consider that part of the driver's
job. There's nothing your driver is doing that the application developer
can't do themselves - so your driver isn't faster than other drivers. It's
faster only when used by lazy programmers.

What you're doing is optimizing developer code, with the assumption that
developers can't be trusted to code efficiently - they're going to write
bad SQL and forget to prepare their statements. That's your approach, and
that's fine. The other approach, which is also fine, is that each software
component has its own role, and that clearly-defined boundaries of
responsibility should exist - that writing SQL is the developer's job, and
that delivering it to the database is the driver's job. To me this
separation of layers/roles is key part of software engineering: it results
in simpler, leaner components which interact in predictable ways, and when
there's a problem it's easier to know exactly where it originated. To be
honest, the mere idea of having an SQL parser inside my driver makes me
shiver.

The HikariCP page you sent (thanks, didn't know it) contains a great
Dijkstra quote that summarizes what I think about this - "Simplicity is
prerequisite for reliability.". IMHO you're going the opposite way by
implicitly rewriting SQL, preparing statements and creating savepoints.

But at the end of the day it's two different philosophies. All I ask is
that you respect the one that isn't yours, which means accepting the
optimization of non-prepared statements. There's no mutual exclusion.

> 2) I don't say "it is the only true way". I would change my mind if
> someone would suggest better solution. Everybody makes mistakes, and I have
> no problem with admitting that "I made a mistake" and moving forward.
> They say: "Don't cling to a mistake just because you spent a lot of time
> making it"
>

:)

So your way isn't the only true way, but others are just clinging to their
mistakes... :)

> However, no-one did suggest a case when application issues lots of unique
> SQL statements.
>

We did, you just dismissed or ignored them.

3) For "performance related" issues, a business case and a voltmeter is
> required to prove there's an issue.
>
>
> Shay>But the second query, which should be invalidated

[HACKERS] Undiagnosed bug in Bloom index

2016-08-13 Thread Jeff Janes

I am getting corrupted Bloom indexes in which a tuple in the table
heap is not in the index.

I see it as early as commit a9284849b48b, with commit e13ac5586c49c
cherry picked onto it.  I don't see it before a9284849b48b because the
test-case seg faults before anything interesting can happen.  I think
this is an ab initio bug, either in bloom contrib or in the core index
am.  I see it as recently as 371b572, which is as new as I have
tested.

The problem is that an update which must properly update exactly one
row instead updates zero rows.

It takes 5 to 16 hours to reproduce when run as 8 clients on 8 cores.
I suspect it is some kind of race condition, and testing with more
clients on more cores would make it happen faster.  If you inject
crash/recovery cycles into the system, it seems to happen sooner.  But
crash/recovery cycles are not necessary.

If you use the attached do_nocrash.sh script, the error will generate
a message like:

child abnormal exit update did not update 1 row: key 6838 updated 0E0
at count.pl line 189.\n  at count.pl line 197.

(And I've added code so that once this is detected, the script will
soon terminate)

If you want to run do_nocrash.sh, change the first few lines to
hardcode the correct path for the binaries and the temp data directory
(which will be ruthlessly deleted).  It will run on an unpatched
server, since crash injection is turned off.

If you want to make it fork more clients, change the 8 in 'perl
count.pl 8 0|| on_error;'

I have preserved a large tarball (215M) of a corrupt data directory.
It was run with the a928484 compilation with e13ac5586 cherrypicked,
and is at https://drive.google.com/open?id=0Bzqrh1SO9FcEci1FQTkwZW9ZU1U.
Or, if you can tell me how to look for myself (pageinspect doesn't
offer much for Bloom).

With that tarball, the first query using the index returns nothing,
will the second forcing a seq scan returns a row:

select * from foo where bloom = md5('6838');

select * from foo where bloom||'' = md5('6838');



The machinery posted here is probably much more elaborate than
necessary to detect the problem. You could probably detect it with
pgbench -N, except that that doesn't check the results to make sure
the expected number of rows were actually selected/updated.

Cheers,

Jeff


count.pl
Description: Binary data


do_nocrash.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Vladimir Sitnikov

Shay>To be honest, the mere idea of having an SQL parser inside my driver
makes me shiver.

Same for me.
However I cannot wait for PostgreSQL 18 that does not need client-side
parsing.

Shay>We did, you just dismissed or ignored them

Please prove me wrong, but I did provide a justified answer to both
yours:
https://www.postgresql.org/message-id/CAB%3DJe-FHSwrbJiTcTDeT4J3y_%2BWvN1d%2BS%2B26aesr85swocb7EA%40mail.gmail.com
(starting
with "Why the connection is reset")
and Robert's examples:
https://www.postgresql.org/message-id/CAB%3DJe-GSAs_340dqdrJoTtP6KO6xxN067CtB6Y0ea5c8LRHC9Q%40mail.gmail.com

Shay>There's nothing your driver is doing that the application developer
can't do themselves -
Shay>so your driver isn't faster than other drivers. It's faster only when
used by lazy programmers.

I'm afraid you do not get the point.
ORMs like Hibernate, EclipseLink, etc send regular "insert ... values" via
batch API.
For the developer the only way to make use of "multivalues" is to implement
either "ORM fix" or "the driver fix" or "postgresql fix".

So the feature has very little to do with laziness of the programmers.
Application developer just does not have full control of each SQL when
working though ORM.
Do you suggest "stop using ORMs"? Do you suggest fixing all the ORMs so it
uses optimal for each DB insert statement?
Do you suggest fixing postgresql?

Once again "multivalues rewrite at pgjdbc level" enables the feature
transparently for all the users. If PostgreSQL 10/11 would improve
bind/exec performance, we could even drop that rewrite at pgjdbc level and
revert to the regular flow. That would again be transparent to the
application.

Shay>are you sure there aren't "hidden" costs on the PostgreSQL side for
generating so many implicit savepoints?

Technically speaking I use the same savepoint name through bind/exec
message.

Shay>What you're doing is optimizing developer code, with the assumption
that developers can't be trusted to code efficiently - they're going to
write bad SQL and forget to prepare their statements

Please, be careful. "you are completely wrong here" he-he. Well, you list
the wrong assumption. Why do you think my main assumption is "developers
can't be trusted"?

The proper assumption is: I follow Java database API specification, and I
optimize pgjdbc for the common use case (e.g. ORM or ORM-like).

For instance, if Java application wants to use bind values (e.g. to prevent
security issues), then the only way is to go through
java.sql.PreparedStatement.

Here's the documentation:
https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#prepareStatement-java.lang.String-

Here's a quote:
Javadoc> *Note:* This method is optimized for handling parametric SQL
statements that benefit from precompilation. If the driver supports
precompilation, the methodprepareStatement will send the statement to the
database for precompilation. Some drivers may not support precompilation.
In this case, the statement may not be sent to the database until the
PreparedStatement object is executed. This has no direct effect on users;
however, it does affect which methods throw certainSQLException objects.

The most important part is "if the driver supports precompilation..."
There's no API to enable/disable precompilation at all.
So, when using Java, there is no such thing as
"statement.enableServerPrepare=true".

It is expected, that "driver" would "optimize" the handling somehow in the
best possible way.

It is Java API specification that enables me (as a driver developer) to be
flexible, and leverage database features so end user gets best experience.

Vladimir

Re: [HACKERS] Add hint for function named "is"

2016-08-13 Thread Jim Nasby


On 8/12/16 1:40 PM, Tom Lane wrote:

What this is telling us is that given input like, say,

SELECT 'foo'::character varying

Bison is no longer sure whether "varying" is meant as a type name modifier
or a ColLabel.  And indeed there is *no* principled answer to that that
doesn't involve giving up the ability for "varying" to be a ColLabel.
Just promoting it to a fully reserved word (which it is not today)
wouldn't be enough, because right now even fully reserved words can be
ColLabels.


FWIW, I've always disliked how some types could contains spaces without 
being quoted. AFAIK nothing else in the system allows that, and I don't 
see why character varying and timestamp with* should get a special pass.


I doubt we could get rid of this in CREATE TABLE, but I wonder how many 
people actually cast using the unquoted form.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add hint for function named "is"

2016-08-13 Thread Tom Lane

Jim Nasby  writes:
> FWIW, I've always disliked how some types could contains spaces without 
> being quoted. AFAIK nothing else in the system allows that, and I don't 
> see why character varying and timestamp with* should get a special pass.

Because the SQL standard says so.  If we were going to start ignoring
SQL-standard spellings, this area would not be very high on my list,
actually.  Their insistence on inventing crazy new special syntaxes
for things that could be normal function calls is what bugs me ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] condition variables

2016-08-13 Thread Thomas Munro

On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas  wrote:
> [condition-variable-v1.patch]

Don't you need to set proc->cvSleeping = false in ConditionVariableSignal?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-13 Thread Tom Lane

Andrew Gierth  writes:
> Latest patch.
> Names and scopes are as per discussion. New files for code and
> regression test. Docs included.

Pushed with (mostly) cosmetic adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Tatsuo Ishii

> Shay>I don't know much about the Java world, but both pgbouncer and pgpool
> (the major pools?)
> 
> In Java world, https://github.com/brettwooldridge/HikariCP is a very good
> connection pool.
> Neither pgbouncer nor pgpool is required.
> The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL
> 
> The idea is HikariCP pools pgjdbc connections, and server-prepared
> statements are per-pgjdbc connection, so there is no reason to "discard
> all" or "deallocate all" or whatever.

Interesting. What would happen if a user changes some of GUC
parameters? Subsequent session accidentally inherits the changed GUC
parameter?

> Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do
> with any bugs or issues pgbouncer may have.

That is correct for pgpool as well.

> That is configurable. If pgbouncer/pgpool defaults to "wrong
> configuration", why should we (driver developers, backend developers) try
> to accommodate that?

There's nothing wrong with DICARD ALL. It's just not suitable for your
program (and HikariCP?).

I don't know about pgbouncer but pgpool has been created for a general
purpose connection pooler, which means it must behave as much as
similarly to PostgreSQL backend from frontend's point of
view. "DISCARD ALL" is needed to simulate the behavior of backend: it
discards all properties set by a frontend including prepared
statements when a session terminates. "DISCARD ALL" is perfect for
this goal.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reminder: upcoming 9.6 branch split and rc1 release

2016-08-13 Thread Tom Lane

Just to make sure that everyone's on the same page: the schedule the
release team has agreed to is that we'll branch off REL9_6_STABLE on
Aug 15 (Monday!) and issue 9.6rc1 the week of Aug 29.  Barring scary
bugs emerging, this should keep us on track for 9.6.0 release in
late September.

I will take care of making the branch sometime Monday afternoon.
I plan to do a pgindent run right before the branch, as per


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-08-13 Thread Andreas 'ads' Scherbaum


On 27.07.2016 05:00, Joshua D. Drake wrote:

On 07/26/2016 06:25 PM, Peter Eisentraut wrote:

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.


What is the use of the existing behavior?  You get back an arbitrary
implementation dependent value.  We don't even guarantee what the value
will be.  If we changed it to return a different implementation
dependent value, would users get upset?


No they would not get upset because they wouldn't know.

Can we just do the right thing?


Attached is a patch to "do the right thing". The verification is in 
"to_date()" now, the extra function is removed. Regression tests are 
updated - two or three of them returned a wrong date before, and still 
passed. They fail now. Documentation is also updated.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


to_date_valid.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: Barriers

2016-08-13 Thread Thomas Munro

Hi hackers,

I would like to propose "barriers" for Postgres processes. A barrier
is a very simple mechanism for coordinating parallel computation, as
found in many threading libraries.

First, you initialise a Barrier object somewhere in shared memory,
most likely in the DSM segment used by parallel query, by calling
BarrierInit(&barrier, nworkers). Then workers can call
BarrierWait(&barrier) when they want to block until all workers arrive
at the barrier. When the final worker arrives, BarrierWait returns in
all workers, releasing them to continue their work. One arbitrary
worker receives a different return value as a way of "electing" it to
perform serial phases of computation. For parallel phases of
computation, the return value can be ignored. For example, there may
be preparation, merging, or post-processing phases which must be done
by just one worker, interspersed with phases where all workers do
something.

My use case for this is coordinating the phases of parallel hash
joins, but I strongly suspect there are other cases. Parallel sort
springs to mind, which is why I wanted to post this separately and
earlier than my larger patch series, to get feedback from people
working on other parallel features.

A problem that I'm still grappling with is how to deal with workers
that fail to launch. What I'm proposing so far is based on static
worker sets, where you can only give the number of workers at
initialisation time, just like pthread_barrier_init. Some other
libraries allow for adjustable worker sets, and I'm wondering if a
parallel leader might need to be able to adjust the barrier when it
hears of a worker not starting. More on that soon.

Please see the attached WIP patch. I had an earlier version with its
own waitlists and signalling machinery etc, but I've now rebased it to
depend on Robert Haas's proposed condition variables, making this code
much shorter and sweeter. So it depends on his
condition-variable-vX.patch[1], which in turn depends on my
lwlocks-in-dsm-vX.patch[2] (for proclist).

When Michaël Paquier's work on naming wait points[3] lands, I plan to
include event IDs as an extra argument to BarrierWait which will be
passed though so as to show up in pg_stat_activity. Then you'll be
able to see where workers are waiting for each other! For now I
didn't want to tangle this up with yet another patch.

I thought about using a different name to avoid colliding with
barrier.h and overloading the term: there are of course also compiler
barriers and memory barriers. But then I realised that that header
was basically vacant real estate, and 'barrier' is the super-well
established standard term for this parallel computing primitive.

I'd be grateful for any thoughts, feedback, flames etc.

[1]
https://www.postgresql.org/message-id/flat/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/CAEepm%3D0Vvr9zgwHt67RwuTfwMEby1GiGptBk3xFPDbbgEtZgMg%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/flat/cab7npqtghfouhag1ejrvskn8-e5fpqvhm7al0tafsdzjqg_...@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

barrier-v1.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Vladimir Sitnikov

Tatsuo>Interesting. What would happen if a user changes some of GUC
parameters? Subsequent session accidentally inherits the changed GUC
parameter?

Yes, that is what happens.
The idea is not to mess with gucs.

Tatsuo>There's nothing wrong with DICARD ALL
Tatsuo>"DISCARD ALL" is perfect for this goal.

It looks like you mean: "let's reset the connection state just in case".
I see where it might help: e.g. when providing database access to random
people who might screw a connection in every possible way.

Just in case: do you think one should reset CPU caches, OS filesystem
caches, DNS caches, bounce the application, and bounce the OS in addition
to "discard all"?
Why reset only "connection properties" when we can reset everything to its
pristine state?

Just in case: PostgreSQL does not execute "discard all" on its own.
If you mean "pgpool is exactly like reconnect to postgresql but faster
since connection is already established", then OK, that might work in some
cases (e.g. when response times/throughput are not important), however why
forcing "you must always start from scratch" execution model?

Developers are not that dumb, and they can understand that "changing gucs
at random is bad".

When a connection pool is dedicated to a particular application (or a set
of applications), then it makes sense to reuse statement cache for
performance reasons.
Of course, it requires some discipline like "don't mess with gucs", however
that is acceptable and it is easily understandable by developers.

My application cannot afford re-parsing hot SQL statements as hurts
performance. It is very visible in the end-to-end performance tests, so
"discard all" is not used, and developers know not to mess with gucs.

Vladimir

Re: [HACKERS] Undiagnosed bug in Bloom index

2016-08-13 Thread Tom Lane

Jeff Janes  writes:
> I am getting corrupted Bloom indexes in which a tuple in the table
> heap is not in the index.

Hmm.  I can trivially reproduce a problem, but I'm not entirely sure
whether it matches yours.  Same basic test case as the bloom regression
test:

regression=# CREATE TABLE tst (  
i   int4,
t   text
);
CREATE TABLE
regression=# CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);
CREATE INDEX
regression=# INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM 
generate_series(1,2000) i;
INSERT 0 2000
regression=# vacuum verbose tst;
...
INFO:  index "bloomidx" now contains 2000 row versions in 5 pages
...
regression=# delete from tst;
DELETE 2000
regression=# vacuum verbose tst;
...
INFO:  index "bloomidx" now contains 0 row versions in 5 pages
DETAIL:  2000 index row versions were removed.
...
regression=# INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM 
generate_series(1,2000) i;
INSERT 0 2000
regression=# vacuum verbose tst;
...
INFO:  index "bloomidx" now contains 1490 row versions in 5 pages
...

Ooops.  (Note: this is done with some fixes already in place to make
blvacuum.c return correct tuple counts for VACUUM VERBOSE; right now
it tends to double-count during a VACUUM.)

The problem seems to be that (1) blvacuum marks all the index pages as
BLOOM_DELETED, but doesn't bother to clear the notFullPage array on the
metapage; (2) blinsert uses a page from notFullPage, failing to notice
that it's inserting data into a BLOOM_DELETED page; (3) once we ask the
FSM for a page, it returns a BLOOM_DELETED page that we've already put
tuples into, which we happily blow away by reinit'ing the page.

A race-condition variant of this could be that after an autovacuum
has marked a page BLOOM_DELETED, but before it's reached the point of
updating the metapage, blinsert could stick data into the deleted page.
That would make it possible to reach the problem without requiring the
extreme edge case that blvacuum finds no partly-full pages to put into the
metapage.  If this does explain your problem, it's probably that variant.

Will push a fix in a bit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-13 Thread Tatsuo Ishii

> Tatsuo>Interesting. What would happen if a user changes some of GUC
> parameters? Subsequent session accidentally inherits the changed GUC
> parameter?
> 
> Yes, that is what happens.

Ouch.

> The idea is not to mess with gucs.
> 
> Tatsuo>There's nothing wrong with DICARD ALL
> Tatsuo>"DISCARD ALL" is perfect for this goal.
> 
> It looks like you mean: "let's reset the connection state just in case".
> I see where it might help: e.g. when providing database access to random
> people who might screw a connection in every possible way.

Yes, that's what clients of pgpool is expecting. Clients do not want
to change their applications which were working with PostgreSQL
without pgpool.

> Just in case: do you think one should reset CPU caches, OS filesystem
> caches, DNS caches, bounce the application, and bounce the OS in addition
> to "discard all"?

Aparently no, because that is different from what PostgreSQL is doing
when backend exits.

> Why reset only "connection properties" when we can reset everything to its
> pristine state?

You can propose that kind of variants of DISCARD command. PostgreSQL
is an open source project.

> Just in case: PostgreSQL does not execute "discard all" on its own.
> If you mean "pgpool is exactly like reconnect to postgresql but faster
> since connection is already established", then OK, that might work in some
> cases (e.g. when response times/throughput are not important), however why
> forcing "you must always start from scratch" execution model?

I'm not doing that. I do not ask all the people to use pgpool. People
can choose whatever tools he/she thinks suitable for their purpose.

> Developers are not that dumb, and they can understand that "changing gucs
> at random is bad".
> 
> When a connection pool is dedicated to a particular application (or a set
> of applications), then it makes sense to reuse statement cache for
> performance reasons.
> Of course, it requires some discipline like "don't mess with gucs", however
> that is acceptable and it is easily understandable by developers.

I'm not against such a developer's way. If you like it, you can do
it. Nobody disturbs you. I just want to say that not all the client
want that way.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-13 Thread Amit Kapila

On Sat, Aug 13, 2016 at 1:05 AM, David G. Johnston
 wrote:
>
> I'll admit I haven't tried to find fault with the idea (or discover better
> alternatives) nor how it would look in implementation.  As a user, though,
> it would make sense if the system behaved in this way.  That only AUTOCOMMIT
> needs this capability at the moment doesn't bother me.  I'm also fine with
> making it an error and moving on - but if you want to accommodate both
> possibilities this seems like a cleaner solution than yet another
> environment variable that a user would need to consider.
>

I agree on your broad point that we should consider user convenience,
but  in this case I am not sure if it will be convenient for a user to
make the behaviour dependent on value of ON_ERROR_STOP.  I have
checked this behaviour in one of the top commercial database and it
just commits the previous transaction after the user turns the
Autocommit to on and the first command is complete.  I am not saying
that we should blindly follow that behaviour, but just to indicate
that it should be okay for users if we don't try to define multiple
behaviours here based on value of ON_ERROR_STOP.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Barriers

2016-08-13 Thread konstantin knizhnik

Hi Thomas,

Barriers are really very simple and convenient mechanism for process 
synchronization.
But it is actually a special case of semaphores: having semaphore primitive it 
is trivial to implement a barrier.
We have semaphores in Postgres, but ... them can not be used by extensions: 
there is fixed number of semaphores allocated based on maximal number of 
connections and there is no mechanism for requesting additional semaphores. 

Rober has recently proposed conditional variables, which are also very useful. 
Right now we have spinlocks, LW-locks and latches.
>From my point of view it is not enough. While creating various extensions for 
>Postgres I always fill lack of such synchronization primitive as events 
>(condition variables) and semaphores. Events and semaphores are similar and it 
>is possible to implement any of them based on another. But from user's point 
>of view them have different semantic and use cases, so it is better to provide 
>both of them.

I wonder if we should provide system-abstraction-layer (SAL) for Postgres, 
where we can isolate all system dependent code and which will be available for 
vore developers as well as for developers of extensions?  The obvious 
candidates for SAL are:
1. Synchronization primitives (locks, events, semaphores, mutexes, spinlocks, 
latches, barriers)
2. Shared memory
3. File access
4. Network sockets
5. Process control (fork, ...)

Certainly it requires a lot of refactoring but will make Postgres code much 
more elegant, easer to read and maintain.
Also it is not necessary to do all this changes  in one step: here we do not 
need atomic transactions:)
We can start for example with synchronization primitives, as far as in any case 
a lot of changes are proposed here.

Parallel execution is one of the most promising approach to improve Postgres 
performance. I do not mean just parallel execution of single query.
Parallel vacuum, parallel index creation, parallel sort, ... And to implement 
all this stuff we definitely need convenient and efficient synchronization 
primitives.
The set of such primitives can be discussed. IMHO it should include RW-locks 
(current LW-locks), mutexes (spinlock + some mechanism to wait), events 
(condition variables), semaphores and barriers (based on semaphores). Latches 
can be left for backward compatibility or be replaced with events.
I wonder if somebody has measured  how much times latches (signal+socket) are 
slower then posix semaphores or conditional variables?

On Aug 14, 2016, at 2:18 AM, Thomas Munro wrote:

> Hi hackers,
> 
> I would like to propose "barriers" for Postgres processes.  A barrier
> is a very simple mechanism for coordinating parallel computation, as
> found in many threading libraries.
> 
> First, you initialise a Barrier object somewhere in shared memory,
> most likely in the DSM segment used by parallel query, by calling
> BarrierInit(&barrier, nworkers).  Then workers can call
> BarrierWait(&barrier) when they want to block until all workers arrive
> at the barrier.  When the final worker arrives, BarrierWait returns in
> all workers, releasing them to continue their work.  One arbitrary
> worker receives a different return value as a way of "electing" it to
> perform serial phases of computation.  For parallel phases of
> computation, the return value can be ignored.  For example, there may
> be preparation, merging, or post-processing phases which must be done
> by just one worker, interspersed with phases where all workers do
> something.
> 
> My use case for this is coordinating the phases of parallel hash
> joins, but I strongly suspect there are other cases.  Parallel sort
> springs to mind, which is why I wanted to post this separately and
> earlier than my larger patch series, to get feedback from people
> working on other parallel features.
> 
> A problem that I'm still grappling with is how to deal with workers
> that fail to launch.  What I'm proposing so far is based on static
> worker sets, where you can only give the number of workers at
> initialisation time, just like pthread_barrier_init.  Some other
> libraries allow for adjustable worker sets, and I'm wondering if a
> parallel leader might need to be able to adjust the barrier when it
> hears of a worker not starting.  More on that soon.
> 
> Please see the attached WIP patch.  I had an earlier version with its
> own waitlists and signalling machinery etc, but I've now rebased it to
> depend on Robert Haas's proposed condition variables, making this code
> much shorter and sweeter.  So it depends on his
> condition-variable-vX.patch[1], which in turn depends on my
> lwlocks-in-dsm-vX.patch[2] (for proclist).
> 
> When Michaël Paquier's work on naming wait points[3] lands, I plan to
> include event IDs as an extra argument to BarrierWait which will be
> passed though so as to show up in pg_stat_activity.  Then you'll be
> able to see where workers are waiting for each other!  For now I
> didn't want to tangle this up

Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2

Re: [HACKERS] new autovacuum criterion for visible pages

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] Slowness of extended protocol

Re: [HACKERS] Slowness of extended protocol

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

Re: [HACKERS] Logical Replication WIP

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] handling unconvertible error messages

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

Re: [HACKERS] Slowness of extended protocol

[HACKERS] Undiagnosed bug in Bloom index

Re: [HACKERS] Slowness of extended protocol

Re: [HACKERS] Add hint for function named "is"

Re: [HACKERS] Add hint for function named "is"

Re: [HACKERS] condition variables

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

Re: [HACKERS] Slowness of extended protocol

[HACKERS] Reminder: upcoming 9.6 branch split and rc1 release

Re: [HACKERS] to_date_valid()

[HACKERS] WIP: Barriers

Re: [HACKERS] Slowness of extended protocol

Re: [HACKERS] Undiagnosed bug in Bloom index

Re: [HACKERS] Slowness of extended protocol

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

Re: [HACKERS] WIP: Barriers

30 matches

Site Navigation

Mail list logo

Footer information