from:"Mikko Tiihonen"

Re: [HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to

2015-08-06 Thread Mikko Tiihonen

On Thu, Aug 6, 2015 at 03:15 AM, Michael Paquier  
wrote:
>On Wed, Aug 5, 2015 at 11:53 PM, Bruce Momjian  wrote:
>> On Wed, Jul  8, 2015 at 12:24:37PM -0400, Robbie Harwood wrote:
>>> > You update the documentation just for  psql but your change effects any
>>> > libpq application if we go forward with this patch we should update the
>>> > documentation for libpq as well.
>>> >
>>> > This approach seems to work with the url style of conninfo
>>> >
>>> > For example
>>> >   postgres://some-down-host.info,some-other-host.org:5435/test1
>>> >
>>> > seems to work as expected but I don't like that syntax I would rather see
>>> > postgres://some-down-host.info:5435/test1,postgres://some-other-host.org:5435/test1
>>> >
>>> > This would be a more invasive change but I think the syntax is more 
>>> > usable.
>>>
>>> I agree with this; it seems to me that it's more powerful to be able to
>>> specify complete urls for when they may differ.
>>>
>>> For the non-url case though, I don't see a clean way of doing this.  We
>>> could always, e.g., locally bind port specification to the closest host
>>> specification, but that seems nasty, and is still less powerful than
>>> passing urls (or we could just do the same for all parameters, but
>>> that's just a mess).
>>>
>>> Might it be reasonable to only allow the multi-host syntax in the
>>> url-style and not otherwise?
>>
>> First, I agree this is a very useful feature that we want.  Many NoSQL
>> databases are promoting multi-host client libraries as HA, which is kind
>> of humorous, and also makes sense because many NoSQL solution are
>> multi-host.
>> I can see this feature benefitting us for clients to auto-failover
>> without requiring a pooler or virtual IP reassignment, and also useful
>> for read-only connections that want to connect to a read-only slave, but
>> don't care which one.  The idea of randomly selecting a host from the
>> list might be a future feature.
>
>Yep. The JDBC driver is doing it as well.

I added the JDBC driver support similar feature. Currently it supports the 
following tuning parameters given a list of hostname/port combinations to 
connect to:
  targetServerType=any|master|slave|preferSlave
  loadBalanceHosts=false|true

For an example 2 node master,replica setup one would open write connections 
with host1,host2 & targetServerType=master
and read-only connections with host1,host2 & targetServerType=preferSlave.

>> I realize this is libpq-feature-creep, but considering the complexities
>> of a pooler and virtual IP address reassignment, I think adding this
>> The fact that other DBs are doing it, including I think
>> VMWare's libpq, supports the idea of adding this simple specification.

Because the feature as its simplest is a for loop in libpq. I would not think 
it much of a feature creep, especially since my original patch to libpq showed 
the loop already has been hidden in libpq for a long time, it just needed a 
special dns record for the postgresql hosts that returned dns records for all 
hosts.

Even there are poolers in front of postgres they can be set up in much simpler 
and reliable non-cluster mode when the libpq can be given multiple pooler 
addresses to connect to.

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to

2015-04-19 Thread Mikko Tiihonen

Hi,


I would like allow specifying multiple host names for libpq to try to 
connecting to. This is currently only supported if the host name resolves to 
multiple addresses. Having the support for it without complex dns setup would 
be much easier.


Example:

psql -h dbslave,dbmaster -p 5432 dbname

psql 'postgresql://dbslave,dbmaster:5432/dbname'


Here the idea is that without any added complexity of pgbouncer or similar tool 
I can get any libpq client to try connecting to multiple nodes until one 
answers. I have added the similar functionality to the jdbc driver few years 
ago.


Because libpq almost supported the feature already the patch is very simple. I 
just split the given host name and do a dns lookup on each separately, and link 
the results.


If you configure a port that does not exist you can see the libpq trying to 
connect to multiple hosts.


psql -h 127.0.0.2,127.0.0.3, -p 

psql: could not connect to server: Connection refused
Is the server running on host "127.0.0.2,127.0.0.3" (127.0.0.2) and 
accepting
TCP/IP connections on port ?
could not connect to server: Connection refused
Is the server running on host "127.0.0.2,127.0.0.3" (127.0.0.3) and 
accepting
TCP/IP connections on port ?


Further improvement would be to add a connection parameter to limit connection 
only to master (writable) or to slave (read only).



-Mikko
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 62a3b21..e79f96c 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -232,6 +232,12 @@ EOF
   with a slash, it is used as the directory for the Unix-domain
   socket.
   
+  
+  It is possible to provide multiple host names to connect to. The first host name that accepts
+  the connection will be used. To explicitly specify multiple host names they must be separated
+  by comma.
+  An alternative is to use dns record for the host name that resolves to multiple addresses.
+  
   
 
 
diff --git a/src/backend/libpq/ip.c b/src/backend/libpq/ip.c
index db939b5..90d3c1e 100644
--- a/src/backend/libpq/ip.c
+++ b/src/backend/libpq/ip.c
@@ -56,6 +56,9 @@ static int getnameinfo_unix(const struct sockaddr_un * sa, int salen,
  int flags);
 #endif
 
+/* Marker for multiple dns resolver results linked together */
+#define AI_PG_LINKED	(1<<30)
+
 
 /*
  *	pg_getaddrinfo_all - get address info for Unix, IPv4 and IPv6 sockets
@@ -65,6 +68,9 @@ pg_getaddrinfo_all(const char *hostname, const char *servname,
    const struct addrinfo * hintp, struct addrinfo ** result)
 {
 	int			rc;
+	char			*hostnames;
+	char			*host_start;
+	struct addrinfo	*addr;
 
 	/* not all versions of getaddrinfo() zero *result on failure */
 	*result = NULL;
@@ -73,10 +79,46 @@ pg_getaddrinfo_all(const char *hostname, const char *servname,
 	if (hintp->ai_family == AF_UNIX)
 		return getaddrinfo_unix(servname, hintp, result);
 #endif
-
 	/* NULL has special meaning to getaddrinfo(). */
-	rc = getaddrinfo((!hostname || hostname[0] == '\0') ? NULL : hostname,
-	 servname, hintp, result);
+	if (!hostname || hostname[0] == '\0')
+		return getaddrinfo(NULL, servname, hintp, result);
+
+	hostnames = strdup(hostname);
+	if (!hostnames)
+		return EAI_MEMORY;
+
+	host_start = hostnames;
+
+	/* dns lookups in reverse order to make result links in correct order */
+	do
+	{
+		host_start = strrchr(hostnames, ',');
+		if (host_start)
+		{
+			*host_start = '\0';
+			host_start++;
+		}
+		else
+			host_start = hostnames;
+
+		addr = NULL;
+		rc = getaddrinfo(host_start, servname, hintp, &addr);
+		if (rc || !addr)
+		{
+			if (addr)
+freeaddrinfo(addr);
+			break;
+		}
+		if (*result)
+		{
+			(*result)->ai_flags |= AI_PG_LINKED;
+			addr->ai_next = *result;
+		}
+		*result = addr;
+
+	} while (host_start > hostnames);
+
+	free(hostnames);
 
 	return rc;
 }
@@ -94,6 +136,9 @@ pg_getaddrinfo_all(const char *hostname, const char *servname,
 void
 pg_freeaddrinfo_all(int hint_ai_family, struct addrinfo * ai)
 {
+	struct addrinfo * last_addrinfo;
+	struct addrinfo * ai_ptr;
+
 #ifdef HAVE_UNIX_SOCKETS
 	if (hint_ai_family == AF_UNIX)
 	{
@@ -109,10 +154,24 @@ pg_freeaddrinfo_all(int hint_ai_family, struct addrinfo * ai)
 	}
 	else
 #endif   /* HAVE_UNIX_SOCKETS */
+	/* struct was built by getaddrinfo() */
+	if (ai != NULL)
 	{
-		/* struct was built by getaddrinfo() */
-		if (ai != NULL)
-			freeaddrinfo(ai);
+		while (true)
+		{
+			/* unlink extra addrinfo structures before freeing */
+			last_addrinfo = NULL;
+			for (ai_ptr = ai; ai_ptr->ai_next; ai_ptr = ai_ptr->ai_next)
+			{
+if (ai_ptr->ai_next->ai_flags & AI_PG_LINKED)
+	last_addrinfo = ai_ptr;
+			}
+			if (!last_addrinfo)
+break;
+			freeaddrinfo(last_addrinfo->ai_next);
+			last_addrinfo->ai_next = NULL;
+		}
+		freeaddrinfo(ai);
 	}
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
h

Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-04 Thread Mikko Tiihonen

Kevin Wooten  wrote:
> > On Nov 4, 2014, at 12:55 AM, Craig Ringer  wrote:
> >
> > I have to say I like some aspects of how pgjdbc-ng is being done - in
> > particular use of Maven and being built around non-blocking I/O.
> >
> > OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
> > driver, and since it hard-depends on JDK 7 I don't understand why Netty
> > was used instead of the async / non-blocking features of the core JVM.
> > That may simply be my inexperience with writing non-blocking socket I/O
> > code on Java though.
> >

Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think
that depending on old Java 7 version that soon should not even be used in
production (without paying for extra support) can hardly be too hard 
requirement.

> It confuses me as to why you consider using stable, well implemented, well 
> tested and well cared for libraries as inexplicable.  > Just because we are 
> building a “driver” means we have to write every line of code ourselves?  No 
> thanks.

Embedding parts of other projects into code-base during build with renamed 
packages is nowadays common practice in java projects: spring does it, 
elasticsearch embeds whole netty and more, even jre embeds for example xerces 
and asm.
It might not be the optimal solution, but still definitely better than writing 
everything from scratch or copy-pasting code from other projects.

If pgjdbc-ng provides both a thin maven version (with external versioned 
dependencies) and a fat-jar with the external versions repackaged inside the 
users can choose either old way: just-drop-a-jdbc-jar-into-project or new way 
with their chosen build tool that automatically manages the dependencies.

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Mikko Tiihonen

> Craig Ringer wrote:
> On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> > Is the following summary correct:
> > - the network protocol supports pipelinings
> Yes.
> 
> All you have to do is *not* send a Sync message and be aware that the
> server will discard all input until the next Sync, so pipelining +
> autocommit doesn't make a ton of sense for error handling reasons.

I do not quite grasp why not sending Sync is so important. My proof of concept 
setup was for queries with autocommit enabled.
When looking with wireshark I could observe that the client sent 3-10 
P/B//D/E/S messages to server, before the server started sending the 
corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test 
application waited for the all the replies to come to not overflow the network 
buffers (which is known to cause deadlocks with current pg jdbc driver).

If I want separate error handling for each execution then shouldn't I be 
sending separate sync for each pipelined operation?

> > - the server handles operations in order, starting the processing of next 
> > operation only after fully processing the previous one 
> >- thus pipelining is invisible to the server
> 
> As far as I know, yes. The server just doesn't care.
> 
> > - libpq driver does not support pipelining, but that is due to internal 
> > limitations
> 
> Yep.
> 
> > - if proper error handling is done by the client then there is no reason 
> > why pipelining could be supported by any pg client
> 
> Indeed, and most should support it. Sending batches of related queries
> would make things a LOT faster.
> 
> PgJDBC's batch support is currently write-oriented. There is no
> fundamental reason it can't be expanded for reads. I've already written
> a patch to do just that for the case of returning generated keys.
>
> https://github.com/ringerc/pgjdbc/tree/batch-returning-support
> 
> and just need to rebase it so I can send a pull for upstream PgJDBC.
> It's already linked in the issues documenting the limitatations in batch
>support.

Your code looked like good. Returning inserts are an important thing to 
optimize.

> If you want to have more general support for batches that return rowsets
> there's no fundamental technical reason why it can't be added. It just
> requires some tedious refactoring of the driver to either:
> 
> - Sync and wait before it fills its *send* buffer, rather than trying
>   to manage its receive buffer (the server send buffer), so it can
>   reliably avoid deadlocks; or
> 
> - Do async I/O in a select()-like loop over a protocol state machine,
>   so it can simultaneously read and write on the wire.

I also think the async I/O is the way to go. Luckily that has already been done
in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
of netty java NIO library. It has quite good feature parity with the original
pgjdbc driver. I'll try next if I can enable the pipelining with it now that
I have tried the proof of concept with the originial pgjdbc driver.

> I might need to do some of that myself soon, but it's a big (and
> therefore error-prone) job I've so far avoided by making smaller, more
> targeted changes.
> 
> For JDBC the JDBC batch interface is the right place to do this, and you
> should not IMO attempt to add pipelining outside that interface.
> (Multiple open resultsets from portals, yes, but not pipelining of queries).

I do not think the JDBC batch interface even allow doing updates to multiple
tables when using prepared statements?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pipelining executions to postgresql server

2014-11-02 Thread Mikko Tiihonen

> > > From: Andres Freund 
> > > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
> > > > I created a proof of concecpt patch for postgresql JDBC driver that
> > > > allows the caller to do pipelining of requests within a
> > > > transaction. The pipelining here means same as for HTTP: the client
> > > > can send the next execution already before waiting for the response of
> > > > the previous request to be fully processed.
> > >
> > > Slightly confused here. To my knowledge the jdbc driver already employs
> > > some pipelining? There's some conditions where it's disabled (IIRC
> > > RETURNING for DML is one of them), but otherwise it's available.
> > >
> > > I'm very far from a pgjdbc expert, but that's what I gathered from the
> > > code when investigating issues a while back and from my colleague Craig.
> >
> > Most DB interfaces make the server operations look synchronous.
> 
> You IIRC can use jdbc's batch interface.

Yes, there is a limited batch interface for inserts and updates. But for 
example when using prepared statements you can only do batches of same 
statement (with different parameters of course).

> > I should have searched earlier a better reference to libpg. I am planning 
> > on adding support for something similar to
> > http://www.postgresql.org/docs/9.3/static/libpq-async.html
> > More specifically operations like:
> >   int PQsendQuery(PGconn *conn, const char *command);
> >   PGresult *PQgetResult(PGconn *conn);
> 
> The network protocol allows for pipelining, yes. But libpq unfortunately
> doesn't.
> You should read the protocol definition.

I have studied the protocol, that is why I concluded that it would be possible 
to add support for pipelining for clients.
 
> > It should be, if the server startd to process (read in) the next query
> > only after it has sent the previous response out.
> 
> There's complexities around error handling and such making it slightly
> more complex.

Are you referring to some complexities on the server side related to error 
handling or on the client side?

Is the following summary correct:
- the network protocol supports pipelinings
- the server handles operations in order, starting the processing of next 
operation only after fully processing the previous one - thus pipelining is 
invisible to the server
- libpq driver does not support pipelining, but that is due to internal 
limitations
- if proper error handling is done by the client then there is no reason why 
pipelining could be supported by any pg client

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pipelining executions to postgresql server

2014-11-02 Thread Mikko Tiihonen

> From: Andres Freund 
> On 2014-11-01 14:04:05 +, Mikko Tiihonen wrote:
> > I created a proof of concecpt patch for postgresql JDBC driver that
> > allows the caller to do pipelining of requests within a
> > transaction. The pipelining here means same as for HTTP: the client
> > can send the next execution already before waiting for the response of
> > the previous request to be fully processed.
> 
> Slightly confused here. To my knowledge the jdbc driver already employs
> some pipelining? There's some conditions where it's disabled (IIRC
> RETURNING for DML is one of them), but otherwise it's available.
> 
> I'm very far from a pgjdbc expert, but that's what I gathered from the
> code when investigating issues a while back and from my colleague Craig.

Most DB interfaces make the server operations look synchronous.
For JDBC the only standard interface is similar to libpg:
  PGresult *PQexec(PGconn *conn, const char *command);

I should have searched earlier a better reference to libpg. I am planning on 
adding support for something similar to
http://www.postgresql.org/docs/9.3/static/libpq-async.html
More specifically operations like:
  int PQsendQuery(PGconn *conn, const char *command);
  PGresult *PQgetResult(PGconn *conn);
The Java API will of course be custom to postgresql jdbc driver since there is 
no official java api for async db operations.

The above I can do purely on the jdbc driver side. But my my question was about 
pipelining.
In libpg terms: Is it safe to do multiple PQsendQuery operations before 
invoking PQgetResult as many times?
It should be, if the server startd to process (read in) the next query only 
after it has sent the previous response out.

And do any pg connection poolers support having multiple executions on-the-fly 
at the same time for the same connection?

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pipelining executions to postgresql server

2014-11-01 Thread Mikko Tiihonen

Hi,

I created a proof of concecpt patch for postgresql JDBC driver that allows the 
caller to do pipelining of requests within a transaction. The pipelining here 
means same as for HTTP: the client can send the next execution already before 
waiting for the response of the previous request to be fully processed.

The goal is to reduce the effects of latency between server and client. The 
pipelining allowed my test with localhost postgresql and jdbc test that queries 
a single value from database 200 times to get a more than 20% speed-up. The 
pipelining version processes the responses every 10 queries. With actual 
latency between the server and client larger speed-ups are of course possible.

I think pipelining + jsonb support would make postgresql even more competive 
key/value and document store.

Example use case:
1) insert to shopping cart table, and 3 inserts shopping cart rows table in one 
pipeline.
  - only one round trip penalty instead of 4
2) query shopping cart row and query shopping cart rows in one pipeline 
  - only one round trip penalty instead of 2

The only alternative way to reduce the round-trip latency is to make every 
operation in single round-trip and that can only be done with pl functions and 
by passing in more complex objects, for example the whole shopping cart with 
rows as json data.

What kind of problems could pipelining cause (assuming we limit it to rather 
simple use cases only)?

-MikkoFrom c662b8865c58cba714655148401ac86a21c10f3c Mon Sep 17 00:00:00 2001
From: Mikko Tiihonen 
Date: Sat, 1 Nov 2014 15:43:19 +0200
Subject: [PATCH] Example pipelined single-shot query

---
 org/postgresql/core/QueryExecutor.java |  13 +++
 org/postgresql/core/v2/QueryExecutorImpl.java  |   5 +
 org/postgresql/core/v3/QueryExecutorImpl.java  |  41 +++
 org/postgresql/jdbc2/AbstractJdbc2Statement.java   |  79 +
 .../test/jdbc2/PipelineExecutionTest.java  | 129 +
 5 files changed, 267 insertions(+)
 create mode 100644 org/postgresql/test/jdbc2/PipelineExecutionTest.java

diff --git a/org/postgresql/core/QueryExecutor.java b/org/postgresql/core/QueryExecutor.java
index e80a23c..b8e46a6 100644
--- a/org/postgresql/core/QueryExecutor.java
+++ b/org/postgresql/core/QueryExecutor.java
@@ -8,6 +8,7 @@
 */
 package org.postgresql.core;
 
+import java.io.IOException;
 import java.sql.SQLException;
 
 import org.postgresql.copy.CopyOperation;
@@ -101,6 +102,16 @@ public interface QueryExecutor {
 static int QUERY_NO_BINARY_TRANSFER = 256;
 
 /**
+ * Flag for pipeline executions with responses read later.
+ */
+static int QUERY_PIPELINE = 512;
+
+/**
+ * Flag for pipeline executions with responses read later.
+ */
+static int QUERY_DEQUEUE_PIPELINE = 1024;
+
+/**
  * Execute a Query, passing results to a provided ResultHandler.
  *
  * @param query the query to execute; must be a query returned from
@@ -125,6 +136,8 @@ public interface QueryExecutor {
  int flags)
 throws SQLException;
 
+void processPipelinedResult(ResultHandler handler) throws SQLException;
+
 /**
  * Execute several Query, passing results to a provided ResultHandler.
  *
diff --git a/org/postgresql/core/v2/QueryExecutorImpl.java b/org/postgresql/core/v2/QueryExecutorImpl.java
index 33c0048..5a6f607 100644
--- a/org/postgresql/core/v2/QueryExecutorImpl.java
+++ b/org/postgresql/core/v2/QueryExecutorImpl.java
@@ -616,4 +616,9 @@ public class QueryExecutorImpl implements QueryExecutor {
 public CopyOperation startCopy(String sql, boolean suppressBegin) throws SQLException {
 throw new PSQLException(GT.tr("Copy not implemented for protocol version 2"), PSQLState.NOT_IMPLEMENTED);
 }
+
+@Override
+public void processPipelinedResult(ResultHandler handler) throws SQLException {
+throw new PSQLException(GT.tr("Copy not implemented for protocol version 2"), PSQLState.NOT_IMPLEMENTED);
+}
 }
diff --git a/org/postgresql/core/v3/QueryExecutorImpl.java b/org/postgresql/core/v3/QueryExecutorImpl.java
index 966a6f6..7297764 100644
--- a/org/postgresql/core/v3/QueryExecutorImpl.java
+++ b/org/postgresql/core/v3/QueryExecutorImpl.java
@@ -11,6 +11,7 @@ package org.postgresql.core.v3;
 import org.postgresql.core.*;
 
 import java.util.ArrayList;
+import java.util.LinkedList;
 import java.util.List;
 import java.util.HashMap;
 import java.util.Properties;
@@ -1713,7 +1714,33 @@ public class QueryExecutorImpl implements QueryExecutor {
 }
 }
 
+public void processPipelinedResult(ResultHandler handler) throws SQLException {
+ResultHandlerHolder holder;
+while ((holder = pipelineResultHandlers.remove(0)) != null) {
+try {
+processResults(holder.handler, holder.flags & (~QUERY_PIPELINE) | QUERY_DEQUEUE_PIPELINE);
+} catch (IOException e) {
+

[HACKERS] Documenting the Frontend/Backend Protocol update criteria

2014-06-01 Thread Mikko Tiihonen

Hi,

Currently the criteria on updating the F/B protocol is undefined. This makes it
hard to update the protocol going forward. It makes it also hard to write
library/driver/application implementations that will be more future proof to
future server versions.

Ideally the documentation for 9.4 (backport?) would say what kind of things are
allowed to change within the v3 protocol, and thus implies what kind of changes
need a new v4 protocol. Is there some wishlist page of items to do differently
for v4 already?

I did find the following text in the documentation: "binary representations for
complex data types might change across server versions". But having more
specific rules would help, especially since it seems to be there just to scare:
so far changes have been strongly discouraged.

An example to consider: some binary formats have flags (arrays) or version
(jsonb) field. We should explicitly say that clients must reject any unknown
bits/versions that they do not know about. This guarantees they detect small
format updates instead of silently accepting then and possibly returning
corrupt data.

My motivation:

Two years ago accidentally I opened a discussion on how to do updates to the
binary encoding of data in the protocol [1]. I would like to reopen the
discussion now since the jsonb 'binary' encoding is just a version '1' + text
json. The result from the last discussion was that having a version or flags as
part of the binary format is not enough, since drivers/libraries (fixable) and
applications (unfixable) are depending on the current encoding.
And if we add a new bit to the flags or bump the version number we will break
backward compatibility.

To summarize the previous discussion:
- there are currently no written rules for modifying the binary encoding formats
- bytea modification was done with a GUC, but GUC was seen as a bad solution in
general
- my proposal was to add a minor format version number was not good either
since any per session state would be problematic for connection poolers

[1]:
http://grokbase.com/t/postgresql/pgsql-hackers/11bwhv1esa/add-minor-version-to-v3-protocol-to-allow-changes-without-breaking-backwards-compatibility

Re: [HACKERS] Use gcc built-in atomic inc/dec in lock.c

2012-12-14 Thread Mikko Tiihonen


On 12/14/2012 05:55 PM, Merlin Moncure wrote:

On Fri, Dec 14, 2012 at 9:33 AM, Mikko Tiihonen
 wrote:

On 12/13/2012 12:19 AM, Peter Geoghegan wrote:


On 12 December 2012 22:11, Mikko Tiihonen
 wrote:


noticed a "XXX: It might be worth considering using an atomic
fetch-and-add
instruction here, on architectures where that is supported." in lock.c

Here is my first try at using it.



That's interesting, but I have to wonder if there is any evidence that
this *is* actually helpful to performance.



One of my open questions listed in the original email was request for help
on
creating a test case that exercise the code path enough so that it any
improvements can be measured.

But apart from performance I think there are two other aspects to consider:
1) Code clarity: I think the lock.c code is easier to understand after the
patch
2) Future possibilities: having the atomic_inc/dec generally available
allows
other performance critical parts of postgres take advantage of them in
the
future


This was actually attempted a little while back; a spinlock was
replaced with a few atomic increment and decrement calls for managing
the refcount and other things on the freelist. It helped or hurt
depending on contention but the net effect was negative.   On
reflection I think that was because that the assembly 'lock'
instructions are really expensive relative to the others: so it's not
safe to assume that say 2-3 gcc primitive increment calls are cheaper
that a spinlock.


The spinlock uses one 'lock' instruction when taken, and no lock
instructions when released.

Thus I think replacing one spinlock protected add/sub with atomic 'lock'
add/sub not perform worse.

But if you replace "mutex-lock,add,add,unlock" with "atomic add, atomic
add" you already have more hw level synchronization and thus risk lower
performance if they are on separate cache lines. This of course limits
the use cases of the atomic operations.

-Mikko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use gcc built-in atomic inc/dec in lock.c

2012-12-14 Thread Mikko Tiihonen


On 12/13/2012 12:19 AM, Peter Geoghegan wrote:

On 12 December 2012 22:11, Mikko Tiihonen
 wrote:

noticed a "XXX: It might be worth considering using an atomic fetch-and-add
instruction here, on architectures where that is supported." in lock.c

Here is my first try at using it.


That's interesting, but I have to wonder if there is any evidence that
this *is* actually helpful to performance.


One of my open questions listed in the original email was request for help on
creating a test case that exercise the code path enough so that it any
improvements can be measured.

But apart from performance I think there are two other aspects to consider:
1) Code clarity: I think the lock.c code is easier to understand after the patch
2) Future possibilities: having the atomic_inc/dec generally available allows
   other performance critical parts of postgres take advantage of them in the
   future

-Mikko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Use gcc built-in atomic inc/dec in lock.c

2012-12-12 Thread Mikko Tiihonen

:   8b 44 ab 04 mov0x4(%rbx,%rbp,4),%eax
 e49:   83 e8 01sub$0x1,%eax
 e4c:   89 44 ab 04 mov%eax,0x4(%rbx,%rbp,4)
 e50:   c6 03 00movb   $0x0,(%rbx)
 e53:   48 8b 5c 24 08  mov0x8(%rsp),%rbx
 e58:   48 8b 6c 24 10  mov0x10(%rsp),%rbp
 e5d:   48 83 c4 18 add$0x18,%rsp
 e61:   f3 c3   repz retq
 e63:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
 e68:   ba 41 00 00 00  mov$0x41,%edx
 e6d:   be 00 00 00 00  mov$0x0,%esi
 e72:   48 89 dfmov%rbx,%rdi
 e75:   e8 00 00 00 00  callq  e7a 
 e7a:   eb c9   jmpe45 
 e7c:   0f 1f 40 00 nopl   0x0(%rax)

Code with gcc-built-ins:
0da0 :
 da0:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# da7 

 da7:   48 85 c0test   %rax,%rax
 daa:   74 2a   je dd6 
 dac:   8b 50 28mov0x28(%rax),%edx
 daf:   c6 40 40 00 movb   $0x0,0x40(%rax)
 db3:   48 c7 05 00 00 00 00movq   $0x0,0x0(%rip)# dbe 

 dba:   00 00 00 00
 dbe:   48 89 d0mov%rdx,%rax
 dc1:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# dc8 

 dc8:   25 ff 03 00 00  and$0x3ff,%eax
 dcd:   48 c1 e0 02 shl$0x2,%rax
 dd1:   f0 83 2c 02 01  lock subl $0x1,(%rdx,%rax,1)
 dd6:   f3 c3   repz retq
 dd8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
 ddf:   00

>From 958b04eb08603167dee2fe6684f9430f5b578f28 Mon Sep 17 00:00:00 2001
From: Mikko Tiihonen 
Date: Wed, 12 Dec 2012 20:02:49 +0200
Subject: [PATCH] Use gcc built-in atomic add/sub instructions, if available


diff --git a/configure.in b/configure.in
index 2dee4b3..dec2785 100644
--- a/configure.in
+++ b/configure.in
@@ -1451,6 +1451,28 @@ if test x"$pgac_cv_gcc_int_atomics" = x"yes"; then
   AC_DEFINE(HAVE_GCC_INT_ATOMICS, 1, [Define to 1 if you have __sync_lock_test_and_set(int *) and friends.])
 fi
 
+#TODO, check for __atomic_is_lock_free (sizeof(int), 0) too
+
+AC_CACHE_CHECK([for builtin atomic functions], pgac_cv_gcc_int_atomic_add,
+[AC_TRY_LINK([],
+  [int counter = 0;
+   __atomic_add_fetch(&counter, 1, __ATOMIC_SEQ_CST);],
+  [pgac_cv_gcc_int_atomic_add="yes"],
+  [pgac_cv_gcc_int_atomic_add="no"])])
+if test x"$pgac_cv_gcc_int_atomic_add" = x"yes"; then
+  AC_DEFINE(HAVE_GCC_INT_ATOMIC_ADD, 1, [Define to 1 if you have __atomic_add_fetch(int *, int, int) and friends.])
+fi
+
+AC_CACHE_CHECK([for builtin sync functions], pgac_cv_gcc_int_sync_add,
+[AC_TRY_LINK([],
+  [int counter = 0;
+   __sync_add_and_fetch(&counter, 1);],
+  [pgac_cv_gcc_int_sync_add="yes"],
+  [pgac_cv_gcc_int_sync_add="no"])])
+if test x"$pgac_cv_gcc_int_sync_add" = x"yes"; then
+  AC_DEFINE(HAVE_GCC_INT_SYNC_ADD, 1, [Define to 1 if you have  __sync_add_and_fetch(int *, int) and friends.])
+fi
+
 
 #
 # Pthreads
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index ec4da20..c8c0b91 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -40,7 +40,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sinvaladt.h"
-#include "storage/spin.h"
+#include "storage/atomics.h"
 #include "storage/standby.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -234,7 +234,12 @@ static PROCLOCK *FastPathGetRelationLockEntry(LOCALLOCK *locallock);
 
 typedef struct
 {
+#ifdef MUTEXLESS_ATOMIC_INC
+#define FAST_PATH_MUTEX(data) NULL
+#else
+#define FAST_PATH_MUTEX(data) &(data)->mutex
 	slock_t		mutex;
+#endif
 	uint32		count[FAST_PATH_STRONG_LOCK_HASH_PARTITIONS];
 } FastPathStrongRelationLockData;
 
@@ -427,8 +432,10 @@ InitLocks(void)
 	FastPathStrongRelationLocks =
 		ShmemInitStruct("Fast Path Strong Relation Lock Data",
 		sizeof(FastPathStrongRelationLockData), &found);
+#ifndef MUTEXLESS_ATOMIC_INC
 	if (!found)
 		SpinLockInit(&FastPathStrongRelationLocks->mutex);
+#endif
 
 	/*
 	 * Allocate non-shared hash table for LOCALLOCK structs.  This stores lock
@@ -1207,11 +1214,8 @@ RemoveLocalLock(LOCALLOCK *locallock)
 
 		fasthashcode = FastPathStrongLockHashPartition(locallock->hashcode);
 
-		SpinLockAcquire(&FastPathStrongRelationLocks->mutex);
-		Assert(FastPathStrongRelationLocks->count[fasthashcode] > 0);
-		FastPathStrongRelationLocks->count[fasthashcode]--;
 		locallock->holdsStrongLockCount = FALSE;
-		SpinLockRelease(&FastPathStro

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Mikko Tiihonen


On 01/25/2012 06:40 PM, Tom Lane wrote:

Marko Kreen  writes:

On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:

Huh?  How can that work?  If we decide to change the representation of
some other "well known type", say numeric, how do we decide whether a
client setting that bit is expecting that change or not?



It sets that bit *and* version code - which means that it is
up-to-date with all "well-known" type formats in that version.


Then why bother with the bit in the format code?  If you've already done
some other negotiation to establish what datatype formats you will
accept, this doesn't seem to be adding any value.


Basically, I see 2 scenarios here:



1) Client knows the result types and can set the
text/bin/version code safely, without further restrictions.



2) There is generic framework, that does not know query contents
but can be expected to track Postgres versions closely.
Such framework cannot say "binary" for results safely,
but *could* do it for some well-defined subset of types.


The hole in approach (2) is that it supposes that the client side knows
the specific datatypes in a query result in advance.  While this is
sometimes workable for application-level code that knows what query it's
issuing, it's really entirely untenable for a framework or library.
The only way that a framework can deal with arbitrary queries is to
introduce an extra round trip (Describe step) to see what datatypes
the query will produce so it can decide what format codes to issue
... and that will pretty much eat up any time savings you might get
from a more efficient representation.


This is pretty much what jdbc driver already does, since it does not have
100% coverage of even current binary formats. First time you execute a
query it requests text encoding, but caches the Describe results. Next
time it sets the binary bits on all return columns that it knows how to
decode.


You really want to do the negotiation once, at connection setup, and
then be able to process queries without client-side prechecking of what
data types will be sent back.


I think my original minor_version patch tried to do that. It introduced a
per-connection setting for version. Server GUC_REPORTED the maximum supported
minor_version but defaulted to the baseline wire format.
The jdbc client could bump the minor_version to supported higher
value (error if value larger than what server advertised).

A way was provided for the application using jdbc driver to
override the requested minor_version in the rare event that something
broke (rare, because jdbc driver generally does not expose the
wire-encoding to applications).

Now if pgbounce and other pooling solutions would reset the minor_version
to 0 then it should work.

Scenarios where other end is too old to know about the minor_version:
Vserver>>Vlibpq  => client does nothing -> use baseline version
Vlibpq>>Vserver  => no supported_minor_version in GUC_REPORT -> use baseline

Normal 9.2+ scenarios:
Vserver>Vlibpq  => libpg sets minor_version to largest that is supports
   -> libpq requested version used
Vlibpq>Vserver  => libpg notices that server supported value is lower than
   its so it sets minor_version to server supported value
   -> server version used

For perl driver that exposes the wire format to application by default
I can envision that the driver needs to add a new API that applications
need to use to explicitly bump the minor_version up instead of defaulting
to the largest supported by the driver as in jdbc/libpg.

The reason why I proposed a incrementing minor_version instead of bit flags
of new encodings was that it takes less space and is easier to document and
understand so that exposing it to applications is possible.

But how to handle postgres extensions that change their wire-format?
Maybe we do need to have "oid:minor_version,oid:ver,oid_ver" as the
negotiated version variable?

-Mikko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-22 Thread Mikko Tiihonen



Previous title was: Add minor version to v3 protocol to allow changes without 
breaking backwards compatibility

On 01/20/2012 04:45 AM, Noah Misch wrote:

On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:

On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch  wrote:

I agree with Merlin; the frontend/backend protocol is logically distinct from
the binary send/recv formats of data types. ?For one key point, the latter is
not exclusively core-defined; third-party extensions change their send/recv
formats on a different schedule. ?They can add myext.binary_format_version
GUCs of their own to cope in a similar way.


I agree.  It occurs to me that we recently changed the default *text*
output format for bytea for reasons not dissimilar to those
contemplated here.  Presumably, that's a much more disruptive change,
and yet we've had minimal complaints because anyone who gets bitten
can easily set bytea_output='escape' and the problem goes away.  The
same thing seems like it would work here, only the number of people
needing to change the parameter will probably be even smaller, because
fewer people use binary than text.

Having said that, if we're to follow the precedent set by
bytea_format, maybe we ought to just add
binary_array_format={huge,ittybitty} and be done with it, rather than
inventing a minor protocol version GUC for something that isn't really
a protocol version change at all.  We could introduce a
differently-named general mechanism, but I guess I'm not seeing the
point of that either.  Just because someone has a
backward-compatibility issue with one change of this type doesn't mean
they have a similar issue with all of them.  So I think adding a
special-purpose GUC is more logical and more parallel to what we've
done in the past, and it doesn't saddle us with having to be certain
that we've designed the mechanism generally enough to handle all the
cases that may come later.


That makes sense.  An attraction of a single binary format version was avoiding
the "Is this worth a GUC?" conversation for each change.  However, adding a GUC
should be no more notable than bumping a binary format version.


I see the main difference between the GUC per feature vs minor version being 
that
in versioned changes old clients keep working because the have to explicitly
request a specific version. Whereas in separate GUC variables each feature will 
be
enabled by default and users have to either keep up with new client versions or
figure out how to explicitly disable the changes.

However, due to popular vote I removed the minor version proposal for now.


Here is a second version of the patch. The binary array encoding changes
stay the same but all code around was rewritten.

Changes from previous versions based on received comments:
* removed the minor protocol version concept
* introduced a new GUC variable array_output copying the current
  bytea_output type, with values "full" (old value) and
  "smallfixed" (new default)
* added documentation for the new GUC variable
* used constants for the array flags variable values

-Mikko
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index e55b503..179a081
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** COPY postgres_log FROM '/full/path/to/lo
*** 4888,4893 
--- 4888,4910 

   
  
+  
+   array_output (enum)
+   
+array_output configuration parameter
+   
+   
+
+ Sets the output format for binary encoding of values of
+ type array. Valid values are
+ smallfixed (the default)
+ and full (the traditional PostgreSQL
+ format).  The array type always
+ accepts both formats on input, regardless of this setting.
+
+   
+  
+ 
   
xmlbinary (enum)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
new file mode 100644
index 5582a06..6192a2e
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***
*** 30,41 
--- 30,45 
   * GUC parameter
   */
  bool		Array_nulls = true;
+ int			array_output = ARRAY_OUTPUT_SMALLFIXED;
  
  /*
   * Local definitions
   */
  #define ASSGN	 "="
  
+ #define FLAG_HASNULLS 1
+ #define FLAG_FIXEDLEN 2
+ 
  typedef enum
  {
  	ARRAY_NO_LEVEL,
*** static void ReadArrayBinary(StringInfo b
*** 86,92 
  FmgrInfo *receiveproc, Oid typioparam, int32 typmod,
  int typlen, bool typbyval, char typalign,
  Datum *values, bool *nulls,
! bool *hasnulls, int32 *nbytes);
  static void CopyArrayEls(ArrayType *array,
  			 Datum *values, bool *nulls, int nitems,
  			 int typlen, bool typbyval, char typalign,
--- 90,96 
  FmgrInfo *receiveproc, Oid typioparam, int32 typmod,
  int typlen, bool typbyval, char typalign,
  Datum *values, bool *nulls,
! bool *hasnulls, int32 *nbytes, bool fixedlen);
  static void CopyArrayEls

Re: [HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2011-12-01 Thread Mikko Tiihonen


On 11/30/2011 06:52 PM, Merlin Moncure wrote:

On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen
  wrote:

Hi,

As discussed few days ago it would be beneficial if we could change the v3
backend<->client protocol without breaking backwards compatibility.

Here is a working patch that exports a supported_binary_minor constant and a
binary_minor session variable and a that can be used by clients to enable
newer features.

I also added an example usage where the array encoding for constant size
elements is optimized if binary_minor version is new enough.

I have coded the client side support for binary_minor for jdbc driver and
tested that it worked. But lets first discuss if this is an acceptable path
forward.


Regarding your TODO in the code comments about interactions with
replication:  I think it should be removed.  WAL streaming depends on
more things being identical than what is guaranteed here which is
basically the protocol + data formats.


OK. I'll remove the comments about replication.


I think all references to
'protocol' should be removed;  Binary wire formats != protocol: the
protocol could bump to v4 but the wire formats (by happenstance) could
all still continue to work -- therefore the version isn't minor (it's
not dependent on protocol version but only on itself).  Therefore,
don't much like the name 'supported_binary_minor'.  How about
binary_format_version?


I was thinking that it would be possible use the new minor version to
introduce also new protocol messages such as streaming of large data
into database without knowing it's size beforehand.


Also, is a non granular approach really buying us anything here?  ISTM
*something* is likely to change format on most releases of the server
so I'm wondering what's the point (if you don't happen to be on the
same x.x release of the server, you might as well assume to not match
or at least 'go on your own risk). The value added to the client
version query is quite low.


You have a very good point about changes in every new postgres
version. Either text or the binary encoding is likely to change for
some types in each new release.

There needs to be decision on official policy about breaking backwards
compatibility of postgresql clients. Is it:

A) Every x.y postgres release is free to break any parts of the
   * protocol
   * text encoding
   * binary protocol
   as long as it is documented
   -> all client libraries should be coded so that they refuse to
  support version x.y+1 or newer (they might have a option to
  override this but there are no guarantees that it would work)
   Good: no random bugs when using old client library
   Bad: initial complaints from users before they understand that
this is the best option for everyone

B) Every x.y postgres release must guarantee that no client visible
   parts of protocol, text encoding or binary encoding will change
   from previous release in the v3 protocol. If any changes are
   needed then a new protocol version must be created.
   -> very high barrier for new features
   Good: can upgrade server without updating clients
   Bad: new features are only introduced very rarely after enough
have accumulated
   Bad: many feature/change patches will rot while waiting for the
upcoming new version

C) There is effort to try avoiding incompatible changes. Some
   changes are blocked because it is detected that they can break
   backwards compatibility while others are let through (often with
   some compatibility option on server side to fall back to
   previous functionality (f.ex. bytea hex encoding).
   -> As far as I understand this is the current situation.
   Good: has worked so far
   Bad: accumulates compatibility flags in the server

D) My proposed compromise where there is one minor_version setting
   and code in server to support all different minor versions.
   The client requests the minor version so that all releases
   default to backwards compatible version.
   When there combinations starts to create too much maintenance
   overhead a new clean v4 version of protocol is specified.
   Good: Keeps full backwards compatibility
   Good: Allows introducing changes at any time
   Bad: Accumulates conditional code to server and clients until
new version of protocol is released


I'd like the official policy to be A, otherwise I'll push for D.

-Mikko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2011-11-28 Thread Mikko Tiihonen


Hi,

As discussed few days ago it would be beneficial if we could change the v3 
backend<->client protocol without breaking backwards compatibility.

Here is a working patch that exports a supported_binary_minor constant and a binary_minor session variable and a that can be used by clients to enable newer 
features.


I also added an example usage where the array encoding for constant size 
elements is optimized if binary_minor version is new enough.

I have coded the client side support for binary_minor for jdbc driver and 
tested that it worked. But lets first discuss if this is an acceptable path 
forward.

On 11/25/2011 02:20 AM, Oliver Jowett wrote:
> Re list vs. always-incrementing minor version, you could just use an
> integer and set bits to represent features, which would keep it simple
> but also let clients be more selective about which features they
> implement (you could support feature 21 and 23 without supporting 22)

I decided not to use a feature flag because when features start to depend on each other we need multiple negotiation round trips until the final feature set can 
be known.
If in your example above the feature 23 depends on server side on feature 22, but the client only requests 21,23. The server must inform back that combination 
21,23 is reduced to 21. And if then the client can not support 21 without 23 the final feature set will not contain 21 or 23.


-Mikko
commit d7ef5f1aef0941ec4b931f24745b65d77e7511e4
Author: Mikko Tiihonen 
Date:   Sun Nov 27 14:12:59 2011 +0200

Define binary_minor variable to control binary protocol minor version

diff --git a/src/backend/utils/adt/version.c b/src/backend/utils/adt/version.c
index 6ea5bd2..33ed4d3 100644
--- a/src/backend/utils/adt/version.c
+++ b/src/backend/utils/adt/version.c
@@ -16,6 +16,7 @@
 
 #include "utils/builtins.h"
 
+int binary_minor = 0;
 
 Datum
 pgsql_version(PG_FUNCTION_ARGS)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index da7b6d4..67e80f1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -64,6 +64,7 @@
 #include "storage/predicate.h"
 #include "tcop/tcopprot.h"
 #include "tsearch/ts_cache.h"
+#include "utils/binaryminorversion.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/guc_tables.h"
@@ -2053,6 +2054,18 @@ static struct config_int ConfigureNamesInt[] =
},
 
{
+   {"binary_minor", PGC_USERSET, CLIENT_CONN_LOCALE,
+   gettext_noop("Sets the binary protocol minor version 
that controls enabling"
+"of newer features."),
+   gettext_noop("The default value is 0 which uses the 
binary protocol features"
+"as specified in postgres 9.1 
release.")
+   },
+   &binary_minor,
+   0, 0, SUPPORTED_BINARY_MINOR_VERSION,
+   NULL, NULL, NULL
+   },
+
+   {
{"log_min_duration_statement", PGC_SUSET, LOGGING_WHEN,
gettext_noop("Sets the minimum execution time above 
which "
 "statements will be logged."),
diff --git a/src/include/utils/binaryminorversion.h 
b/src/include/utils/binaryminorversion.h
new file mode 100644
index 000..40ba970
--- /dev/null
+++ b/src/include/utils/binaryminorversion.h
@@ -0,0 +1,32 @@
+/*-
+ *
+ * binaryminorversion.h
+ *   "Binary protocol encoding minor version number" for PostgreSQL.
+ *
+ * The catalog version number is used to flag incompatible changes in
+ * the PostgreSQL v3 binary protocol.  Whenever anyone changes the
+ * format of binary protocol, or modifies the standard types in such a
+ * way that an updated client wouldn't work with an old database
+ * (or vice versa), the binary protocol encoding version number
+ * should be changed.
+ *
+ * The point of this feature is to provide a way to allow introducing
+ * small new features to the binary encoding of types or the actual
+ * v3 protocol while still allowing full backward compatibility with
+ * old clients. The client can be an application using postgres,
+ * any tool using COPY BINARY or another postgres backend using 
+ * replication (TODO: does this affect replication?).
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/util/binprotoversion.h
+ *
+ *-
+ */
+#ifndef BINARYMINORVERSION_H
+#define BINARYMINORVERSION_H
+
+#define SUPPORTED_BINARY_MINOR_VERSION 1
+
+#endif
diff --g

Re: [HACKERS] [JDBC] Optimize postgres protocol for fixed size arrays

2011-11-24 Thread Mikko Tiihonen


On 11/24/2011 02:36 AM, Kevin Grittner wrote:

Oliver Jowett  wrote:


Can we get a mechanism for minor protocol changes in this future
version? Something as simple as exchanging a list of protocol
features during the initial handshake (then use only features that
are present on both sides) would be enough. The difficulty of
making any protocol changes at the moment is a big stumbling block.


I've been thinking the same thing.  Any new protocol should include a
way for each side to publish a list of what it can accept from the
other during initial handshaking.


(You could probably retrofit that to the current protocol version)


Perhaps.  It would be great if both sides could recognize the case
where the "feature negotiation" was absent and use a default feature
list for the protocol available on the other end.


What about a hand-shake protocol based on simple binary-protocol minor
version instead of features. We keep the v3 protocol as is but can
add cumulative conditionally enabled features when we bump the minor
version.

The hand shake requires that the server sends a parameter back with
it's highest supported minor version:
FE=> StartupPacket
<=BE ParameterStatus(binary_minor = 23)

And the client can send any number between 1<=binary_minor back to
enable newer protocol versions and/or limit what the server sends
FE=> Execute(SET binary_minor = 20)

To keep full backwards compatibility:
1) if backend does not send a binary_minor parameter on connection the
   highest supported minor version is assumed to be 0 (current format)
2) the backend assumes the active minor version is 0 unless the
   SET binary_minor is received

I think bumping a minor version is better than feature flags because:
1) the hand shake is simpler and faster
2) coding is easier as all previous features are known to be supported
   and active when implementing feature+1

I'm not exactly sure about the COPY BINARY feature Tom mentioned. But
probably we could prefix the data with the int4 containing the
minor version?

-Mikko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread Mikko Tiihonen


Hi,

During conversion of the jdbc driver to use binary encoding when receiving 
array objects from postgres it was noticed
that for example for int[] arrays the binary encoding is normally 30% to 200% 
larger in bytes than the text encoding.

This was of concern to some users with slower links to database. Even though 
the encoded size was larger the binary
encoding was still constantly faster (with 50% speed up for float[]).

Here is a patch that adds a new flag to the protocol that is set when all 
elements of the array are of same fixed size.
When the bit is set the 4 byte length is only sent once and not for each 
element. Another restriction is that the flag
can only be set when there are no NULLs in the array.

The postgres part is my first try at the problem and I really need some 
feedback around the detection of fixed size
elements. I just made a guess and noticed that typlen != 0 seemed to work for 
the basic types I user for testing.

First the postgres patch:

diff --git a/src/backend/utils/adt/arrayfuncs.c 
b/src/backend/utils/adt/arrayfuncs.c
index bfb6065..970272f 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -86,7 +86,7 @@ static void ReadArrayBinary(StringInfo buf, int nitems,
FmgrInfo *receiveproc, Oid typioparam, int32 
typmod,
int typlen, bool typbyval, char typalign,
Datum *values, bool *nulls,
-   bool *hasnulls, int32 *nbytes);
+   bool *hasnulls, int32 *nbytes, bool fixedlen);
 static void CopyArrayEls(ArrayType *array,
 Datum *values, bool *nulls, int nitems,
 int typlen, bool typbyval, char typalign,
@@ -1242,7 +1242,7 @@ array_recv(PG_FUNCTION_ARGS)
ndim, MAXDIM)));

flags = pq_getmsgint(buf, 4);
-   if (flags != 0 && flags != 1)
+   if ((flags & ~3) != 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 errmsg("invalid array flags")));
@@ -1327,7 +1327,7 @@ array_recv(PG_FUNCTION_ARGS)
&my_extra->proc, typioparam, typmod,
typlen, typbyval, typalign,
dataPtr, nullsPtr,
-   &hasnulls, &nbytes);
+   &hasnulls, &nbytes, (flags & 2) != 0);
if (hasnulls)
{
dataoffset = ARR_OVERHEAD_WITHNULLS(ndim, nitems);
@@ -1390,7 +1390,8 @@ ReadArrayBinary(StringInfo buf,
Datum *values,
bool *nulls,
bool *hasnulls,
-   int32 *nbytes)
+   int32 *nbytes,
+   bool fixedlen)
 {
int i;
boolhasnull;
@@ -1403,7 +1404,7 @@ ReadArrayBinary(StringInfo buf,
charcsave;

/* Get and check the item length */
-   itemlen = pq_getmsgint(buf, 4);
+   itemlen = fixedlen ? typlen : pq_getmsgint(buf, 4);
if (itemlen < -1 || itemlen > (buf->len - buf->cursor))
ereport(ERROR,

(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
@@ -1496,6 +1497,7 @@ array_send(PG_FUNCTION_ARGS)
int bitmask;
int nitems,
i;
+   int flags;
int ndim,
   *dim;
StringInfoData buf;
@@ -1535,6 +1537,8 @@ array_send(PG_FUNCTION_ARGS)
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;

+   flags = ARR_HASNULL(v) ? 1 : (typlen > 0 ? 2 : 0);
+
ndim = ARR_NDIM(v);
dim = ARR_DIMS(v);
nitems = ArrayGetNItems(ndim, dim);
@@ -1543,7 +1547,7 @@ array_send(PG_FUNCTION_ARGS)

/* Send the array header information */
pq_sendint(&buf, ndim, 4);
-   pq_sendint(&buf, ARR_HASNULL(v) ? 1 : 0, 4);
+   pq_sendint(&buf, flags, 4);
pq_sendint(&buf, element_type, sizeof(Oid));
for (i = 0; i < ndim; i++)
{
@@ -1571,7 +1575,8 @@ array_send(PG_FUNCTION_ARGS)

itemvalue = fetch_att(p, typbyval, typlen);
outputbytes = SendFunctionCall(&my_extra->proc, 
itemvalue);
-   pq_sendint(&buf, VARSIZE(outputbytes) - VARHDRSZ, 4);
+   if ((flags & 2) == 0)
+   pq_sendint(&buf, VARSIZE(outputbytes) - 
VARHDRSZ, 4);
pq_sendbytes(&buf, VARDATA(outputbytes),

Re: [HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to

[HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to

Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

Re: [HACKERS] Pipelining executions to postgresql server

Re: [HACKERS] Pipelining executions to postgresql server

Re: [HACKERS] Pipelining executions to postgresql server

[HACKERS] Pipelining executions to postgresql server

[HACKERS] Documenting the Frontend/Backend Protocol update criteria

Re: [HACKERS] Use gcc built-in atomic inc/dec in lock.c

Re: [HACKERS] Use gcc built-in atomic inc/dec in lock.c

[HACKERS] Use gcc built-in atomic inc/dec in lock.c

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

Re: [HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility

[HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility

Re: [HACKERS] [JDBC] Optimize postgres protocol for fixed size arrays

[HACKERS] Optimize postgres protocol for fixed size arrays

17 matches

Site Navigation

Mail list logo

Footer information