Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-05 Thread Victor Wagner
On Tue, 6 Sep 2016 07:58:28 +0530
Mithun Cy  wrote:

> 
> Now if only one host is in connection string and it ask for read_write
> connection(connect to master) I mean read_only is set 0 explicitly.
> With above logic we will allow it to connect to standby?. I still
> think psql connection to standby should be handled by changing the
> default value of readonly to 1 (which means connect to any). 

It would definitely be more logical, but I haven't found easy way to do
it. May be I should return to this place and rethink. I don't like and
idea to explicitely ignore connection string parameter due to some
condition. 

Really, I think, test for replication connection can be either
removed from this part of code now.  


> Thinking
> further probably replacing readonly parameter with
> targetServerType=any|master (with default being any) should clear
> some confusions and bring consistency since same is used in JDBC
> multi host connection string.

It seems that in this context change readonly=0|1 to
targetServerType=any|master  makes sense.


 
> 2)
> @@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn)
>portstr,
>(int) (UNIXSOCK_PATH_BUFLEN - 1));
>   conn->options_valid = false;
> + free(nodes->port);
> 
> nodes->port was not allocated at this point.
> 

I'll recheck it.

-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-05 Thread Mithun Cy
On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>After a brief examination I've found following ways to improve the patch.
Adding to above comments.

1)
+ /*
+ * consult connection options and check if RO connection is OK
+ * RO connection is OK if readonly connection is explicitely
+ * requested or if replication option is set, or just one
+ * host is specified in the connect string.
+ */
+ if ((conn->pghost==NULL || strchr(conn->pghost,',')==NULL) ||
+ ((conn->read_only && conn->read_only[0] > '0'))
+ ||(conn->replication && conn->replication[0])
+ )
+ {

Now if only one host is in connection string and it ask for read_write
connection(connect to master) I mean read_only is set 0 explicitly. With
above logic we will allow it to connect to standby?. I still think psql
connection to standby should be handled by changing the default value of
readonly to 1 (which means connect to any). Thinking further probably
replacing readonly parameter with targetServerType=any|master (with default
being any) should clear some confusions and bring consistency since same is
used in JDBC multi host connection string.

2)
@@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn)
   portstr,
   (int) (UNIXSOCK_PATH_BUFLEN - 1));
  conn->options_valid = false;
+ free(nodes->port);

nodes->port was not allocated at this point.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-05 Thread Aleksander Alekseev
Hello, Victor.

> As community shows an interest with this patch, and it has been
> included in the current commitfest, I've to submit it.

Naturally we show interest in this patch. It's a great feature that
would be very nice to have!

> This version of patch includes
> 
> 1. Most important - a tap test suite (really small enough and has a
> room for improvements).
> 
> 2. Compatibility with older versions - now readonly check is skipped if
> only one hostname is specified in the connect string
> 
> 3. pg_host and pg_port variables now show host name and port library
> was connected to.
> 
> 4. Your fix with compilation of ecpg tests.
> 
> Patch is attached.

After a brief examination I've found following ways to improve the
patch.

1) It looks like code is not properly formatted.

2)

> +  readonly
> +  
> +  
> +  If this parameter is 0 (the default), upon successful connection
> +  library checks if host is in recovery state, and if it is so, 
> +  tries next host in the connect string. If this parameter is
> +  non-zero, connection to warm standby nodes are considered
> +  successful.

IIRC the are "warm" and "hot" standby's in PostgreSQL terminology. The
difference is that you can't execute any queries on a "warm" standby. So
the description is probably wrong. I suggest to fix it to just "...
connection to standby nodes are...".

3)

> + falover_timeout
> + 
> + 
> + Maximum time to cycilically retry all the hosts in commect string.
> + (as decimal integer number of seconds). If not specified, then
> + hosts are tried just once.
> + 

Typos: cycilically, commect

4) 

>  static int 
>  connectDBStart(PGconn *conn)
>  {
> +   struct nodeinfo
> +   {
> +   char   *host;
> +   char   *port;
> +   };

Not sure if all compilers we support can parse this. I suggest to
declare nodinfo structure outside of procedure, just to be safe.

5)

> +   nodes = calloc(sizeof(struct nodeinfo), 2); 
> +   nodes->host = strdup(conn->pghostaddr);
> hint.ai_family = AF_UNSPEC;

> +   /* Parse comma-separated list of host-port pairs into
> +  function-local array of records */
> +   nodes = malloc(sizeof(struct nodeinfo) * 4); 


> /* pghostaddr and pghost are NULL, so use Unix domain
>  * socket */
> -   node = NULL;
> +   nodes = calloc(sizeof(struct nodeinfo), 2); 

> +   sprintf(port,"%d", htons(((struct sockaddr_in
> *)ai->ai_addr)->sin_port));
> +   return strdup(port);

You should check return values of malloc, calloc and strdup procedures,
or use safe pg_* equivalents.

6) 

> +   for (p = addrs; p->ai_next != NULL; p =
>   p->ai_next);
> +   p->ai_next = this_node_addrs;

Looks a bit scary. I suggest to add an empty scope ( {} ) and a comment
to make our intention clear here.

7) Code compiles with following warnings (CLang 3.8):

> 1 warning generated.
> fe-connect.c:5239:39: warning: if statement has empty body
> [-Wempty-body]
>
> errorMessage,
> false, true));
>   
> ^
> fe-connect.c:5239:39: note: put the semicolon on a separate line to
> silence this warning
> 1 warning generated.

8) get_next_element procedure implementation is way too smart (read
"complicated"). You could probably just store current list length and
generate a random number between 0 and length-1.

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-04 Thread Victor Wagner
On Mon, 5 Sep 2016 07:59:02 +0530
Mithun Cy  wrote:

> On Aug 31, 2016 1:44 PM, "Victor Wagner"  wrote:
> > Thanks, I've added this to 7-th (yet unpublished here) version of my
> > patch.
> Hi victor, just wanted know what your plan for your patch 07. Would
> you like to submit it to the community. I have just signed up as a
> reviewer for your patch.

As community shows an interest with this patch, and it has been
included in the current commitfest, I've to submit it.

This version of patch includes

1. Most important - a tap test suite (really small enough and has a
room for improvements).

2. Compatibility with older versions - now readonly check is skipped if
only one hostname is specified in the connect string

3. pg_host and pg_port variables now show host name and port library
was connected to.

4. Your fix with compilation of ecpg tests.

Patch is attached.



-- 
   Victor Wagner 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index f22e3da..835061d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  readonly
+  
+  
+  If this parameter is 0 (the default), upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  non-zero, connection to warm standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ falover_timeout
+ 
+ 
+ Maximum time to cycilically retry all the hosts in commect string.
+ (as decimal integer number of seconds). If not specified, then
+ hosts are tried just once.
+ 
+ 
+ If we have replicating cluster, and master node 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-04 Thread Mithun Cy
On Aug 31, 2016 1:44 PM, "Victor Wagner"  wrote:
> Thanks, I've added this to 7-th (yet unpublished here) version of my
> patch.
Hi victor, just wanted know what your plan for your patch 07. Would you
like to submit it to the community. I have just signed up as a reviewer for
your patch.


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-31 Thread Victor Wagner
On Tue, 30 Aug 2016 14:54:57 +0530
Mithun Cy  wrote:

> On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy
>  wrote:
> >  
> > >rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags
> > >-lecpg  
> > -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o
> > test1  
> > >../../../../../src/interfaces/libpq/libpq.so: undefined reference
> > >to  
> > `pg_usleep'
> >  
> 
> As similar to psql I have added -lpq for compilation. So now ecpg
> tests compiles and make check-world has passed.

Thanks, I've added this to 7-th (yet unpublished here) version of my
patch, available on my site.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-31 Thread Victor Wagner
On Fri, 26 Aug 2016 10:10:33 +0530
Mithun Cy  wrote:

> On Thu, Mar 17, 2016 at 4:47 AM, David Steele 
> wrote:
> >Since there has been no response from the author I have marked this
> >patch  
> "returned with feedback".  Please feel free >to resubmit for 9.7!
> I have started to work on this patch, and tried to fix some of the
> issues discussed above. The most recent patch 06 has fixed many
> issues which was raised previously which include indefinite looping,
> crashes. And, some of the issues which remain pending are.
> 
> 1. Connection status functions PQport, PQhost do not give
> corresponding values of established connection. -- Have attached the
> patch for same.
> 
> 2. Default value of readonly parameter is 0, which means should
> connect to master only. So absence of parameter in connection string
> in simple cases like "./psql -p PORT database" fails to connect to
> hot standby server.  I think since if readonly=1 means connect to
> any. and readonly=0 means connect to master only, we should change
> the default value  to 1, to handle the cases when parameter is not
> specified.
> 
> JFYI Interestingly PostgreSql JDBC driver have following options
> targetServerType=any|master|slave|preferSlave for same purpose.
> 
> Also did a little bit of patch clean-up.
> 
> Also found some minor issues related to build which I am working on.
> 1. make check-world @src/interfaces/ecpg/test/connect fails with
> following error:
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -O0 -pthread -D_REENTRANT
> -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o -L../../ecpglib
> -L../../pgtypeslib -L../../../../../src/interfaces/libpq
> -L../../../../../src/port -L../../../../../src/common -Wl,--as-needed
> -Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags
> -lecpg -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm
> -o test1 ../../../../../src/interfaces/libpq/libpq.so: undefined
> reference to `pg_usleep'
> collect2: error: ld returned 1 exit status
> make[3]: *** [test1] Error 1
> make[3]: Leaving directory `/home/mithun/mynewpost/p1/
> src/interfaces/ecpg/test/connect'
> 
> patch has used pg_usleep() which is in L../../../../../src/port  I
> think dependency is not captured rightly.
> 
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-30 Thread Mithun Cy
On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy 
wrote:
>
> >rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg
> -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
> >../../../../../src/interfaces/libpq/libpq.so: undefined reference to
> `pg_usleep'
>

As similar to psql I have added -lpq for compilation. So now ecpg tests
compiles and make check-world has passed.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


libpq-failover-ecpg-make-01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-25 Thread Victor Wagner
On Fri, 26 Aug 2016 10:10:33 +0530
Mithun Cy  wrote:

> On Thu, Mar 17, 2016 at 4:47 AM, David Steele 
> wrote:
> >Since there has been no response from the author I have marked this
> >patch
> "returned with feedback".  Please feel free >to resubmit for 9.7!
> I have started to work on this patch, and tried to fix some of the
> issues discussed above. The most recent patch 06 has fixed many
> issues which was raised previously which include indefinite looping,
> crashes. And, some of the issues which remain pending are.

Although I have not resubmutted my patch yet, I've continue to work on
it, so you can see my last version on 

https://www.wagner.pp.ru/fossil/pgfailover.

It seems that I've already fixed some of issues you mentioned below.
Also I've added testsuite, which is incomplete.

> JFYI Interestingly PostgreSql JDBC driver have following options
> targetServerType=any|master|slave|preferSlave for same purpose.

Probably we should havbe a comptibility alias.


-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-25 Thread Mithun Cy
On Thu, Mar 17, 2016 at 4:47 AM, David Steele  wrote:
>Since there has been no response from the author I have marked this patch
"returned with feedback".  Please feel free >to resubmit for 9.7!
I have started to work on this patch, and tried to fix some of the issues
discussed above. The most recent patch 06 has fixed many issues which was
raised previously which include indefinite looping, crashes. And, some of
the issues which remain pending are.

1. Connection status functions PQport, PQhost do not give corresponding
values of established connection. -- Have attached the patch for same.

2. Default value of readonly parameter is 0, which means should connect to
master only. So absence of parameter in connection string in simple cases
like "./psql -p PORT database" fails to connect to hot standby server.  I
think since if readonly=1 means connect to any. and readonly=0 means
connect to master only, we should change the default value  to 1, to handle
the cases when parameter is not specified.

JFYI Interestingly PostgreSql JDBC driver have following options
targetServerType=any|master|slave|preferSlave for same purpose.

Also did a little bit of patch clean-up.

Also found some minor issues related to build which I am working on.
1. make check-world @src/interfaces/ecpg/test/connect fails with following
error:
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-g -O0 -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS
test1.o -L../../ecpglib -L../../pgtypeslib
-L../../../../../src/interfaces/libpq
-L../../../../../src/port -L../../../../../src/common -Wl,--as-needed
-Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg
-lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
../../../../../src/interfaces/libpq/libpq.so: undefined reference to
`pg_usleep'
collect2: error: ld returned 1 exit status
make[3]: *** [test1] Error 1
make[3]: Leaving directory `/home/mithun/mynewpost/p1/
src/interfaces/ecpg/test/connect'

patch has used pg_usleep() which is in L../../../../../src/port  I think
dependency is not captured rightly.

Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index f22e3da..835061d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-03-18 Thread David Steele

On 3/10/16 7:34 AM, Simon Riggs wrote:

On 9 March 2016 at 23:11, David Steele > wrote:

There hasn't been any movement on this patch in a while.  Will you
have a new tests ready for review soon?


I see the value in this feature, but the patch itself needs work and
probably some slimming down/reality and a good spellcheck.

If someone takes this on soon it can go into 9.6, otherwise I vote to
reject this early to avoid wasting review time.


Since there has been no response from the author I have marked this 
patch "returned with feedback".  Please feel free to resubmit for 9.7!


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-03-10 Thread Simon Riggs
On 9 March 2016 at 23:11, David Steele  wrote:


> There hasn't been any movement on this patch in a while.  Will you have a
> new tests ready for review soon?
>

I see the value in this feature, but the patch itself needs work and
probably some slimming down/reality and a good spellcheck.

If someone takes this on soon it can go into 9.6, otherwise I vote to
reject this early to avoid wasting review time.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-03-09 Thread David Steele

Hi Victor,

On 2/1/16 5:05 PM, Alvaro Herrera wrote:

Victor Wagner wrote:

On Fri, 22 Jan 2016 16:36:15 -0300
Alvaro Herrera  wrote:


You're editing the expected file for the libpq-regress thingy, but you
haven't added any new lines to test the new capability.  I think it'd
be good to add some there.  (I already said this earlier in the
thread; is there any reason you ignored it the first time?)

I seriously doubt that this program can be used to test new
capabilities.

All it does, it calls PQconninfoParse and than examines some fields of
PGconn structure.

Ah, you're right, I didn't remember that.


If I add some new uris, than only thing I can test is that comma is
properly copied from the URI to this field. And may be that some syntax
errors are properly detected.

Yeah, we should do that.


So, I think that new functionality need other approach for testing.
There should be test of real connection to real temporary cluster.
Probably, based on Perl TAP framework which is actively used in the
Postgres recently.

Yes, agreed.  So please have a look at that one and share your opinion
about it.  It'd be useful.

Meanwhile I'm moving the patch to the next commitfest.


There hasn't been any movement on this patch in a while.  Will you have 
a new tests ready for review soon?


Thanks,

--
-David
da...@pgmasters.net



Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-02-01 Thread Alvaro Herrera
Victor Wagner wrote:
> On Fri, 22 Jan 2016 16:36:15 -0300
> Alvaro Herrera  wrote:
> 
> > You're editing the expected file for the libpq-regress thingy, but you
> > haven't added any new lines to test the new capability.  I think it'd
> > be good to add some there.  (I already said this earlier in the
> > thread; is there any reason you ignored it the first time?)
> 
> I seriously doubt that this program can be used to test new
> capabilities.
> 
> All it does, it calls PQconninfoParse and than examines some fields of
> PGconn structure.

Ah, you're right, I didn't remember that.

> If I add some new uris, than only thing I can test is that comma is
> properly copied from the URI to this field. And may be that some syntax
> errors are properly detected.

Yeah, we should do that.

> So, I think that new functionality need other approach for testing.
> There should be test of real connection to real temporary cluster.
> Probably, based on Perl TAP framework which is actively used in the
> Postgres recently.

Yes, agreed.  So please have a look at that one and share your opinion
about it.  It'd be useful.

Meanwhile I'm moving the patch to the next commitfest.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Victor Wagner
On Fri, 22 Jan 2016 16:36:15 -0300
Alvaro Herrera  wrote:

> You're editing the expected file for the libpq-regress thingy, but you
> haven't added any new lines to test the new capability.  I think it'd
> be good to add some there.  (I already said this earlier in the
> thread; is there any reason you ignored it the first time?)

I seriously doubt that this program can be used to test new
capabilities.

All it does, it calls PQconninfoParse and than examines some fields of
PGconn structure.

The only reason I've to modify expected output is that I changed usage
of one of these field, and keep there comma-separated list of host:port
pairs instead of just hostname.

Thus contents this field after parsing of some existing URIs is
changed, while semantic of URIs is same.

If I add some new uris, than only thing I can test is that comma is
properly copied from the URI to this field. And may be that some syntax
errors are properly detected.

 
> If the test program requires improvement to handle the new stuff,
> let's do that.

The only improvement I can think of, is to examine list of the addrinfo
structures into which host list is eventually parsed. But it is quite
problematic, because it depends on many factors which are outside of
our control. 

It stores addresses resolved via OS's name resolver.

For example, if we specify 'localhost' there it can be parsed into one
or to records, depending on presence of IPv6 support.

If we use some other hostname here, we'll rely on internet connectivity
and DNS system. And we cannot ensure that any name to IP mapping will
persist for a long enough time. 

So, I think that new functionality need other approach for testing.
There should be test of real connection to real temporary cluster.
Probably, based on Perl TAP framework which is actively used in the
Postgres recently.
> 



-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Thom Brown
On 23 January 2016 at 03:32, Thom Brown  wrote:
> On 22 January 2016 at 19:30, Victor Wagner  wrote:
>> On Tue, 19 Jan 2016 14:34:54 +
>> Thom Brown  wrote:
>>
>>>
>>> The segfault issue I originally reported now appears to be resolved.
>>>
>>> But now I have another one:
>>>
>>> psql
>>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
>>> -c 'show port'
>>
>> Here is new version of the patch. Now I've reworked hostorder=random and
>> it seems to work as well as sequential. failover_timeout works too.
>> I've also found a case when attempt to connect fail when receiving
>> FATAL message from server which is not properly up yet. So, it is fixed
>> too.
>>
>> Addititonally, error messages from all failed connect attempts are not
>> accumulated now. Only last one is returned.
>
> I can't connect to a standby with the patch applied:
>
> thom@swift:~/Development/test$ psql -p 5531 postgres
> psql: thom@swift:~/Development/test$
>
> No error message, nothing in the logs.  I find this is the case with
> any standby, but doesn't affect primaries.
>
> So this has broken existing functionality somewhere.

Okay, I've tested this again with additional logging.  Again, I'm just
running "psql -p 5531 postgres", which connects to a standby.  This
immediately exits psql, and the logs show:

2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG:  0:
connection received: host=[local]
2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION:
BackendInitialize, postmaster.c:4081
2016-01-24 15:04:59.880 GMT - thom - postgres LOG:  0: connection
authorized: user=thom database=postgres
2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION:
PerformAuthentication, postinit.c:272
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0: statement:
SELECT pg_catalog.pg_is_in_recovery()
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
exec_simple_query, postgres.c:935
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0: duration: 0.583 ms
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
exec_simple_query, postgres.c:1164
2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0:
disconnection: session time: 0:00:00.007 user=thom database=postgres
host=[local]
2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
log_disconnections, postgres.c:4458

This shouldn't be checking whether it's a standby.  I also noticed that with:

psql 
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random=1'
-c 'show port'

The standby at port 5533 shows in the logs that it's checking whether
it's a standby when it happens to hit it.  Shouldn't this be
unnecessary if we've set "readonly" to 1?  The result of the query
doesn't appear to be useful for anything.

Another thing I've noticed is that the PORT variable (shown by \set)
always shows PGPORT, but I expect it to equal the port of whichever
host we successfully connected to.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Thom Brown
On 24 January 2016 at 15:30, Thom Brown  wrote:
> On 23 January 2016 at 03:32, Thom Brown  wrote:
>> On 22 January 2016 at 19:30, Victor Wagner  wrote:
>>> On Tue, 19 Jan 2016 14:34:54 +
>>> Thom Brown  wrote:
>>>

 The segfault issue I originally reported now appears to be resolved.

 But now I have another one:

 psql
 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
 -c 'show port'
>>>
>>> Here is new version of the patch. Now I've reworked hostorder=random and
>>> it seems to work as well as sequential. failover_timeout works too.
>>> I've also found a case when attempt to connect fail when receiving
>>> FATAL message from server which is not properly up yet. So, it is fixed
>>> too.
>>>
>>> Addititonally, error messages from all failed connect attempts are not
>>> accumulated now. Only last one is returned.
>>
>> I can't connect to a standby with the patch applied:
>>
>> thom@swift:~/Development/test$ psql -p 5531 postgres
>> psql: thom@swift:~/Development/test$
>>
>> No error message, nothing in the logs.  I find this is the case with
>> any standby, but doesn't affect primaries.
>>
>> So this has broken existing functionality somewhere.
>
> Okay, I've tested this again with additional logging.  Again, I'm just
> running "psql -p 5531 postgres", which connects to a standby.  This
> immediately exits psql, and the logs show:
>
> 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOG:  0:
> connection received: host=[local]
> 2016-01-24 15:04:59.879 GMT - [unknown] - [unknown] LOCATION:
> BackendInitialize, postmaster.c:4081
> 2016-01-24 15:04:59.880 GMT - thom - postgres LOG:  0: connection
> authorized: user=thom database=postgres
> 2016-01-24 15:04:59.880 GMT - thom - postgres LOCATION:
> PerformAuthentication, postinit.c:272
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0: statement:
> SELECT pg_catalog.pg_is_in_recovery()
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> exec_simple_query, postgres.c:935
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0: duration: 0.583 ms
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> exec_simple_query, postgres.c:1164
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0:
> disconnection: session time: 0:00:00.007 user=thom database=postgres
> host=[local]
> 2016-01-24 15:04:59.886 GMT - thom - postgres LOCATION:
> log_disconnections, postgres.c:4458
>
> This shouldn't be checking whether it's a standby.  I also noticed that with:
>
> psql 
> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535/postgres?hostorder=random=1'
> -c 'show port'
>
> The standby at port 5533 shows in the logs that it's checking whether
> it's a standby when it happens to hit it.  Shouldn't this be
> unnecessary if we've set "readonly" to 1?  The result of the query
> doesn't appear to be useful for anything.
>
> Another thing I've noticed is that the PORT variable (shown by \set)
> always shows PGPORT, but I expect it to equal the port of whichever
> host we successfully connected to.

Actually, the same goes for the HOST variable, which is currently
showing the list of hosts:ports.

Output of \set variables without patch:

HOST = '127.0.0.1'
PORT = 
'5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'

And with patch:

HOST = 
'127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
PORT = '5488'

They're both wrong, but I'm hoping we can just show the right information here.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Victor Wagner
On Sun, 24 Jan 2016 15:30:22 +
Thom Brown  wrote:

в
> Okay, I've tested this again with additional logging.  Again, I'm just
> running "psql -p 5531 postgres", which connects to a standby.  This
> immediately exits psql, and the logs show:

> 2016-01-24 15:04:59.886 GMT - thom - postgres LOG:  0: statement:
> SELECT pg_catalog.pg_is_in_recovery()
> This shouldn't be checking whether it's a standby.  I also noticed
> that with:

This is, of course, incompatibility with previous behavior. Probably,
I should modify this patch, so it would imply readonly flag if only one
host/port pair is specified in the command line.

Now it does check for standby regardless of number of hosts specified.



-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Victor Wagner
On Sun, 24 Jan 2016 15:58:10 +
Thom Brown  wrote:


> 
> Output of \set variables without patch:
> 
> HOST = '127.0.0.1'
> PORT =
> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
> 
> And with patch:
> 
> HOST =
> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
> PORT = '5488'
> 
> They're both wrong, but I'm hoping we can just show the right
> information here.

I think we should show right information here, but it is not so simple.

Problem is that I never keep symbolic representation of individual
host/port pair. And when we connect successfully, we have only struct
sockaddr representation of the it, which contain right IP
address, but doesn't contain symbolic host name.

Moreover, one hostname from connect string can produce more than one
addrinfo structures. For example, on the machines with IPv6 support,
'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
[::1] IPv6, and produces two records. 

So would do any name, which have both A and  records in DNS. And
nothing prevent domain administrator to put more than one A record for
same hostname into DNS zone.


So, it is just same information which can be retrieved from the backend
via 

select inet_client_addr();
select inet_client_port();

What is really interesting for HOST and PORT variable - it is the name
of host and port number used to make actual connection, as they were
specified in the connect string or service file.


> 
> Thom



-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Thom Brown
On 24 January 2016 at 19:53, Victor Wagner  wrote:
> On Sun, 24 Jan 2016 15:58:10 +
> Thom Brown  wrote:
>
>
>>
>> Output of \set variables without patch:
>>
>> HOST = '127.0.0.1'
>> PORT =
>> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>
>> And with patch:
>>
>> HOST =
>> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>> PORT = '5488'
>>
>> They're both wrong, but I'm hoping we can just show the right
>> information here.
>
> I think we should show right information here, but it is not so simple.
>
> Problem is that I never keep symbolic representation of individual
> host/port pair. And when we connect successfully, we have only struct
> sockaddr representation of the it, which contain right IP
> address, but doesn't contain symbolic host name.
>
> Moreover, one hostname from connect string can produce more than one
> addrinfo structures. For example, on the machines with IPv6 support,
> 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
> [::1] IPv6, and produces two records.
>
> So would do any name, which have both A and  records in DNS. And
> nothing prevent domain administrator to put more than one A record for
> same hostname into DNS zone.
>
>
> So, it is just same information which can be retrieved from the backend
> via
>
> select inet_client_addr();
> select inet_client_port();

I think you mean:

select inet_server_addr();
select inet_server_port();

> What is really interesting for HOST and PORT variable - it is the name
> of host and port number used to make actual connection, as they were
> specified in the connect string or service file.

And this is probably not the correct thing for it to report.  The
documentation says "The database server host you are currently
connected to." and "The database server port to which you are
currently connected.", so yeah, I'd expect to see those set to
whatever those 2 functions resolve to.  That being said, if one
connects via a domain socket, those appear to come back blank with
those functions, yet HOST and PORT report the correct information in
those cases (without passing in multiple hosts).  Is that a
pre-existing bug?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-24 Thread Thom Brown
On 24 January 2016 at 20:11, Thom Brown  wrote:
> On 24 January 2016 at 19:53, Victor Wagner  wrote:
>> On Sun, 24 Jan 2016 15:58:10 +
>> Thom Brown  wrote:
>>
>>
>>>
>>> Output of \set variables without patch:
>>>
>>> HOST = '127.0.0.1'
>>> PORT =
>>> '5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>>
>>> And with patch:
>>>
>>> HOST =
>>> '127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533,127.0.0.1:5534,127.0.0.1:5535'
>>> PORT = '5488'
>>>
>>> They're both wrong, but I'm hoping we can just show the right
>>> information here.
>>
>> I think we should show right information here, but it is not so simple.
>>
>> Problem is that I never keep symbolic representation of individual
>> host/port pair. And when we connect successfully, we have only struct
>> sockaddr representation of the it, which contain right IP
>> address, but doesn't contain symbolic host name.
>>
>> Moreover, one hostname from connect string can produce more than one
>> addrinfo structures. For example, on the machines with IPv6 support,
>> 'localhost' hostname is resolved into both 127.0.0.1 IPv4 address and
>> [::1] IPv6, and produces two records.
>>
>> So would do any name, which have both A and  records in DNS. And
>> nothing prevent domain administrator to put more than one A record for
>> same hostname into DNS zone.
>>
>>
>> So, it is just same information which can be retrieved from the backend
>> via
>>
>> select inet_client_addr();
>> select inet_client_port();
>
> I think you mean:
>
> select inet_server_addr();
> select inet_server_port();
>
>> What is really interesting for HOST and PORT variable - it is the name
>> of host and port number used to make actual connection, as they were
>> specified in the connect string or service file.
>
> And this is probably not the correct thing for it to report.  The
> documentation says "The database server host you are currently
> connected to." and "The database server port to which you are
> currently connected.", so yeah, I'd expect to see those set to
> whatever those 2 functions resolve to.  That being said, if one
> connects via a domain socket, those appear to come back blank with
> those functions, yet HOST and PORT report the correct information in
> those cases (without passing in multiple hosts).  Is that a
> pre-existing bug?

I've just checked, and can see that this doesn't appear to be a bug.
As per network.c:

/*
 * IP address that the server accepted the connection on (NULL if Unix socket)
 */

and

/*
 * port that the server accepted the connection on (NULL if Unix socket)
 */

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-22 Thread Alvaro Herrera
You're editing the expected file for the libpq-regress thingy, but you
haven't added any new lines to test the new capability.  I think it'd be
good to add some there.  (I already said this earlier in the thread; is
there any reason you ignored it the first time?)

If the test program requires improvement to handle the new stuff, let's
do that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-22 Thread Victor Wagner
On Tue, 19 Jan 2016 14:34:54 +
Thom Brown  wrote:

> 
> The segfault issue I originally reported now appears to be resolved.
> 
> But now I have another one:
> 
> psql
> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
> -c 'show port'

Here is new version of the patch. Now I've reworked hostorder=random and
it seems to work as well as sequential. failover_timeout works too.
I've also found a case when attempt to connect fail when receiving
FATAL message from server which is not properly up yet. So, it is fixed
too.

Addititonally, error messages from all failed connect attempts are not
accumulated now. Only last one is returned.





-- 
   Victor Wagner 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2328d8f..3c7bd15 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  readonly
+  
+  
+  If this parameter is 0 (the default), upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  non-zero, connection to warm standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ falover_timeout
+ 
+ 
+ Maximum time to cycilically retry all the hosts in commect string.
+ (as decimal integer number of seconds). If not specified, then
+ hosts are tried just once.
+ 
+ 
+ If we have replicating cluster, and master node fails, it might
+ take some time to promote one of standby nodes to the new master.
+ So clients which notice that connect to the master fails, can
+ already give up attempt to reestablish a 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-22 Thread Thom Brown
On 22 January 2016 at 19:30, Victor Wagner  wrote:
> On Tue, 19 Jan 2016 14:34:54 +
> Thom Brown  wrote:
>
>>
>> The segfault issue I originally reported now appears to be resolved.
>>
>> But now I have another one:
>>
>> psql
>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
>> -c 'show port'
>
> Here is new version of the patch. Now I've reworked hostorder=random and
> it seems to work as well as sequential. failover_timeout works too.
> I've also found a case when attempt to connect fail when receiving
> FATAL message from server which is not properly up yet. So, it is fixed
> too.
>
> Addititonally, error messages from all failed connect attempts are not
> accumulated now. Only last one is returned.

I can't connect to a standby with the patch applied:

thom@swift:~/Development/test$ psql -p 5531 postgres
psql: thom@swift:~/Development/test$

No error message, nothing in the logs.  I find this is the case with
any standby, but doesn't affect primaries.

So this has broken existing functionality somewhere.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-22 Thread Thom Brown
On 23 January 2016 at 03:32, Thom Brown  wrote:
> On 22 January 2016 at 19:30, Victor Wagner  wrote:
>> On Tue, 19 Jan 2016 14:34:54 +
>> Thom Brown  wrote:
>>
>>>
>>> The segfault issue I originally reported now appears to be resolved.
>>>
>>> But now I have another one:
>>>
>>> psql
>>> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
>>> -c 'show port'
>>
>> Here is new version of the patch. Now I've reworked hostorder=random and
>> it seems to work as well as sequential. failover_timeout works too.
>> I've also found a case when attempt to connect fail when receiving
>> FATAL message from server which is not properly up yet. So, it is fixed
>> too.
>>
>> Addititonally, error messages from all failed connect attempts are not
>> accumulated now. Only last one is returned.
>
> I can't connect to a standby with the patch applied:
>
> thom@swift:~/Development/test$ psql -p 5531 postgres
> psql: thom@swift:~/Development/test$
>
> No error message, nothing in the logs.  I find this is the case with
> any standby, but doesn't affect primaries.
>
> So this has broken existing functionality somewhere.

Okay, I've tested my original complaints, and they appear to be
resolved.  The timeout works fine, the sequential and random orders
behave as expected, and the readonly setting is also working.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-21 Thread Victor Wagner
On Tue, 19 Jan 2016 14:34:54 +
Thom Brown  wrote:

> The segfault issue I originally reported now appears to be resolved.
> 
> But now I have another one:
> 
> psql
> 'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
> -c 'show port'
> 
> Segmentation fault

This segfault appears to be a simple programming error. 
which comes out only when  we need to retry address list (which happens
when no node is available at the time of first scan)

There is code which prevents random hostorder to connect same node
twice a row

while (current != NULL)
{
if (current == conn->addr_cur)
{
current = current->ai_next;
}
count++;
if ((rand()&0x) < 0x1 / count)
{
choice = current;
}
current = current->ai_next;
}


There should be continue;  after first current = current->ai_next;
 
> This is where no nodes are available.  If I remove hostorder=random,
> or replace it with hostorder=sequential, it doesn't segfault.
> However, it then tries to connect to PGHOST on PGPORT repeatedly, even
> if I bring up one of the nodes it should be looking for.  Not only
> this, but it seems to do it forever if failover_timeout is greater
> than 0.

But this turns out to be more complicated matter, so I'm not ready to
provide next version of patch yet.







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-01-19 Thread Thom Brown
On 21 December 2015 at 14:50, Victor Wagner  wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev  wrote:
>
>> Sorry, but there is something wrong with your patch:
>> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
>
> Really, somehow broken version of the patch got into message.
>
> Here is correct one.

The segfault issue I originally reported now appears to be resolved.

But now I have another one:

psql 
'postgresql://thom@127.0.0.1:5530,127.0.0.1:5531,127.0.0.1:5532,127.0.0.1:5533/postgres?hostorder=random=1_timeout=5'
-c 'show port'

Segmentation fault

This is where no nodes are available.  If I remove hostorder=random,
or replace it with hostorder=sequential, it doesn't segfault.
However, it then tries to connect to PGHOST on PGPORT repeatedly, even
if I bring up one of the nodes it should be looking for.  Not only
this, but it seems to do it forever if failover_timeout is greater
than 0.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-21 Thread Alvaro Herrera
Victor Wagner wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev  wrote:
> 
> > Sorry, but there is something wrong with your patch:
> > % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
> 
> Really, somehow broken version of the patch got into message.

It would be good to add a few test cases of this new functionality to
libpq's regress.in file.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-21 Thread Teodor Sigaev

Sorry, but there is something wrong with your patch:
% patch -p1 -C < ~/Downloads/libpq-failover-5.patch

--
|diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
...
Hunk #18 succeeded at 2804.
patch:  malformed patch at line 666: <<< BEGIN MERGE CONFLICT: local 
copy shown first <<<


Victor Wagner wrote:

On Mon, 07 Dec 2015 15:26:33 -0500
Korry Douglas  wrote:



The problem seems to be in PQconnectPoll() in the case for
CONNECTION_AUTH_OK, specifically this code:

/* We can release the address list now. */
pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
conn->addrlist = NULL;
conn->addr_cur = NULL;
That frees up the list of alternative host addresses.  The state
machine then progresses to CONNECTION_CHECK_RO (which invokes
pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the


Thank you for pointing to this problem. I've overlooked it. Probably
I should improve my testing scenario.

I', attaching new version of the patch, which, hopefully, handles
address list freeing correctly.









--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-21 Thread Victor Wagner
On Mon, 21 Dec 2015 17:18:37 +0300
Teodor Sigaev  wrote:

> Sorry, but there is something wrong with your patch:
> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch

Really, somehow broken version of the patch got into message.

Here is correct one.

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9c0e4c8..162bd4f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  readonly
+  
+  
+  If this parameter is 0 (the default), upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  non-zero, connection to warm standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ falover_timeout
+ 
+ 
+ Maximum time to cycilically retry all the hosts in commect string.
+ (as decimal integer number of seconds). If not specified, then
+ hosts are tried just once.
+ 
+ 
+ If we have replicating cluster, and master node fails, it might
+ take some time to promote one of standby nodes to the new master.
+ So clients which notice that connect to the master fails, can
+ already give up attempt to reestablish a connection when new master
+ became available. 
+ 
+ 
+ Setting this parameter to reasonable time makes library try to
+ reconnect all the host in cyclically until new master appears.
+ 
+ 
+ 
  
   client_encoding
   
@@ -7165,6 +7241,18 @@ user=admin
An example file is provided at
share/pg_service.conf.sample.
   
+  
+  If more than one host option present in the section of service file, it
+  is interpeted as alternate servers for failover or load-balancing. See
+   option 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-21 Thread Michael Paquier
On Mon, Dec 21, 2015 at 11:50 PM, Victor Wagner  wrote:
> On Mon, 21 Dec 2015 17:18:37 +0300
> Teodor Sigaev  wrote:
>
>> Sorry, but there is something wrong with your patch:
>> % patch -p1 -C < ~/Downloads/libpq-failover-5.patch
>
> Really, somehow broken version of the patch got into message.
>
> Here is correct one.

Patch is moved to next CF as you are still working on it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-09 Thread Victor Wagner
On Mon, 07 Dec 2015 15:26:33 -0500
Korry Douglas  wrote:


> The problem seems to be in PQconnectPoll() in the case for 
> CONNECTION_AUTH_OK, specifically this code:
> 
>/* We can release the address list now. */
>pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
>conn->addrlist = NULL;
>conn->addr_cur = NULL;
> That frees up the list of alternative host addresses.  The state
> machine then progresses to CONNECTION_CHECK_RO (which invokes 
> pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the

Thank you for pointing to this problem. I've overlooked it. Probably
I should improve my testing scenario.

I', attaching new version of the patch, which, hopefully, handles
address list freeing correctly.




-- 
   Victor Wagner 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9c0e4c8..162bd4f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  readonly
+  
+  
+  If this parameter is 0 (the default), upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  non-zero, connection to warm standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ falover_timeout
+ 
+ 
+ Maximum time to cycilically retry all the hosts in commect string.
+ (as decimal integer number of seconds). If not specified, then
+ hosts are tried just once.
+ 
+ 
+ If we have replicating cluster, and master node fails, it might
+ take some time to promote one of standby nodes to the new master.
+ So clients which notice that connect to the master fails, 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-07 Thread Korry Douglas



I've tried to deal with some of these problems.

My patch have support for following things:

1. Check whether database instance is in the recovery/standby mode and
try to find another one if so.
2. Let cluster management software to have some time to promote one of
the standbys to master. I.e. there can be failover timeout specified to
allow retry after some time if no working master found.

Really there is room for some improvements in handling of connect
timeout (which became much more important thing when ability to try
next host appears). Now it is handled only by blocking-mode connect
functions, not by async state machine. But I decided to publish patch
without these improvements to get feedback from community.

A bit of testing on this turns up a problem.

Consider a connection string that specifies two hosts and a read/write 
connection:


  postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0

If the first host is a healthy standby (meaning that I can connect to it 
but pg_is_in_recovery() returns 't'), the state machine will never move 
on to the second host.


The problem seems to be in PQconnectPoll() in the case for 
CONNECTION_AUTH_OK, specifically this code:


  /* We can release the address list now. */
  pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
  conn->addrlist = NULL;
  conn->addr_cur = NULL;

That frees up the list of alternative host addresses.  The state machine 
then progresses to CONNECTION_CHECK_RO (which invokes 
pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the response 
from the server).  Since we just connected to a standby replica, 
pg_is_in_recovery() returns 't' and the state changes to 
CONNECTION_NEEDED.  The next call to try_next_address() will fail to 
find a next address because we freed the list in the case for 
CONNECTION_AUTH_OK.


A related issue:  when the above sequence occurs, no error message is 
returned (because the case for CONNECTION_NEEDED thinks "an appropriate 
error message is already set up").


In short, if you successfully connect to standby replica (and specify 
readonly=0), the remaining hosts are ignored, even though one of those 
hosts is a master.


And one comment about the code itself - in connectDBStart(), you've 
added quite a bit of code to parse multiple hosts/ports. I would 
recommend adding a comment that shows the expected format, and then 
choosing better variable names (better than 'p', 'q', and 'r'); perhaps 
the variable names could refer to components of the connection string 
that you are parsing (like 'host', 'port', 'delimiter', ...).  That 
would make the code much easier to read/maintain.


Thanks.


-- Korry



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-12-07 Thread Korry Douglas







I've tried to deal with some of these problems.

My patch have support for following things:

1. Check whether database instance is in the recovery/standby mode and
try to find another one if so.
2. Let cluster management software to have some time to promote one of
the standbys to master. I.e. there can be failover timeout specified to
allow retry after some time if no working master found.

Really there is room for some improvements in handling of connect
timeout (which became much more important thing when ability to try
next host appears). Now it is handled only by blocking-mode connect
functions, not by async state machine. But I decided to publish patch
without these improvements to get feedback from community.

A bit of testing on this turns up a problem.

Consider a connection string that specifies two hosts and a read/write 
connection:


  postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0

If the first host is a healthy standby (meaning that I can connect to 
it but pg_is_in_recovery() returns 't'), the state machine will never 
move on to the second host.


The problem seems to be in PQconnectPoll() in the case for 
CONNECTION_AUTH_OK, specifically this code:


  /* We can release the address list now. */
  pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
  conn->addrlist = NULL;
  conn->addr_cur = NULL;

That frees up the list of alternative host addresses.  The state 
machine then progresses to CONNECTION_CHECK_RO (which invokes 
pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the 
response from the server).  Since we just connected to a standby 
replica, pg_is_in_recovery() returns 't' and the state changes to 
CONNECTION_NEEDED.  The next call to try_next_address() will fail to 
find a next address because we freed the list in the case for 
CONNECTION_AUTH_OK.


A related issue:  when the above sequence occurs, no error message is 
returned (because the case for CONNECTION_NEEDED thinks "an 
appropriate error message is already set up").


In short, if you successfully connect to standby replica (and specify 
readonly=0), the remaining hosts are ignored, even though one of those 
hosts is a master.


A follow-up - the conn->addrlist is also freed when the case for 
CONNECTION_CHECK_RW decides that conn->status != CONNECTION_OK and calls 
closePGConn().



-- Korry


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-18 Thread Robert Haas
On Tue, Nov 17, 2015 at 3:20 PM, Peter Eisentraut  wrote:
> On 11/6/15 3:59 PM, Robert Haas wrote:
>> So, I really wonder why we're not happy with the ability to substitute
>> out just the host and IP.
>
> One of my concerns is that the proposed URLs are not valid URLs anymore
> that can be parsed or composed with a URL library, which would be sad.

I agree that's sad.  But compatibility with JDBC is happy, which is
something to think about, at least.

> The other issue is that I think it's taking the implementation down the
> wrong path.

Well, I don't object to the idea of having some kind of a function
that can take a set of connection strings or URLs and try them all.
But then it's not really a connection string any more.  I mean, with
the proposal where we allow the host option to be multiply specified,
you can just plug this feature into any utility whatever that
understands connection strings, and regardless of whether you are
using the classic connection string format or the URL format, it will
work.  And it's a pretty simple extension of what we support already.

psql 'host=foo host=bar failovertime=1'

If we do what you're proposing, then I'm really not clear on how this
works.  Certainly, there can be new libpq functions that take a list
of connection strings or URLs as an argument... but what would this
look like if you wanted to invoke psql this way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-17 Thread Peter Eisentraut
On 11/6/15 3:59 PM, Robert Haas wrote:
> So, I really wonder why we're not happy with the ability to substitute
> out just the host and IP.

One of my concerns is that the proposed URLs are not valid URLs anymore
that can be parsed or composed with a URL library, which would be sad.

The other issue is that I think it's taking the implementation down the
wrong path.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-17 Thread Peter Eisentraut
On 11/5/15 10:12 AM, Victor Wagner wrote:
> 2. You are suggesting that it should be possible to specify all options 
> separately for each of the fallback hosts. 

I'm not necessarily suggesting that.  I think it could very well be

conn="postgresql://foo postgresql://bar order=random"



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-09 Thread Victor Wagner
On Mon, 9 Nov 2015 15:14:02 +0800
Craig Ringer  wrote:


> 
> I'd rather not see convoluted and complex connstrings, per my prior
> post. The JDBC extended URL style seems good enough.

It is what my patch implements.
 
> IMO the biggest problem is that client-side failover is way more
> complex than "got -ECONNREFUSED, try next host". I think we're all
> focusing on not being able to twiddle arbitrary settings while
> ignoring the real problem with client failover, i.e. making it
> actually work in the real world.
> 
I've tried to deal with some of these problems. 

My patch have support for following things:

1. Check whether database instance is in the recovery/standby mode and
try to find another one if so.
2. Let cluster management software to have some time to promote one of
the standbys to master. I.e. there can be failover timeout specified to
allow retry after some time if no working master found.

Really there is room for some improvements in handling of connect
timeout (which became much more important thing when ability to try
next host appears). Now it is handled only by blocking-mode connect
functions, not by async state machine. But I decided to publish patch
without these improvements to get feedback from community.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-08 Thread Craig Ringer
On 7 November 2015 at 04:59, Robert Haas  wrote:

> So, I really wonder why we're not happy with the ability to substitute
> out just the host and IP.

I tend to agree. That solves 95% of the problem and doesn't foreclose
solving the other 5% some other way if someone really cares later.

I'd rather not see convoluted and complex connstrings, per my prior
post. The JDBC extended URL style seems good enough.

IMO the biggest problem is that client-side failover is way more
complex than "got -ECONNREFUSED, try next host". I think we're all
focusing on not being able to twiddle arbitrary settings while
ignoring the real problem with client failover, i.e. making it
actually work in the real world.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-06 Thread Alvaro Herrera
Craig Ringer wrote:
> On 6 November 2015 at 13:34, Robert Haas  wrote:
> 
> >> But some options control how
> >> next host should be choosen (i.e. use random order for load-balancing
> >> or sequential order for high availability), so they should be specified
> >> only once per connect string.
> >
> > But this seems like a point worthy of consideration.
> 
> This makes me think that trying to wedge this into the current API
> using a funky connection string format might be a mistake.
> 
> Lots of these issues would go away if you could provide more than just
> a connstring.

Yes, I agree.  I wonder if the failover aspect couldn't be better
covered by something more powerful than a single URI, such as the
service file format.  Maybe just allow the contents of a service file to
be passed as a "connection string", so that the application/environment
can continue to maintain the connection info as a string somewhere
instead of having to have an actual file.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-06 Thread Robert Haas
On Fri, Nov 6, 2015 at 10:38 AM, Alvaro Herrera
 wrote:
> Craig Ringer wrote:
>> On 6 November 2015 at 13:34, Robert Haas  wrote:
>>
>> >> But some options control how
>> >> next host should be choosen (i.e. use random order for load-balancing
>> >> or sequential order for high availability), so they should be specified
>> >> only once per connect string.
>> >
>> > But this seems like a point worthy of consideration.
>>
>> This makes me think that trying to wedge this into the current API
>> using a funky connection string format might be a mistake.
>>
>> Lots of these issues would go away if you could provide more than just
>> a connstring.
>
> Yes, I agree.  I wonder if the failover aspect couldn't be better
> covered by something more powerful than a single URI, such as the
> service file format.  Maybe just allow the contents of a service file to
> be passed as a "connection string", so that the application/environment
> can continue to maintain the connection info as a string somewhere
> instead of having to have an actual file.

This gets pretty far from the concept of a connection string, which is
supposed to be a sort of universal format for telling anything
PostgreSQL how to find the database server.  For example, psql accepts
a connection string, but you wouldn't want to have to pass the entire
contents of a file, newlines and all, as a command-line argument.
Now, we could design some kind of new connection string format that
would work, like:

psql 'alternatives failovertime=1 (host=192.168.10.49 user=rhaas)
(host=192.168.10.52 user=bob)'

There are a couple of problems with this.  One, it can't be parsed
anything like a normal connection string.  Lots of software will have
to be rewritten to use whatever parser we come up with for the new
format.  Two, it's nothing like the existing JDBC syntax that already
does more or less what people want here.  Three, I think that
implementing it is likely to be an extraordinarily large project.  The
whole data structure that libpq uses to store a parsed connection
string probably needs to change in fairly major ways.

So, I really wonder why we're not happy with the ability to substitute
out just the host and IP.

http://www.postgresql.org/message-id/4dfa5a86.9030...@acm.org
https://jdbc.postgresql.org/documentation/94/connect.html (bottom of page)

Yes, it's possible that some people might need to change some other
parameter when the host or IP changes.  But those people aren't
prohibited from writing their own function that tries to connect to
multiple PostgreSQL servers one after the other using any system
they'd like.  I don't think we really need to cater to every possible
use case in core.  The JDBC feature has been out for several years and
people seem to like it OK - the fact that we can think of things it
doesn't do doesn't make it a bad idea.  I'd rather have a feature that
can do 95% of what people want in the real world next month than 99%
of what people want in 2019, and I'm starting to think we're setting
the bar awfully high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-05 Thread Craig Ringer
On 6 November 2015 at 13:34, Robert Haas  wrote:

>> But some options control how
>> next host should be choosen (i.e. use random order for load-balancing
>> or sequential order for high availability), so they should be specified
>> only once per connect string.
>
> But this seems like a point worthy of consideration.

This makes me think that trying to wedge this into the current API
using a funky connection string format might be a mistake.

Lots of these issues would go away if you could provide more than just
a connstring.

>> Third category of options are specified per-cluster much more often
>> than per node. But are quite often changed from compiled in default.
>
> This, too.

Like this. If you have a global set of connection options, then
per-connection options, it's a lot simpler.

I guess that can be hacked in with a more dramatic change in the
connstring format, otherwise, incorporating subsections or something.
I'm not keen on doing anything like that, but there are all sorts of
options...

"global:[user=fred port=5432] host1:[host=somehost user=user1]
host2:[host=localhost]"


(puts head in brown paper bag)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-05 Thread Robert Haas
On Thu, Nov 5, 2015 at 10:12 AM, Victor Wagner  wrote:
> There are some drawbacks with this approach
>
> 1. You are suggesting usage of two nested loops, instead of one straight
> loop - outer loop over the URLs in the connect string, inner loop over
> the IP addresses each URL resolves into. (This inner loop already
> presents there for several versions of the postgres).

This isn't really a problem.

> 2. You are suggesting that it should be possible to specify all options
> separately for each of the fallback hosts.
>
> But some options control how
> next host should be choosen (i.e. use random order for load-balancing
> or sequential order for high availability), so they should be specified
> only once per connect string.

But this seems like a point worthy of consideration.

> Third category of options are specified per-cluster much more often
> than per node. But are quite often changed from compiled in default.
> Typically there are authentication credentials (even you use
> trigger-based replication, user information would probably be
> replicated), database name (if you use WAL-level replication), client
> encoding (which depends on more on the client than on server), etc.
>
> It would be pain if we would need specify them for each host separately.
> Syntax, implemented in the my patch allows even to specify
> nonstandard-port once for all hosts (using port= parameter in the query
> string) and override it for particular host in the list using colon
> syntax.

This, too.

> 3. Last, but not least. There is more than one postgresql client
> implementation. Apart of libpq we have jdbc, native GO language client
> etc. And I think that maintaining compatibility of the connect string
> between them is good thing. So, I've modelled syntax of multihost
> feature in the libpq URL-style syntax after jdbc syntax.

Also this.

I'm not sure what the right decision is here - I can see both sides of
it to some degree.  But I think your points deserve to be taken
seriously.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-05 Thread Victor Wagner
On 2015.11.02 at 12:15:48 -0500, Peter Eisentraut wrote:

>> That's true, but doesn't allowing every parameter to be multiply
>> specified greatly increase the implementation complexity for a pretty
>> marginal benefit?  

>Well, the way I would have approached is that after all the parsing you
>end up with a list of PQconninfoOption arrays and you you loop over that
>list and call connectDBStart() until you have a connection that
>satisfies you.

>I see that the patch doesn't do that and it looks quite messy as a
>result.

There are some drawbacks with this approach

1. You are suggesting usage of two nested loops, instead of one straight
loop - outer loop over the URLs in the connect string, inner loop over
the IP addresses each URL resolves into. (This inner loop already
presents there for several versions of the postgres).

2. You are suggesting that it should be possible to specify all options 
separately for each of the fallback hosts. 

But some options control how
next host should be choosen (i.e. use random order for load-balancing
or sequential order for high availability), so they should be specified
only once per connect string.

Some other options are
connection level rather then node-level (i.e. if you are using
replication mode of the connection, there is no sense to specify it for
one host and not specify for other). These options should be better
specified such way, that it would be not possible to use conflicting
values.

Third category of options are specified per-cluster much more often
than per node. But are quite often changed from compiled in default.
Typically there are authentication credentials (even you use
trigger-based replication, user information would probably be
replicated), database name (if you use WAL-level replication), client
encoding (which depends on more on the client than on server), etc.

It would be pain if we would need specify them for each host separately.
Syntax, implemented in the my patch allows even to specify
nonstandard-port once for all hosts (using port= parameter in the query
string) and override it for particular host in the list using colon
syntax.

3. Last, but not least. There is more than one postgresql client
implementation. Apart of libpq we have jdbc, native GO language client
etc. And I think that maintaining compatibility of the connect string
between them is good thing. So, I've modelled syntax of multihost
feature in the libpq URL-style syntax after jdbc syntax.
-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-03 Thread Robert Haas
On Mon, Nov 2, 2015 at 12:15 PM, Peter Eisentraut  wrote:
> On 10/30/15 9:26 AM, Robert Haas wrote:
>> That's true, but doesn't allowing every parameter to be multiply
>> specified greatly increase the implementation complexity for a pretty
>> marginal benefit?
>
> Well, the way I would have approached is that after all the parsing you
> end up with a list of PQconninfoOption arrays and you you loop over that
> list and call connectDBStart() until you have a connection that
> satisfies you.
>
> I see that the patch doesn't do that and it looks quite messy as a result.

Hmm, I see.  That approach certainly seems worth considering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-02 Thread Peter Eisentraut
On 10/30/15 9:26 AM, Robert Haas wrote:
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?

Well, the way I would have approached is that after all the parsing you
end up with a list of PQconninfoOption arrays and you you loop over that
list and call connectDBStart() until you have a connection that
satisfies you.

I see that the patch doesn't do that and it looks quite messy as a result.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Victor Wagner
On Fri, 30 Oct 2015 14:26:45 +0100
Robert Haas  wrote:

> On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut 
> wrote:

> 
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?  I think host and IP would hit 98% of the use cases
> here.

As far as I can tell from the experience of writing this patch, it
would greatly increase complexity.

If there should be only need to have multiple hosts, I could almost
completely incapsulate changes into DNS resolving code (which already
allows to handle several addresses). Need to support different port for
each host already required change of internal storage, and as a
consequence changes in the regression test suite
(src/interfaces/libpq/test/regress.out)

But both host and port are used in the same place - in the connect
system call. If we add possibility to different values per host for some
parameter, such as database name, which should be used significantly
later, i.e. during sending of first protocol message, size of patch
would grow may be twice.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Robert Haas
On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut  wrote:
> On 10/28/15 4:18 AM, Victor Wagner wrote:
>> On Mon, 26 Oct 2015 16:25:57 -0400
>> Peter Eisentraut  wrote:
>>
>>> Also, this assumes that all the components other than host and port
>>> are the same.  Earlier there was a discussion about why the ports
>>> would ever need to be different.  Well, why can't the database names
>>> be different? I could have use for that.
>>
>> Because of way postgresql replication is implemented.
>
> There are multiple types of PostgreSQL replication, and there will be
> others in the future.

That's true, but doesn't allowing every parameter to be multiply
specified greatly increase the implementation complexity for a pretty
marginal benefit?  I think host and IP would hit 98% of the use cases
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Christopher Browne
On 30 October 2015 at 09:26, Robert Haas  wrote:
>
> On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut  wrote:
> > On 10/28/15 4:18 AM, Victor Wagner wrote:
> >> On Mon, 26 Oct 2015 16:25:57 -0400
> >> Peter Eisentraut  wrote:
> >>
> >>> Also, this assumes that all the components other than host and port
> >>> are the same.  Earlier there was a discussion about why the ports
> >>> would ever need to be different.  Well, why can't the database names
> >>> be different? I could have use for that.
> >>
> >> Because of way postgresql replication is implemented.
> >
> > There are multiple types of PostgreSQL replication, and there will be
> > others in the future.
>
> That's true, but doesn't allowing every parameter to be multiply
> specified greatly increase the implementation complexity for a pretty
> marginal benefit?  I think host and IP would hit 98% of the use cases
> here.

I think it makes the feature WORSE.  I am getting more and more convinced
that the Correct Solution is for this feature to be handled by submitting
multiple URIs, and my argument isn't even based on any aspects of
implementation complexity.

Take as example the case where I have two database servers I want to
be considered.

a) Database with URI
   postgresql://cbbro...@server-a.example.info:5432/my-first-database

b) Database with URL
   postgresql://rob...@server-b.example.info:7032/my-second-database

With all the "per-variable multiplicities", this would turn into a
combinatorial explosion of combinations, some 2^4, or 16 possible servers,
some of which likely don't exist, and others of which aren't proper (e.g. -
trying to connect to database a) using robert's credentials).

Possibly some of those combinations are outright improper.

I'm not going to claim it's terribly wise to be doing the cross-credential
bit; I'd think it likely to be rather dumb for the application to use one
user when connecting to one database and another when connecting to
another.  But the notion of it being nondeterministic is just awful.

I head back to... "the way OpenLDAP does this"...
  They let you specify multiple LDAP servers...
ldap://server1.example.info:389 ldap://server2.example.info:389
where URIs are separated by whitespace.

Seems like Best Goodness for Postgres to do something analogous...
   postgresql://cbbro...@server-a.example.info:5432/my-first-database
postgresql://rob...@server-b.example.info:7032/my-second-database

Is it a bit long?  Sure.  But it is unambiguous, and requires *only*
whitespace parsing to separate the URIs.  I'd think it fine for Postgres to
support this via multiple "-d" options; that would be about as good.

That can cover 100% of cases, and I don't see it needing to interact
specially with odd bits such as "was that a replication slot?"  You can
always choose to shoot yourself in the foot, but this doesn't spin the
roulette for you...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread David Fetter
On Fri, Oct 30, 2015 at 11:29:09AM -0400, Christopher Browne wrote:
> On 30 October 2015 at 09:26, Robert Haas  wrote:
> >
> > On Thu, Oct 29, 2015 at 8:29 PM, Peter Eisentraut  wrote:
> > > On 10/28/15 4:18 AM, Victor Wagner wrote:
> > >> On Mon, 26 Oct 2015 16:25:57 -0400
> > >> Peter Eisentraut  wrote:
> > >>
> > >>> Also, this assumes that all the components other than host and port
> > >>> are the same.  Earlier there was a discussion about why the ports
> > >>> would ever need to be different.  Well, why can't the database names
> > >>> be different? I could have use for that.
> > >>
> > >> Because of way postgresql replication is implemented.
> > >
> > > There are multiple types of PostgreSQL replication, and there will be
> > > others in the future.
> >
> > That's true, but doesn't allowing every parameter to be multiply
> > specified greatly increase the implementation complexity for a pretty
> > marginal benefit?  I think host and IP would hit 98% of the use cases
> > here.
> 
> I think it makes the feature WORSE.  I am getting more and more convinced
> that the Correct Solution is for this feature to be handled by submitting
> multiple URIs, and my argument isn't even based on any aspects of
> implementation complexity.
> 
> Take as example the case where I have two database servers I want to
> be considered.
> 
> a) Database with URI
>postgresql://cbbro...@server-a.example.info:5432/my-first-database
> 
> b) Database with URL
>postgresql://rob...@server-b.example.info:7032/my-second-database
> 
> With all the "per-variable multiplicities", this would turn into a
> combinatorial explosion of combinations, some 2^4, or 16 possible servers,
> some of which likely don't exist, and others of which aren't proper (e.g. -
> trying to connect to database a) using robert's credentials).
> 
> Possibly some of those combinations are outright improper.

Let's say that the chances of their all both existing and being proper
are close enough to zero not to matter.

> Seems like Best Goodness for Postgres to do something analogous...
>postgresql://cbbro...@server-a.example.info:5432/my-first-database
> postgresql://rob...@server-b.example.info:7032/my-second-database

Yes.

> Is it a bit long?  Sure.  But it is unambiguous, and requires *only*
> whitespace parsing to separate the URIs.  I'd think it fine for Postgres to
> support this via multiple "-d" options; that would be about as good.

Indeed.  Is there any risk of losing ordering information in the
standard command line processing libraries?

> That can cover 100% of cases, and I don't see it needing to interact
> specially with odd bits such as "was that a replication slot?"  You can
> always choose to shoot yourself in the foot, but this doesn't spin the
> roulette for you...

An evocative image, if not a pretty one.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Josh Berkus
On 10/30/2015 08:29 AM, Christopher Browne wrote:
> I think it makes the feature WORSE.  I am getting more and more convinced
> that the Correct Solution is for this feature to be handled by submitting
> multiple URIs, and my argument isn't even based on any aspects of
> implementation complexity.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-29 Thread Peter Eisentraut
On 10/28/15 4:18 AM, Victor Wagner wrote:
> On Mon, 26 Oct 2015 16:25:57 -0400
> Peter Eisentraut  wrote:
> 
>> Also, this assumes that all the components other than host and port
>> are the same.  Earlier there was a discussion about why the ports
>> would ever need to be different.  Well, why can't the database names
>> be different? I could have use for that.
> 
> Because of way postgresql replication is implemented.

There are multiple types of PostgreSQL replication, and there will be
others in the future.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-28 Thread Victor Wagner
On Mon, 26 Oct 2015 16:25:57 -0400
Peter Eisentraut  wrote:

> Also, this assumes that all the components other than host and port
> are the same.  Earlier there was a discussion about why the ports
> would ever need to be different.  Well, why can't the database names
> be different? I could have use for that.

Because of way postgresql replication is implemented.

This multihost feature is for clusters. Either for automatic switch
from one cluster node to another when master node fails and one of the
standbys get promoted to master, or for load balancing between hot
standby nodes.

Postgresql allows different replicas listen on different ports, because
each replica has its own postgresql conf, but repication works on the
node level, not on the database level, so all the databases from master
node would be replicated with their original names.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-27 Thread Shulgin, Oleksandr
On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne 
wrote:

>
> On 26 October 2015 at 16:25, Peter Eisentraut  wrote:
>
>> On 10/14/15 6:41 AM, Victor Wagner wrote:
>> > 1. It is allowed to specify several hosts in the connect string, either
>> > in URL-style (separated by comma) or in param=value form (several host
>> > parameters).
>>
>> I'm not fond of having URLs that are not valid URLs according to the
>> applicable standards.  Because then they can't be parsed or composed by
>> standard libraries.
>>
>> Also, this assumes that all the components other than host and port are
>> the same.  Earlier there was a discussion about why the ports would ever
>> need to be different.  Well, why can't the database names be different?
>>  I could have use for that.
>>
>> I think you should just accept multiple URLs.
>>
>
> I'd give a "+1" on this...
>
> As an area of new behaviour, I don't see a big problem with declining to
> support every wee bit of libpq configuration, and instead requiring the
> use of URLs.
>
> Trying to put "multiplicities" into each parameter (and then considering
> it at the pg_service level, too) is WAY more complicated, and for a
> feature where it seems to me that it is pretty reasonable to have a
> series of fully qualified URLs.
>
> Specifying several URLs should be easier to understand, easier to
> test, easier to code, and easier to keep from blowing up badly.
>

Setting aside all other concerns, have a +1 from me on that too.

--
Alex


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-27 Thread Josh Berkus
On 10/27/2015 01:42 AM, Shulgin, Oleksandr wrote:
> On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne  > wrote:
> 
> 
> On 26 October 2015 at 16:25, Peter Eisentraut  > wrote:
> 
> On 10/14/15 6:41 AM, Victor Wagner wrote:
> > 1. It is allowed to specify several hosts in the connect string, 
> either
> > in URL-style (separated by comma) or in param=value form (several 
> host
> > parameters).
> 
> I'm not fond of having URLs that are not valid URLs according to the
> applicable standards.  Because then they can't be parsed or
> composed by
> standard libraries.
> 
> Also, this assumes that all the components other than host and
> port are
> the same.  Earlier there was a discussion about why the ports
> would ever
> need to be different.  Well, why can't the database names be
> different?
>  I could have use for that.
> 
> I think you should just accept multiple URLs.
> 
> 
> I'd give a "+1" on this...
> 
> As an area of new behaviour, I don't see a big problem with declining to
> support every wee bit of libpq configuration, and instead requiring the
> use of URLs.
> 
> Trying to put "multiplicities" into each parameter (and then considering
> it at the pg_service level, too) is WAY more complicated, and for a
> feature where it seems to me that it is pretty reasonable to have a
> series of fully qualified URLs.
> 
> Specifying several URLs should be easier to understand, easier to
> test, easier to code, and easier to keep from blowing up badly.
> 
> 
> Setting aside all other concerns, have a +1 from me on that too.

I'm good with this.  +1


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-26 Thread Christopher Browne
On 26 October 2015 at 16:25, Peter Eisentraut  wrote:

> On 10/14/15 6:41 AM, Victor Wagner wrote:
> > 1. It is allowed to specify several hosts in the connect string, either
> > in URL-style (separated by comma) or in param=value form (several host
> > parameters).
>
> I'm not fond of having URLs that are not valid URLs according to the
> applicable standards.  Because then they can't be parsed or composed by
> standard libraries.
>
> Also, this assumes that all the components other than host and port are
> the same.  Earlier there was a discussion about why the ports would ever
> need to be different.  Well, why can't the database names be different?
>  I could have use for that.
>
> I think you should just accept multiple URLs.
>

I'd give a "+1" on this...

As an area of new behaviour, I don't see a big problem with declining to
support every wee bit of libpq configuration, and instead requiring the
use of URLs.

Trying to put "multiplicities" into each parameter (and then considering
it at the pg_service level, too) is WAY more complicated, and for a
feature where it seems to me that it is pretty reasonable to have a
series of fully qualified URLs.

Specifying several URLs should be easier to understand, easier to
test, easier to code, and easier to keep from blowing up badly.

I'll observe that this is the way that OpenLDAP supports specifying
multiple clients, so this technique is familiar in other contexts
where people are trying to accomplish the same kind of thing.
Sample docs, perhaps not authoritative, but useful enough...

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-26 Thread Peter Eisentraut
On 10/14/15 6:41 AM, Victor Wagner wrote:
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).

I'm not fond of having URLs that are not valid URLs according to the
applicable standards.  Because then they can't be parsed or composed by
standard libraries.

Also, this assumes that all the components other than host and port are
the same.  Earlier there was a discussion about why the ports would ever
need to be different.  Well, why can't the database names be different?
 I could have use for that.

I think you should just accept multiple URLs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-15 Thread Victor Wagner
On 2015.10.14 at 14:47:46 +0200, Shulgin, Oleksandr wrote:

> 
> So what exactly would happen with this command: PGHOST=host1 psql -h host2
> 
> Or this one: PGHOST=host1 psql host=host2

The right thing - value of the PGHOST environment variable is ignored,
and explicitely specified host is used.

> What about service files?

Oops, I've missed something here. Multiple host lines in the service
file do not produce list of alternate hosts.

I'll address this case in the next version of patch.

However, comma-separated list of host in the one host string in the
service file works.


> 
>  trying postgresql://[::1]:12345/db
> -dbname='db' host='::1' port='12345' (inet)
> +dbname='db' host='[::1]:12345' (inet)
> 
> Such a change doesn't look really nice to me.

Really, any alternative would be worse. Only alternative I can imagine
is to replace pghost field in the PGConn structure by array or linked
list of structures with host and port fields.

Now I decided to keep such array local to connectDBStart function, and
store list of host-port pair as string. 

It takes some CPU cycles to reparse this list each time when
reconnection is needed, but this is negligible compared with DNS
resolution of each host name. And we cannot avoid repeated DNS
resolution, because we have to take into account possibility of
DNS-based failover.


> > 2. Url with colon  but no port number after the host no more considered
> > valid.
> >
> 
> We could live with that, but is there a good reason for backward
> compatibility break in this case?

Because we can. Really, I don't see any useful semantic of host with
colon at the end. Null host name with colon and port have useful
semantic (if there is just one host in the  connect string), so it kept
as it was.

In the old version of connection-string parsing code stripping lone
colon of the host name was done by generic code (because port number
have gone to another variable anyway). Now handling of lone
colon is special case, because we just check syntax of host-port pair
list. 

And I have to choose how to handle it - either complain or silently
strip it away. Complaining was similier. We compute port number and
check if it is between 1 and 65535. Zero-length port number is not
integer between 1 and 65535.


> --
> Alex

-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-15 Thread Victor Wagner
On 2015.10.15 at 17:13:46 +0800, Craig Ringer wrote:

> On 14 October 2015 at 18:41, Victor Wagner  wrote:
> 
> > 5. Added new parameter readonly=boolean. If this parameter is false (the
> > default) upon successful connection check is performed that backend is
> > not in the recovery state. If so, connection is not considered usable
> > and next host is tried.
> 
> What constitutes "failed" as far as this is concerned?

failed means "select pg_is_in_recovery() returns true". 
The only reason of adding this parameter is to state that we 
can use  connection to warm-backup slave (or if we cannot).

> Like the PgJDBC approach I wonder how much this'll handle in practice
> and how it'll go with less clear-cut failures like disk-full on a
> replica that's a member of the failover pool, long delays before
> no-route-to-host errors, dns problems, etc.

Really I don't think that disk-full and other such node errors are
problems of client library. It is cluster management software which
should check for these errors and shut down erroneous nodes or at least
disable their listening for connection.

As for long-timeouts on network failures, they can be handled by
existing connect_timeout parameter. 

May be it is worth effort to rewrite this parameter handling. Now it
only used in blocking functions which call connectDBComplete
(PQconnectdb[Params], PQreset and PQsetdbLogin). May be it should be hidden into
PQconnectPoll state machine, so event-driven applications could get its
handling for free.

Really, my initial proposal included hostorder=parallel especially for
this situation. But I've decided not to implement it right now, because
it requires changes in the libpq public API.
 
> Had any chance to simulate network failures?

Testing failover code is really tricky thing. All testing I've done yet
was in the mode "Shudown one of the nodes and see what happens".
It is not too hard to simulate network failures using iptables, but
requires considerable planning of the test scenario.


-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-15 Thread Craig Ringer
On 14 October 2015 at 18:41, Victor Wagner  wrote:

> 5. Added new parameter readonly=boolean. If this parameter is false (the
> default) upon successful connection check is performed that backend is
> not in the recovery state. If so, connection is not considered usable
> and next host is tried.

What constitutes "failed" as far as this is concerned?

Like the PgJDBC approach I wonder how much this'll handle in practice
and how it'll go with less clear-cut failures like disk-full on a
replica that's a member of the failover pool, long delays before
no-route-to-host errors, dns problems, etc.

Had any chance to simulate network failures?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Implement failover on libpq connect level.

2015-10-14 Thread Victor Wagner
On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:

> Rationale
> =
> 
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
> 
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
> 
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
> 
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).
> 

Attached patch which implements client library failover and
loadbalancing as was described in the proposal
<20150818041850.ga5...@wagner.pp.ru>.

This patch implements following fuctionality:

1. It is allowed to specify several hosts in the connect string, either
in URL-style (separated by comma) or in param=value form (several host
parameters).

2. Each host parameter can be accompanied with port specifier separated
by colon. Port, specified such way takes precedence of port parameter or
default port for this particular host only.

3. There is hostorder parameter with two possible valies 'sequential'
and 'random' (default sequential). 'parallel' hostorder described in the
proposal is not yet implemented in this version of patch.

4. In the URL-style connect string parameter loadBalanceHosts=true is
considered equal to 'hostorder=random' for compatibility with jdbc.

5. Added new parameter readonly=boolean. If this parameter is false (the
default) upon successful connection check is performed that backend is
not in the recovery state. If so, connection is not considered usable
and next host is tried.

6. Added new parameter falover_timeout. If no usable host is found and
this parameter is specified, hosts are retried cyclically until this
timeout expires. It gives some time for claster management software to
promote one of standbys to master if master fails. By default there is
no retries.

Some implementation notes:

1. Field  pghost in the PGconn structure now stores comma separated list
of hosts, which is parsed in the connectDBStart. So, expected results of 
some tests in src/interfaces/libpq/test are changed. 

2. Url with colon  but no port number after the host no more considered
valid.

3. With hostorder=random we have to seed libc random number gernerator.
Some care was taken to not to lose entropy if generator was
initialized by app before connection, and to ensure different random
values if several connections are made from same application in one
second (even in single thread). RNG is seeded by xor of current time,
random value from this rng before seeding (if it was seeded earlier, it
keeps entropy) and address of the connection structure. May be it worth
effort adding thread id or pid, but there is no portable way to doing
so, so it would need testing on all supported platform.



diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 0ee018e..79a99ec 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-14 Thread Shulgin, Oleksandr
On Wed, Oct 14, 2015 at 12:41 PM, Victor Wagner  wrote:

> On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:
>
> > Rationale
> > =
> >
> > Since introduction of the WAL-based replication into the PostgreSQL, it
> is
> > possible to create high-availability and load-balancing clusters.
> >
> > However, there is no support for failover in the client libraries. So,
> only
> > way to provide transparent for client application failover is IP address
> > migration. This approach has some limitation, i.e. it requires that
> > master and backup servers reside in the same subnet or may not be
> > feasible for other reasons.
> >
> > Commercial RDBMS, such as Oracle, employ more flexible approach. They
> > allow to specify multiple servers in the connect string, so if primary
> > server is not available, client library tries to connect to other ones.
> >
> > This approach allows to use geographically distributed failover clusters
> > and also is a cheap way to implement load-balancing (which is not
> > possible with IP address migration).
> >
>
> Attached patch which implements client library failover and
> loadbalancing as was described in the proposal
> <20150818041850.ga5...@wagner.pp.ru>.
>
> This patch implements following fuctionality:
>
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).
>

So what exactly would happen with this command: PGHOST=host1 psql -h host2

Or this one: PGHOST=host1 psql host=host2

What about service files?

2. Each host parameter can be accompanied with port specifier separated
> by colon. Port, specified such way takes precedence of port parameter or
> default port for this particular host only.
>
> 3. There is hostorder parameter with two possible valies 'sequential'
> and 'random' (default sequential). 'parallel' hostorder described in the
> proposal is not yet implemented in this version of patch.
>
> 4. In the URL-style connect string parameter loadBalanceHosts=true is
> considered equal to 'hostorder=random' for compatibility with jdbc.
>
> 5. Added new parameter readonly=boolean. If this parameter is false (the
> default) upon successful connection check is performed that backend is
> not in the recovery state. If so, connection is not considered usable
> and next host is tried.
>
> 6. Added new parameter falover_timeout. If no usable host is found and
> this parameter is specified, hosts are retried cyclically until this
> timeout expires. It gives some time for claster management software to
> promote one of standbys to master if master fails. By default there is
> no retries.
>
> Some implementation notes:
>
> 1. Field  pghost in the PGconn structure now stores comma separated list
> of hosts, which is parsed in the connectDBStart. So, expected results of
> some tests in src/interfaces/libpq/test are changed.
>

 trying postgresql://[::1]:12345/db
-dbname='db' host='::1' port='12345' (inet)
+dbname='db' host='[::1]:12345' (inet)

Such a change doesn't look really nice to me.


> 2. Url with colon  but no port number after the host no more considered
> valid.
>

We could live with that, but is there a good reason for backward
compatibility break in this case?

--
Alex


<    1   2