date:20140807

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Michael Paquier

On Thu, Aug 7, 2014 at 8:20 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 It needs to go into 9_4_STABLE as well.
It is worth noticing that the buildfarm is completely in red because
this patch was not backpatched to REL9_4_STABLE:
http://buildfarm.postgresql.org/cgi-bin/show_status.pl
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
  The reason is that during startup DataDir is not set by the time server
  processes config file due to which we process .auto.conf file in second
  pass.  I think ideally it should ignore the invalid setting in such a
  case
  as it does during reload, however currently there is no good way to
  process .auto.conf  incase DataDir is not set, so we handle it
  separately
  in second pass.

 What about picking up only data_directory at the first pass?

 I think that will workout and for that I think we might need to add
 few checks during ProcessConfigFile.  Do you want to address it
 during parsing of config file or during setting of values?

I prefer during paring. The attached patch does that. In this patch,
the first call of ProcessConfigFile() picks up only data_directory. One
concern of this patch is that the logic is a bit ugly. Do you have any
other better idea?

Regards,

-- 
Fujii Masao
*** a/src/backend/utils/misc/guc-file.l
--- b/src/backend/utils/misc/guc-file.l
***
*** 648,653  ParseConfigFp(FILE *fp, const char *config_file, int depth, int elevel,
--- 648,672 
  		else
  		{
  			/*
+ 			 * Pick up only the data_directory if DataDir is not set, which
+ 			 * means that the configuration file is read for the first time and
+ 			 * PG_AUTOCONF_FILENAME file cannot be read yet. In this case,
+ 			 * we should not pick up all the settings except the data_directory
+ 			 * from postgresql.conf yet because they might be overwritten
+ 			 * with the settings in PG_AUTOCONF_FILENAME file which will be
+ 			 * read later. OTOH, since it's ensured that data_directory doesn't
+ 			 * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten
+ 			 * later. Now only the data_directory needs to be picked up to
+ 			 * read PG_AUTOCONF_FILENAME file later.
+ 			 */
+ 			if (DataDir == NULL  strcmp(opt_name, data_directory) != 0)
+ 			{
+ if (token == 0)
+ 	break;
+ continue;
+ 			}
+ 
+ 			/*
  			 * ordinary variable, append to list.  For multiple items of
  			 * same parameter, retain only which comes later.
  			 */
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 4342,4347  SelectConfigFiles(const char *userDoption, const char *progname)
--- 4342,4354 
  		return false;
  	}
  
+ 	/*
+ 	 * Read the configuration file for the first time. This time only
+ 	 * data_directory parameter is picked up to determine the data directory
+ 	 * so that we can read PG_AUTOCONF_FILENAME file next time. Then don't
+ 	 * forget to read the configuration file again later to pick up all the
+ 	 * parameters.
+ 	 */
  	ProcessConfigFile(PGC_POSTMASTER);
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:28 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 11:39 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Fujii Masao masao.fu...@gmail.com writes:
  The patch chooses the last settings for every parameters and ignores
  the
  former settings. But I don't think that every parameters need to be
  processed
  this way. That is, we can change the patch so that only PGC_POSTMASTER
  parameters are processed that way. The invalid settings in the
  parameters
  except PGC_POSTMASTER can be checked by pg_ctl reload as they are now.
  Also this approach can reduce the number of comparison to choose the
  last setting, i.e., n in O(n^2) is the number of uncommented
  *PGC_POSTMASTER*
  parameters (not every parameters). Thought?
 
  I don't find that to be a particularly good idea.  In the first place,
  it introduces extra complication and a surprising difference in the
  behavior of different GUCs.  In the second place, I thought part of the
  point of this patch was to suppress log messages complaining about
  invalid values that then weren't actually used for anything.  That
  issue
  exists just as much for non-POSTMASTER variables.  (IOW, value cannot
  be changed now is not the only log message we're trying to suppress.)
 
  Yep, sounds reasonable. This makes me think that we can suppress
  such invalid message of the parameters which are actually not used
  at not only conf file reload but also *postmaster startup*. That's
  another
  story, though. Anyway, barring any objection, I will commit Amit's
  patch.

 Applied the slightly-modified version!

 Thanks.  There is a commitfest entry [1] for this patch, do you
 want some thing more to be addressed or shall we mark that as
 committed.

 [1]
 https://commitfest.postgresql.org/action/patch_view?id=1500

Yeah, let's mark this as committed because your patch has been committed and
the originally-reported problem has been fixed. We are now discussing another
patch, but I will add that as new CF entry.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Mitsumasa KONDO

Hi,

2014-08-07 13:47 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  On 08/06/2014 08:39 PM, Fujii Masao wrote:
  The WAL files that pg_receivexlog writes will not be re-read soon
  basically,
  so we can advise the OS to release any cached pages when WAL file is
  closed. I feel inclined to change pg_receivexlog that way. Thought?
 
 
  -1. The OS should be smart enough to not thrash the cache by files that
 are
  written sequentially and never read.

OS's buffer strategy is optimized for general situation. Do you forget OS
hackers discussion last a half of year?


 Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
 not,
 so I was thinking that posix_fadvise is called when the server closes WAL
 file.

That's right.


  If we go down this path, we'd need to
  sprinkle posix_fadvises into many, many places.

Why do you aim to be perfect at the beginning?
It is as same as history of postgres, your concern doesn't make sense.


  Anyway, who are we to say that they won't be re-read soon? You might e.g
  have a secondary backup site where you copy the files received by
  pg_receivexlog, as soon as they're completed.

 So whether posix_fadvise is called or not needs to be exposed as an
 user-configurable option. We would need to measure how useful exposing
 that is, though.

By the way, does pg_receivexlog process have fsync() in every WAL commit?
If yes, I think that we need no or less fsync() option for the better
performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data
integrarity.


Best regards,
--
Mitsumasa KONDO

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO




IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.


ISTM that you raise two distinct questions wrt to PostgreSQL, which are, 
is the worst case performance really an issue:

 (1) in general
 (2) wrt adversarial inputs

The answer could be (1) mostly no and (2) maybe yes.

It suggests that where qsort is used, the administrator wary of (2) could 
be allowed to use an alternate implementation, maybe some merge sort, say 
by tweaking a configuration option in postgresql.conf.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO



Hello John,


[...]
In fact, the mentioned paper says this about the subject Moreover, if 
worst-case performance is important, Quicksort is the wrong algorithm.


I fully agree with this conclusion.

--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Heikki Linnakangas


On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote:

2014-08-07 13:47 GMT+09:00 Fujii Masao masao.fu...@gmail.com:


On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 08/06/2014 08:39 PM, Fujii Masao wrote:

The WAL files that pg_receivexlog writes will not be re-read soon
basically,
so we can advise the OS to release any cached pages when WAL file is
closed. I feel inclined to change pg_receivexlog that way. Thought?



-1. The OS should be smart enough to not thrash the cache by files that

are

written sequentially and never read.



OS's buffer strategy is optimized for general situation. Do you forget OS
hackers discussion last a half of year?


Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
not,
so I was thinking that posix_fadvise is called when the server closes WAL
file.


That's right.


Well, I'd like to hear someone from the field complaining that 
pg_receivexlog is thrashing the cache and thus reducing the performance 
of some other process. Or a least a synthetic test case that 
demonstrates that happening.



By the way, does pg_receivexlog process have fsync() in every WAL commit?


It fsync's each file after finishing to write it. Ie. each WAL file is 
fsync'd once.



If yes, I think that we need no or less fsync() option for the better
performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data
integrarity.


Hmm. An fsync=off style option might make sense, although I doubt the 
one fsync at end of file is causing a performance problem for anyone in 
practice. Haven't heard any complaints, anyway.


An option to fsync after every commit record might make sense if you use 
pg_receivexlog with synchronous replication. Doing that would require 
parsing the WAL, though, to see where the commit records are. But then 
again, the fsync's wouldn't need to correspond to commit records. We 
could fsync just before we go to sleep to wait for more WAL to be received.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Wed, Aug 6, 2014 at 10:36 PM, Peter Geoghegan p...@heroku.com wrote:
 This *almost* applies to patched Postgres if you pick a benchmark that
 is very sympathetic to my patch. To my surprise, work_mem = '10MB'
 (which results in an external tape sort) is sometimes snapping at the
 heels of a work_mem = '5GB' setting (which results in an in-memory
 quicksort).

Note that this was with a default temp_tablespaces setting that wrote
temp files on my home partition/SSD. With a /dev/shm/ temp tablespace,
tape sort edges ahead, and has a couple of hundred milliseconds on
quicksort for this test case. It's actually faster.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-07 Thread furuyao

 Okay, applied the patch.
 
 I heavily modified your patch based on the master that the refactoring
 patch has been applied. Attached is that modified version. Could you
 review that?

Thank you for the patch.
I did a review of the patch. 

No problem in the patch. 

Behavior after the true return of ProcessXLogDataMsg was changed by the patch.
Although it was moving to while(1), it has changed so that a while(r != 0) loop 
may be continued.
Since still_sending is false, although skip processing is performed, a result 
of operation does not change.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Tatsuo Ishii

Fabien,

 I have not tested, but the patch looks ok in principle.

Thanks for the review. I have registered it to Aug Commit fest.
https://commitfest.postgresql.org/action/patch_view?id=1532

 I'm not sure of the variable name is_non_init_parameter_set. I would 
 suggest benchmarking_option_set?

Ok, I will replace the variable name as you suggested.

 Also, to be consistent, maybe it should check that no initialization-specific 
 option are set when benchmarking.

Good suggesition. Here is the v2 patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index c0e5e24..67d7960 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2520,6 +2520,9 @@ main(int argc, char **argv)
 	char	   *filename = NULL;
 	bool		scale_given = false;
 
+	bool		benchmarking_option_set = false;
+	bool		initializing_option_set = false;
+
 	CState	   *state;			/* status of clients */
 	TState	   *threads;		/* array of thread */
 
@@ -2599,12 +2602,14 @@ main(int argc, char **argv)
 break;
 			case 'S':
 ttype = 1;
+benchmarking_option_set = true;
 break;
 			case 'N':
 ttype = 2;
+benchmarking_option_set = true;
 break;
 			case 'c':
-nclients = atoi(optarg);
+benchmarking_option_set = true;
 if (nclients = 0 || nclients  MAXCLIENTS)
 {
 	fprintf(stderr, invalid number of clients: %d\n, nclients);
@@ -2629,6 +2634,7 @@ main(int argc, char **argv)
 #endif   /* HAVE_GETRLIMIT */
 break;
 			case 'j':			/* jobs */
+benchmarking_option_set = true;
 nthreads = atoi(optarg);
 if (nthreads = 0)
 {
@@ -2637,9 +2643,11 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
+benchmarking_option_set = true;
 is_connect = true;
 break;
 			case 'r':
+benchmarking_option_set = true;
 is_latencies = true;
 break;
 			case 's':
@@ -2652,6 +2660,7 @@ main(int argc, char **argv)
 }
 break;
 			case 't':
+benchmarking_option_set = true;
 if (duration  0)
 {
 	fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n);
@@ -2665,6 +2674,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'T':
+benchmarking_option_set = true;
 if (nxacts  0)
 {
 	fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n);
@@ -2681,12 +2691,14 @@ main(int argc, char **argv)
 login = pg_strdup(optarg);
 break;
 			case 'l':
+benchmarking_option_set = true;
 use_log = true;
 break;
 			case 'q':
 use_quiet = true;
 break;
 			case 'f':
+benchmarking_option_set = true;
 ttype = 3;
 filename = pg_strdup(optarg);
 if (process_file(filename) == false || *sql_files[num_files - 1] == NULL)
@@ -2696,6 +2708,8 @@ main(int argc, char **argv)
 {
 	char	   *p;
 
+	benchmarking_option_set = true;
+
 	if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 	{
 		fprintf(stderr, invalid variable definition: %s\n, optarg);
@@ -2708,6 +2722,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'F':
+initializing_option_set = true;
 fillfactor = atoi(optarg);
 if ((fillfactor  10) || (fillfactor  100))
 {
@@ -2716,6 +2731,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'M':
+benchmarking_option_set = true;
 if (num_files  0)
 {
 	fprintf(stderr, query mode (-M) should be specifiled before transaction scripts (-f)\n);
@@ -2731,6 +2747,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'P':
+benchmarking_option_set = true;
 progress = atoi(optarg);
 if (progress = 0)
 {
@@ -2745,6 +2762,8 @@ main(int argc, char **argv)
 	/* get a double from the beginning of option value */
 	double		throttle_value = atof(optarg);
 
+	benchmarking_option_set = true;
+
 	if (throttle_value = 0.0)
 	{
 		fprintf(stderr, invalid rate limit: %s\n, optarg);
@@ -2756,14 +2775,19 @@ main(int argc, char **argv)
 break;
 			case 0:
 /* This covers long options which take no argument. */
+if (foreign_keys || unlogged_tables)
+	initializing_option_set = true;
 break;
 			case 2:/* tablespace */
+initializing_option_set = true;
 tablespace = pg_strdup(optarg);
 break;
 			case 3:/* index-tablespace */
+initializing_option_set = true;
 index_tablespace = pg_strdup(optarg);
 break;
 			case 4:
+benchmarking_option_set = true;
 sample_rate = atof(optarg);
 if (sample_rate = 0.0 || sample_rate  1.0)
 {
@@ -2776,6 +2800,7 @@ main(int argc, char **argv)
 fprintf(stderr, --aggregate-interval is not currently supported on Windows);
 exit(1);
 #else
+benchmarking_option_set = true;
 agg_interval = atoi(optarg);
 if (agg_interval = 0)
 {
@@ -2808,9 +2833,23 @@ main(int

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Simon Riggs

On 6 August 2014 17:27, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:
 
  Well, for file-level backups we have:
 
  1) use file modtime (possibly inaccurate)
  2) use file modtime and checksums (heavy read load)
 
  For block-level backups we have:
 
  3) accumulate block numbers as WAL is written
  4) read previous WAL at incremental backup time
  5) read data page LSNs (high read load)
 
  The question is which of these do we want to implement?  #1 is very easy
  to implement, but incremental _file_ backups are larger than block-level
  backups.  If we have #5, would we ever want #2?  If we have #3, would we
  ever want #4 or #5?

 You may want to implement both #3 and #2. #3 would need a config
 switch to enable updating the bitmap. That would make it optional to
 incur the I/O cost of updating the bitmap. When the bitmap isn't
 there, the backup would use #2. Slow, but effective. If slowness is a
 problem for you, you enable the bitmap and do #3.

 Sounds reasonable IMO, and it means you can start by implementing #2.

 Well, Robert Haas had the idea of a separate process that accumulates
 the changed WAL block numbers, making it low overhead.  I question
 whether we need #2 just to handle cases where they didn't enable #3
 accounting earlier.  If that is the case, just do a full backup and
 enable #3.

Well, there is a huge difference between file-level and block-level backup.

Designing, writing and verifying block-level backup to the point that
it is acceptable is a huge effort. (Plus, I don't think accumulating
block numbers as they are written will be low overhead. Perhaps
there was a misunderstanding there and what is being suggested is to
accumulate file names that change as they are written, since we
already do that in the checkpointer process, which would be an option
between 2 and 3 on the above list).

What is being proposed here is file-level incremental backup that
works in a general way for various backup management tools. It's the
80/20 first step on the road. We get most of the benefit, it can be
delivered in this release as robust, verifiable code. Plus, that is
all we have budget for, a fairly critical consideration.

Big features need to be designed incrementally across multiple
releases, delivering incremental benefit (or at least that is what I
have learned). Yes, working block-level backup would be wonderful, but
if we hold out for that as the first step then we'll get nothing
anytime soon.

I would also point out that the more specific we make our backup
solution the less likely it is to integrate with external backup
providers. Oracle's RMAN requires specific support in external
software. 10 years after Postgres PITR we still see many vendors
showing PostgreSQL Backup Supported as meaning pg_dump only.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Craig Ringer

On 08/05/2014 10:44 PM, Shaun Thomas wrote:
 On 08/05/2014 12:56 AM, Mark Kirkwood wrote:
 
 The moral of the story for this case is that mapping Oracle to Postgres
 datatypes can require some careful thought. Using 'native' types (like
 integer, float8 etc) will generally give vastly quicker performance.
 
 We've seen a lot of this ourselves. Oracle's NUMERIC is a native type,
 whereas ours is emulated.

I'm not sure what you mean by native vs emulated here.

PostgreSQL's NUMERIC is binary-coded decimal with mathematical
operations performed in software.

According to the docs, my impression is that Oracle's NUMBER is stored
more like a decfloat:

http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i22289

but my Oracle expertise is admittedly lacking.


New Intel hardware supports IEEE 754:2008 decimal floating point in
hardware, and I'm quite interested in implementing DECFLOAT(n) for
PostgreSQL to take advantage of that.

A DECFLOAT type would also be more compatible with things like the C#
Decimal type than our current NUMERIC is.

 At least you used INT though. I've seen too many Oracle shops using
 NUMERIC in PostgreSQL because it's there, and suffering major
 performance hits because of it.

In retrospect it might be a bit of a loss that the numeric type format
doesn't reserve a couple of bits for short-value flags, so we could
store and work with native integers for common values.

There's NumericShort and NumericLong, but no NumericNative or
NumericInt32 or whatever. OTOH, by the time you handle alignment and
padding requirements and the cost of deciding which numeric format the
input is, it might not've been much faster. Presumably it was looked at
during the introduction of NumericShort.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wraparound limits

2014-08-07 Thread Teodor Sigaev


Hi!

I have a questions about setting transaction's wraparound limits. Function 
SetTransactionIdLimit() in access/transam/varsup.c:


1)
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId  1);
if (xidWrapLimit  FirstNormalTransactionId)
xidWrapLimit += FirstNormalTransactionId;

Isn't  it a problem if oldest_datfrozenxid  MaxTransactionId/2?

2)
xidStopLimit = xidWrapLimit - 100;
if (xidStopLimit  FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;

xidWarnLimit = xidStopLimit - 1000;
if (xidWarnLimit  FirstNormalTransactionId)
xidWarnLimit -= FirstNormalTransactionId;

Why does it use '-' instead of '+' if variable  FirstNormalTransactionId? In 
this case it is easy to get xidStopLimit  xidWrapLimit or xidWarnLimit  
xidStopLimit...



Thank you.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [GSoC] kmedoids status report

2014-08-07 Thread Maxence Ahlouche

Hi!

Here is a report of what has been discussed yesterday on IRC.

The kmedoids module now seems to work correctly on basic datasets. I've
also implemented its variants with different seeding methods: random
initial medoids, and initial medoids distributed among the points (similar
to kmeans++ [0]).

The next steps are:

   - Making better tests (1-2d)
   - Writing the documentation (1d)
   - Adapting my code to GP and HAWQ -- btw, are default parameters now
   available in GP and HAWQ? (1-2d)
   - Refactoring kmedoids and kmeans, as there is code duplication between
   those two.
   For this step, I don't know if I'll have time to create a clustering
   module, and make kmeans and kmedoids submodules of it. If yes, then it's
   perfect; otherwise, I'll just rename the common functions in kmeans, and
   have kmedoids call them from there.

Hai also helped me setup (once more) the VM where GreenPlum and HAWQ are
installed, so that I can test my code on these DBMS.

As a reminder, I'm supposed to stop coding next Monday, and then the last
week is dedicated to documentation, tests, refactoring and polishing.

Regards,

Maxence

[0] https://en.wikipedia.org/wiki/K-means%2B%2B

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Ants Aasma

On Thu, Aug 7, 2014 at 4:15 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 I'm thinking about adding a new message type in the protocol that gets
 sent immediately before CommandComplete, containing the LSN of the
 commit. Clients would need to enable the sending of this message with a
 GUC that they set when they connect, so it doesn't confuse clients that
 aren't expecting it or aware of it.

 Is this something you can see being useful for other non-BDR purposes?

I have been thinking about something similar.

For regular streaming replication you could keep track of the commit
LSN on a per client basis and automatically redirect read queries to a
standby if standby apply location is larger than the commit LSN of
this particular client. This can be done mostly transparently for the
application while not running into the issue that recent modifications
disappear for a while after commit.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 5:02 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote:

 2014-08-07 13:47 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 08/06/2014 08:39 PM, Fujii Masao wrote:

 The WAL files that pg_receivexlog writes will not be re-read soon
 basically,
 so we can advise the OS to release any cached pages when WAL file is
 closed. I feel inclined to change pg_receivexlog that way. Thought?



 -1. The OS should be smart enough to not thrash the cache by files that

 are

 written sequentially and never read.


 OS's buffer strategy is optimized for general situation. Do you forget OS
 hackers discussion last a half of year?

 Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
 not,
 so I was thinking that posix_fadvise is called when the server closes WAL
 file.


 That's right.


 Well, I'd like to hear someone from the field complaining that
 pg_receivexlog is thrashing the cache and thus reducing the performance of
 some other process. Or a least a synthetic test case that demonstrates that
 happening.

Yeah, I will test that by seeing the performance of PostgreSQL which is
running in the same server as pg_receivexlog is running. We can just
compare that performance with normal pg_receivexlog and that with
the patched one (i.e., posix_fadvise is called).



 By the way, does pg_receivexlog process have fsync() in every WAL commit?


 It fsync's each file after finishing to write it. Ie. each WAL file is
 fsync'd once.


 If yes, I think that we need no or less fsync() option for the better
 performance. It is general in NOSQL storages.
 If no, we need fsync() option for more getting reliability and data
 integrarity.


 Hmm. An fsync=off style option might make sense, although I doubt the one
 fsync at end of file is causing a performance problem for anyone in
 practice. Haven't heard any complaints, anyway.

 An option to fsync after every commit record might make sense if you use
 pg_receivexlog with synchronous replication. Doing that would require
 parsing the WAL, though, to see where the commit records are. But then
 again, the fsync's wouldn't need to correspond to commit records. We could
 fsync just before we go to sleep to wait for more WAL to be received.

That's what Furuya-san proposed in last CommitFest.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:20 AM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
 On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
  On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
  On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
  
   On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
   Thinking some more, there seems like this whole store-multiple-LSNs
   thing is too much. We can still do block-level incrementals just by
   using a single LSN as the reference point. We'd still need a complex
   file format and a complex file reconstruction program, so I think that
   is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
  Yes, that's the approach taken by pg_rman for its block-level
  incremental backup. Btw, I don't think that the CPU cost to scan all
  the relation files added to the one to rebuild the backups is worth
  doing it on large instances. File-level backup would cover most of the
 
  Well, if you scan the WAL files from the previous backup, that will tell
  you what pages that need incremental backup.

 That would require you to store that WAL, which is something we hope
 to avoid. Plus if you did store it, you'd need to retrieve it from
 long term storage, which is what we hope to avoid.

 Well, for file-level backups we have:

 1) use file modtime (possibly inaccurate)
 2) use file modtime and checksums (heavy read load)

 For block-level backups we have:

 3) accumulate block numbers as WAL is written
 4) read previous WAL at incremental backup time
 5) read data page LSNs (high read load)

 The question is which of these do we want to implement?

There are some data which don't have LSN, for example, postgresql.conf.
When such data has been modified since last backup, they also need to
be included in incremental backup? Probably yes. So implementing only
block-level backup seems not complete solution. It needs file-level backup as
an infrastructure for such data. This makes me think that it's more reasonable
to implement file-level backup first.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Michael Paquier

On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
 There are some data which don't have LSN, for example, postgresql.conf.
 When such data has been modified since last backup, they also need to
 be included in incremental backup? Probably yes.
Definitely yes. That's as well the case of paths like pg_clog,
pg_subtrans and pg_twophase.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-08-07 Thread Amit Khandekar

On 21 June 2014 23:36, Kevin Grittner kgri...@ymail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 I didn't change the tuplestores to TID because it seemed to me that
 it would preclude using transition relations with FDW triggers, and
 it seemed bad not to support that.  Does anyone see a way around
 that, or feel that it's OK to not support FDW triggers in this
 regard?

I think it is ok to use tuplestores for now, but as mentioned by you
somewhere else in the thread, later on we should change this to using
tids as an optimization.


 Does this look good otherwise, as far as it goes?

I didn't yet extensively go through the patch, but before that, just a
few quick comments:

I see that the tupelstores for transition tables are stored per query
depth. If the
DML involves a table that has multiple child tables, it seems as though
a single tuplestore would be shared among all these tables. That means
if we define
such triggers using transition table clause for all the child tables, then
the trigger function for a child table will see tuples from other child tables
as well. Is that true ? If it is, it does not make sense.
For fdw tuplestore, this issue does not arise because the DML won't involve
multiple target tables I suppose.

---

I tried to google some SQLs that use REFERENCING clause with triggers.
It looks like in some database systems, even the WHEN clause of CREATE TRIGGER
can refer to a transition table, just like how it refers to NEW and
OLD row variables.

For e.g. :
CREATE TRIGGER notify_dept
  AFTER UPDATE ON weather
  REFERENCING NEW_TABLE AS N_TABLE
  NEW AS N_ROW
  FOR EACH ROW
  WHEN ((SELECT AVG (temperature) FROM N_TABLE)  10)
  BEGIN
notify_department(N_ROW.temperature, N_ROW.city);
  END

Above, it is used to get an aggregate value of all the changed rows. I think
we do not currently support aggregate expressions in the where clause, but with
transition tables, it makes more sense to support it later if not now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-08-07 Thread Michael Paquier

On Thu, May 8, 2014 at 10:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 Well, we have to live with it for now :)
I just had a look at the first patch and got some comments:
1) Instead of using an assertion here, wouldn't it be better to error
out if name is NULL, and truncate the name if it is longer than
SHMEM_INDEX_KEYSIZE - 1 (including '\0')?
scanstr in scansup.c?
Assert(IsUnderPostmaster);
+   Assert(name != NULL  strlen(name)  0 
+  strlen(name)  SHMEM_INDEX_KEYSIZE - 1);
2) The addition of a field to track the size of a dsm should be
explicitly mentioned, this is useful for the 2nd patch.
3) The refactoring done in dsm_create to find an unused slot should be
done as a separate patch for clarity.
4) Using '\0' here would be more adapted:
+   item-name[SHMEM_INDEX_KEYSIZE - 1] = 0;

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound limits

2014-08-07 Thread Heikki Linnakangas


On 08/07/2014 01:34 PM, Teodor Sigaev wrote:

Hi!

I have a questions about setting transaction's wraparound limits. Function
SetTransactionIdLimit() in access/transam/varsup.c:

1)
  xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId  1);
  if (xidWrapLimit  FirstNormalTransactionId)
  xidWrapLimit += FirstNormalTransactionId;

Isn't  it a problem if oldest_datfrozenxid  MaxTransactionId/2?


Don't think so. What problem do you see?


2)
  xidStopLimit = xidWrapLimit - 100;
  if (xidStopLimit  FirstNormalTransactionId)
  xidStopLimit -= FirstNormalTransactionId;

  xidWarnLimit = xidStopLimit - 1000;
  if (xidWarnLimit  FirstNormalTransactionId)
  xidWarnLimit -= FirstNormalTransactionId;

Why does it use '-' instead of '+' if variable  FirstNormalTransactionId? In
this case it is easy to get xidStopLimit  xidWrapLimit or xidWarnLimit 
xidStopLimit...


Remember that the limits are compared with xids using wrap-around aware 
functions TransactionIdPrecedes and TransactionidFollows. Not regular  
and . The  checks above are just to check if the XID hit one of the 
special TransactionIds, and if so, increase/decrese it to get back to 
the normal range.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled

2014-08-07 Thread Fujii Masao

On Fri, May 23, 2014 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU y.hayam...@gmail.com wrote:
 The failing assertion is for prohibiting memory allocation in a critical 
 section, which is introduced by commit 4a170ee9 on 2014-04-04.

This issue is still in open item list for 9.4. But the commit which had
caused this issue was reverted by d27d493. So I removed this from the
Open Item list.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning performance: cache stringToNode() of pg_constraint.ccbin

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 9:35 PM, Bruce Momjian br...@momjian.us wrote:
 On Sun, Jan 12, 2014 at 12:53:40PM -0500, Noah Misch wrote:
 On Sat, Jan 11, 2014 at 02:10:01PM -0500, Bruce Momjian wrote:
  On Mon, Jun  3, 2013 at 03:07:27PM -0400, Noah Misch wrote:
   A colleague, Korry Douglas, observed a table partitioning scenario where
   deserializing pg_constraint.ccbin is a hot spot.  The following test 
   case, a
   simplification of a typical partitioning setup, spends 28% of its time in
   stringToNode() and callees thereof:
 
  Noah, what is the status on this?

 Further study revealed a defect in the patch's memory management, and I have
 not gotten around to correcting that.

 I talked to Noah and he can't continue on this item.  Can someone else
 work on it?

Well, it would be helpful if he could describe the defect he found, so
that the next person doesn't have to guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 7:20 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  I think you missed one of the regression tests, see attached

 Woops.  Thanks, committed.

 Thanks.

 It needs to go into 9_4_STABLE as well.

Sheesh, where is my brain?  Done, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
nicolas.barb...@gmail.com wrote:
 2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

 +1 for blockfilter, because it stresses the fact that the physical
 arrangement of rows in blocks matters for this index.

I don't like that quite as well as summary, but I'd prefer either to
the current naming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Append to a GUC parameter ?

2014-08-07 Thread Alvaro Herrera

Fabrízio de Royes Mello wrote:
 On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Josh Berkus j...@agliodbs.com writes:
   BTW, while there's unlikely to be a good reason to put search_path in
   pg.conf with appends, there are a LOT of reasons to want to be able to
   append to it during a session.
 
  [shrug...]  You can do that today with current_setting()/set_config().
 
 With a very simple statement you can do that:

Of course, this doesn't solve the requirement that started this thread,
which is about having includeable pg.conf fragments to enable
extensions.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Simon Riggs

On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:
 2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

 +1 for blockfilter, because it stresses the fact that the physical
 arrangement of rows in blocks matters for this index.

 I don't like that quite as well as summary, but I'd prefer either to
 the current naming.

Yes, summary index isn't good. I'm not sure where the block or the
filter part comes in though, so -1 to block filter, not least
because it doesn't have a good abbreviation (bfin??).

A better description would be block range index since we are
indexing a range of blocks (not just one block). Perhaps a better one
would be simply range index, which we could abbreviate to RIN or
BRIN.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Claudio Freire

On Thu, Aug 7, 2014 at 11:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:
 2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

 +1 for blockfilter, because it stresses the fact that the physical
 arrangement of rows in blocks matters for this index.

 I don't like that quite as well as summary, but I'd prefer either to
 the current naming.

 Yes, summary index isn't good. I'm not sure where the block or the
 filter part comes in though, so -1 to block filter, not least
 because it doesn't have a good abbreviation (bfin??).

Block filter would refer to the index property that selects blocks,
not tuples, and it does so through a filter function (for min-max,
it's a range check, but for other opclasses it could be anything).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-08-07 Thread Kevin Grittner

Thanks for your review and comments, Amit!

Amit Khandekar amit.khande...@enterprisedb.com wrote:
 On 21 June 2014 23:36, Kevin Grittner kgri...@ymail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 I didn't change the tuplestores to TID because it seemed to me that
 it would preclude using transition relations with FDW triggers, and
 it seemed bad not to support that.  Does anyone see a way around
 that, or feel that it's OK to not support FDW triggers in this
 regard?

 I think it is ok to use tuplestores for now, but as mentioned by you
 somewhere else in the thread, later on we should change this to using
 tids as an optimization.

Well, the optimization would probably be to use a tuplestore of 
tids referencing modified tuples in the base table, rather than a 
tuplestore of the data itself.  But I think we're in agreement.

 Does this look good otherwise, as far as it goes?

 I didn't yet extensively go through the patch, but before that, just a
 few quick comments:

 I see that the tupelstores for transition tables are stored per query
 depth. If the
 DML involves a table that has multiple child tables, it seems as though
 a single tuplestore would be shared among all these tables. That means
 if we define
 such triggers using transition table clause for all the child tables, then
 the trigger function for a child table will see tuples from other child tables
 as well. Is that true ?

I don't think so.  I will make a note of the concern to confirm by testing.

 If it is, it does not make sense.

I agree.

 I tried to google some SQLs that use REFERENCING clause with triggers.
 It looks like in some database systems, even the WHEN clause of CREATE TRIGGER
 can refer to a transition table, just like how it refers to NEW and
 OLD row variables.

 For e.g. :
 CREATE TRIGGER notify_dept
   AFTER UPDATE ON weather
   REFERENCING NEW_TABLE AS N_TABLE
   NEW AS N_ROW
   FOR EACH ROW
   WHEN ((SELECT AVG (temperature) FROM N_TABLE)  10)
   BEGIN
 notify_department(N_ROW.temperature, N_ROW.city);
   END

 Above, it is used to get an aggregate value of all the changed rows. I think
 we do not currently support aggregate expressions in the where clause, but 
 with
 transition tables, it makes more sense to support it later if not now.

Interesting point; I had not thought about that.  Will see if I can 
include support for that in the patch for the next CF; failing 
that; I will at least be careful to not paint myself into a corner 
where it is unduly hard to do later.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication commands and log_statements

2014-08-07 Thread Fujii Masao

On Wed, Jul 2, 2014 at 4:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 Hi.

 Do we have any consensus about what to do with these two patches?

 1. Introduce a log_replication_command setting.
 2. Change log_statement to be a list of tokens.

 If I understand correctly, there weren't any strong objections to the
 former, but the situation is less clear when it comes to the second.

On second thought, I'd like to propose the third option. This is the same idea
as Andres suggested upthread. That is, we log replication commands only
when log_statement is set to all. Neither new parameter is introduced nor
log_statement is redefined as a list.

Firstly, since log_statement=all means that all statements are logged, it's
basically natural and intuitive to log even replication commands in that
setting. Secondly, any statements except DDL and data-modifying statements,
e.g., VACUUM, CHECKPOINT, BEGIN, ..etc, are logged only when log_statement=all.
So even if some users want to control the logging of DDL and maintenance
commands orthogonally, which is impossible for now. Therefore, even if the
logging of replication commands which are neither DDL nor data-modifying
statements also cannot be controlled orthogonally, I don't think that users
get so surprised. Of course I have no objection to address the issue by, e.g.,
enabling such orthogonal logging control, in the future, though.

Thought? What about introducing that simple but not so confusing change first?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Merlin Moncure

On Thu, Aug 7, 2014 at 5:12 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 New Intel hardware supports IEEE 754:2008 decimal floating point in
 hardware, and I'm quite interested in implementing DECFLOAT(n) for
 PostgreSQL to take advantage of that.

+1

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Alvaro Herrera

Simon Riggs wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
  On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
  nicolas.barb...@gmail.com wrote:
  2014-08-06 Claudio Freire klaussfre...@gmail.com:
 
  So, I like blockfilter a lot. I change my vote to blockfilter ;)
 
  +1 for blockfilter, because it stresses the fact that the physical
  arrangement of rows in blocks matters for this index.
 
  I don't like that quite as well as summary, but I'd prefer either to
  the current naming.
 
 Yes, summary index isn't good. I'm not sure where the block or the
 filter part comes in though, so -1 to block filter, not least
 because it doesn't have a good abbreviation (bfin??).

I was thinking just blockfilter (I did show a sample command).
Claudio explained the name downthread; personally, of all the options
suggested thus far, it's the one I like the most (including minmax).

At this point, the naming issue is what is keeping me from committing
this patch, so the quicker we can solve it, the merrier.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 10:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:
 2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

 +1 for blockfilter, because it stresses the fact that the physical
 arrangement of rows in blocks matters for this index.

 I don't like that quite as well as summary, but I'd prefer either to
 the current naming.

 Yes, summary index isn't good. I'm not sure where the block or the
 filter part comes in though, so -1 to block filter, not least
 because it doesn't have a good abbreviation (bfin??).

 A better description would be block range index since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply range index, which we could abbreviate to RIN or
 BRIN.

range index might get confused with range types; block range index
seems better.  I like summary, but I'm fine with block range index or
block filter index, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas

On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan p...@heroku.com wrote:
 The adversarial method works for almost any polymorphic program
 recognizable as quicksort. The subject quicksort may copy values at
 will, or work with lists rather than arrays. It may even pick the
 pivot at random. The quicksort will be vulnerable provided only that
 it satisfies some mild assumptions that are met by every
 implementation I have seen.

 IMHO, while worst case performance is a very useful tool for analyzing
 algorithms (particularly their worst case time complexity), a worst
 case should be put in its practical context. For example, if we had
 reason to be concerned about *adversarial* inputs, I think that there
 is a good chance that our qsort() actually would be problematic to the
 point of driving us to prefer some generally slower alternative.

I completely agree with this, and I think everyone else does, too.
Where we don't all necessarily agree is which worst cases are
realistic and which worst cases are simply pathological cases with
which we need not be concerned in practice.  For example, when I
proposed the patch to use MVCC catalog snapshots, Andres invented a
test case which I thought was far more brutal than anything likely to
be encountered in the real world.  But Andres didn't agree; he thought
the test case was realistic.  So, I worked on the patch some more and
came up with a solution that performs acceptably even if those brutal
conditions are encountered in the world.  As a result, the patch as
committed is better than the one originally submitted.  I could
certainly have spent more time debating whether that effort was
worthwhile, and maybe I would have won the argument, but it was a
better of use of that time to instead try to improve the patch.

So here.  You may not agree that the mitigation strategies for which
others are asking for are worthwhile, but you can't expect everyone
else to agree with your assessment of which cases are likely to occur
in practice.  The case of a cohort of strings to be sorted which share
a long fixed prefix and have different stuff at the end does not seem
particularly pathological to me.  It doesn't, in other words, require
an adversary: some real-world data sets will look like that.  I will
forebear repeating examples I've given before, but I've seen that kind
of thing more than once in real data sets that people (well, me,
anyway) actually wanted to put into a PostgreSQL database.  So I'm
personally of the opinion that the time you've put into trying to
protect against those cases is worthwhile.  I understand that you may
disagree with that, and that's fine: we're not all going to agree on
everything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Fabien COELHO



Hello Tatsuo-san,


Thanks for the review. I have registered it to Aug Commit fest.
https://commitfest.postgresql.org/action/patch_view?id=1532

I'm not sure of the variable name is_non_init_parameter_set. I would 
suggest benchmarking_option_set?


Ok, I will replace the variable name as you suggested.


Also, to be consistent, maybe it should check that no initialization-specific 
option are set when benchmarking.


Good suggesition. Here is the v2 patch.


I applied it without problem and tested it.


* It seems that -c is ignored, the atoi() line has been removed.

* Option -q is initialization specific, but not detected as such like the 
other, although there is a specific detection later. I think that it would 
be better to use the systematic approach, and to remove the specific 
check.


* I would name the second boolean initialization_option_set, as it is 
describe like that in the documentation.



* I would suggest the following error messages:
 some options cannot be used in initialization (-i) mode\n and
 some options cannot be used in benchmarking mode\n.
Although these messages are rough, I think that they are enough and avoid 
running something unexpected, which is your purpose.



Find attached a patch which adds these changes to your current version.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 67d7960..2f7d80e 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2521,7 +2521,7 @@ main(int argc, char **argv)
 	bool		scale_given = false;
 
 	bool		benchmarking_option_set = false;
-	bool		initializing_option_set = false;
+	bool		initialization_option_set = false;
 
 	CState	   *state;			/* status of clients */
 	TState	   *threads;		/* array of thread */
@@ -2610,6 +2610,7 @@ main(int argc, char **argv)
 break;
 			case 'c':
 benchmarking_option_set = true;
+nclients = atoi(optarg);
 if (nclients = 0 || nclients  MAXCLIENTS)
 {
 	fprintf(stderr, invalid number of clients: %d\n, nclients);
@@ -2695,6 +2696,7 @@ main(int argc, char **argv)
 use_log = true;
 break;
 			case 'q':
+initialization_option_set = true;
 use_quiet = true;
 break;
 			case 'f':
@@ -2722,7 +2724,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'F':
-initializing_option_set = true;
+initialization_option_set = true;
 fillfactor = atoi(optarg);
 if ((fillfactor  10) || (fillfactor  100))
 {
@@ -2776,14 +2778,14 @@ main(int argc, char **argv)
 			case 0:
 /* This covers long options which take no argument. */
 if (foreign_keys || unlogged_tables)
-	initializing_option_set = true;
+	initialization_option_set = true;
 break;
 			case 2:/* tablespace */
-initializing_option_set = true;
+initialization_option_set = true;
 tablespace = pg_strdup(optarg);
 break;
 			case 3:/* index-tablespace */
-initializing_option_set = true;
+initialization_option_set = true;
 index_tablespace = pg_strdup(optarg);
 break;
 			case 4:
@@ -2835,7 +2837,7 @@ main(int argc, char **argv)
 	{
 		if (benchmarking_option_set)
 		{
-			fprintf(stderr, some parameters cannot be used in initializing mode\n);
+			fprintf(stderr, some options cannot be used in initialization (-i) mode\n);
 			exit(1);
 		}
 
@@ -2844,9 +2846,9 @@ main(int argc, char **argv)
 	}
 	else
 	{
-		if (initializing_option_set)
+		if (initialization_option_set)
 		{
-			fprintf(stderr, some parameters cannot be used in benchmarking mode\n);
+			fprintf(stderr, some options cannot be used in benchmarking mode\n);
 			exit(1);
 		}
 	}
@@ -2868,13 +2870,6 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	/* -q may be used only with -i */
-	if (use_quiet  !is_init_mode)
-	{
-		fprintf(stderr, quiet-logging is allowed only in initialization mode (-i)\n);
-		exit(1);
-	}
-
 	/* --sampling-rate may must not be used with --aggregate-interval */
 	if (sample_rate  0.0  agg_interval  0)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian

On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
 On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
  There are some data which don't have LSN, for example, postgresql.conf.
  When such data has been modified since last backup, they also need to
  be included in incremental backup? Probably yes.
 Definitely yes. That's as well the case of paths like pg_clog,
 pg_subtrans and pg_twophase.

I assumed these would be unconditionally backed up during an incremental
backup because they relatively small and you don't want to make a mistake.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian

On Thu, Aug  7, 2014 at 11:03:40AM +0100, Simon Riggs wrote:
 Well, there is a huge difference between file-level and block-level backup.
 
 Designing, writing and verifying block-level backup to the point that
 it is acceptable is a huge effort. (Plus, I don't think accumulating
 block numbers as they are written will be low overhead. Perhaps
 there was a misunderstanding there and what is being suggested is to
 accumulate file names that change as they are written, since we
 already do that in the checkpointer process, which would be an option
 between 2 and 3 on the above list).
 
 What is being proposed here is file-level incremental backup that
 works in a general way for various backup management tools. It's the
 80/20 first step on the road. We get most of the benefit, it can be
 delivered in this release as robust, verifiable code. Plus, that is
 all we have budget for, a fairly critical consideration.
 
 Big features need to be designed incrementally across multiple
 releases, delivering incremental benefit (or at least that is what I
 have learned). Yes, working block-level backup would be wonderful, but
 if we hold out for that as the first step then we'll get nothing
 anytime soon.

That is fine.  I just wanted to point out that as features are added,
file-level incremental backups might not be useful.  In fact, I think
there are a lot of users for which file-level incremental backups will
never be useful, i.e. you have to have a lot of frozen/static data for
file-level incremental backups to be useful.  

I am a little worried that many users will not realize this until they
try it and are disappointed, e.g. Why is PG writing to my static data
so often? --- then we get beaten up about our hint bits and freezing
behavior.  :-(

I am just trying to set realistic expectations.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Oleg Bartunov

+1 for BRIN !

On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:
 2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

 +1 for blockfilter, because it stresses the fact that the physical
 arrangement of rows in blocks matters for this index.

 I don't like that quite as well as summary, but I'd prefer either to
 the current naming.

 Yes, summary index isn't good. I'm not sure where the block or the
 filter part comes in though, so -1 to block filter, not least
 because it doesn't have a good abbreviation (bfin??).

 A better description would be block range index since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply range index, which we could abbreviate to RIN or
 BRIN.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 9:15 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 To support transparent client-side failover in BDR, it's necessary to
 know what the LSN of a node was at the time a transaction committed and
 keep track of that in the client/proxy.

I doubt whether it makes sense to do this without a broader
understanding of how the client-side failover mechanism would work.
If we're going to add something like this, it should include libpq
support for actually doing something useful with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini

Il 07/08/14 17:29, Bruce Momjian ha scritto:
 I am a little worried that many users will not realize this until they
 try it and are disappointed, e.g. Why is PG writing to my static data
 so often? --- then we get beaten up about our hint bits and freezing
 behavior.  :-(
 
 I am just trying to set realistic expectations.
 

Our experience is that for big databases (size over about 50GB) the
file-level approach is often enough to halve the size of the backup.

Users which run Postgres as Data Warehouse surely will benefit from it,
so we could present it as a DWH oriented feature.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini

Il 07/08/14 17:25, Bruce Momjian ha scritto:
 On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
 On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
 There are some data which don't have LSN, for example, postgresql.conf.
 When such data has been modified since last backup, they also need to
 be included in incremental backup? Probably yes.
 Definitely yes. That's as well the case of paths like pg_clog,
 pg_subtrans and pg_twophase.
 
 I assumed these would be unconditionally backed up during an incremental
 backup because they relatively small and you don't want to make a mistake.
 

You could decide to always copy files which doesn't have LSN, but you
don't know what the user could put inside PGDATA. I would avoid any
assumption on files which are not owned by Postgres.

With the current full backup procedure they are backed up, so I think
that having them backed up with a rsync-like algorithm is what an user
would expect for an incremental backup.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] replication commands and log_statements

2014-08-07 Thread Abhijit Menon-Sen

At 2014-08-07 23:22:43 +0900, masao.fu...@gmail.com wrote:

 That is, we log replication commands only when log_statement is set to
 all. Neither new parameter is introduced nor log_statement is
 redefined as a list.

That sounds good to me.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread John Cochran

On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan p...@heroku.com wrote:
  The adversarial method works for almost any polymorphic program
  recognizable as quicksort. The subject quicksort may copy values at
  will, or work with lists rather than arrays. It may even pick the
  pivot at random. The quicksort will be vulnerable provided only that
  it satisfies some mild assumptions that are met by every
  implementation I have seen.
 
  IMHO, while worst case performance is a very useful tool for analyzing
  algorithms (particularly their worst case time complexity), a worst
  case should be put in its practical context. For example, if we had
  reason to be concerned about *adversarial* inputs, I think that there
  is a good chance that our qsort() actually would be problematic to the
  point of driving us to prefer some generally slower alternative.


If one is concerned about adversarial inputs, then as mentioned earlier,
two main steps need to be taken.
1. Don't permit potential adversaries define the comparison function used
by qsort().
2. Perform random selection of pivot to prevent precomputed lists of data
that invoke quadratic behavior in qsort.

And before the argument that a random pivot will be less efficient than the
median of median that's currently being used, you need to look at what is
currently being used.
1. If a small partition is being sorted, the element at n/2 is selected
as the pivot with n being the size of the partition.
2. If a medium partition is being sorted, then pivot selected is the
median of the elements at 0, n/2, and n.
3. And finally, if a large partition is being sorted, potential pivots
are selected from 0, n/8, n/4, 3n/8, n/2, 5n/8, 3n/4, 7n/8, and n.  Of
those 9 candidates, the median of medians is selected as the pivot.

There is nothing in the world that would prevent the following.
1. Short partition - select a pivot at random.
2. Medium partition - select the median of 3 randomly selected candidates.
3. Large partition - select the median of medians of 9 randomly selected
candidates.

Using the above random pivot selection methods, the performance of a
hardened qsort should be close to that of an unhardened qsort. The only
real difference is the cost of generating random numbers to select the
candidates for the median tests. However, if an malicious compare function
is permitted to be defined, it would still be vulnerable to that attack,
but it would be pretty much immune to a precomputed data attack.

Or perhaps it just might be simpler to detect an attack and fall back to a
sort algorithm that doesn't have quadratic behavior.

What I mean by that is the qsort in the Engineering a Sort Function has a
big O of approximately 1.1 n ln n comparisons. If upon starting qsort, it
computes an upper bound of comparisons it's willing to perform. Say 3 n ln
n or so (1.386 n log n is the estimate for a randomly selected pivot). Then
if the upper bound is reached, the qsort function backs out leaving the
array in a partially sorted order, and then calls a slower sort that
doesn't exhibit quadratic behavior to actually finish the sort.

The result of doing that would be most, if not all, sorts being handled by
qsort in a timely fashion. But if bad luck or an adversary strikes, the
qsort bails out after things look bad and passes the task to a different
sort. I'll be the first to admit that the hybrid approach would be slower
in those bad cases than it would be if the alternate sort was selected at
the beginning, but it would be faster than letting qsort continue with
quadratic behavior.

-- 

There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies and the other way is to
make it so complicated that there are no obvious deficiencies. — C.A.R.
Hoare

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas robertmh...@gmail.com wrote:
 So here.  You may not agree that the mitigation strategies for which
 others are asking for are worthwhile, but you can't expect everyone
 else to agree with your assessment of which cases are likely to occur
 in practice.  The case of a cohort of strings to be sorted which share
 a long fixed prefix and have different stuff at the end does not seem
 particularly pathological to me.  It doesn't, in other words, require
 an adversary: some real-world data sets will look like that.  I will
 forebear repeating examples I've given before, but I've seen that kind
 of thing more than once in real data sets that people (well, me,
 anyway) actually wanted to put into a PostgreSQL database.  So I'm
 personally of the opinion that the time you've put into trying to
 protect against those cases is worthwhile.  I understand that you may
 disagree with that, and that's fine: we're not all going to agree on
 everything.

I actually agree with you here. Sorting text is important, so we
should spend a lot of time avoiding regressions. Your example is
reasonable - I believe that people do that to a non-trivial degree.
The fact is, I probably can greatly ameliorate that case. However, to
give an example of a case I have less sympathy for, I'll name the case
you mention, *plus* the strings are already in logical order, and so
our qsort() can (dubiously) take advantage of that, so that there is a
1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
There might be other cases like that that crop up.

I also thought the paper was pretty cool, and thought it might be
interesting because of the fact that is was authored by McIlroy, and
he refers to weaknesses in his and our very own implementation
specifically.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Gabriele Bartolini

Hi Marco,

 With the current full backup procedure they are backed up, so I think
 that having them backed up with a rsync-like algorithm is what an user
 would expect for an incremental backup.

Exactly. I think a simple, flexible and robust method for file based
incremental backup is all we need. I am confident it could be done for
9.5.

I would like to quote every single word Simon said. Block level
incremental backup (with Robert's proposal) is definitely the ultimate
goal for effective and efficient physical backups. I see file level
incremental backup as a very good compromise, a sort of intermediate
release which could nonetheless produce a lot of benefits to our user
base, for years to come too.

Thanks,
Gabriele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Nicolas Barbier

2014-08-07 Oleg Bartunov obartu...@gmail.com:

 +1 for BRIN !

+1, rolls off the tongue smoothly and captures the essence :-).

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Petr Jelinek


On 07/08/14 16:16, Simon Riggs wrote:


A better description would be block range index since we are
indexing a range of blocks (not just one block). Perhaps a better one
would be simply range index, which we could abbreviate to RIN or
BRIN.



+1 for block range index (BRIN)

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas robertmh...@gmail.com wrote:
 So here.  You may not agree that the mitigation strategies for which
 others are asking for are worthwhile, but you can't expect everyone
 else to agree with your assessment of which cases are likely to occur
 in practice.  The case of a cohort of strings to be sorted which share
 a long fixed prefix and have different stuff at the end does not seem
 particularly pathological to me.  It doesn't, in other words, require
 an adversary: some real-world data sets will look like that.  I will
 forebear repeating examples I've given before, but I've seen that kind
 of thing more than once in real data sets that people (well, me,
 anyway) actually wanted to put into a PostgreSQL database.  So I'm
 personally of the opinion that the time you've put into trying to
 protect against those cases is worthwhile.  I understand that you may
 disagree with that, and that's fine: we're not all going to agree on
 everything.

 I actually agree with you here.

/me pops cork.

 Sorting text is important, so we
 should spend a lot of time avoiding regressions. Your example is
 reasonable - I believe that people do that to a non-trivial degree.
 The fact is, I probably can greatly ameliorate that case. However, to
 give an example of a case I have less sympathy for, I'll name the case
 you mention, *plus* the strings are already in logical order, and so
 our qsort() can (dubiously) take advantage of that, so that there is a
 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
 There might be other cases like that that crop up.

I think that's actually not a very unrealistic case at all.  In
general, I think that if a particular data distribution is a
reasonable scenario, that data distribution plus it's already sorted
is also reasonable.  Data gets presorted in all kinds of ways:
sometimes it gets loaded in sorted (or nearly-sorted) order from
another data source; sometimes we do an index scan that produces data
in order by column A and then later sort by A, B; sometimes somebody
clusters the table.  Respecting the right of other people to have
different opinions, I don't think I'll ever be prepared to concede the
presorted case as either uncommon or unimportant.

That's not to prejudge anything that may or may not be in your patch,
which I have not studied in enormous detail.  It's just what I think
about the subject in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-08-07 Thread Bruce Momjian

On Tue, Aug  5, 2014 at 07:31:21PM -0400, Bruce Momjian wrote:
 On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote:
  On Thu, Jul 10, 2014 at 06:17:14PM -0400, Bruce Momjian wrote:
   Well, we are going to need to call internal C functions, often bypassing
   their typical call sites and the assumption about locking, etc.  Perhaps
   this could be done from a plpgsql function.  We could add and drop a
   dummy column to force TOAST table creation --- we would then only need a
   way to detect if a function _needs_ a TOAST table, which was skipped in
   binary upgrade mode previously.
   
   That might be a minimalistic approach.
  
  I have thought some more on this.  I thought I would need to open
  pg_class in C and do complex backend stuff, but I now realize I can do
  it from libpq, and just call ALTER TABLE and I think that always
  auto-checks if a TOAST table is needed.  All I have to do is query
  pg_class from libpq, then construct ALTER TABLE commands for each item,
  and it will optionally create the TOAST table if needed.  I just have to
  use a no-op ALTER TABLE command, like SET STATISTICS.
 
 Attached is a completed patch which handles oid conflicts in pg_class
 and pg_type for TOAST tables that were not needed in the old cluster but
 are needed in the new one.  I was able to recreate a failure case and
 this fixed it.
 
 The patch need to be backpatched because I am getting more-frequent bug
 reports from users using pg_upgrade to leave now-end-of-life'ed 8.4. 
 There is not a good work-around for pg_upgrade failures without this
 fix, but at least pg_upgrade throws an error.

Patch applied through 9.3, with an additional Assert check. 9.2 code was
different enough that there was too high a risk for backpatching.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas robertmh...@gmail.com wrote:
 I think that's actually not a very unrealistic case at all.  In
 general, I think that if a particular data distribution is a
 reasonable scenario, that data distribution plus it's already sorted
 is also reasonable.  Data gets presorted in all kinds of ways:
 sometimes it gets loaded in sorted (or nearly-sorted) order from
 another data source; sometimes we do an index scan that produces data
 in order by column A and then later sort by A, B; sometimes somebody
 clusters the table.  Respecting the right of other people to have
 different opinions, I don't think I'll ever be prepared to concede the
 presorted case as either uncommon or unimportant.

I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare. All I'm saying is: if that's a
case that has to lose out more than your more limited case in order to
get a large number of representative cases 3 - 7 times faster, and
marginal cases a good bit faster, that's probably worth it. I am
willing to fix any case that I can - as you say, it's often easier to
write the code that to argue - but let's have a sense of proportion
about these things. Even your quite reasonable case is going to lose
out a bit, because what I've done is fundamentally about deciding at
intervals during copying tuples if it's worth it. If it isn't, then
clearly that whole process went to waste, and we've merely cut our
losses. We could add a GUC to control it as suggested by Simon, or
even put something in attoptions per attribute to indicate that the
optimization should be avoided, but that's something that we've
historically resisted doing.

 That's not to prejudge anything that may or may not be in your patch,
 which I have not studied in enormous detail.  It's just what I think
 about the subject in general.

Fair enough. At the risk of sounding trite, I'll add: everything is a
trade-off.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the patch I posted yesterday.  I did not see a good
 reason to meld that together in a single commit with the first of the
 patches you posted; I'll leave you to revise that patch to conform
 with this approach.

 Okay. Attached is the same patch set, rebased on top on your commit
 with appropriate amendments.

Two things:

+* result.  Also, since strxfrm()/strcoll() require
NULL-terminated inputs,

In my original patch, I wrote NUL, as in the NUL character.  You've
changed it to NULL, but the original was correct.  NULL is a pointer
value that is not relevant here; the character with value 0 is NUL.


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 3:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the patch I posted yesterday.  I did not see a good
 reason to meld that together in a single commit with the first of the
 patches you posted; I'll leave you to revise that patch to conform
 with this approach.

 Okay. Attached is the same patch set, rebased on top on your commit
 with appropriate amendments.

 Two things:

 +* result.  Also, since strxfrm()/strcoll() require
 NULL-terminated inputs,

 In my original patch, I wrote NUL, as in the NUL character.  You've
 changed it to NULL, but the original was correct.  NULL is a pointer
 value that is not relevant here; the character with value 0 is NUL.

Gah.  Hit send to soon.  Also, as much as I'd prefer to avoid
relitigating the absolutely stupid debate about how to expand the
buffers, this is no good:

+   tss-buflen1 = Max(len1 + 1, tss-buflen1 * 2);

If the first expansion is for a string 512MB and the second string is
longer than the first, this will exceed MaxAllocSize and error out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
 daniele.varra...@gmail.com wrote:
 I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
 argument 1: something is wrong: the string is likely to end up in
 sentences with other colons so I'd rather use something is wrong at
 argument 1.
 
 Is the patch attached better?

 Looks OK to me.  I thought someone else might comment, but since no
 one has, committed.

It looks to me like this is still wrong:

if (nargs % 2 != 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg(invalid number or arguments: object must be matched 
key value pairs)));
+errmsg(invalid number or arguments),
+errhint(Object must be matched key value pairs.)));

Surely that was meant to read invalid number OF arguments.  The errhint
is only charitably described as English, as well.  I'd suggest something
like Arguments of json_build_object() must be pairs of keys and values.
--- but maybe someone else can phrase that better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: show only failed queries

2014-08-07 Thread Pavel Stehule

2014-08-07 7:10 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hello
 
  updated version patch in attachment

 Thanks! But ISTM you forgot to attached the patch.


grr .. I am sorry



  +/* all psql known variables are included in list by default */
  +for (known_varname = known_varnames; *known_varname;
 known_varname++)
  +varnames[nvars++] = pg_strdup(*known_varname);
 
  Don't we need to append both prefix and suffix to even known variables?
 
 
  ??? I am not sure if I understand well - probably system read only
  variables as DBNAME, USER, VERSION should not be there

 I had that question because complete_from_variables() is also called by the
 tab-completion of \echo : and it shows half-baked variables list. So
 I thought probably we need to change complete_from_variables() more to
 fix the problem.


I understand now.

I fixed it.

I have a question.\echo probably should not to show empty known variable.

data for autocomplete for \echo should be same as result of \set




  +else if (strcmp(prev2_wd, \\set) == 0)
  +{
  +if (strcmp(prev_wd, AUTOCOMMIT) == 0)
 
  ISTM that some psql variables like IGNOREEOF are not there. Why not?
 
 
  yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT,
 HISTCONTROL,
  HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER,  VERSION
 
  There are more reasons:
 
  * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT,
  HISTSIZE, ..
 
  * variable is pseudo read only  - change has not any effect: DBNAME,
  ENCODING, VERSION

 So HISTCONTROL should be there because it doesn't have such reasons at all?


yes

Pavel


 Regards,

 --
 Fujii Masao

commit bacfdc0e03219265aeee34b78f7ec9d272d49f72
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Wed Aug 6 23:05:42 2014 +0200

warnings and typo fixed

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 24e60b7..b94ed63 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3608,6 +3608,79 @@ psql_completion(const char *text, int start, int end)
 	{
 		matches = complete_from_variables(text, , );
 	}
+	else if (strcmp(prev2_wd, \\set) == 0)
+	{
+		if (strcmp(prev_wd, AUTOCOMMIT) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, interactive, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, COMP_KEYWORD_CASE) == 0)
+		{
+			static const char *const my_list[] =
+			{lower, upper, preserve-lower, preserve-upper, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, ECHO) == 0)
+		{
+			static const char *const my_list[] =
+			{none, errors, queries, all, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, ECHO_HIDDEN) == 0)
+		{
+			static const char *const my_list[] =
+			{noexec, off, on, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, ON_ERROR_ROLLBACK) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, interactive, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, ON_ERROR_STOP) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, QUIET) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, SINGLELINE) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, SINGLESTEP) == 0)
+		{
+			static const char *const my_list[] =
+			{on, off, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, VERBOSITY) == 0)
+		{
+			static const char *const my_list[] =
+			{default, verbose, terse, NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+	}
 	else if (strcmp(prev_wd, \\sf) == 0 || strcmp(prev_wd, \\sf+) == 0)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
 	else if (strcmp(prev_wd, \\cd) == 0 ||
@@ -4062,28 +4135,61 @@ complete_from_variables(const char *text, const char *prefix, const char *suffix
 {
 	char	  **matches;
 	char	  **varnames;
+	static const char **known_varname;
 	int			nvars = 0;
 	int			maxvars = 100;
 	int			i;
 	struct _variable *ptr;
 
+	static const char *known_varnames[] = {
+		AUTOCOMMIT, COMP_KEYWORD_CASE, DBNAME, ECHO, ECHO_HIDDEN, ENCODING,
+		FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF,
+		LASTOID, ON_ERROR_ROLLBACK, ON_ERROR_STOP, PORT, PROMPT1, PROMPT2,
+		PROMPT3, QUIET, SINGLELINE, SINGLESTEP, USER, VERBOSITY,
+		NULL
+	};
+
 	varnames = (char **) pg_malloc((maxvars + 1) * sizeof(char *));
 
+	/* all psql known variables are included in list by default */
+	for (known_varname = known_varnames; *known_varname; known_varname++)
+		varnames[nvars++] = psprintf(%s%s%s, prefix, *known_varname, suffix);
+
 	for (ptr =

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:15 PM, Robert Haas robertmh...@gmail.com wrote:
 In my original patch, I wrote NUL, as in the NUL character.  You've
 changed it to NULL, but the original was correct.  NULL is a pointer
 value that is not relevant here; the character with value 0 is NUL.

NULL-terminated string seems like acceptable usage (e.g. [1]), but
I'll try to use the term NUL in reference to '\0' in the future to
avoid confusion.

[1] https://en.wikipedia.org/wiki/Null-terminated_string
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 Hi all
 To support transparent client-side failover in BDR, it's necessary to
 know what the LSN of a node was at the time a transaction committed and
 keep track of that in the client/proxy.

 I'm thinking about adding a new message type in the protocol that gets
 sent immediately before CommandComplete, containing the LSN of the
 commit. Clients would need to enable the sending of this message with a
 GUC that they set when they connect, so it doesn't confuse clients that
 aren't expecting it or aware of it.

FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
at all.   What happens five years from now when we switch to some other
implementation that doesn't have LSNs?

I don't mind if you expose some other way to inquire about LSNs, but
let's *not* embed it into the wire protocol.  Not even as an option.

This position also obviates the need to complain about having a GUC
that changes the protocol-level behavior, which is also a seriously
bad idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 Gah.  Hit send to soon.  Also, as much as I'd prefer to avoid
 relitigating the absolutely stupid debate about how to expand the
 buffers, this is no good:

 +   tss-buflen1 = Max(len1 + 1, tss-buflen1 * 2);

 If the first expansion is for a string 512MB and the second string is
 longer than the first, this will exceed MaxAllocSize and error out.

Fair point. I think this problem is already present in a few existing
places, but it shouldn't be. I suggest this remediation:

 +   tss-buflen1 = Max(len1 + 1, Min(tss-buflen1 * 2, (int) 
 MaxAllocSize));

I too would very much prefer to not repeat that debate.  :-)
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Quick doc fix

2014-08-07 Thread Guillaume Lelarge

Hi,

Still translating the 9.4 manual, and found another typo. Patch attached.

Thanks.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index d3fcb82..cf174f0 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -528,7 +528,7 @@
 the structfieldrelfrozenxid/ column of a table's
 structnamepg_class/ row contains the freeze cutoff XID that was used
 by the last whole-table commandVACUUM/ for that table.  All rows
-inserted by transactions with XIDs XIDs older than this cutoff XID are
+inserted by transactions with XIDs older than this cutoff XID are
 guaranteed to have been frozen.  Similarly,
 the structfielddatfrozenxid/ column of a database's
 structnamepg_database/ row is a lower bound on the unfrozen XIDs

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that pre-sorted, all-unique text datums, that have all
 differences beyond the first 8 bytes, that the user happens to
 actually want to sort are fairly rare.

Actually, you could use that case to justify not doing strxfrm()
transformation of very large lists in the more general case, where
strxfrm() blobs aren't truncated/abbreviated, but our qsort() is used,
which goes against the strong recommendation of, just to pick an
example at random, the glibc documentation. It states: Here is an
example of how you can use strxfrm when you plan to do many
comparisons. It does the same thing as the previous example, but much
faster, because it has to transform each string only once, no matter
how many times it is compared with other strings. Even the time needed
to allocate and free storage is much less than the time we save, when
there are many strings. [1]

Sure, that would still be better than the equivalent abbreviated keys
case, since no strcoll() tie-breakers are required, but it would still
be bad, and all because of our pre-check for sorted input.

[1] https://www.gnu.org/software/libc/manual/html_node/Collation-Functions.html
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor

On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan p...@heroku.com wrote:

 I think that pre-sorted, all-unique text datums, that have all
 differences beyond the first 8 bytes, that the user happens to
 actually want to sort are fairly rare.


While I'm sure it's not common, I've seen a couple of ten-million tuple
tables having a URL column as primary key where 98% of the entries begin
with 'http://www.'

So, that exact scenario is out there.

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread David G Johnston

Tom Lane-2 wrote
 Robert Haas lt;

 robertmhaas@

 gt; writes:
 On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
 lt;

 daniele.varrazzo@

 gt; wrote:
 I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
 argument 1: something is wrong: the string is likely to end up in
 sentences with other colons so I'd rather use something is wrong at
 argument 1.
 
 Is the patch attached better?
 
 Looks OK to me.  I thought someone else might comment, but since no
 one has, committed.
 
 It looks to me like this is still wrong:
 
 if (nargs % 2 != 0)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 -errmsg(invalid number or arguments: object must be
 matched key value pairs)));
 +errmsg(invalid number or arguments),
 +errhint(Object must be matched key value pairs.)));
 
 Surely that was meant to read invalid number OF arguments.  The errhint
 is only charitably described as English, as well.  I'd suggest something
 like Arguments of json_build_object() must be pairs of keys and values.
 --- but maybe someone else can phrase that better.

The user documentation is worth emulating here:

http://www.postgresql.org/docs/9.4/interactive/functions-json.html


errmsg(argument count must be divisible by 2)
errhint(The argument list consists of alternating names and values)

Note that I s/keys/names/ to match said documentation.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fixed-redundant-i18n-strings-in-json-tp5813330p5814122.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor rod.tay...@gmail.com wrote:
 While I'm sure it's not common, I've seen a couple of ten-million tuple
 tables having a URL column as primary key where 98% of the entries begin
 with 'http://www.'

 So, that exact scenario is out there.

Sure, that scenario is relatively common. My point was that that
scenario, alongside a perfect logical/physical heap correlation, and
alongside a frequent need to sort is uncommon. If you're able to get
away with a bubble sort best case there, then the sort is going to
be extremely fast relative to any other database system anyway, and
you don't have too much to complain about. It's exactly the same
wasted cost as in the general case (where the tuples aren't in order),
except that it looks more expensive next to your unrealistically fast
sort that you were very lucky to be able to get away with.

I think the special pre-check for already sorted tuples within qsort()
is at best only justified by scenarios like where a serial primary key
index is reindexed. That's about it.

You can totally alter the outcome of this case by inserting a single
tuple, so the top-level pre-check fails.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor

Sigh. Found another example.

A table with 15 million entries and a unique key on filesystem location for
things users created via a web interface.

Entries all begin with /usr/home/ ...

This one is frequently sorted as batch operations against the files are
performed in alphabetical order to reduce conflict issues that a random
ordering may cause between jobs.

regards,

Rod




On Thu, Aug 7, 2014 at 5:23 PM, Rod Taylor rod.tay...@gmail.com wrote:


 On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan p...@heroku.com wrote:

 I think that pre-sorted, all-unique text datums, that have all
 differences beyond the first 8 bytes, that the user happens to
 actually want to sort are fairly rare.


 While I'm sure it's not common, I've seen a couple of ten-million tuple
 tables having a URL column as primary key where 98% of the entries begin
 with 'http://www.'

 So, that exact scenario is out there.

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor rod.tay...@gmail.com wrote:
 This one is frequently sorted as batch operations against the files are
 performed in alphabetical order to reduce conflict issues that a random
 ordering may cause between jobs.

Sure. There are cases out there. But, again, I have a hard time
imagining why you'd expect those to be pre-sorted in practice, and
particularly why you'd feel justified in expecting that to sort much
faster than equivalent though slightly imperfectly correlated data.
Without that, the fmgr-elision aspect of sort support appears to offer
enough for us to still win on balance [1], assuming 9.4 is our basis
of comparison.

[1] 
http://www.postgresql.org/message-id/cam3swzqhjxiyrsqbs5w3u-vtj_jt2hp8o02big5wyb4m9lp...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Append to a GUC parameter ?

2014-08-07 Thread Jerry Sievers

Alvaro Herrera alvhe...@2ndquadrant.com writes:

 Fabrízio de Royes Mello wrote:

 On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Josh Berkus j...@agliodbs.com writes:
   BTW, while there's unlikely to be a good reason to put search_path in
   pg.conf with appends, there are a LOT of reasons to want to be able to
   append to it during a session.
 
  [shrug...]  You can do that today with current_setting()/set_config().
 
 With a very simple statement you can do that:

 Of course, this doesn't solve the requirement that started this thread,
 which is about having includeable pg.conf fragments to enable
 extensions.
o
ISTM the idea of a token in the value string that would expand to an
existing setting s/b general purpose enough to  allow for prepend/append
and not require adding a new opperator as += or whatever.

I say this without knowing just exactly what the implementation effort
is but just to reiterate the original intent.

I think someone already suggest this upthread.

shared_preload_libraries = '%,more_libs'
shared_preload_libraries = 'more_libs,%'

At conclusion of file processing, stripping off an unnecessary delimiter
at beginning or end of string would be a nice asthetic feature and/or
might be required depending on whether an empty list value is legal.

Thanks

 -- 
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-07 Thread Joshua D. Drake



Hello,

I know this has been brought up before:

http://www.postgresql.org/message-id/20140724080902.ga28...@msg.df7cb.de

But this is just plain wrong. I don't care that the FAQ (on the wiki) 
says we are doing it wrong for good reasons. When I (or anyone else) 
pulls postgresql-$version-dev, I want the libpq for my version. I do not 
want 9.3.


Yes, it should (because of protocol compatibility) work but it doesn't 
always (as stated in that email and in a similar problem we just ran into).


There can be unintended circumstances on machines when you mix and match 
like that. Can we please do some proper packaging on this?


Sincerely,

Joshua D. Drake
--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Tom Lane

James Cloos cl...@jhcloos.com writes:
 ST == Shaun Thomas stho...@optionshouse.com writes:
 ST That said, the documentation here says FLOAT4 is an alias for REAL,
 ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than
 ST FLOAT8, which is an alias for DOUBLE PRECISION.

 There are some versions of glibc where doing certain math on double is
 faster than doing it on float, depending on how things are compiled.
 Maybe this is one of them?

No, it isn't.  The problem here is that the result of SQRT() is
float8 (a/k/a double precision) while the variable that it is to
be assigned to is float4 (a/k/a real).  As was noted upthread,
changing the variable's declared type to eliminate the run-time
type coercion removes just about all the discrepancy between PG
and Oracle runtimes.  The original comparison is not apples-to-apples
because the Oracle coding required no type coercions.  (Or at least,
so I assume; I'm not too familiar with Oracle's math functions.)

plpgsql is not efficient at all about coercions performed as a side
effect of assignments; if memory serves, it always handles them by
converting to text and back.  So basically the added cost here came
from float8out() and float4in().  There has been some talk of trying
to do such coercions via SQL casts, but nothing's been done for fear
of compatibility problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Josh Berkus

On 08/07/2014 04:48 PM, Tom Lane wrote:
 plpgsql is not efficient at all about coercions performed as a side
 effect of assignments; if memory serves, it always handles them by
 converting to text and back.  So basically the added cost here came
 from float8out() and float4in().  There has been some talk of trying
 to do such coercions via SQL casts, but nothing's been done for fear
 of compatibility problems.

Yeah, that's a weeks-long project for someone.  And would require a lot
of tests ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quick doc fix

2014-08-07 Thread Tom Lane

Guillaume Lelarge guilla...@lelarge.info writes:
 Still translating the 9.4 manual, and found another typo. Patch attached.

Applied, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Josh Berkus

On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
 +1 for BRIN !
 
 On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 A better description would be block range index since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply range index, which we could abbreviate to RIN or
 BRIN.

How about Block Range Dynamic indexes?

Or Range Usage Metadata indexes?

You see what I'm getting at:

BRanDy

RUM

... to keep with our new indexes naming scheme ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 03:54 AM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 Hi all
 To support transparent client-side failover in BDR, it's necessary to
 know what the LSN of a node was at the time a transaction committed and
 keep track of that in the client/proxy.
 
 I'm thinking about adding a new message type in the protocol that gets
 sent immediately before CommandComplete, containing the LSN of the
 commit. Clients would need to enable the sending of this message with a
 GUC that they set when they connect, so it doesn't confuse clients that
 aren't expecting it or aware of it.
 
 FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
 at all.   What happens five years from now when we switch to some other
 implementation that doesn't have LSNs?

Everyone who's relying on them already via pg_stat_replication, etc, breaks.

They're _already_ exposed to users. That ship has sailed.

That's not to say I'm stuck to LSNs as the way to do this, just that I
don't think that particular argument is relevant.

 I don't mind if you expose some other way to inquire about LSNs, but
 let's *not* embed it into the wire protocol.  Not even as an option.

Well, the underlying need here is to have the client able to keep track
of what point in the server's time-line it must see on a replica before
it proceeds to use that replica.

So if the client does some work on server A, then for some reason needs
to / must use server B, it can ask server B are you replayed up to the
last transaction I performed on server A yet? and wait until it is.

That's useful for streaming replication (to permit consistent queries
against read replicas) but it's much more so for BDR, where it's
necessary to avoid a variety of multi-master replication anomalies and
conflicts.

I considered LSNs to be the logical mechanism for this as they're
already user-visible, exposed in pg_stat_replication, they can already
be used for just this purpose by hand (just with an extra round-trip), etc.

An obvious alternative is to merge the commit timestamp work, then
expose the timestamp of the last commit replayed in pg_stat_replication.
Then all the client needs to keep track of is the server time of the
last commit.

 This position also obviates the need to complain about having a GUC
 that changes the protocol-level behavior, which is also a seriously
 bad idea.

Well, I'd prefer to be able to negotiate with the server and establish
requirements during the protocol handshake.

As far as I know there isn't an explicit protocol negotiation with
capabilities fields (just a plain version field), but we do have the
startup packet's 'options' field. So I was thinking that requesting the
setting of a PGC_BACKEND GUC in the startup packet would be a logical
way for the client to request use of a protocol extension.

Looking at ProcessStartupPacket(...) in postmaster.c I see that there's
room for special-casing options. Do you think it would be more
appropriate to add a new connection option that's sent by a client to
request reporting of commit timestamps / LSNs / whatever by the server
at commit time?

If not, do you have an alternative suggestion? I can't imagine that
extending the CommandComplete message is a desirable option.

It seems like it'd be useful to expose this as a read-only GUC anyway,
so I don't really see why a PGC_BACKEND GUC isn't exactly the right
thing to use for this, but I'm happy to listen to suggestions.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Michael Paquier

On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus j...@agliodbs.com wrote:
 On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
 +1 for BRIN !

 On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 A better description would be block range index since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply range index, which we could abbreviate to RIN or
 BRIN.

 How about Block Range Dynamic indexes?

 Or Range Usage Metadata indexes?

 You see what I'm getting at:

 BRanDy

 RUM

 ... to keep with our new indexes naming scheme ...
Not the best fit for kids, fine for grad students.

BRIN seems to be a perfect consensus, so +1 for it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Tom Lane

David G Johnston david.g.johns...@gmail.com writes:
 Tom Lane-2 wrote
 Surely that was meant to read invalid number OF arguments.  The errhint
 is only charitably described as English, as well.  I'd suggest something
 like Arguments of json_build_object() must be pairs of keys and values.
 --- but maybe someone else can phrase that better.

 The user documentation is worth emulating here:
 http://www.postgresql.org/docs/9.4/interactive/functions-json.html

 errmsg(argument count must be divisible by 2)
 errhint(The argument list consists of alternating names and values)

Seems reasonable to me.

 Note that I s/keys/names/ to match said documentation.

Hm.  The docs aren't too consistent either: there are several other nearby
places that say keys.  Notably, the functions json[b]_object_keys() have
that usage embedded in their names, where we can't readily change it.

I'm inclined to think we should s/names/keys/ in the docs instead.
Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 7:58 AM, Robert Haas robertmh...@gmail.com wrote:
 range index might get confused with range types; block range index
 seems better.  I like summary, but I'm fine with block range index or
 block filter index, too.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 On 08/08/2014 03:54 AM, Tom Lane wrote:
 FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
 at all.   What happens five years from now when we switch to some other
 implementation that doesn't have LSNs?

 Everyone who's relying on them already via pg_stat_replication, etc, breaks.
 They're _already_ exposed to users. That ship has sailed.

They're exposed to replication tools, yeah, but embedding them in the
wire protocol would be moving the goalposts a long way past that.  As an
example of something that doubtless seemed like a good idea at the time,
consider the business about how an INSERT command completion tag includes
the OID of the inserted row.  We're stuck with that obsolete idea
*forever* because it's embedded in the protocol for all clients.

 This position also obviates the need to complain about having a GUC
 that changes the protocol-level behavior, which is also a seriously
 bad idea.

 Well, I'd prefer to be able to negotiate with the server and establish
 requirements during the protocol handshake.

Sure, but a GUC is not that.  The problem with a GUC for changing
wire-level behavior is that it might be set by code far removed from
the wire, possibly breaking lower code levels that expected different
behavior.  The multitude of ways that we offer for setting GUCs is
an active evil in this particular context.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/07/2014 11:42 PM, Robert Haas wrote:
 I doubt whether it makes sense to do this without a broader
 understanding of how the client-side failover mechanism would work.
 If we're going to add something like this, it should include libpq
 support for actually doing something useful with it.

I'm currently interested in targeting PgBouncer and PgJDBC, not libpq,
though I can see that exposing helpers for it in libpq could be useful.

The goal here is to permit a client to safely switch from one server to
another - either in a multimaster async replication system like BDR, or
routing read-only queries to hot standbys with streaming replication -
and know for sure that its last commit is visible on the server it is
now connected to.

For hot standby that means it can avoid running queries that won't see
the latest work it did if the standby is lagging - deciding to run them
on the upstream instead, or wait, as appropriate.

For BDR it'll permit the client to safely perform transparent failover
to another node and resume write operations without risking conflicts
with its own prior transactions . (I wrote some explanations about this
on -general in the thread here:
http://www.postgresql.org/message-id/84184aef-887d-49df-8f47-6377b1d6e...@gmail.com
).


Broadly, what I'm thinking of is:

* Whenever a client issues a transaction that gets a txid assigned, and
that tx commits, the server reports the LSN that includes the commit.

* The client keeps track of which server it is connected to using the
server's (sysid, timelineid, databaseoid) or a similar identifier -
probably specific to the replication protocol in use, unless something
generic proves practical.

* When the client has to switch to a new server or chooses to do so, it
checks pg_stat_replication or pg_replication_slots, finds the server it
was previously connected to, and checks to see if the new server has
replayed up to the last write transaction this client performed on the
previous server. If not, it can make a policy-driven decision: wait
until replay catchup, wait for a while then bail out, etc.

This is admittedly all a bit hand-wavey. I'm looking at ways to do it,
not a firm implementation plan.

Notably, the LSN (and timelineID) aren't the only way to keep track of
the replay progress of a server and check it from another server. If the
commit timestamps work is merged and the timestamp of the last replayed
commit record is exposed in pg_replication_slots, the client could use
the server-reported commit timestamp to the same effect.


In the above you'll note that the client has to make some choices. The
client might be picking different servers for failover, read load
spreading, or other things I haven't thought of. It might be retaining
the old connection and making new ones it wants to be consistent up to a
certain point on the old connection (read spreading), or it might be
dropping the old connection and making a new one (failover). If the new
server to connect to isn't caught up yet it might want to wait
indefinitely, wait a short while, or bail out immediately and try a
different server. There's a lot of client/application specific policy
going to be involved here, so I'm not sure it makes sense to try to make
it transparent in libpq. I can see it being useful to expose some tools
in libpq for it, without a doubt, so clients can do these sorts of
things usefully.

(There's also another whole new question: how do you pick which
alternative server to connect to? But that's not really within the scope
of this.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread David Johnston

On Thu, Aug 7, 2014 at 5:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David G Johnston david.g.johns...@gmail.com writes:
  Tom Lane-2 wrote
  Surely that was meant to read invalid number OF arguments.  The
 errhint
  is only charitably described as English, as well.  I'd suggest something
  like Arguments of json_build_object() must be pairs of keys and
 values.
  --- but maybe someone else can phrase that better.

  The user documentation is worth emulating here:
  http://www.postgresql.org/docs/9.4/interactive/functions-json.html

  errmsg(argument count must be divisible by 2)
  errhint(The argument list consists of alternating names and values)

 Seems reasonable to me.

  Note that I s/keys/names/ to match said documentation.

 Hm.  The docs aren't too consistent either: there are several other nearby
 places that say keys.  Notably, the functions json[b]_object_keys() have
 that usage embedded in their names, where we can't readily change it.

 I'm inclined to think we should s/names/keys/ in the docs instead.
 Thoughts?


Agreed - have the docs match the common API term usage in our
implementation.

Not sure its worth a thorough hunt but at least fix them as they are
noticed.

David J.

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 09:02 AM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 08/08/2014 03:54 AM, Tom Lane wrote:
 FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
 at all.   What happens five years from now when we switch to some other
 implementation that doesn't have LSNs?
 
 Everyone who's relying on them already via pg_stat_replication, etc, breaks.
 They're _already_ exposed to users. That ship has sailed.
 
 They're exposed to replication tools, yeah, but embedding them in the
 wire protocol would be moving the goalposts a long way past that.  As an
 example of something that doubtless seemed like a good idea at the time,
 consider the business about how an INSERT command completion tag includes
 the OID of the inserted row.  We're stuck with that obsolete idea
 *forever* because it's embedded in the protocol for all clients.

That makes sense.


 Well, I'd prefer to be able to negotiate with the server and establish
 requirements during the protocol handshake.
 
 Sure, but a GUC is not that.  The problem with a GUC for changing
 wire-level behavior is that it might be set by code far removed from
 the wire, possibly breaking lower code levels that expected different
 behavior.  The multitude of ways that we offer for setting GUCs is
 an active evil in this particular context.

I'd value your input and suggestions then.

I thought this is what PGC_BACKEND GUCs were for - set them in the
startup packet and you can't mess with them afterwards. I can see that
the ability of someone to cause that to be set in (e.g.) PGOPTIONS could
be a problem though.

AFAIK we don't _have_ a fancy negotiation system in the protocol, with
back-and-forth exchanges of capabilities information.

So is the appropriate thing to do here to set a non-GUC 'options' field
in the startup packet?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Josh Berkus

On 08/07/2014 05:52 PM, Michael Paquier wrote:
 On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus j...@agliodbs.com wrote:
 On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
 +1 for BRIN !

 On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 August 2014 14:53, Robert Haas robertmh...@gmail.com wrote:
 A better description would be block range index since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply range index, which we could abbreviate to RIN or
 BRIN.

 How about Block Range Dynamic indexes?

 Or Range Usage Metadata indexes?

 You see what I'm getting at:

 BRanDy

 RUM

 ... to keep with our new indexes naming scheme ...
 Not the best fit for kids, fine for grad students.

But, it goes perfectly with our GIN and VODKA indexes.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 08/08/2014 03:54 AM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 Hi all
 To support transparent client-side failover in BDR, it's necessary to
 know what the LSN of a node was at the time a transaction committed and
 keep track of that in the client/proxy.

 I'm thinking about adding a new message type in the protocol that gets
 sent immediately before CommandComplete, containing the LSN of the
 commit. Clients would need to enable the sending of this message with a
 GUC that they set when they connect, so it doesn't confuse clients that
 aren't expecting it or aware of it.

 FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
 at all.   What happens five years from now when we switch to some other
 implementation that doesn't have LSNs?

 Everyone who's relying on them already via pg_stat_replication, etc, breaks.

 They're _already_ exposed to users. That ship has sailed.

 That's not to say I'm stuck to LSNs as the way to do this, just that I
 don't think that particular argument is relevant.

 I don't mind if you expose some other way to inquire about LSNs, but
 let's *not* embed it into the wire protocol.  Not even as an option.

 Well, the underlying need here is to have the client able to keep track
 of what point in the server's time-line it must see on a replica before
 it proceeds to use that replica.

 So if the client does some work on server A, then for some reason needs
 to / must use server B, it can ask server B are you replayed up to the
 last transaction I performed on server A yet? and wait until it is.

ISTM that the proper solution to that problem is the introduction of
new synchronous replication mode which makes the transaction wait for
its WAL to be replayed by the standby. If this mode is used, a client
doesn't need to track the LSN of each transaction and check whether
the committed transaction has already replayed by the standby or not.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 10:58 AM, Fujii Masao wrote:
 ISTM that the proper solution to that problem is the introduction of
 new synchronous replication mode which makes the transaction wait for
 its WAL to be replayed by the standby. If this mode is used, a client
 doesn't need to track the LSN of each transaction and check whether
 the committed transaction has already replayed by the standby or not.

I'm not convinced of that.

That pushes the penalty onto the writer - which now has to wait until
replicas catch up. It has to pay this for every commit, even if actually
failing over to another node is unlikely.

It'd be better to just enable sync rep instead, or it would if we had
all-nodes sync rep.

IMO any waiting needs to be done on the other side, i.e. Wait until I
am caught up before proceeding rather than wait for the other end to
catch up before returning.

Doing it the way you describe would make it nearly useless for enabling
client-side failover in BDR, where half the point is that it can deal
with high latency or intermittently available links to downstream replicas.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Michael Paquier

On Fri, Aug 8, 2014 at 11:58 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 ISTM that the proper solution to that problem is the introduction of
 new synchronous replication mode which makes the transaction wait for
 its WAL to be replayed by the standby. If this mode is used, a client
 doesn't need to track the LSN of each transaction and check whether
 the committed transaction has already replayed by the standby or not.
Don't you need to combine that with the possibility to wait for N
targets instead of 1 in synchronous_standby_names? You may want to be
sure that the commit is done on a series of standbys before
considering any further operations after this transaction commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Tom Lane

I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This interacts poorly with the code in pglz_compress() that gives up if
it's found nothing compressible in the first first_success_by bytes of a
value-to-be-compressed.  (first_success_by is 1024 in the default set of
compression parameters.)

As an example, here's gdb's report of the bitwise representation of the
example JSON value in the bug thread:

0x2ab85ac:  0x2005  0x0004  0x50003098  0x309f
0x2ab85bc:  0x30ae  0x30b8  0x30cf  0x30da
0x2ab85cc:  0x30df  0x30ee  0x3105  0x6b6e756a
0x2ab85dc:  0x40de  0x0034  0x0068  0x009c
0x2ab85ec:  0x00d0  0x0104  0x0138  0x016c
0x2ab85fc:  0x01a0  0x01d4  0x0208  0x023c
0x2ab860c:  0x0270  0x02a4  0x02d8  0x030c
0x2ab861c:  0x0340  0x0374  0x03a8  0x03dc
0x2ab862c:  0x0410  0x0444  0x0478  0x04ac
0x2ab863c:  0x04e0  0x0514  0x0548  0x057c
0x2ab864c:  0x05b0  0x05e4  0x0618  0x064c
0x2ab865c:  0x0680  0x06b4  0x06e8  0x071c
0x2ab866c:  0x0750  0x0784  0x07b8  0x07ec
0x2ab867c:  0x0820  0x0854  0x0888  0x08bc
0x2ab868c:  0x08f0  0x0924  0x0958  0x098c
0x2ab869c:  0x09c0  0x09f4  0x0a28  0x0a5c
0x2ab86ac:  0x0a90  0x0ac4  0x0af8  0x0b2c
0x2ab86bc:  0x0b60  0x0b94  0x0bc8  0x0bfc
0x2ab86cc:  0x0c30  0x0c64  0x0c98  0x0ccc
0x2ab86dc:  0x0d00  0x0d34  0x0d68  0x0d9c
0x2ab86ec:  0x0dd0  0x0e04  0x0e38  0x0e6c
0x2ab86fc:  0x0ea0  0x0ed4  0x0f08  0x0f3c
0x2ab870c:  0x0f70  0x0fa4  0x0fd8  0x100c
0x2ab871c:  0x1040  0x1074  0x10a8  0x10dc
0x2ab872c:  0x1110  0x1144  0x1178  0x11ac
0x2ab873c:  0x11e0  0x1214  0x1248  0x127c
0x2ab874c:  0x12b0  0x12e4  0x1318  0x134c
0x2ab875c:  0x1380  0x13b4  0x13e8  0x141c
0x2ab876c:  0x1450  0x1484  0x14b8  0x14ec
0x2ab877c:  0x1520  0x1554  0x1588  0x15bc
0x2ab878c:  0x15f0  0x1624  0x1658  0x168c
0x2ab879c:  0x16c0  0x16f4  0x1728  0x175c
0x2ab87ac:  0x1790  0x17c4  0x17f8  0x182c
0x2ab87bc:  0x1860  0x1894  0x18c8  0x18fc
0x2ab87cc:  0x1930  0x1964  0x1998  0x19cc
0x2ab87dc:  0x1a00  0x1a34  0x1a68  0x1a9c
0x2ab87ec:  0x1ad0  0x1b04  0x1b38  0x1b6c
0x2ab87fc:  0x1ba0  0x1bd4  0x1c08  0x1c3c
0x2ab880c:  0x1c70  0x1ca4  0x1cd8  0x1d0c
0x2ab881c:  0x1d40  0x1d74  0x1da8  0x1ddc
0x2ab882c:  0x1e10  0x1e44  0x1e78  0x1eac
0x2ab883c:  0x1ee0  0x1f14  0x1f48  0x1f7c
0x2ab884c:  0x1fb0  0x1fe4  0x2018  0x204c
0x2ab885c:  0x2080  0x20b4  0x20e8  0x211c
0x2ab886c:  0x2150  0x2184  0x21b8  0x21ec
0x2ab887c:  0x2220  0x2254  0x2288  0x22bc
0x2ab888c:  0x22f0  0x2324  0x2358  0x238c
0x2ab889c:  0x23c0  0x23f4  0x2428  0x245c
0x2ab88ac:  0x2490  0x24c4  0x24f8  0x252c
0x2ab88bc:  0x2560  0x2594  0x25c8  0x25fc
0x2ab88cc:  0x2630  0x2664  0x2698  0x26cc
0x2ab88dc:  0x2700  0x2734  0x2768  0x279c
0x2ab88ec:  0x27d0  0x2804  0x2838  0x286c
0x2ab88fc:  0x28a0  0x28d4  0x2908  0x293c
0x2ab890c:  0x2970  0x29a4  0x29d8  0x2a0c
0x2ab891c:  0x2a40  0x2a74  0x2aa8  0x2adc
0x2ab892c:  0x2b10  0x2b44  0x2b78  0x2bac
0x2ab893c:  0x2be0  0x2c14  0x2c48  0x2c7c

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 09:51 AM, Tom Lane wrote:

 AFAIK we don't _have_ a fancy negotiation system in the protocol, with
 back-and-forth exchanges of capabilities information.
 
 Maybe it's time to invent that.  It would be positively foolish to
 create any such behavior without a protocol version bump anyway.

I was hoping it'd be easier to sneak a new message type in without a
full protocol bump. As you can imagine that makes it a ... rather larger
job.

Still, if it's unsafe to do it that way...

 Although as I said, I don't think embedding knowledge of LSNs at the
 protocol level is a good thing to begin with. 

As I said upthread, it need not necessarily be an LSN. A commit
timestamp would do the job too, if information about the last-replayed
commit timestamp was accessible on the downstream node.

It needs to be a sequence identifier that can be matched against
pg_replication_slots / pg_stat_replication or passed to a function on
the downstream end to say wait until we're replayed to this point.

For streaming replication there's only one upstream, so there's no need
to identify it. For BDR you'd also have to identify the upstream node of
interest - probably by slot ID, or by (sysid, tlid, dboid) tuple.

In the end, it can be an opaque magic cookie. It really doesn't matter,
so long as what the client receives is a value it can pass to another Pg
instance and say wait until you've replayed up to this, please or
have you replayed up to this yet?.

 Is it really necessary that this information be pushed to the client
on every commit, as opposed to the client asking for it occasionally?

I think so, yes, though I'd be glad to be proved wrong.

For the purpose of transparent failover (BDR) at least, the server
currently being written to can go away at any moment, and you should
know exactly what you're up to in order to make it safe to continue on
another server.



Consider, for a multi-master configuration where two servers replicate
to each other:

On a connection to server1:

INSERT INTO bird(id, parrot)
VALUES (1, 'African Grey');

[client grabs magic cookie for server replay state]

INSERT INTO bird(id, parrot)
VALUES (2, 'Lorikkeet');

[server1 sends changes to server2, which is behind on replay
 and still working on catching up]

[server1 dies abruptly]

[client drops connection to server1, connects to server2]

-- Correct spelling
UPDATE bird
SET parrot = 'Lorikeet'
WHERE id = 2;


If the INSERT from server1 hasn't replayed on server2 yet this will
fail. Other anomalies can be worse and cause lost updates, etc.

To protect against this the client needs a way to wait, after connecting
to server2, until it's caught up with the state of server1. That's what
I'm talking about here. In this case, if you used a periodic progress
indicator requested by the client, you'd still get the same error,
because you'd wait until the first INSERT but not the second.

So yes, the client needs this info at every commit.

That means that enabling client-side fail-over won't be free, especially
for lots of small transactions. It'll be cheaper if Pg can push the info
with the commit confirmation instead of the client having to request it
afterwards though.


(Note that the client risks waiting forever if server1 didn't send the
required commits before it died, but that's where application policy
decisions come in).



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-07 Thread Kyotaro HORIGUCHI

Hi, thank you for the comment.

 Hi Kyotaro,
 I looked at the patches and felt that the approach taken here is too
 intrusive, considering that the feature is only for foreign scans.

I agree to you premising that it's only for foreign scans but I
regard it as an example of parallel execution planning.

 There are quite a few members added to the generic Path, Plan structures,
 whose use is is induced only through foreign scans. Each path now stores
 two sets of costs, one with parallelism and one without. The parallel
 values will make sense only when there is a foreign scan, which uses
 parallelism, in the plan tree. So, those costs are maintained unnecessarily
 or the memory for those members is wasted in most of the cases, where the
 tables involved are not foreign. Also, not many foreign tables will be able
 to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would
 like hear from others.

I intended to discuss what the estimation and planning for
parallel exexution (not limited to foreign scan) would be
like. Backgroud worker would be able to take on executing some
portion of path tree in 'parallel'. The postgres_fdw for this
patch is simply a case in planning of parallel
executions. Although, as you see, it does only choosing whether
to go parallel for the path constructed regardless of parallel
execution but thinking of the possible alternate paths of
parallel execution will cost too much.

Limiting to parallel scans for this discussion, the overall gain
by multiple simultaneous scans distributed in path/plan tree
won't be known before cost counting is done up to the root node
(more precisely the common parent of them). This patch foolishly
does bucket brigade of parallel cost up to root node, but there
should be smarter way to shortcut it, for example, simplly
picking up parallelizable nodes by scanning completed path/plan
tree and calculate the probably-eliminable costs from them, then
subtract it from or compare to the total (nonparallel) cost. This
might be more acceptable for everyone than current implement.

 Instead, an FDW which can use parallelism can add two paths one with and
 one without parallelism with appropriate costs and let the logic choosing
 the cheapest path take care of the actual choice. In fact, I thought,
 parallelism would be always faster than the non-parallel one, except when
 the foreign server is too much loaded. But we won't be able to check that
 anyway. Can you point out a case where the parallelism may not win over
 serial execution?

It always wins against serial execution if parallel execution can
launched with no extra cost. But actually it costs extra resource
so I thought that parallel execution should be curbed for small
gain. It's the two GUCs added by this patch and what
choose_parallel_scans() does, although in non-automated way. The
overloading issue is not a matter confined to parallel execution
but surely it will be more severe since it is less visible and
controllable from users. However, it anyhow would should go to
manual configuration at end.

 BTW, the name parallelism seems to be misleading here. All, it will be able
 to do is fire the queries (or data fetch requests) asynchronously. So, we
 might want to change the naming appropriately.

It is right ragarding what I did exactly to postgres_fdw. But not
allowing all intermedate tuples from child execution nodes in
parallel to be piled up on memory without restriction, I suppose
all 'parallel' execution to be a kind of this 'asynchronous'
startup/fething. As for postgres_fdw, it would look more like
'parallel' (and perhaps more effeicient) by processing queries
using libpq's single-row mode instead of a cursor but the similar
processing takes place under system calls even for the case.


Well, I will try to make the version not including parallel costs
in plan/path structs, and single-row mode for postgres_fdw. I
hope it will go towards anything.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

Hi,

We can specify the unit when setting autovacuum_vacuum_cost_delay
GUC as follows.

ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';

OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
as storage parameter as follows.

CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms');
ERROR:  invalid value for integer option
autovacuum_vacuum_cost_delay: 80ms

This is not user-friendly. I'd like to propose the attached patch which
introduces the infrastructure which allows us to specify the unit when
setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
Comment? Review?

Regards,

-- 
Fujii Masao
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***
*** 105,111  static relopt_int intRelOpts[] =
  			Packs btree index pages only to this percentage,
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 105,111 
  			Packs btree index pages only to this percentage,
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 113,119  static relopt_int intRelOpts[] =
  			Packs hash index pages only to this percentage,
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 113,119 
  			Packs hash index pages only to this percentage,
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 121,127  static relopt_int intRelOpts[] =
  			Packs gist index pages only to this percentage,
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 121,127 
  			Packs gist index pages only to this percentage,
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 129,135  static relopt_int intRelOpts[] =
  			Packs spgist index pages only to this percentage,
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 129,135 
  			Packs spgist index pages only to this percentage,
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 137,143  static relopt_int intRelOpts[] =
  			Minimum number of tuple updates or deletes prior to vacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 137,143 
  			Minimum number of tuple updates or deletes prior to vacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 145,151  static relopt_int intRelOpts[] =
  			Minimum number of tuple inserts, updates or deletes prior to analyze,
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 145,151 
  			Minimum number of tuple inserts, updates or deletes prior to analyze,
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 153,159  static relopt_int intRelOpts[] =
  			Vacuum cost delay in milliseconds, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100
  	},
  	{
  		{
--- 153,159 
  			Vacuum cost delay in milliseconds, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100, GUC_UNIT_MS
  	},
  	{
  		{
***
*** 161,167  static relopt_int intRelOpts[] =
  			Vacuum cost amount available before napping, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 1
  	},
  	{
  		{
--- 161,167 
  			Vacuum cost amount available before napping, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 1, 0
  	},
  	{
  		{
***
*** 169,175  static relopt_int intRelOpts[] =
  			Minimum age at which VACUUM should freeze a table row, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10
  	},
  	{
  		{
--- 169,175 
  			Minimum age at which VACUUM should freeze a table row, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10, 0
  	},
  	{
  		{
***
*** 177,183  static relopt_int intRelOpts[] =
  			Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10
  	},
  	{
  		{
--- 177,183 
  			Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10, 0
  	},
  	{
  		{
***
*** 185,191  static relopt_int intRelOpts[] =
  			Age at which to autovacuum a table to prevent transaction ID wraparound,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 20
  	},
  	{
  		{
--- 185,191 
  			Age at which

Re: [HACKERS] Use unique index for longer pathkeys.

2014-08-07 Thread Kyotaro HORIGUCHI

Hello, 

  Although, yes, you're right, irrespective of the common
  something, and even if the dropped index was i_t1_pkey_2, which
  is on t1(a, b), the result tuples are sorted as expected only by
  the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
  still unique so the joined tuples are also unique, and the unique
  key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
  which can be transformed to (t.a, t1.b) using the equiality
  between t.a and t1.a. And considering the inner relation (t1) is
  already sorted by (a, b), the sort itself could be elimited from
  the plan.
 
 I think if we could eliminate t1.c,t1.d considering indexes on
 individual relations (may be by using technique I have mentioned
 upthread or some other way), then the current code itself will
 eliminate the ORDER BY clause.  I have tried that by using a query
 without having t1.c,t1.d in ORDER BY clause
 
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
 t1.a,t1.b;
 QUERY PLAN
 --
  Merge Join
Merge Cond: (t1.a = t.a)
-  Index Scan using i_t1_pkey_2 on t1
-  Index Scan using i_t_pkey on t
 (4 rows)

Ya, the elimination seems to me so fascinate:)

 Okay, I think you want to handle the elimination of ORDER BY clauses
 at a much broader level which might handle most of simple cases as
 well.  However I think eliminating clauses as mentioned above is itself
 a good optimization for many cases and more so if that can be done in
 a much simpler way.

Yes, if can be done in much simpler way..  I guess that it
could be looking from opposite side, that is, equivalence
classes, anyway, I'll try it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Alvaro Herrera

Fujii Masao wrote:
 Hi,
 
 We can specify the unit when setting autovacuum_vacuum_cost_delay
 GUC as follows.
 
 ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';
 
 OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
 as storage parameter as follows.
 
 CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms');
 ERROR:  invalid value for integer option
 autovacuum_vacuum_cost_delay: 80ms
 
 This is not user-friendly.

No disagreement here.

 I'd like to propose the attached patch which
 introduces the infrastructure which allows us to specify the unit when
 setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
 Comment? Review?

Hm, what's with the parse_int signature change and the hintmsg thing?
Is it just me or the patch is incomplete?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Larry White

Apologies if this is a ridiculous suggestion, but I believe that swapping
out the compression algorithm (for Snappy, for example) has been discussed
in the past. I wonder if that algorithm is sufficiently different that it
would produce a better result, and if that might not be preferable to some
of the other options.


On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I looked into the issue reported in bug #11109.  The problem appears to be
 that jsonb's on-disk format is designed in such a way that the leading
 portion of any JSON array or object will be fairly incompressible, because
 it consists mostly of a strictly-increasing series of integer offsets.
 This interacts poorly with the code in pglz_compress() that gives up if
 it's found nothing compressible in the first first_success_by bytes of a
 value-to-be-compressed.  (first_success_by is 1024 in the default set of
 compression parameters.)

 As an example, here's gdb's report of the bitwise representation of the
 example JSON value in the bug thread:

 0x2ab85ac:  0x2005  0x0004  0x50003098  0x309f
 0x2ab85bc:  0x30ae  0x30b8  0x30cf  0x30da
 0x2ab85cc:  0x30df  0x30ee  0x3105  0x6b6e756a
 0x2ab85dc:  0x40de  0x0034  0x0068  0x009c
 0x2ab85ec:  0x00d0  0x0104  0x0138  0x016c
 0x2ab85fc:  0x01a0  0x01d4  0x0208  0x023c
 0x2ab860c:  0x0270  0x02a4  0x02d8  0x030c
 0x2ab861c:  0x0340  0x0374  0x03a8  0x03dc
 0x2ab862c:  0x0410  0x0444  0x0478  0x04ac
 0x2ab863c:  0x04e0  0x0514  0x0548  0x057c
 0x2ab864c:  0x05b0  0x05e4  0x0618  0x064c
 0x2ab865c:  0x0680  0x06b4  0x06e8  0x071c
 0x2ab866c:  0x0750  0x0784  0x07b8  0x07ec
 0x2ab867c:  0x0820  0x0854  0x0888  0x08bc
 0x2ab868c:  0x08f0  0x0924  0x0958  0x098c
 0x2ab869c:  0x09c0  0x09f4  0x0a28  0x0a5c
 0x2ab86ac:  0x0a90  0x0ac4  0x0af8  0x0b2c
 0x2ab86bc:  0x0b60  0x0b94  0x0bc8  0x0bfc
 0x2ab86cc:  0x0c30  0x0c64  0x0c98  0x0ccc
 0x2ab86dc:  0x0d00  0x0d34  0x0d68  0x0d9c
 0x2ab86ec:  0x0dd0  0x0e04  0x0e38  0x0e6c
 0x2ab86fc:  0x0ea0  0x0ed4  0x0f08  0x0f3c
 0x2ab870c:  0x0f70  0x0fa4  0x0fd8  0x100c
 0x2ab871c:  0x1040  0x1074  0x10a8  0x10dc
 0x2ab872c:  0x1110  0x1144  0x1178  0x11ac
 0x2ab873c:  0x11e0  0x1214  0x1248  0x127c
 0x2ab874c:  0x12b0  0x12e4  0x1318  0x134c
 0x2ab875c:  0x1380  0x13b4  0x13e8  0x141c
 0x2ab876c:  0x1450  0x1484  0x14b8  0x14ec
 0x2ab877c:  0x1520  0x1554  0x1588  0x15bc
 0x2ab878c:  0x15f0  0x1624  0x1658  0x168c
 0x2ab879c:  0x16c0  0x16f4  0x1728  0x175c
 0x2ab87ac:  0x1790  0x17c4  0x17f8  0x182c
 0x2ab87bc:  0x1860  0x1894  0x18c8  0x18fc
 0x2ab87cc:  0x1930  0x1964  0x1998  0x19cc
 0x2ab87dc:  0x1a00  0x1a34  0x1a68  0x1a9c
 0x2ab87ec:  0x1ad0  0x1b04  0x1b38  0x1b6c
 0x2ab87fc:  0x1ba0  0x1bd4  0x1c08  0x1c3c
 0x2ab880c:  0x1c70  0x1ca4  0x1cd8  0x1d0c
 0x2ab881c:  0x1d40  0x1d74  0x1da8  0x1ddc
 0x2ab882c:  0x1e10  0x1e44  0x1e78  0x1eac
 0x2ab883c:  0x1ee0  0x1f14  0x1f48  0x1f7c
 0x2ab884c:  0x1fb0  0x1fe4  0x2018  0x204c
 0x2ab885c:  0x2080  0x20b4  0x20e8  0x211c
 0x2ab886c:  0x2150  0x2184  0x21b8  0x21ec
 0x2ab887c:  0x2220  0x2254  0x2288  0x22bc
 0x2ab888c:  0x22f0  0x2324  0x2358  0x238c
 0x2ab889c:  0x23c0  0x23f4  0x2428  0x245c
 0x2ab88ac:  0x2490  0x24c4  0x24f8  0x252c
 0x2ab88bc:  0x2560  0x2594  0x25c8  0x25fc
 0x2ab88cc:  0x2630  0x2664  0x2698  0x26cc
 0x2ab88dc:  0x2700  0x2734  0x2768

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Amit Kapila

On Thu, Aug 7, 2014 at 12:36 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao masao.fu...@gmail.com
wrote:

  What about picking up only data_directory at the first pass?
 
  I think that will workout and for that I think we might need to add
  few checks during ProcessConfigFile.  Do you want to address it
  during parsing of config file or during setting of values?

 I prefer during paring. The attached patch does that. In this patch,
 the first call of ProcessConfigFile() picks up only data_directory. One
 concern of this patch is that the logic is a bit ugly.

I think handling 'data_directory' in ParseConfigFp() looks bit out of
place as this API is used to parse other config files as well like
recovery.conf.  I agree that for all other paths DataDir might be
already set due to which this new path will never be executed, still
from code maintenance point of view it would have been better if we
can find a way to handle it in a place where we are processing
the server's main config file (postgresql.conf).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Fujii Masao wrote:
 Hi,

 We can specify the unit when setting autovacuum_vacuum_cost_delay
 GUC as follows.

 ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';

 OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
 as storage parameter as follows.

 CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = 
 '80ms');
 ERROR:  invalid value for integer option
 autovacuum_vacuum_cost_delay: 80ms

 This is not user-friendly.

 No disagreement here.

 I'd like to propose the attached patch which
 introduces the infrastructure which allows us to specify the unit when
 setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
 Comment? Review?

 Hm, what's with the parse_int signature change and the hintmsg thing?
 Is it just me or the patch is incomplete?

Sorry, probably I failed to see your point. You mean that the signature
of parse_int needs to be changed so that it includes the hintmsg as the
argument? If yes, there is no problem. Both signature and function body
of parse_int has already included the hingmsg as the argument so far.
Am I missing something?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Alvaro Herrera

Fujii Masao wrote:
 On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Hm, what's with the parse_int signature change and the hintmsg thing?
  Is it just me or the patch is incomplete?
 
 Sorry, probably I failed to see your point. You mean that the signature
 of parse_int needs to be changed so that it includes the hintmsg as the
 argument? If yes, there is no problem. Both signature and function body
 of parse_int has already included the hingmsg as the argument so far.
 Am I missing something?

I just mean that the parse_int function body is not touched by your
patch, unless I am failing to see something.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-07 Thread Stephen Frost

JD,

* Joshua D. Drake (j...@commandprompt.com) wrote:
 But this is just plain wrong. I don't care that the FAQ (on the
 wiki) says we are doing it wrong for good reasons. When I (or anyone
 else) pulls postgresql-$version-dev, I want the libpq for my
 version. I do not want 9.3.

No, it isn't wrong.  If you'd like the specific version, follow what's
in the FAQ to get the specific version.  Otherwise, we're going to
follow the Debian guidelines which are quite sensible and more-or-less
say make new builds go against the latest version.

 There can be unintended circumstances on machines when you mix and
 match like that. Can we please do some proper packaging on this?

This *is* the proper packaging.

If you want the specific version, update your deb line.  Don't complain
because you used the Debian repo that follows the Debian guidelines and
didn't like what you got- that's not going to change.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 I looked into the issue reported in bug #11109.  The problem appears to be
 that jsonb's on-disk format is designed in such a way that the leading
 portion of any JSON array or object will be fairly incompressible, because
 it consists mostly of a strictly-increasing series of integer offsets.
 This interacts poorly with the code in pglz_compress() that gives up if
 it's found nothing compressible in the first first_success_by bytes of a
 value-to-be-compressed.  (first_success_by is 1024 in the default set of
 compression parameters.)

I haven't looked at this in any detail, so take this with a grain of
salt, but what about teaching pglz_compress about using an offset
farther into the data, if the incoming data is quite a bit larger than
1k?  This is just a test to see if it's worthwhile to keep going, no?  I
wonder if this might even be able to be provided as a type-specific
option, to avoid changing the behavior for types other than jsonb in
this regard.

(I'm imaginging a boolean saying pick a random sample, or perhaps a
function which can be called that'll return here's where you wanna test
if this thing is gonna compress at all)

I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
especially as I do hope that some day we'll figure out a way to use a
better compression algorithm than pglz.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Tatsuo Ishii

Fabien,

 Hello Tatsuo-san,
 
 Thanks for the review. I have registered it to Aug Commit fest.
 https://commitfest.postgresql.org/action/patch_view?id=1532

 I'm not sure of the variable name is_non_init_parameter_set. I would
 suggest benchmarking_option_set?

 Ok, I will replace the variable name as you suggested.

 Also, to be consistent, maybe it should check that no
 initialization-specific option are set when benchmarking.

 Good suggesition. Here is the v2 patch.
 
 I applied it without problem and tested it.
 
 
 * It seems that -c is ignored, the atoi() line has been removed.
 
 * Option -q is initialization specific, but not detected as such like
 * the other, although there is a specific detection later. I think that
 * it would be better to use the systematic approach, and to remove the
 * specific check.
 
 * I would name the second boolean initialization_option_set, as it is
 * describe like that in the documentation.
 
 
 * I would suggest the following error messages:
  some options cannot be used in initialization (-i) mode\n and
  some options cannot be used in benchmarking mode\n.
 Although these messages are rough, I think that they are enough and
 avoid running something unexpected, which is your purpose.
 
 
 Find attached a patch which adds these changes to your current
 version.

Thank you for the review and patch. Looks good. I changed the status
to Ready for Committer. I will wait for a fewd days and if there's
no objection, I will commit the patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 2:12 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Fujii Masao wrote:
 On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Hm, what's with the parse_int signature change and the hintmsg thing?
  Is it just me or the patch is incomplete?

 Sorry, probably I failed to see your point. You mean that the signature
 of parse_int needs to be changed so that it includes the hintmsg as the
 argument? If yes, there is no problem. Both signature and function body
 of parse_int has already included the hingmsg as the argument so far.
 Am I missing something?

 I just mean that the parse_int function body is not touched by your
 patch, unless I am failing to see something.

Yes, my patch doesn't change the parse_int function at all because I didn't
think such change is required for the purpose (i.e., just allows us to specify
the unit in the setting of storage parameters). But, you might find the
reason why it needs to be changed?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

96 matches

Mail list logo