Re: [HACKERS] postgresql.auto.conf and reload

2014-08-06 Thread Fujii Masao
On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 The patch chooses the last settings for every parameters and ignores the
 former settings. But I don't think that every parameters need to be 
 processed
 this way. That is, we can change the patch so that only PGC_POSTMASTER
 parameters are processed that way. The invalid settings in the parameters
 except PGC_POSTMASTER can be checked by pg_ctl reload as they are now.
 Also this approach can reduce the number of comparison to choose the
 last setting, i.e., n in O(n^2) is the number of uncommented 
 *PGC_POSTMASTER*
 parameters (not every parameters). Thought?

 I don't find that to be a particularly good idea.  In the first place,
 it introduces extra complication and a surprising difference in the
 behavior of different GUCs.  In the second place, I thought part of the
 point of this patch was to suppress log messages complaining about
 invalid values that then weren't actually used for anything.  That issue
 exists just as much for non-POSTMASTER variables.  (IOW, value cannot
 be changed now is not the only log message we're trying to suppress.)

 Yep, sounds reasonable. This makes me think that we can suppress
 such invalid message of the parameters which are actually not used
 at not only conf file reload but also *postmaster startup*. That's another
 story, though. Anyway, barring any objection, I will commit Amit's patch.

Applied the slightly-modified version!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Fabien COELHO


Hello Robert,


The issue is that there are 3 definitions of modulo, two of which are fine
(Knuth floored division and Euclidian), and the last one much less useful.
Alas, C (%)  SQL (MOD) choose the bad definition:-( I really need any of
the other two. The attached patch adds all versions, with % and mod
consistent with their C and SQL unfortunate counterparts, and fmod and
emod the sane ones.


Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.


I agree that it is overkill.

In fact there is a link: if there was a real expression syntax, the 
remainder sign could be fixed afterwards, so the standard C/SQL version 
would do. If it is not available, the modulo operator must be right.


If there is only one modulo added, I would rather have the Knuth version. 
However I was afraid that someone would object if % does not return the 
same result than the C/PostgreSQL versions (even if I think that nearly 
nobody has a clue about what % returns when arguments are negative), hence 
the 3 modulo version to counter this potential critic.


But I would prefer just one version with the Knuth (or Euclidian)
definitions.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Fabien COELHO


Hello Alvaro,


I wonder if it would be necessary to offer the division operator
semantics corresponding to whatever additional modulo operator we choose
to offer.  That is, if we add emod, do we need ediv as well?


I would make sense, however I do not need it, and I'm not sure of a use 
case where it would be needed, so I do not think that it is necessary. 
If it happens to be, it could be added then quite easily.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-06 Thread Fujii Masao
On Wed, Aug 6, 2014 at 2:34 PM,  furu...@pm.nttdata.co.jp wrote:
  I have improved the patch  by making following changes:
 
  1. Since stream_stop() was redundant, stream_stop() at the time of
 WAL file closing was deleted.
 
  2. Change the Flash judging timing for the readability of source code.
 I have changed the Flash judging timing , from the continuous
 message after receiving to
 before the feedbackmassege decision of continue statement after
 execution.
 
  Thanks for the updated version of the patch!
 
  While reviewing the patch, I found that HandleCopyStream() is still
  long and which decreases the readability of the source code.
  So I feel inclined to refactor the HandleCopyStream() more for better
  readability. What about the attached refactoring patch?

 Sorry, I forgot to attached the patch in previous email. So attached.

 Thank you for the refactoring patch.
 I did a review of the patch.

 -   break;  /* ignore the rest of 
 this XLogData packet */

 +   return true;/* ignore the rest of this 
 XLogData packet */

 For break statement at close of wal file, it is a return to true.
 It may be a behavior of continue statement. Is it satisfactory?

Sorry I failed to see your point.

In the original code, when we reach the end of WAL file and it's streaming
stopping point, we break out of inner loop in the code block for processing
XLogData packet. And then we goes back to top of outer loop in
HandleCopyStream. ISTM that the refactored code also works the same way.
Anyway, could you elaborate the problem?

 The walreceiver distributes XLogWalRcvProcessMsg and XLogWalRcvWrite, but 
 isn't that division necessary?

Not necessary, but I have no objection to the idea.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing PG_RETURN_UINT16

2014-08-06 Thread Fujii Masao
On Wed, Aug 6, 2014 at 4:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep m.kn...@web.de wrote:
 I’m missing the PG_RETURN_UINT16 macro in fmgr.h
 Since we already have the PG_GETARG_UINT16 macro
 I guess it makes sense to to have it.

 here is the trivial patch for it.

 I see no reason not to add this.  Anyone else want to object?

+1 to add that.

What about backpatch to 9.4? This is very simple change and there seems to
be no reason to wait for it until 9.5.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Fabien COELHO


For example, if we had reason to be concerned about *adversarial* 
inputs, I think that there is a good chance that our qsort() actually 
would be problematic to the point of driving us to prefer some generally 
slower alternative.


That is an interesting point.

Indeed, a database in general often stores user-supplied data, which may 
happen to be sorted for presentation purpose in an interface. Same thing 
occured with hashtable algorithms and was/is a way to do DOS attacks on 
web applications. I'm not sure whether the qsort version discussed here 
would apply on user-supplied data, though. If so, adding some randomness 
in the decision process would suffice to counter the adversarial input 
argument you raised.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 If so, adding some randomness in the decision process would suffice to
 counter the adversarial input argument you raised.


This is specifically addressed by the paper. Indeed, randomly choosing
a pivot is a common strategy. It won't fix the problem.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-06 Thread Pavan Deolasee
On Tue, Aug 5, 2014 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:



 This gradual approach looks good to me. And, if the additional compression
 algorithm like lz4 is always better than pglz for every scenarios, we can
 just
 change the code so that the additional algorithm is always used. Which
 would
 make the code simpler.


Right.


  3. Compressing one block vs all blocks:
  Andres suggested that compressing all backup blocks in one go may give us
  better compression ratio. This is worth trying. I'm wondering what would
 the
  best way to do so without minimal changes to the xlog insertion code.
 Today,
  we add more rdata items for backup block header(s) and backup blocks
  themselves (if there is a hole then 2 per backup block) beyond what the
  caller has supplied. If we have to compress all the backup blocks
 together,
  then one approach is to copy the backup block headers and the blocks to a
  temp buffer, compress that and replace the rdata entries added previously
  with a single rdata.

 Basically sounds reasonable. But, how does this logic work if there are
 multiple rdata and only some of them are backup blocks?


My idea is to just make a pass over the rdata entries past the
rdt_lastnormal element after processing the backup blocks and making
additional entries in the chain. These additional rdata entries correspond
to the backup blocks and their headers. So we can copy the rdata-data of
these elements in a temp buffer and compress the entire thing in one go. We
can then replace the rdata chain past the rdt_lastnormal with a single
rdata with data pointing to the compressed data. Recovery code just needs
to decompress this data the record header indicates that the backup data is
compressed. Sure the exact mechanism to indicate if the data is compressed
(and by which algorithm) can be worked out.


 If a hole is not copied to that temp buffer, ISTM that we should
 change backup block header  so that it contains the info for a
 hole, e.g., location that a hole starts. No?


AFAICS its not required if we compress the stream of BkpBlock and the block
data. The current mechanism of constructing the additional rdata chain
items takes care of hole anyways.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] postgresql.auto.conf and reload

2014-08-06 Thread Fujii Masao
On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jul 28, 2014 at 11:33 PM, Fujii Masao masao.fu...@gmail.com wrote:
 There is other side effect on this patch. With the patch, when reloading
 the configurartion file, the server cannot warm an invalid setting value
 if it's not the last setting of the parameter. This may cause problematic
 situation as follows.

 1. ALTER SYSTEM SET work_mem TO '1024kB';
 2. Reload the configuration file --- success
 3. Then, a user accidentally adds work_mem = '2048KB' into
 postgresql.conf
  The setting value '2048KB' is invalid, and the unit should be
 'kB' instead of 'KB'.
 4. Reload the configuration file --- success
  The invalid setting is ignored because the setting of work_mem in
  postgresql.auto.conf is preferred. So a user cannot notice that
 postgresql.conf
  has an invalid setting.
 5. Failover on shared-disk HA configuration happens, then PostgreSQL fails
 to
 start up because of such an invalid setting. When PostgreSQL
 starts up, the last
 setting is preferred. But all the settings are checked.

 The reason is that during startup DataDir is not set by the time server
 processes config file due to which we process .auto.conf file in second
 pass.  I think ideally it should ignore the invalid setting in such a case
 as it does during reload, however currently there is no good way to
 process .auto.conf  incase DataDir is not set, so we handle it separately
 in second pass.

What about picking up only data_directory at the first pass?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-06 Thread furuyao
  -   break;  /* ignore
 the rest of this XLogData packet */
 
  +   return true;/* ignore the rest of
 this XLogData packet */
 
  For break statement at close of wal file, it is a return to true.
  It may be a behavior of continue statement. Is it satisfactory?
 
 Sorry I failed to see your point.
 
 In the original code, when we reach the end of WAL file and it's streaming
 stopping point, we break out of inner loop in the code block for
 processing XLogData packet. And then we goes back to top of outer loop
 in HandleCopyStream. ISTM that the refactored code also works the same
 way.
 Anyway, could you elaborate the problem?

I'm sorry. I was confused with the patch that I have to offer. 
It is necessary to exit the loop since the loop added to the continuously 
received the message if the patch. 
Refactor patch is no problem with the contents of the presentation.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Enhancing pgbench parameter checking

2014-08-06 Thread Tatsuo Ishii
Hi,

It is pretty annoying that pgbench does not check parameter which
should not be used with -i. I often type like:

pgbench -c 10 -T 300 -S -i test

and accidentally initialize pgbench database. This is pretty
uncomfortable if the database is huge since initializing huge database
takes long time. Why don't we check the case? Included is the patch to
enhance the behavior of pgbench in this regard IMO. Here is a sample
session after patching:

$ ./pgbench -c 10 -T 300 -S -i test
some parameters cannot be used in initialize mode

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index c0e5e24..d7a3f57 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2520,6 +2520,8 @@ main(int argc, char **argv)
 	char	   *filename = NULL;
 	bool		scale_given = false;
 
+	bool		is_non_init_param_set = false;
+
 	CState	   *state;			/* status of clients */
 	TState	   *threads;		/* array of thread */
 
@@ -2599,12 +2601,14 @@ main(int argc, char **argv)
 break;
 			case 'S':
 ttype = 1;
+is_non_init_param_set = true;
 break;
 			case 'N':
 ttype = 2;
+is_non_init_param_set = true;
 break;
 			case 'c':
-nclients = atoi(optarg);
+is_non_init_param_set = true;
 if (nclients = 0 || nclients  MAXCLIENTS)
 {
 	fprintf(stderr, invalid number of clients: %d\n, nclients);
@@ -2629,6 +2633,7 @@ main(int argc, char **argv)
 #endif   /* HAVE_GETRLIMIT */
 break;
 			case 'j':			/* jobs */
+is_non_init_param_set = true;
 nthreads = atoi(optarg);
 if (nthreads = 0)
 {
@@ -2637,9 +2642,11 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
+is_non_init_param_set = true;
 is_connect = true;
 break;
 			case 'r':
+is_non_init_param_set = true;
 is_latencies = true;
 break;
 			case 's':
@@ -2652,6 +2659,7 @@ main(int argc, char **argv)
 }
 break;
 			case 't':
+is_non_init_param_set = true;
 if (duration  0)
 {
 	fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n);
@@ -2665,6 +2673,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'T':
+is_non_init_param_set = true;
 if (nxacts  0)
 {
 	fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n);
@@ -2681,12 +2690,14 @@ main(int argc, char **argv)
 login = pg_strdup(optarg);
 break;
 			case 'l':
+is_non_init_param_set = true;
 use_log = true;
 break;
 			case 'q':
 use_quiet = true;
 break;
 			case 'f':
+is_non_init_param_set = true;
 ttype = 3;
 filename = pg_strdup(optarg);
 if (process_file(filename) == false || *sql_files[num_files - 1] == NULL)
@@ -2696,6 +2707,8 @@ main(int argc, char **argv)
 {
 	char	   *p;
 
+	is_non_init_param_set = true;
+
 	if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 	{
 		fprintf(stderr, invalid variable definition: %s\n, optarg);
@@ -2716,6 +2729,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'M':
+is_non_init_param_set = true;
 if (num_files  0)
 {
 	fprintf(stderr, query mode (-M) should be specifiled before transaction scripts (-f)\n);
@@ -2731,6 +2745,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'P':
+is_non_init_param_set = true;
 progress = atoi(optarg);
 if (progress = 0)
 {
@@ -2745,6 +2760,8 @@ main(int argc, char **argv)
 	/* get a double from the beginning of option value */
 	double		throttle_value = atof(optarg);
 
+	is_non_init_param_set = true;
+
 	if (throttle_value = 0.0)
 	{
 		fprintf(stderr, invalid rate limit: %s\n, optarg);
@@ -2764,6 +2781,7 @@ main(int argc, char **argv)
 index_tablespace = pg_strdup(optarg);
 break;
 			case 4:
+is_non_init_param_set = true;
 sample_rate = atof(optarg);
 if (sample_rate = 0.0 || sample_rate  1.0)
 {
@@ -2776,6 +2794,7 @@ main(int argc, char **argv)
 fprintf(stderr, --aggregate-interval is not currently supported on Windows);
 exit(1);
 #else
+is_non_init_param_set = true;
 agg_interval = atoi(optarg);
 if (agg_interval = 0)
 {
@@ -2808,6 +2827,12 @@ main(int argc, char **argv)
 
 	if (is_init_mode)
 	{
+		if (is_non_init_param_set)
+		{
+			fprintf(stderr, some parameters cannot be used in initialize mode\n);
+			exit(1);
+		}
+
 		init(is_no_vacuum);
 		exit(0);
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-08-06 Thread Amit Kapila
On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
 This essentially removes BgWriterDelay, but it's still mentioned in
BgBufferSync().  Looking further, I see that with the patch applied,
BgBufferSync() is still present in the source code but is no longer called
from anywhere.  Please don't submit patches that render things unused
without actually removing them; it makes it much harder to see what you've
changed.  I realize you probably left it that way for testing purposes, but
you need to clean such things up before submitting.  Likewise, if you've
rendered GUCs or statistics counters removed, you need to rip them out, so
that the scope of the changes you've made is clear to reviewers.

I have kept it just for the reason that if the basic approach is
sounds reasonable/accepted, then I will clean it up.  Sorry for
the inconvenience, I didn't realized that it can be annoying for
reviewer, I will remove all such code from patch in next version.


 A comparison of BgBufferSync() with
BgBufferSyncAndMoveBuffersToFreelist() reveals that you've removed at least
one behavior that some people (at least, me) will care about, which is the
guarantee that the background writer will scan the entire buffer pool at
least every couple of minutes.

Okay, I will take care of this based on the conclusion of
the other points in this mail.

This is important because it guarantees that dirty data doesn't sit in
memory forever.  When the system becomes busy again after a long idle
period, users will expect the system to have used the idle time to flush
dirty buffers to disk.  This also improves data recovery prospects if, for
example, somebody loses their pg_xlog directory - there may be dirty
buffers whose contents are lost, of course, but they won't be months old.

 b.  New stats for number of buffers on freelist has been added, some
  old one's like maxwritten_clean can be removed as new logic for
  syncing buffers and moving them to free list doesn't use them.
  However I think it's better to remove them once the new logic is
  accepted.  Added some new logs for info related to free list under
  BGW_DEBUG


 If I'm reading this right, the new statistic is an incrementing counter
where, every time you update it, you add the number of buffers currently on
the freelist.  That makes no sense.

I think using 'number of buffers currently on the freelist' and
'number of recently allocated buffers' for consecutive cycles,
we can figure out approximately how many buffer allocations
needs clock sweep assuming low and high threshold water
marks are fixed.  However there can be cases where it is not
easy to estimate that number.

 I think what you should be counting is the number of allocations that are
being satisfied from the free-list.  Then, by comparing the rate at which
that value is incrementing to the rate at which buffers_alloc is
incrementing, somebody can figure out what percentage of allocations are
requiring a clock-sweep run.  Actually, I think it's better to flip it
around: count the number of allocations that require an individual backend
to run the clock sweep (vs. being satisfied from the free-list); call it,
say, buffers_backend_clocksweep.  We can then try to tune the patch to make
that number as small as possible under varying workloads.

This can give us clear idea to tune the patch, however we need
to maintain 3 counters for it in code(recent_alloc (needed for
current bgwriter logic) and other 2 suggested by you).  Do you
want to retain such counters in code or it's for kind of debug info
for patch?



 d.  Autotune the low and high threshold for freelist for various
  configurations.

 I think we need to come up with some kind of formula here rather than
just a list of hard-coded constants.

That was my initial intention as well and I have tried based
on number of shared buffers like keeping threshold values as
percentage of shared buffers but nothing could satisfy different
kind of workloads.  The current values I have choosen are based
on experiments for various workloads at different thresholds.  I have
shown the lwlock_stats data for various loads based on current
thresholds upthread.  Another way could be to make them as config
knobs and use the values as given by user incase it is provided by
user else go with fixed values.

There are other instances in code as well (one of them I remember
offhand is in pglz_compress) where we use fixed values based on
different sizes.

And it definitely needs some comments explaining the logic behind the
choices.

Agreed, I shall improve them in next version of patch.


 Aside from those specific remarks, I think the elephant in the room is
the question of whether it really makes sense to have one process which is
responsible both for populating the free list and for writing buffers to
disk.  One problem, which I alluded to above under point (1), is 

Re: [HACKERS] inherit support for foreign tables

2014-08-06 Thread Etsuro Fujita
(2014/07/01 11:10), Etsuro Fujita wrote:
 (2014/06/30 22:48), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Done.  I think this is because create_foreignscan_plan() makes reference
 to attr_needed, which isn't computed for inheritance children.

 I wonder whether it isn't time to change that.  It was coded like that
 originally only because calculating the values would've been a waste of
 cycles at the time.  But this is at least the third place where it'd be
 useful to have attr_needed for child rels.
 
 +1 for calculating attr_needed for child rels.  (I was wondering too.)
 
 I'll create a separate patch for it.

Attached is a WIP patch for that.  The following functions have been
changed to refer to attr_needed:

* check_index_only()
* remove_unused_subquery_outputs()
* check_selective_binary_conversion()

I'll add this to the upcoming commitfest.  If anyone has any time to
glance at it before then, that would be a great help.

Thanks,

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 799,806  check_selective_binary_conversion(RelOptInfo *baserel,
  	}
  
  	/* Collect all the attributes needed for joins or final output. */
! 	pull_varattnos((Node *) baserel-reltargetlist, baserel-relid,
!    attrs_used);
  
  	/* Add all the attributes used by restriction clauses. */
  	foreach(lc, baserel-baserestrictinfo)
--- 799,811 
  	}
  
  	/* Collect all the attributes needed for joins or final output. */
! 	for (i = baserel-min_attr; i = baserel-max_attr; i++)
! 	{
! 		if (!bms_is_empty(baserel-attr_needed[i - baserel-min_attr]))
! 			attrs_used =
! bms_add_member(attrs_used,
! 			   i - FirstLowInvalidHeapAttributeNumber);
! 	}
  
  	/* Add all the attributes used by restriction clauses. */
  	foreach(lc, baserel-baserestrictinfo)
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***
*** 577,588  set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
  		childrel-has_eclass_joins = rel-has_eclass_joins;
  
  		/*
! 		 * Note: we could compute appropriate attr_needed data for the child's
! 		 * variables, by transforming the parent's attr_needed through the
! 		 * translated_vars mapping.  However, currently there's no need
! 		 * because attr_needed is only examined for base relations not
! 		 * otherrels.  So we just leave the child's attr_needed empty.
  		 */
  
  		/*
  		 * Compute the child's size.
--- 577,585 
  		childrel-has_eclass_joins = rel-has_eclass_joins;
  
  		/*
! 		 * Compute the child's attr_needed.
  		 */
+ 		adjust_appendrel_attr_needed(rel, childrel, childRTE, appinfo);
  
  		/*
  		 * Compute the child's size.
***
*** 2173,2178  remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel)
--- 2170,2176 
  {
  	Bitmapset  *attrs_used = NULL;
  	ListCell   *lc;
+ 	int			i;
  
  	/*
  	 * Do nothing if subquery has UNION/INTERSECT/EXCEPT: in principle we
***
*** 2193,2204  remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel)
  	 * Collect a bitmap of all the output column numbers used by the upper
  	 * query.
  	 *
! 	 * Add all the attributes needed for joins or final output.  Note: we must
! 	 * look at reltargetlist, not the attr_needed data, because attr_needed
! 	 * isn't computed for inheritance child rels, cf set_append_rel_size().
! 	 * (XXX might be worth changing that sometime.)
  	 */
! 	pull_varattnos((Node *) rel-reltargetlist, rel-relid, attrs_used);
  
  	/* Add all the attributes used by un-pushed-down restriction clauses. */
  	foreach(lc, rel-baserestrictinfo)
--- 2191,2205 
  	 * Collect a bitmap of all the output column numbers used by the upper
  	 * query.
  	 *
! 	 * Add all the attributes needed for joins or final output.
  	 */
! 	for (i = rel-min_attr; i = rel-max_attr; i++)
! 	{
! 		if (!bms_is_empty(rel-attr_needed[i - rel-min_attr]))
! 			attrs_used =
! bms_add_member(attrs_used,
! 			   i - FirstLowInvalidHeapAttributeNumber);
! 	}
  
  	/* Add all the attributes used by un-pushed-down restriction clauses. */
  	foreach(lc, rel-baserestrictinfo)
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***
*** 1768,1779  check_index_only(RelOptInfo *rel, IndexOptInfo *index)
  	 * way out.
  	 */
  
! 	/*
! 	 * Add all the attributes needed for joins or final output.  Note: we must
! 	 * look at reltargetlist, not the attr_needed data, because attr_needed
! 	 * isn't computed for inheritance child rels.
! 	 */
! 	pull_varattnos((Node *) rel-reltargetlist, rel-relid, attrs_used);
  
  	/* Add all the attributes used by restriction clauses. */
  	foreach(lc, rel-baserestrictinfo)
--- 1768,1781 
  	 * way out.
  	 */
  
! 	/* Collect all the attributes needed for joins or final output. */
! 	for (i = rel-min_attr; i = rel-max_attr; i++)
! 	{
! 		if (!bms_is_empty(rel-attr_needed[i 

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-06 Thread Robert Haas
On Tue, Aug 5, 2014 at 7:05 AM, Ashutosh Bapat
ashutosh.ba...@enterprisedb.com wrote:
 There are quite a few members added to the generic Path, Plan structures,
 whose use is is induced only through foreign scans. Each path now stores two
 sets of costs, one with parallelism and one without. The parallel values
 will make sense only when there is a foreign scan, which uses parallelism,
 in the plan tree. So, those costs are maintained unnecessarily or the memory
 for those members is wasted in most of the cases, where the tables involved
 are not foreign.

Yeah, I don't think that's going to be acceptable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-06 Thread Fujii Masao
On Wed, Aug 6, 2014 at 5:10 PM,  furu...@pm.nttdata.co.jp wrote:
  -   break;  /* ignore
 the rest of this XLogData packet */
 
  +   return true;/* ignore the rest of
 this XLogData packet */
 
  For break statement at close of wal file, it is a return to true.
  It may be a behavior of continue statement. Is it satisfactory?

 Sorry I failed to see your point.

 In the original code, when we reach the end of WAL file and it's streaming
 stopping point, we break out of inner loop in the code block for
 processing XLogData packet. And then we goes back to top of outer loop
 in HandleCopyStream. ISTM that the refactored code also works the same
 way.
 Anyway, could you elaborate the problem?

 I'm sorry. I was confused with the patch that I have to offer.
 It is necessary to exit the loop since the loop added to the continuously 
 received the message if the patch.
 Refactor patch is no problem with the contents of the presentation.

Okay, applied the patch.

I heavily modified your patch based on the master that the refactoring
patch has been applied. Attached is that modified version. Could you
review that?

Regards,

-- 
Fujii Masao
diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 7c50b01..c15776f 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -106,6 +106,21 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+   termoption-F replaceable class=parameterinterval/replaceable/option/term
+   termoption--fsync-interval=replaceable class=parameterinterval/replaceable/option/term
+   listitem
+para
+Specifies the maximum time to issue sync commands to ensure the
+received WAL file is safely flushed to disk, in seconds. The default
+value is zero, which disables issuing fsyncs except when WAL file is
+closed. If literal-1/literal is specified, WAL file is flushed as
+soon as possible, that is, as soon as there are WAL data which has
+not been flushed yet.
+/para
+   /listitem
+  /varlistentry
+
+ varlistentry
   termoption-v/option/term
   termoption--verbose/option/term
   listitem
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 5df2eb8..0b02c4c 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -371,7 +371,7 @@ LogStreamerMain(logstreamer_param *param)
 	if (!ReceiveXlogStream(param-bgconn, param-startptr, param-timeline,
 		   param-sysidentifier, param-xlogdir,
 		   reached_end_position, standby_message_timeout,
-		   NULL))
+		   NULL, 0))
 
 		/*
 		 * Any errors will already have been reported in the function process,
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 9640838..0b7af54 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -36,6 +36,7 @@ static char *basedir = NULL;
 static int	verbose = 0;
 static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
+static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
 
 
@@ -62,6 +63,8 @@ usage(void)
 	printf(_(\nOptions:\n));
 	printf(_(  -D, --directory=DIRreceive transaction log files into this directory\n));
 	printf(_(  -n, --no-loop  do not loop on connection lost\n));
+	printf(_(  -F  --fsync-interval=INTERVAL\n
+			  frequency of syncs to transaction log files (in seconds)\n));
 	printf(_(  -v, --verbose  output verbose messages\n));
 	printf(_(  -V, --version  output version information, then exit\n));
 	printf(_(  -?, --help show this help, then exit\n));
@@ -330,7 +333,8 @@ StreamLog(void)
 starttli);
 
 	ReceiveXlogStream(conn, startpos, starttli, NULL, basedir,
-	  stop_streaming, standby_message_timeout, .partial);
+	  stop_streaming, standby_message_timeout, .partial,
+	  fsync_interval);
 
 	PQfinish(conn);
 }
@@ -360,6 +364,7 @@ main(int argc, char **argv)
 		{port, required_argument, NULL, 'p'},
 		{username, required_argument, NULL, 'U'},
 		{no-loop, no_argument, NULL, 'n'},
+		{fsync-interval, required_argument, NULL, 'F'},
 		{no-password, no_argument, NULL, 'w'},
 		{password, no_argument, NULL, 'W'},
 		{status-interval, required_argument, NULL, 's'},
@@ -389,7 +394,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nwWv,
+	while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv,
 			long_options, option_index)) != -1)
 	{
 		switch (c)
@@ -436,6 +441,15 @@ main(int argc, char **argv)
 			case 'n':
 noloop = 1;
 break;
+		case 'F':
+			fsync_interval = atoi(optarg) * 1000;
+			if (fsync_interval  -1000)
+			{
+fprintf(stderr, _(%s: 

[HACKERS] pgcrypto: PGP signatures

2014-08-06 Thread Marko Tiikkaja

Hi hackers,

Attached is a patch to add support for PGP signatures in encrypted 
messages into pgcrypto.


Currently, the list of limitations is the following:

- It only knows how to generate one signature per message.  I don't 
see that as a problem.
- If a message has been signed with multiple keys which have the 
same keyid as the one specified to verify the message, an error is 
returned.  Naively, it seems that we should try all of them and return 
OK if even one of them matches, but that seems icky.
- Only RSA signatures are supported.  It wouldn't be too hard for 
someone familiar with DSA to add it in, but I'm not volunteering to do 
it.  Personally I think supporting RSA is better than no support at all.


As per usual, I'll also add this to the upcoming commitfest.  Any 
feedback appreciated before that, of course.



.marko
*** a/contrib/pgcrypto/Makefile
--- b/contrib/pgcrypto/Makefile
***
*** 20,39  SRCS = pgcrypto.c px.c px-hmac.c px-crypt.c \
mbuf.c pgp.c pgp-armor.c pgp-cfb.c pgp-compress.c \
pgp-decrypt.c pgp-encrypt.c pgp-info.c pgp-mpi.c \
pgp-pubdec.c pgp-pubenc.c pgp-pubkey.c pgp-s2k.c \
!   pgp-pgsql.c
  
  MODULE_big= pgcrypto
  OBJS  = $(SRCS:.c=.o) $(WIN32RES)
  
  EXTENSION = pgcrypto
! DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql
  PGFILEDESC = pgcrypto - cryptographic functions
  
  REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \
$(CF_TESTS) \
crypt-des crypt-md5 crypt-blowfish crypt-xdes \
pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \
!   pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info
  
  EXTRA_CLEAN = gen-rtab
  
--- 20,39 
mbuf.c pgp.c pgp-armor.c pgp-cfb.c pgp-compress.c \
pgp-decrypt.c pgp-encrypt.c pgp-info.c pgp-mpi.c \
pgp-pubdec.c pgp-pubenc.c pgp-pubkey.c pgp-s2k.c \
!   pgp-sig.c pgp-pgsql.c
  
  MODULE_big= pgcrypto
  OBJS  = $(SRCS:.c=.o) $(WIN32RES)
  
  EXTENSION = pgcrypto
! DATA = pgcrypto--1.2.sql pgcrypto--1.0--1.1.sql pgcrypto--1.1--1.2.sql 
pgcrypto--unpackaged--1.0.sql
  PGFILEDESC = pgcrypto - cryptographic functions
  
  REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \
$(CF_TESTS) \
crypt-des crypt-md5 crypt-blowfish crypt-xdes \
pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \
!   pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info pgp-sign
  
  EXTRA_CLEAN = gen-rtab
  
*** a/contrib/pgcrypto/expected/pgp-encrypt.out
--- b/contrib/pgcrypto/expected/pgp-encrypt.out
***
*** 16,22  select pgp_sym_decrypt(pgp_sym_encrypt('Secret.', 'key'),
expect-sess-key=0,
expect-s2k-mode=3,
expect-s2k-digest-algo=sha1,
!   expect-compress-algo=0
');
   pgp_sym_decrypt 
  -
--- 16,23 
expect-sess-key=0,
expect-s2k-mode=3,
expect-s2k-digest-algo=sha1,
!   expect-compress-algo=0,
!   expect-digest-algo=sha512
');
   pgp_sym_decrypt 
  -
***
*** 30,38  select pgp_sym_decrypt(pgp_sym_encrypt('Secret.', 'key'),
expect-sess-key=1,
expect-s2k-mode=0,
expect-s2k-digest-algo=md5,
!   expect-compress-algo=1
');
  NOTICE:  pgp_decrypt: unexpected cipher_algo: expected 4 got 7
  NOTICE:  pgp_decrypt: unexpected s2k_mode: expected 0 got 3
  NOTICE:  pgp_decrypt: unexpected s2k_digest_algo: expected 1 got 2
  NOTICE:  pgp_decrypt: unexpected use_sess_key: expected 1 got 0
--- 31,41 
expect-sess-key=1,
expect-s2k-mode=0,
expect-s2k-digest-algo=md5,
!   expect-compress-algo=1,
!   expect-digest-algo=md5
');
  NOTICE:  pgp_decrypt: unexpected cipher_algo: expected 4 got 7
+ NOTICE:  pgp_decrypt: unexpected digest_algo: expected 1 got 10
  NOTICE:  pgp_decrypt: unexpected s2k_mode: expected 0 got 3
  NOTICE:  pgp_decrypt: unexpected s2k_digest_algo: expected 1 got 2
  NOTICE:  pgp_decrypt: unexpected use_sess_key: expected 1 got 0
*** a/contrib/pgcrypto/expected/pgp-info.out
--- b/contrib/pgcrypto/expected/pgp-info.out
***
*** 76,78  from encdata order by id;
--- 76,151 
   FD0206C409B74875
  (4 rows)
  
+ -- pgp_main_key_id
+ select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=1;
+  pgp_main_key_id  
+ --
+  1C29BC0D18177364
+ (1 row)
+ 
+ select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=2;
+  pgp_main_key_id  
+ --
+  48E9CD56FEA668DB
+ (1 row)
+ 
+ select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=3;
+  pgp_main_key_id  
+ --
+  63F875F63F6774A0
+ (1 row)
+ 
+ select 

Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Fabien COELHO



If so, adding some randomness in the decision process would suffice to
counter the adversarial input argument you raised.


This is specifically addressed by the paper. Indeed, randomly choosing
a pivot is a common strategy. It won't fix the problem.


Too bad. I must admit that I do not see how to build a test case which 
would trigger a worst case behavior against a qsort which chooses the 
pivot randomly, but I have not read the paper, and possibly there is an 
element of context which is eluding me.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-06 Thread Fabien COELHO


Included is the patch to enhance the behavior of pgbench in this regard 
IMO. Here is a sample session after patching:


$ ./pgbench -c 10 -T 300 -S -i test
some parameters cannot be used in initialize mode


I have not tested, but the patch looks ok in principle.

I'm not sure of the variable name is_non_init_parameter_set. I would 
suggest benchmarking_option_set?


Also, to be consistent, maybe it should check that no 
initialization-specific option are set when benchmarking.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Fabien COELHO



Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.


Here is a third simpler patch which only implements the Knuth's modulo, 
where the remainder has the same sign as the divisor.


I would prefer this version 3 (one simple modulo based on Knuth 
definition) or if there is a problem version 2 (all 3 modulos). Version 1 
which provides a modulo compatible with C  SQL is really useless to me.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index c0e5e24..3f53e2c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -1574,6 +1574,22 @@ top:
 	}
 	snprintf(res, sizeof(res), INT64_FORMAT, ope1 / ope2);
 }
+else if (strcmp(argv[3], %) == 0)
+{
+	int64_t remainder;
+	if (ope2 == 0)
+	{
+		fprintf(stderr, %s: division by zero in modulo\n, argv[0]);
+		st-ecnt++;
+		return true;
+	}
+	/* divisor signed remainder */
+	remainder = ope1 % ope2;
+	if ((ope2  0  remainder  0) ||
+		(ope2  0  remainder  0))
+		remainder += ope2;
+	snprintf(res, sizeof(res), INT64_FORMAT, remainder);
+}
 else
 {
 	fprintf(stderr, %s: unsupported operator %s\n, argv[0], argv[3]);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index b7d88f3..d722f31 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -735,7 +735,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   Each replaceableoperand/ is either an integer constant or a
   literal:/replaceablevariablename/ reference to a variable
   having an integer value.  The replaceableoperator/ can be
-  literal+/, literal-/, literal*/, or literal//.
+  literal+/, literal-/, literal*/, literal%/ (divisor-signed version) or literal//.
  /para
 
  para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-08-06 Thread Robert Haas
On Wed, Aug 6, 2014 at 6:12 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 If I'm reading this right, the new statistic is an incrementing counter
 where, every time you update it, you add the number of buffers currently on
 the freelist.  That makes no sense.

 I think using 'number of buffers currently on the freelist' and
 'number of recently allocated buffers' for consecutive cycles,
 we can figure out approximately how many buffer allocations
 needs clock sweep assuming low and high threshold water
 marks are fixed.  However there can be cases where it is not
 easy to estimate that number.

Counters should be design in such a way that you can read it, and then
read it again later, and make sense of it - you should not need to
read the counter on *consecutive* cycles to interpret it.

 I think what you should be counting is the number of allocations that are
 being satisfied from the free-list.  Then, by comparing the rate at which
 that value is incrementing to the rate at which buffers_alloc is
 incrementing, somebody can figure out what percentage of allocations are
 requiring a clock-sweep run.  Actually, I think it's better to flip it
 around: count the number of allocations that require an individual backend
 to run the clock sweep (vs. being satisfied from the free-list); call it,
 say, buffers_backend_clocksweep.  We can then try to tune the patch to make
 that number as small as possible under varying workloads.

 This can give us clear idea to tune the patch, however we need
 to maintain 3 counters for it in code(recent_alloc (needed for
 current bgwriter logic) and other 2 suggested by you).  Do you
 want to retain such counters in code or it's for kind of debug info
 for patch?

I only mean to propose one new counter, and I'd imagine including that
in the final patch.  We already have a counter of total buffer
allocations; that's buffers_alloc.  I'm proposing to add an additional
counter for the number of those allocations not satisfied from the
free list, with a name like buffers_alloc_clocksweep (I said
buffers_backend_clocksweep above, but that's probably not best, as the
existing buffers_backend counts buffer *writes*, not allocations).  I
think we would definitely want to retain this counter in the final
patch, as an additional column in pg_stat_bgwriter.

 d.  Autotune the low and high threshold for freelist for various
  configurations.

 I think we need to come up with some kind of formula here rather than just
 a list of hard-coded constants.

 That was my initial intention as well and I have tried based
 on number of shared buffers like keeping threshold values as
 percentage of shared buffers but nothing could satisfy different
 kind of workloads.  The current values I have choosen are based
 on experiments for various workloads at different thresholds.  I have
 shown the lwlock_stats data for various loads based on current
 thresholds upthread.  Another way could be to make them as config
 knobs and use the values as given by user incase it is provided by
 user else go with fixed values.

How did you go about determining the optimal value for a particular workload?

When the list is kept short, it's less likely that a value on the list
will be referenced or dirtied again before the page is actually
recycled.  That's clearly good.  But when the list is long, it's less
likely to become completely empty and thereby force individual
backends to run the clock-sweep.  My suspicion is that, when the
number of buffers is small, the impact of the list being too short
isn't likely to be very significant, because running the clock-sweep
isn't all that expensive anyway - even if you have to scan through the
entire buffer pool multiple times, there aren't that many buffers.
But when the number of buffers is large, those repeated scans can
cause a major performance hit, so having an adequate pool of free
buffers becomes much more important.

I think your list of high-watermarks is far too generous for low
buffer counts.  With more than 100k shared buffers, you've got a
high-watermark of 2k buffers, which means that 2% or less of the
buffers will be on the freelist, which seems a little on the high side
to me, but probably in the ballpark of what we should be aiming for.
But at 10001 shared buffers, you can have 1000 of them on the
freelist, which is 10% of the buffer pool; that seems high.  At 101
shared buffers, 75% of the buffers in the system can be on the
freelist; that seems ridiculous.  The chances of a buffer still being
unused by the time it reaches the head of the freelist seem very
small.

Based on your existing list of thresholds, and taking the above into
account, I'd suggest something like this: let the high-watermark for
the freelist be 0.5% of the total number of buffers, with a maximum of
2000 and a minimum of 5.  Let the low-watermark be 20% of the
high-watermark.  That might not be best, but I think some kind of
formula like that can likely be made to work.  I would 

Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
 On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
  On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
  On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
  
   On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
   Thinking some more, there seems like this whole store-multiple-LSNs
   thing is too much. We can still do block-level incrementals just by
   using a single LSN as the reference point. We'd still need a complex
   file format and a complex file reconstruction program, so I think that
   is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
  Yes, that's the approach taken by pg_rman for its block-level
  incremental backup. Btw, I don't think that the CPU cost to scan all
  the relation files added to the one to rebuild the backups is worth
  doing it on large instances. File-level backup would cover most of the
 
  Well, if you scan the WAL files from the previous backup, that will tell
  you what pages that need incremental backup.
 
 That would require you to store that WAL, which is something we hope
 to avoid. Plus if you did store it, you'd need to retrieve it from
 long term storage, which is what we hope to avoid.

Well, for file-level backups we have:

1) use file modtime (possibly inaccurate)
2) use file modtime and checksums (heavy read load)

For block-level backups we have:

3) accumulate block numbers as WAL is written
4) read previous WAL at incremental backup time
5) read data page LSNs (high read load)

The question is which of these do we want to implement?  #1 is very easy
to implement, but incremental _file_ backups are larger than block-level
backups.  If we have #5, would we ever want #2?  If we have #3, would we
ever want #4 or #5?

  I am thinking we need a wiki page to outline all these options.
 
 There is a Wiki page.

I would like to see that wiki page have a more open approach to
implementations.

I do think this is a very important topic for us.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 12:12:29AM -0300, Fabrízio de Royes Mello wrote:
 
 On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Josh Berkus j...@agliodbs.com writes:
  BTW, while there's unlikely to be a good reason to put search_path in
  pg.conf with appends, there are a LOT of reasons to want to be able to
  append to it during a session.
 
 [shrug...]  You can do that today with current_setting()/set_config().
 
 
 
 +1
 
 With a very simple statement you can do that:
 
 fabrizio=# SELECT current_setting('search_path');
  current_setting
 -
  $user,public
 (1 row)
 
 fabrizio=# SELECT set_config('search_path', current_setting('search_path')||',
 another_schema', false);
    set_config  
 
  $user,public, another_schema
 (1 row)

Yes, that is very nice, but it only works for session-level changes,
i.e. you can't use that in postgresql.conf or via ALTER USER/DATABASE.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-06 Thread Robert Haas
On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I think you missed one of the regression tests, see attached

Woops.  Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-06 Thread Robert Haas
On Sat, Aug 2, 2014 at 4:40 PM, Jeff Davis pg...@j-davis.com wrote:
 Attached is a patch that explicitly tracks allocated memory (the blocks,
 not the chunks) for each memory context, as well as its children.

 This is a prerequisite for memory-bounded HashAgg, which I intend to
 submit for the next CF. Hashjoin tracks the tuple sizes that it adds to
 the hash table, which is a good estimate for Hashjoin. But I don't think
 it's as easy for Hashagg, for which we need to track transition values,
 etc. (also, for HashAgg, I expect that the overhead will be more
 significant than for Hashjoin). If we track the space used by the memory
 contexts directly, it's easier and more accurate.

 I did some simple pgbench select-only tests, and I didn't see any TPS
 difference.

I was curious whether a performance difference would show up when
sorting, so I tried it out.  I set up a test with pgbench -i 300.  I
then repeatedly restarted the database, and after each restart, did
this:

time psql -c 'set trace_sort=on; reindex index pgbench_accounts_pkey;'

I alternated runs between master and master with this patch, and got
the following results:

master:
LOG:  internal sort ended, 1723933 KB used: CPU 2.58s/11.54u sec
elapsed 16.88 sec
LOG:  internal sort ended, 1723933 KB used: CPU 2.50s/12.37u sec
elapsed 17.60 sec
LOG:  internal sort ended, 1723933 KB used: CPU 2.14s/11.28u sec
elapsed 16.11 sec

memory-accounting:
LOG:  internal sort ended, 1723933 KB used: CPU 2.57s/11.97u sec
elapsed 17.39 sec
LOG:  internal sort ended, 1723933 KB used: CPU 2.30s/12.57u sec
elapsed 17.68 sec
LOG:  internal sort ended, 1723933 KB used: CPU 2.54s/11.99u sec
elapsed 17.25 sec

Comparing the median times, that's about a 3% regression.  For this
particular case, we might be able to recapture that by replacing the
bespoke memory-tracking logic in tuplesort.c with use of this new
facility.  I'm not sure whether there are other cases that we might
also want to test; I think stuff that runs all on the server side is
likely to show up problems more clearly than pgbench.  Maybe a
PL/pgsql loop that does something allocation-intensive on each
iteration, for example, like parsing a big JSON document.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2014-08-06 Thread Bruce Momjian

FYI, I have kept this email from 2011 about poor performance of parsed
words in headline generation.  If someone wants to research it, please
do so:

http://www.postgresql.org/message-id/1314117620.3700.12.camel@dragflick

---

On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote:
 Sushant Sinha sushant...@gmail.com writes:
  Doesn't this force the headline to be taken from the first N words of
  the document, independent of where the match was?  That seems rather
  unworkable, or at least unhelpful.
 
  In headline generation function, we don't have any index or knowledge of
  where the match is. We discover the matches by first tokenizing and then
  comparing the matches with the query tokens. So it is hard to do
  anything better than first N words.
 
 After looking at the code in wparser_def.c a bit more, I wonder whether
 this patch is doing what you think it is.  Did you do any profiling to
 confirm that tokenization is where the cost is?  Because it looks to me
 like the match searching in hlCover() is at least O(N^2) in the number
 of tokens in the document, which means it's probably the dominant cost
 for any long document.  I suspect that your patch helps not so much
 because it saves tokenization costs as because it bounds the amount of
 effort spent in hlCover().
 
 I haven't tried to do anything about this, but I wonder whether it
 wouldn't be possible to eliminate the quadratic blowup by saving more
 state across the repeated calls to hlCover().  At the very least, it
 shouldn't be necessary to find the last query-token occurrence in the
 document from scratch on each and every call.
 
 Actually, this code seems probably flat-out wrong: won't every
 successful call of hlCover() on a given document return exactly the same
 q value (end position), namely the last token occurrence in the
 document?  How is that helpful?
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Robert Haas
On Tue, Aug 5, 2014 at 7:55 PM, Josh Berkus j...@agliodbs.com wrote:
 On 08/05/2014 04:41 PM, Alvaro Herrera wrote:
 I have chosen to keep the name minmax, even if the opclasses now let
 one implement completely different things on top of it such as geometry
 bounding boxes and bloom filters (aka bitmap indexes).  I don't see a
 need for a rename: essentially, in PR we can just say we have these
 neat minmax indexes that other databases also have, but instead of just
 being used for integer data, they can also be used for geometry, GIS and
 bitmap indexes, so as always we're more powerful than everyone else when
 implementing new database features.

 Plus we haven't come up with a better name ...

Several good suggestions have been made, like summarizing or
summary indexes and compressed range indexes.  I still really
dislike the present name - you might think this is a type of index
that has something to do with optimizing min and max, but what it
really is is a kind of small index for a big table.  The current name
couldn't make that less clear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Robert Haas
On Tue, Aug 5, 2014 at 5:53 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
  This patch is pretty trivial.
  Another slightly less trivial but more useful version.
 
  The issue is that there are 3 definitions of modulo, two of which are fine
  (Knuth floored division and Euclidian), and the last one much less useful.
  Alas, C (%)  SQL (MOD) choose the bad definition:-( I really need any of
  the other two. The attached patch adds all versions, with % and mod
  consistent with their C and SQL unfortunate counterparts, and fmod and
  emod the sane ones.

 Three different modulo operators seems like a lot for a language that
 doesn't even have a real expression syntax, but I'll yield to whatever
 the consensus is on this one.

 I wonder if it would be necessary to offer the division operator
 semantics corresponding to whatever additional modulo operator we choose
 to offer.  That is, if we add emod, do we need ediv as well?

Maybe we ought to break down and support a real expression syntax.
Sounds like that would be better all around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:

 Well, for file-level backups we have:

 1) use file modtime (possibly inaccurate)
 2) use file modtime and checksums (heavy read load)

 For block-level backups we have:

 3) accumulate block numbers as WAL is written
 4) read previous WAL at incremental backup time
 5) read data page LSNs (high read load)

 The question is which of these do we want to implement?  #1 is very easy
 to implement, but incremental _file_ backups are larger than block-level
 backups.  If we have #5, would we ever want #2?  If we have #3, would we
 ever want #4 or #5?

You may want to implement both #3 and #2. #3 would need a config
switch to enable updating the bitmap. That would make it optional to
incur the I/O cost of updating the bitmap. When the bitmap isn't
there, the backup would use #2. Slow, but effective. If slowness is a
problem for you, you enable the bitmap and do #3.

Sounds reasonable IMO, and it means you can start by implementing #2.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Questions on dynamic execution and sqlca

2014-08-06 Thread Bill Epstein


I'm very new to Postgres, but have plenty of experience developing stored
procs in Oracle.

I'm going to be creating Postgres stored procedures (functions actually,
since I discovered that in postgres, everything is a function) to do a
variety of batch-type processing.  These functions may or may not be called
by the .Net application that is being developed.  To support both my
Postgres function development and run-time monitoring, I wanted to develop
generic logging functions that would be called by other Postgres functions
to be developed in order to help trace through code and collect error
information.

The attached create_log_utilities.sql holds plsql for creating two logging
functions (one for logging status messages, and one for logging errors).
In the log_msg function, the various sets of EXEC and EXECUTE statements
are from my experimenting with dynamically generating SQL.  If I could get
it working, the intent is to be able to add a LogTableName as an input
parameter, thereby allowing individual developers to utilize their own
version of the log table (w/ the same columns).  I've been able to do this
sort of thing w/ Oracle before.

I've tried a variety of ways based on the on-line docs I've seen, but I
always get a syntax error on EXEC when I use only the line EXEC  statement
(is there a setting I need to set in order to be able to include EXEC
directives?).  The closest I've come is the currently uncommented prepared
statement - it compiles, but I get the following error messages:

   INFO:  INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY, AUDIT_LEVEL,
   AUDIT_TIME, NOTE, SQL) VALUES ('Overpayment','Create
   TLI','LOG','2014-08-06 10:44:23.933','Created TLI','INSERT INTO
   TLIA...')
   CONTEXT:  SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM
   ERROR:  INSERT has more expressions than target columns
   LINE 3:  VALUES ($1, $2, $3, $4, $5, $6)
^
   QUERY:  PREPARE myinsert7 (text, text, text, timestamp, text, text) AS
INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY,
   AUDIT_LEVEL, NOTE, SQL)
VALUES ($1, $2, $3, $4, $5, $6)
   CONTEXT:  PL/pgSQL function utility.log_msg
   (character,text,text,text,text) line 48 at SQL statement
   SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM
   ** Error **

   ERROR: INSERT has more expressions than target columns
   SQL state: 42601
   Context: PL/pgSQL function utility.log_msg
   (character,text,text,text,text) line 48 at SQL statement
   SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM


In the other function (log_error ), the problem I'm having is that I'm
trying to pull out the sqlca error code and description (as I've done in
the past w/ Oracle), in order to write that information in my log table.
The intent is that this function will only be called from within an
EXCEPTION block (as I do in my logging_test  function - I purposely run a
bad query to trigger it).

To exercise the code, I'm just executing select utility.logging_test(); in
a query window.

A few other items I could use clarification on:
- What's the difference between hitting the Execute Query and Execute
PGScript buttons?  Both seem to compile the functions.

- What are the differences among PL/SQL,  PL/PGSQL and pgScript.

- I installed Postgres 9.3.4 and  I'm using PEM v4.0.2.  When I click on
the icon to Execute arbitrary SQL queries, I notice that the icons on the
window that opens are different from the pgAdmin PostgreSQL Tools window
that opens if I double-click on one of my .sql files.  Is there a
difference in these tools?


Attached are the relevant scripts:
(See attached file: create_bpc_audit.sql) - Create the log table
(See attached file: create_log_utilities.sql)- Code to create the two
logging functions
(See attached file: test_log_utilities.sql)- Code to exercise the msg and
error logging functions


Thanks.
Bill

_
William Epstein
Consulting I/T Specialist
AIS ADM Information Management
US Federal
Office/Fax:  301-240-3887, Tie Line:  372-3887
International Business Machines (IBM) Corporation
Global Business Services (GBS)

create_bpc_audit.sql
Description: Binary data


create_log_utilities.sql
Description: Binary data


test_log_utilities.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-06 Thread John Cochran
I just browsed the paper linked by Peter and it looks like the attack has
to be active against a currently executing qsort. In the paper, what
happens is the comparison function is supplied by the attacker and
effectively lies about the result of a comparison. It keeps the lies
consistent in a very specific manner so that eventually qsort returns its
input in a properly sorted fashion.

So it seems to me that the vulnerability only exits if an attacker supplied
comparison function is permitted. For all other cases, assuming that only
vetted comparison functions are permitted, the selection of a random pivot
would make an attack on qsort using specially tailored input data
improbable.


Re: [HACKERS] Questions on dynamic execution and sqlca

2014-08-06 Thread Stephen Frost
Bill,

* Bill Epstein (epste...@us.ibm.com) wrote:
[...]

  This should really go to the -general mailing list.  The -hackers
  mailing list is for discussion regarding developing the PostgreSQL
  server itself.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:
 
  Well, for file-level backups we have:
 
  1) use file modtime (possibly inaccurate)
  2) use file modtime and checksums (heavy read load)
 
  For block-level backups we have:
 
  3) accumulate block numbers as WAL is written
  4) read previous WAL at incremental backup time
  5) read data page LSNs (high read load)
 
  The question is which of these do we want to implement?  #1 is very easy
  to implement, but incremental _file_ backups are larger than block-level
  backups.  If we have #5, would we ever want #2?  If we have #3, would we
  ever want #4 or #5?
 
 You may want to implement both #3 and #2. #3 would need a config
 switch to enable updating the bitmap. That would make it optional to
 incur the I/O cost of updating the bitmap. When the bitmap isn't
 there, the backup would use #2. Slow, but effective. If slowness is a
 problem for you, you enable the bitmap and do #3.
 
 Sounds reasonable IMO, and it means you can start by implementing #2.

Well, Robert Haas had the idea of a separate process that accumulates
the changed WAL block numbers, making it low overhead.  I question
whether we need #2 just to handle cases where they didn't enable #3
accounting earlier.  If that is the case, just do a full backup and
enable #3.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Summary seems good.  If I get enough votes I can change it to that.

 CREATE INDEX foo ON t USING summary (cols)

 Summarizing seems weird on that command.  Not sure about compressed
 range, as you would have to use an abbreviation or run the words
 together.


Summarizing index sounds better to my ears, but both ideas based on
summary are quite succint and to-the-point descriptions of what's
happening, so I vote for those.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 01:31:14PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  CREATE INDEX foo ON t USING crange (cols)   -- misspelling of cringe?
  CREATE INDEX foo ON t USING comprange (cols)
  CREATE INDEX foo ON t USING compressedrng (cols)   -- ugh
  -- or use an identifier with whitespace:
  CREATE INDEX foo ON t USING compressed range (cols)
 
 
 The word you'd use there is not necessarily the one you use on the
 framework, since the framework applies to many such techniques, but
 the index type there is one specific one.

Block filter indexes?

 The create command can still use minmax, or rangemap if you prefer
 that, while the framework's code uses summary or summarizing.

Summary sounds like materialized views to me.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 CREATE INDEX foo ON t USING crange (cols)   -- misspelling of cringe?
 CREATE INDEX foo ON t USING comprange (cols)
 CREATE INDEX foo ON t USING compressedrng (cols)   -- ugh
 -- or use an identifier with whitespace:
 CREATE INDEX foo ON t USING compressed range (cols)


The word you'd use there is not necessarily the one you use on the
framework, since the framework applies to many such techniques, but
the index type there is one specific one.

The create command can still use minmax, or rangemap if you prefer
that, while the framework's code uses summary or summarizing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 1:35 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 01:31:14PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  CREATE INDEX foo ON t USING crange (cols)   -- misspelling of cringe?
  CREATE INDEX foo ON t USING comprange (cols)
  CREATE INDEX foo ON t USING compressedrng (cols)   -- ugh
  -- or use an identifier with whitespace:
  CREATE INDEX foo ON t USING compressed range (cols)


 The word you'd use there is not necessarily the one you use on the
 framework, since the framework applies to many such techniques, but
 the index type there is one specific one.

 Block filter indexes?


Nice one


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] overflow checks optimized away

2014-08-06 Thread Bruce Momjian
On Fri, Nov 29, 2013 at 02:04:10AM +, Greg Stark wrote:
 Attached is what I have so far. I have to say I'm starting to come
 around to Tom's point of view. This is a lot of hassle for not much
 gain. i've noticed a number of other overflow checks that the llvm
 optimizer is not picking up on so even after this patch it's not clear
 that all the signed overflow checks that depend on -fwrapv will be
 gone.
 
 This patch still isn't quite finished though.
 
 a) It triggers a spurious gcc warning about overflows on constant
 expressions. These value of these expressions aren't actually being
 used as they're used in the other branch of the overflow test. I think
 I see how to work around it for gcc using __builtin_choose_expr but it
 might be pretty grotty.
 
 b) I'm concerned these checks depend on INT_MIN/MAX and SHRT_MIN/MAX
 which may not be exactly the right length. I'm kind of confused why
 c.h assumes long is 32 bits and short is 16 bits though so I don't
 think I'm making it any worse. It may be better for us to just define
 our own limits since we know exactly how large we expect these data
 types to be.
 
 c) I want to add regression tests that will ensure that the overflow
 checks are all working. So far I haven't been able to catch any being
 optimized away even with -fno-wrapv and -fstrict-overflow. I think I
 just didn't build with the right options though and it should be
 possible.
 
 The goal here imho is to allow building with -fno-wrapv
 -fstrict-overflow safely. Whether we actually do or not is another
 question but it means we would be able to use new compilers like clang
 without worrying about finding the equivalent of -fwrapv for them.

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 1:55 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  CREATE INDEX foo ON t USING crange (cols)   -- misspelling of cringe?
  CREATE INDEX foo ON t USING comprange (cols)
  CREATE INDEX foo ON t USING compressedrng (cols)   -- ugh
  -- or use an identifier with whitespace:
  CREATE INDEX foo ON t USING compressed range (cols)

 The word you'd use there is not necessarily the one you use on the
 framework, since the framework applies to many such techniques, but
 the index type there is one specific one.

 The create command can still use minmax, or rangemap if you prefer
 that, while the framework's code uses summary or summarizing.

 I think you're confusing the AM name with the opclass name.  The name
 you specify in that part of the command is the access method name.  You
 can specify the opclass together with each column, like so:

 CREATE INDEX foo ON t USING blockfilter
 (order_date date_minmax_ops, geometry gis_bbox_ops);

Oh, uh... no, I'm not confusing them, but now I just realized how one
would implement other classes of block filtering indexes, and yeah...
you do it through the opclasses.

I'm sticking to bloom filters:

CREATE INDEX foo ON t USING blockfilter (order_date date_minmax_ops,
path character_bloom_ops);

Cool. Very cool.

So, I like blockfilter a lot. I change my vote to blockfilter ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_export_snapshot on standby side

2014-08-06 Thread Bruce Momjian

Seems we still have not addressed this.

---

On Sat, May 25, 2013 at 10:18:57AM +0100, Simon Riggs wrote:
 On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote:
 
  We cannot run parallel pg_dump on the standby server because
  pg_export_snapshot() always fails on the standby. Is this the oversight
  of parallel pg_dump or pg_export_snapshot()?
 
  pg_export_snapshot() fails in the standby because it always assigns
  new XID and which is not allowed in the standby. Do we really need
  to assign new XID even in the standby for the exportable snapshot?
 
 Having looked at the code, I say No, we don't *need* to.
 
 There are various parts of the code that deal with
 takenDuringRecovery, so much of this was clearly intended to work in
 recovery.
 
 We use the topXid for the name of the snapshot file. That is clearly
 unnecessary and we should be using the virtualxid instead like we do
 elsewhere. We also use the topXid to test whether it is still running,
 though again, we could equally use the virtualxid instead. There is no
 problem with virtualxids possibly not being active anymore, since if
 we didn't have an xid before and don't have one now, and the xmin is
 the same, the snapshot is still valid.
 
 I think we should treat this as a bug and fix it in 9.3 while we're
 still in beta. Why? Because once we begin using the topXid as the
 filename we can't then change later to using the vxid.
 
 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-08-06 Thread Jeff Janes
On Fri, Aug 1, 2014 at 10:58 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 07/08/2014 08:11 PM, Jeff Janes wrote:

 Is there some recipe for testing the 0002 patch?  Can it be tested on an
 MinGW environment, or does it need to use the MicroSoft supplied
 compilers?


 I used MSVC. It ought to work with MinGW, I think, although you might need
 to tweak the Makefiles to make it compile. Certainly we should eventually
 make it work, before committing. If you try it out, let me know how it goes.


I couldn't it to work when I tried it long time ago, but I didn't record
the error and it may have been a failure to use the correct arguments to
the configure.  I think it was taking a path through all the #ifdef that
resulted in a function never getting defined.

But now it looks like 0002 needs a rebase

Cheers,

Jeff


[HACKERS] posix_fadvise() and pg_receivexlog

2014-08-06 Thread Fujii Masao
Hi,

The WAL files that pg_receivexlog writes will not be re-read soon basically,
so we can advise the OS to release any cached pages when WAL file is
closed. I feel inclined to change pg_receivexlog that way. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Peter Geoghegan
On Wed, Aug 6, 2014 at 9:18 AM, John Cochran j69coch...@gmail.com wrote:
 So it seems to me that the vulnerability only exits if an attacker supplied
 comparison function is permitted. For all other cases, assuming that only
 vetted comparison functions are permitted, the selection of a random pivot
 would make an attack on qsort using specially tailored input data
 improbable.

Whether or not random pivot selection would make it more difficult for
a malicious party to generate pre-processed data that will cause very
bad performance is not all that relevant IMV, since we don't do that.
There are good practical reasons to prefer median of medians pivot
selection, which is what we actually do, and which is clearly affected
to the extent that pre-processing malicious data that causes (at
least) near worst-case performance is possible. It's possible to
predict pivot selection. As the paper says, An adversary can make
such a quicksort go quadratic by arranging for the pivot to compare
low against almost all items not seen during pivot selection. Random
pivot selection will certainly result in more frequent lopsided
partitions without any malicious intent.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-06 Thread Tomas Vondra
On 2.8.2014 22:40, Jeff Davis wrote:
 Attached is a patch that explicitly tracks allocated memory (the blocks,
 not the chunks) for each memory context, as well as its children.
 
 This is a prerequisite for memory-bounded HashAgg, which I intend to

Anyway, I'm really looking forward to the memory-bounded hashagg, and
I'm willing to spend some time testing it.

 submit for the next CF. Hashjoin tracks the tuple sizes that it adds to
 the hash table, which is a good estimate for Hashjoin. But I don't think

Actually, even for HashJoin the estimate is pretty bad, and varies a lot
depending on the tuple width. With narrow tuples (e.g. ~40B of data),
which is actually quite common case, you easily get ~100% palloc overhead.

I managed to address that by using a custom allocator. See this:

   https://commitfest.postgresql.org/action/patch_view?id=1503

I wonder whether something like that would be possible for the hashagg?

That would make the memory accounting accurate with 0% overhead (because
it's not messing with the memory context at all), but it only for the
one node (but maybe that's OK?).

 it's as easy for Hashagg, for which we need to track transition values,
 etc. (also, for HashAgg, I expect that the overhead will be more
 significant than for Hashjoin). If we track the space used by the memory
 contexts directly, it's easier and more accurate.

I don't think that's comparable - I can easily think about cases leading
to extreme palloc overhead with HashAgg (think of an aggregate
implementing COUNT DISTINCT - that effectively needs to store all the
values, and with short values the palloc overhead will be terrible).

Actually, I was looking at HashAgg (it's a somehow natural direction
after messing with Hash Join), and my plan was to use a similar dense
allocation approach. The trickiest part would probably me making this
available from the custom aggregates.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-06 Thread Robert Haas
On Wed, Aug 6, 2014 at 1:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
 The WAL files that pg_receivexlog writes will not be re-read soon basically,
 so we can advise the OS to release any cached pages when WAL file is
 closed. I feel inclined to change pg_receivexlog that way. Thought?

How do we know that the user doesn't plan to read them soon?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-06 Thread Heikki Linnakangas

On 08/06/2014 08:39 PM, Fujii Masao wrote:

Hi,

The WAL files that pg_receivexlog writes will not be re-read soon basically,
so we can advise the OS to release any cached pages when WAL file is
closed. I feel inclined to change pg_receivexlog that way. Thought?


-1. The OS should be smart enough to not thrash the cache by files that 
are written sequentially and never read. If we go down this path, we'd 
need to sprinkle posix_fadvises into many, many places.


Anyway, who are we to say that they won't be re-read soon? You might e.g 
have a secondary backup site where you copy the files received by 
pg_receivexlog, as soon as they're completed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A worst case for qsort

2014-08-06 Thread Fabien COELHO


Random pivot selection will certainly result in more frequent lopsided 
partitions without any malicious intent.


Yep. It makes adversary input attacks more or less impossible, at the 
price of higher average cost. Maybe a less randomized version would do, 
i.e. select randomly one of the 3 medians found, but would keep some 
nice better performance property. It would need proof, though.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Fabien COELHO



Maybe we ought to break down and support a real expression syntax.
Sounds like that would be better all around.


Adding operators is more or less orthogonal with providing a new 
expression syntax. I'm not sure that there is currently a crying need for 
it (a syntax). It would be a significant project. It would raise the 
question where to stop... And I just really need a few more 
functions/operators which can be fitted in the current implementation 
quite easily.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread Pavel Stehule
Hi

I returned to this issue and maybe I found a root issue. It is PL/pgSQL
implicit IO cast

Original text:

postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
if 0=0 then
n = SQRT (f);
end if;
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;
NOTICE:  Result = 3162.28
DO
Time: 31988.720 ms

Little bit modified

postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
if 0=0 then
n = SQRT (f)::real;
end if;
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;
NOTICE:  Result = 3162.28
DO
Time: 9660.592 ms

It is 3x faster

there is invisible IO conversion from double precision::real via libc
vfprintf

https://github.com/okbob/plpgsql_check/ can raise a performance warning in
this situation, but we cannot do too much now without possible breaking
compatibility

Regards

Pavel


2014-08-05 16:02 GMT+02:00 Roberto Mello roberto.me...@gmail.com:

 On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner kgri...@ymail.com wrote:
 
  Since that is outside the loop, the difference should be nominal;

 Apologies. I misread on my phone and though it was within the loop.

  and in a quick test it was.  On the other hand, reducing the
  procedural code made a big difference.

 snip

  test=# DO LANGUAGE plpgsql $$ DECLARE n real;
  BEGIN
PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
  END $$;
  DO
  Time: 3916.815 ms

 That is a big difference. Are you porting a lot of code from PL/SQL,
 and therefore evaluating the performance difference of running this
 code? Or is this just a general test where you wish to assess the
 performance difference?

 PL/pgSQL could definitely use some loving, as far as optimization
 goes, but my feeling is that it hasn't happened because there are
 other suitable backends that give the necessary flexibility for the
 different use cases.

 Roberto


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread Pavel Stehule
Hi

this code is +/- equal to Oracle (it should be eliminate a useless code)

postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
--if 0=0 then
n = SQRT (f)::real;
--end if;
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;
NOTICE:  Result = 3162.28
DO
Time: 5787.797 ms



2014-08-06 21:41 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I returned to this issue and maybe I found a root issue. It is PL/pgSQL
 implicit IO cast

 Original text:

 postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;

 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 if 0=0 then
 n = SQRT (f);
 end if;
 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;
 NOTICE:  Result = 3162.28
 DO
 Time: 31988.720 ms

 Little bit modified

 postgres=# DO LANGUAGE plpgsql $$ DECLARE n real;

 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 if 0=0 then
 n = SQRT (f)::real;
 end if;

 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;
 NOTICE:  Result = 3162.28
 DO
 Time: 9660.592 ms

 It is 3x faster

 there is invisible IO conversion from double precision::real via libc
 vfprintf

 https://github.com/okbob/plpgsql_check/ can raise a performance warning
 in this situation, but we cannot do too much now without possible breaking
 compatibility

 Regards

 Pavel


 2014-08-05 16:02 GMT+02:00 Roberto Mello roberto.me...@gmail.com:

 On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner kgri...@ymail.com wrote:
 
  Since that is outside the loop, the difference should be nominal;

 Apologies. I misread on my phone and though it was within the loop.

  and in a quick test it was.  On the other hand, reducing the
  procedural code made a big difference.

 snip

  test=# DO LANGUAGE plpgsql $$ DECLARE n real;
  BEGIN
PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
  END $$;
  DO
  Time: 3916.815 ms

 That is a big difference. Are you porting a lot of code from PL/SQL,
 and therefore evaluating the performance difference of running this
 code? Or is this just a general test where you wish to assess the
 performance difference?

 PL/pgSQL could definitely use some loving, as far as optimization
 goes, but my feeling is that it hasn't happened because there are
 other suitable backends that give the necessary flexibility for the
 different use cases.

 Roberto


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





Re: [HACKERS] Minmax indexes

2014-08-06 Thread Nicolas Barbier
2014-08-06 Claudio Freire klaussfre...@gmail.com:

 So, I like blockfilter a lot. I change my vote to blockfilter ;)

+1 for blockfilter, because it stresses the fact that the physical
arrangement of rows in blocks matters for this index.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread James Cloos
 ST == Shaun Thomas stho...@optionshouse.com writes:

ST That said, the documentation here says FLOAT4 is an alias for REAL,
ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than
ST FLOAT8, which is an alias for DOUBLE PRECISION.

There are some versions of glibc where doing certain math on double is
faster than doing it on float, depending on how things are compiled.

Maybe this is one of them?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-06 Thread Robert Haas
On Tue, Aug 5, 2014 at 3:54 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote:
 Per your other email, here's the patch again; hopefully I've got the
 right stuff in the file this time.

 Your patch looks fine to me. I recommend committing patch 1 with these
 additions. I can't think of any reason to rehash the 2012 discussion.

I've committed the patch I posted yesterday.  I did not see a good
reason to meld that together in a single commit with the first of the
patches you posted; I'll leave you to revise that patch to conform
with this approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing PG_RETURN_UINT16

2014-08-06 Thread Robert Haas
On Wed, Aug 6, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep m.kn...@web.de wrote:
 I’m missing the PG_RETURN_UINT16 macro in fmgr.h
 Since we already have the PG_GETARG_UINT16 macro
 I guess it makes sense to to have it.

 here is the trivial patch for it.

 I see no reason not to add this.  Anyone else want to object?

 +1 to add that.

 What about backpatch to 9.4? This is very simple change and there seems to
 be no reason to wait for it until 9.5.

Well, that's true.  But on the other hand, if someone is wanting to
write code that will compile with multiple PostgreSQL versions,
they're probably going to add an #ifdef for this anyway, so I don't
see much of a reason to think that will help very many people in
practice.

I've committed this, but just to master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread Pavel Stehule
2014-08-06 22:07 GMT+02:00 James Cloos cl...@jhcloos.com:

  ST == Shaun Thomas stho...@optionshouse.com writes:

 ST That said, the documentation here says FLOAT4 is an alias for REAL,
 ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than
 ST FLOAT8, which is an alias for DOUBLE PRECISION.

 There are some versions of glibc where doing certain math on double is
 faster than doing it on float, depending on how things are compiled.

 Maybe this is one of them?


no

It is plpgsql issue only. PL/pgSQL uses a generic cast via serialization to
string and new parsing

It doesn't use a effective libc casting functions.

see
https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c
function

exec_cast_value




 -JimC
 --
 James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Index-only scans for GIST

2014-08-06 Thread Heikki Linnakangas

On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:

Hi, hackers!
I work on a GSoC project Index-only scans for GIST
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.


Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]

 ListCell *tmpPageData = so-curPageData;
 ^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to 
always return the same rows, hence they're not very good regression test 
cases. I'd suggest using more restricting WHERE clauses, so that each 
query only returns a handful of rows.


* What's the reason for turning GISTScanOpaqueData.pageData from an 
array to a List?


* I think it's leaking memory, in GIST scan context. I tested this with 
a variant of the regression tests:


insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 
0.05*i)),
 point(0.05*i, 0.05*i) FROM generate_series(0, 
1000) as i;

CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p @ box(point(0,0), 
point(999,999)) and length(p::text)  10;


while the final query runs, 'top' shows constantly increasing memory usage.


It includes index-only scans for multicolumn GIST and new regression test.
Fetch() method is realized for box and point opclasses.


Can we have Fetch functions for all the datatypes in btree_gist contrib 
module, please? Do other contrib modules contain GiST opclasses that 
could have Fetch functions?



Documentation is not updated yet, but I'm going to do it till the end of
GSoC.

I've got one question about query with OR condition. It is the last query
in regression test gist_indexonly. It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
It seems to depend on build_paths_for_OR
http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2
function.
But I couldn't understand how.


The query is:

select * from gist_tbl
where b @ box(point(5,5),  point(6,6))
or p @ box(point(0,0),  point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan 
can only return rows based on one key. In this case, you need to do two 
scans, and then return the rows returned by either scan, removing 
duplicates. A bitmap scan is possible, because it can remove the 
duplicates, but the planner can't produce a plain index scan plan that 
would do the same.


A common trick when that happens in a real-world application is to 
re-write the query using UNION:


select * from gist_tbl
where b @ box(point(5,5),  point(6,6))
UNION
select * from gist_tbl
where p @ box(point(0,0),  point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR:  could not identify an equality operator for type box
LINE 1: select * from gist_tbl
   ^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_common_type()'s behavior doesn't match the documentation

2014-08-06 Thread Bruce Momjian

Any progress on this?

---

On Sat, Nov 30, 2013 at 12:43:39PM -0500, Tom Lane wrote:
 In our fine manual, at
 http://www.postgresql.org/docs/devel/static/typeconv-union-case.html
 it's claimed that the nontrivial parts of UNION type resolution
 work like this:
 
   4. Choose the first non-unknown input type which is a preferred type in
   that category, if there is one.
 
   5. Otherwise, choose the last non-unknown input type that allows all the
   preceding non-unknown inputs to be implicitly converted to it. (There
   always is such a type, since at least the first type in the list must
   satisfy this condition.)
 
 This appears to have only the vaguest of resemblances to what
 select_common_type() actually does, which is to make a single
 pass over the inputs in which it does this:
 
 /*
  * take new type if can coerce to it implicitly but not the
  * other way; but if we have a preferred type, stay on it.
  */
 
 Thus for example there's a surprising inconsistency between
 these cases:
 
 regression=# select pg_typeof(t) from (select 'a'::text union select 
 'b'::char(1)) s(t);
  pg_typeof 
 ---
  text
  text
 (2 rows)
 
 regression=# select pg_typeof(t) from (select 'a'::char(1) union select 
 'b'::text) s(t);
  pg_typeof 
 ---
  character
  character
 (2 rows)
 
 I think that at the very least, we ought to prefer preferred types,
 the way the manual claims.  I'm less certain about whether step 5
 is ideal as written.
 
 This came up because some of my Salesforce colleagues were griping about
 the fact that UNION isn't commutative.  They argue that the type
 resolution behavior ought not be sensitive at all to the ordering of the
 inputs.  I'm not sure we can achieve that in general, but the current
 approach certainly seems more order-sensitive than it oughta be.
 
 Some trolling in the git history says that the last actual change in
 this area was in my commit b26dfb95222fddd25322bdddf3a5a58d3392d8b1 of
 2002-09-18, though it appears the documentation has been rewritten more
 recently.  It's a bit scary to be proposing to change behavior that's been
 stable for eleven years, but ...
 
 Thoughts?
 
   regards, tom lane
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_common_type()'s behavior doesn't match the documentation

2014-08-06 Thread David G Johnston
Tom Lane-2 wrote
 In our fine manual, at
 http://www.postgresql.org/docs/devel/static/typeconv-union-case.html
 it's claimed that the nontrivial parts of UNION type resolution
 work like this:
 
   4. Choose the first non-unknown input type which is a preferred type in
   that category, if there is one.
 
   5. Otherwise, choose the last non-unknown input type that allows all the
   preceding non-unknown inputs to be implicitly converted to it. (There
   always is such a type, since at least the first type in the list must
   satisfy this condition.)
 
 This came up because some of my Salesforce colleagues were griping about
 the fact that UNION isn't commutative.  They argue that the type
 resolution behavior ought not be sensitive at all to the ordering of the
 inputs.  I'm not sure we can achieve that in general, but the current
 approach certainly seems more order-sensitive than it oughta be.

4. Use the preferred type for whatever category all inputs share (per 3). 
Per 1 this is only used if at least one input does not agree.

5. No longer needed

6. Stays the same

It is possible for a result type to not match any of the input types but if
you want to be commutative this would have to be allowed.

You could add a majority rules condition rules before 4 and punt if there
is no one dominate type.

Should #1 repeat after flattening domains to their base types?

I would probably logically place 2 before 1 since if everything is unknown
nothing else matters.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/select-common-type-s-behavior-doesn-t-match-the-documentation-tp5780985p5813963.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-08-06 Thread Pavel Stehule
Hello

updated version patch in attachment

2014-08-05 13:31 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Wed, Jul 16, 2014 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
 
 
  2014-07-15 12:07 GMT+02:00 Fujii Masao masao.fu...@gmail.com:
 
  On Tue, Jul 15, 2014 at 7:01 PM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
  
  
  
   2014-07-15 11:29 GMT+02:00 Fujii Masao masao.fu...@gmail.com:
  
   On Thu, Jul 10, 2014 at 9:56 PM, Pavel Stehule
   pavel.steh...@gmail.com
   wrote:
Hello
   
here is a proposed patch - autocomplete for known psql variable
content
  
   Even after applying the patch, the following psql variables were not
   displayed
   on the tab-completion of \set command.
  
   HISTFILE
   HISTSIZE
   HOST
   IGNOREEOF
   LASTOID
  
   I'm not sure if it's worth displaying all of them on the
 tab-completion
   of
   \set
   because it seems strange to change some of them by \set command, for
   example
   HOST, though.
  
  
   For these variables are not default - so doesn't exist and cannot be
   completed by default.
  
   there are two fix:
  
   a) fix autocomplete source - preferred
 
  +1
 
 
  here is the patch

 Thanks for the patch!

 I got the following compiler warnings.

 tab-complete.c:4155: warning: assignment discards qualifiers from
 pointer target type
 tab-complete.c:4166: warning: assignment discards qualifiers from
 pointer target type


fixed


 +FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST,
 IGNOREOFF,

 Typo: IGNOREOFF - IGNOREEOF


fixed


 +/* all psql known variables are included in list by default */
 +for (known_varname = known_varnames; *known_varname; known_varname++)
 +varnames[nvars++] = pg_strdup(*known_varname);

 Don't we need to append both prefix and suffix to even known variables?


??? I am not sure if I understand well - probably system read only
variables as DBNAME, USER, VERSION should not be there



 +else if (strcmp(prev2_wd, \\set) == 0)
 +{
 +if (strcmp(prev_wd, AUTOCOMMIT) == 0)

 ISTM that some psql variables like IGNOREEOF are not there. Why not?


yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL,
HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER,  VERSION

There are more reasons:

* paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT,
HISTSIZE, ..

* variable is pseudo read only  - change has not any effect: DBNAME,
ENCODING, VERSION

Regards

Pavel



 Regards,

 --
 Fujii Masao



Re: [HACKERS] plpgsql.extra_warnings='num_into_expressions'

2014-08-06 Thread Marko Tiikkaja

On 7/21/14, 10:56 PM, I wrote:

Here's a patch which allows you to notice those annoying bugs with INTO
slightly more quickly.  Adding to the next commit phest.


New version, fixed bugs with set operations and VALUES lists.


.marko
*** a/doc/src/sgml/plpgsql.sgml
--- b/doc/src/sgml/plpgsql.sgml
***
*** 4719,4725  a_output := a_output || $$ if v_$$ || referrer_keys.kind || 
$$ like '$$
varnameplpgsql.extra_errors/ for errors. Both can be set either to
a comma-separated list of checks, literalnone/ or literalall/.
The default is literalnone/. Currently the list of available checks
!   includes only one:
variablelist
 varlistentry
  termvarnameshadowed_variables/varname/term
--- 4719,4725 
varnameplpgsql.extra_errors/ for errors. Both can be set either to
a comma-separated list of checks, literalnone/ or literalall/.
The default is literalnone/. Currently the list of available checks
!   is as follows:
variablelist
 varlistentry
  termvarnameshadowed_variables/varname/term
***
*** 4729,4734  a_output := a_output || $$ if v_$$ || referrer_keys.kind || 
$$ like '$$
--- 4729,4744 
   /para
  /listitem
 /varlistentry
+varlistentry
+ termvarnamenum_into_expressions/varname/term
+ listitem
+  para
+   When possible, checks that the number of expressions specified in the
+   SELECT or RETURNING list of a query matches the number expected by the
+   INTO target.  This check is done on a best effort basis.
+  /para
+ /listitem
+/varlistentry
/variablelist
  
The following example shows the effect of varnameplpgsql.extra_warnings/
*** a/src/pl/plpgsql/src/pl_gram.y
--- b/src/pl/plpgsql/src/pl_gram.y
***
*** 22,27 
--- 22,28 
  #include parser/scanner.h
  #include parser/scansup.h
  #include utils/builtins.h
+ #include nodes/nodefuncs.h
  
  
  /* Location tracking support --- simpler than bison's default */
***
*** 97,103  static  PLpgSQL_row *make_scalar_list1(char 
*initial_name,

   PLpgSQL_datum *initial_datum,

   int lineno, int location);
  staticvoid check_sql_expr(const char *stmt, int 
location,
!   
int leaderlen);
  staticvoid plpgsql_sql_error_callback(void *arg);
  staticPLpgSQL_type*parse_datatype(const char *string, int 
location);
  staticvoid check_labels(const char *start_label,
--- 98,104 

   PLpgSQL_datum *initial_datum,

   int lineno, int location);
  staticvoid check_sql_expr(const char *stmt, int 
location,
!   
int leaderlen, PLpgSQL_row *check_row);
  staticvoid plpgsql_sql_error_callback(void *arg);
  staticPLpgSQL_type*parse_datatype(const char *string, int 
location);
  staticvoid check_labels(const char *start_label,
***
*** 106,111  static void check_labels(const char 
*start_label,
--- 107,115 
  staticPLpgSQL_expr*read_cursor_args(PLpgSQL_var *cursor,

  int until, const char *expected);
  staticList*read_raise_options(void);
+ staticboolfind_a_star_walker(Node *node, void 
*context);
+ staticint tlist_result_column_count(Node 
*stmt);
+ 
  
  %}
  
***
*** 1438,1444  for_control  : for_variable K_IN

PLpgSQL_stmt_fori   *new;
  
/* Check first 
expression is well-formed */
!   
check_sql_expr(expr1-query, expr1loc, 7);
  
/* Read and 
check the second one */
expr2 = 
read_sql_expression2(K_LOOP, K_BY,
--- 1442,1448 

PLpgSQL_stmt_fori   *new;
  
/* Check first 
expression is well-formed */
!   
check_sql_expr(expr1-query, expr1loc, 7, NULL);
  

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-06 Thread Peter Geoghegan
On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the patch I posted yesterday.  I did not see a good
 reason to meld that together in a single commit with the first of the
 patches you posted; I'll leave you to revise that patch to conform
 with this approach.

Okay. Attached is the same patch set, rebased on top on your commit
with appropriate amendments.

BTW, I haven't added any of the same portability measures that the
existing strxfrm() selfuncs.c caller has. Since Windows can only use
the optimization with the C locale, I don't believe we're affected,
although the link
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99694;
is now dead (even if you have a Microsoft account), so I won't swear
that Visual Studio 2005 is unaffected in the C locale case. However,
only VS 2008 and later versions support 64-bit builds, and only 64-bit
builds support abbreviated keys within sort support for text, so it
doesn't matter. As for the Solaris bugs that convert_string_datum()
also refers to, well, I just don't have much sympathy for that case.
If a version of Solaris that was old in 2003 still has a buggy
strxfrm() implementation and wants to use Postgres 9.5, too bad. My
usage of strxfrm() is the pattern that the standard expects. The idea
that we should always do a dry-run, where strxfrm() is called with a
NULL pointer to check how much memory is required, just because
somebody's strxfrm() is so badly written as to feel entitled to write
past the end of my buffer is just absurd. That should probably be
removed from the existing convert_string_datum() strxfrm() call site
too. I suspect that no Oracle-supported version of Solaris will be
affected, but if I'm wrong, that's what we have a buildfarm for.

The nodeAgg.c FIXME item remains. I've added a new TODO comment which
suggests that we investigate more opportunistic attempts at getting
away with a cheap memcmp() that indicates equality. That continues to
only actually occur when the sort support comparator is called as a
tie-breaker.

-- 
Peter Geoghegan
From 13be5686aaa894381e70abd5532964c73c11b5b5 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Sat, 2 Aug 2014 15:39:28 -0700
Subject: [PATCH 2/2] Abbreviated sortsupport keys

Add a new infrastructure to sortsupport, and add a single client of this new
infrastructure, the text sortsupport routine.  This infrastructure allows
opclasses whose sortsupport routines support this additional, optional new
capability to provide abbreviated representations of Datums in advance of
sorting, generated through an opclass-specific encoding scheme.  An abbreviated
representation is typically pass by value, and the capability is thought to
mostly be useful for pass by reference types, but there are no hard
requirements on representation.

Opclasses that support this new capability provide a special abbreviation-aware
comparator (in addition to the usual, ordinary sortsupport fmgr-eliding
comparator).  This comparator returns a value less than, equal to, or greater
than 0, in a manner similar to the usual fmgr-eliding comparator (which is to
say, a manner similar to any btree support function 1).  However, there is an
important difference:  sortsupport clients such as tuplesort interpret the
value 0 as inconclusive.  Inconclusive comparisons obligate these clients to
perform a tie-breaker ordinary comparison in respect of the same attribute.

By generating abbreviated keys, the optimization can speed up internal sorts
considerably.  Because of the high cost of CPU cache stalls when sorting, cache
efficiency is a crucial consideration in engineering sorting algorithms and
related infrastructure.  Pass by value representations have much greater
temporal and spatial locality than pass by reference representations.
Furthermore, the abbreviation-aware comparisons can themselves be far cheaper
than an authoritative comparison, if the abbreviation encoding scheme can
produce a reductionist representation sufficient to resolve most comparisons.
Encoding may be expensive, but if an encoding process can occur N times, rather
than having essentially the same encoding process occur N log N times, that can
be a highly effective optimization.

The expense of encoding may well be problematic if no benefit can be derived
from abbreviated keys during sorting, though.  Opclasses supporting abbreviated
keys also provide an abort callback, which hints the number of rows to be
sorted in total, and progress so far.  This can either abort the abbreviation
encoding process entirely, or give the encoding scheme the ability to adapt its
encoding strategy in order to maximize the number of comparisons that are
resolved with abbreviated keys.  HyperLogLog, a cardinality estimator algorithm
is added to give clients of the new sortsupport infrastructure a principled way
to estimate the usefullness of encoding.

Currently, only the MinimalTuple tuplesort case uses 

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-06 Thread Jeff Janes
On Wed, Aug 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  I think you missed one of the regression tests, see attached

 Woops.  Thanks, committed.



Thanks.

It needs to go into 9_4_STABLE as well.

Cheers,

Jeff


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-06 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch n...@leadboat.com wrote:
 However, with work_mem set low enough to get an external sort, the
 difference is more interesting. If I set work_mem to 10 MB, then the
 query takes about 10.7 seconds to execute with a suitably patched
 Postgres. Whereas on master, it consistently takes a full 69 seconds.
 That's the largest improvement I've seen so far, for any case.

 Comparator cost affects external sorts more than it affects internal sorts.
 When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
 the internal sort profile and 10.26% of the external sort profile.

I took another look at this.

If I run dd if=/dev/zero of=/home/pg/test, I can see with iotop that
that has about 45 M/s total disk write fairly sustainably, with
occasional mild blips during write-back. This is a Crucial mobile SSD,
with an ext4/lvm file system, and happens to be what is close at hand.

If I run the same external sorting query with a patched Postgres, I
see 24 M/s total disk write throughout. With master, it's about 6 M/s,
and falls to 0 during the final 36-way merge. I'm not sure if the same
thing occurs with patched during the final merge, because the
available resolution isn't good enough to be able to tell. Anyway,
it's pretty clear that when patched, the external sort on text is, if
not totally I/O bound, much closer to being I/O bound. A good external
sort algorithm should be at least close to totally I/O bound. This
makes I/O parallelism a viable strategy for speeding up sorts, where
it might not otherwise be. I've heard of people using a dedicated temp
tablespace disk with Postgres to speed up sorting, but that always
seemed to be about reducing the impact on the heap filesystem, or vice
versa. I've never heard of anyone using multiple disks to speed up
sorting with Postgres (which I don't presume means it hasn't been done
at least somewhat effectively). However, with external sort benchmarks
(like http://sortbenchmark.org), using I/O parallelism strategically
seems to be table stakes for external sort entrants.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reporting the commit LSN at commit time

2014-08-06 Thread Craig Ringer
Hi all

To support transparent client-side failover in BDR, it's necessary to
know what the LSN of a node was at the time a transaction committed and
keep track of that in the client/proxy.

I'm thinking about adding a new message type in the protocol that gets
sent immediately before CommandComplete, containing the LSN of the
commit. Clients would need to enable the sending of this message with a
GUC that they set when they connect, so it doesn't confuse clients that
aren't expecting it or aware of it.

Is this something you can see being useful for other non-BDR purposes?
Are there any obvious issues with this?


Clients can always follow up with a second query to get the xlog
position, after commit, but that's potentially slow and has a race that
might cause a client to wait longer than it has to after fail-over to a
different node. That's why having the server report it automatically
seems useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning performance: cache stringToNode() of pg_constraint.ccbin

2014-08-06 Thread Bruce Momjian
On Sun, Jan 12, 2014 at 12:53:40PM -0500, Noah Misch wrote:
 On Sat, Jan 11, 2014 at 02:10:01PM -0500, Bruce Momjian wrote:
  On Mon, Jun  3, 2013 at 03:07:27PM -0400, Noah Misch wrote:
   A colleague, Korry Douglas, observed a table partitioning scenario where
   deserializing pg_constraint.ccbin is a hot spot.  The following test 
   case, a
   simplification of a typical partitioning setup, spends 28% of its time in
   stringToNode() and callees thereof:
  
  Noah, what is the status on this?
 
 Further study revealed a defect in the patch's memory management, and I have
 not gotten around to correcting that.

I talked to Noah and he can't continue on this item.  Can someone else
work on it?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Mitsumasa KONDO
2014-08-06 23:38 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


  Three different modulo operators seems like a lot for a language that
 doesn't even have a real expression syntax, but I'll yield to whatever
 the consensus is on this one.


 Here is a third simpler patch which only implements the Knuth's modulo,
 where the remainder has the same sign as the divisor.

 I would prefer this version 3 (one simple modulo based on Knuth
 definition) or if there is a problem version 2 (all 3 modulos). Version 1
 which provides a modulo compatible with C  SQL is really useless to me.

I like version 3, it is simple and practical. And it's enough in pgbench.
If someone wants to use other implementation of modulo algorithm, he just
changes his source code.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] postgresql.auto.conf and reload

2014-08-06 Thread Amit Kapila
On Wed, Aug 6, 2014 at 11:39 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao masao.fu...@gmail.com
wrote:
  On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Fujii Masao masao.fu...@gmail.com writes:
  The patch chooses the last settings for every parameters and ignores
the
  former settings. But I don't think that every parameters need to be
processed
  this way. That is, we can change the patch so that only PGC_POSTMASTER
  parameters are processed that way. The invalid settings in the
parameters
  except PGC_POSTMASTER can be checked by pg_ctl reload as they are now.
  Also this approach can reduce the number of comparison to choose the
  last setting, i.e., n in O(n^2) is the number of uncommented
*PGC_POSTMASTER*
  parameters (not every parameters). Thought?
 
  I don't find that to be a particularly good idea.  In the first place,
  it introduces extra complication and a surprising difference in the
  behavior of different GUCs.  In the second place, I thought part of the
  point of this patch was to suppress log messages complaining about
  invalid values that then weren't actually used for anything.  That
issue
  exists just as much for non-POSTMASTER variables.  (IOW, value cannot
  be changed now is not the only log message we're trying to suppress.)
 
  Yep, sounds reasonable. This makes me think that we can suppress
  such invalid message of the parameters which are actually not used
  at not only conf file reload but also *postmaster startup*. That's
another
  story, though. Anyway, barring any objection, I will commit Amit's
patch.

 Applied the slightly-modified version!

Thanks.  There is a commitfest entry [1] for this patch, do you
want some thing more to be addressed or shall we mark that as
committed.

[1]
https://commitfest.postgresql.org/action/patch_view?id=1500


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] postgresql.auto.conf and reload

2014-08-06 Thread Amit Kapila
On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 
  The reason is that during startup DataDir is not set by the time server
  processes config file due to which we process .auto.conf file in second
  pass.  I think ideally it should ignore the invalid setting in such a
case
  as it does during reload, however currently there is no good way to
  process .auto.conf  incase DataDir is not set, so we handle it
separately
  in second pass.

 What about picking up only data_directory at the first pass?

I think that will workout and for that I think we might need to add
few checks during ProcessConfigFile.  Do you want to address it
during parsing of config file or during setting of values?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_export_snapshot on standby side

2014-08-06 Thread Fujii Masao
On Thu, Aug 7, 2014 at 2:17 AM, Bruce Momjian br...@momjian.us wrote:

 Seems we still have not addressed this.

Thanks for the reminder :) I'm not sure if I have time to work on this
for a while. If anyone is interested in this, please feel free to try it.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-06 Thread Fujii Masao
On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 08/06/2014 08:39 PM, Fujii Masao wrote:

 Hi,

 The WAL files that pg_receivexlog writes will not be re-read soon
 basically,
 so we can advise the OS to release any cached pages when WAL file is
 closed. I feel inclined to change pg_receivexlog that way. Thought?


 -1. The OS should be smart enough to not thrash the cache by files that are
 written sequentially and never read.

Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe not,
so I was thinking that posix_fadvise is called when the server closes WAL file.

 If we go down this path, we'd need to
 sprinkle posix_fadvises into many, many places.

Yes, that's valid concern. But if we can prove that adding posix_fadvise to
a certain place can improve the performance well, I'm inclined to do that.

 Anyway, who are we to say that they won't be re-read soon? You might e.g
 have a secondary backup site where you copy the files received by
 pg_receivexlog, as soon as they're completed.

So whether posix_fadvise is called or not needs to be exposed as an
user-configurable option. We would need to measure how useful exposing
that is, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-08-06 Thread Fujii Masao
On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 updated version patch in attachment

Thanks! But ISTM you forgot to attached the patch.

 +/* all psql known variables are included in list by default */
 +for (known_varname = known_varnames; *known_varname; known_varname++)
 +varnames[nvars++] = pg_strdup(*known_varname);

 Don't we need to append both prefix and suffix to even known variables?


 ??? I am not sure if I understand well - probably system read only
 variables as DBNAME, USER, VERSION should not be there

I had that question because complete_from_variables() is also called by the
tab-completion of \echo : and it shows half-baked variables list. So
I thought probably we need to change complete_from_variables() more to
fix the problem.

 +else if (strcmp(prev2_wd, \\set) == 0)
 +{
 +if (strcmp(prev_wd, AUTOCOMMIT) == 0)

 ISTM that some psql variables like IGNOREEOF are not there. Why not?


 yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL,
 HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER,  VERSION

 There are more reasons:

 * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT,
 HISTSIZE, ..

 * variable is pseudo read only  - change has not any effect: DBNAME,
 ENCODING, VERSION

So HISTCONTROL should be there because it doesn't have such reasons at all?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-06 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 9:32 PM, Peter Geoghegan p...@heroku.com wrote:
 I knew that I'd heard that at least once. Apparently some other
 database systems have external sorts that tend to be faster than
 equivalent internal sorts. I'd guess that that is an artifact of their
 having a substandard internal sort, though.

This *almost* applies to patched Postgres if you pick a benchmark that
is very sympathetic to my patch. To my surprise, work_mem = '10MB'
(which results in an external tape sort) is sometimes snapping at the
heels of a work_mem = '5GB' setting (which results in an in-memory
quicksort).

I have a 338 MB table, that consists or a single text column of 8 byte
strings strings, with high cardinality. I ran VACUUM FREEZE, and took
all the usual precautions of that kind. On the test table n_distinct =
-1, and there is no discernible physical/logical correlation.

The external sort case stabilized as follows:

LOG:  duration: 9731.776 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 9742.948 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 9733.918 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;

The in-memory case stabilized as follows:

LOG:  duration: 0.059 ms  statement: set work_mem = '5GB';
LOG:  duration: 9665.731 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 9602.841 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 9609.107 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;

FWIW, master performed as follows with work_mem = '10MB':

LOG:  duration: 60456.943 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 60368.987 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 61223.942 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;

And master did quite a lot better with work_mem = '5GB', in a way that
fits with my prejudices about how quicksort is supposed to perform
relative to tape sort:

LOG:  duration: 0.060 ms  statement: set work_mem = '5GB';
LOG:  duration: 41697.659 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 41755.496 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;
LOG:  duration: 41883.888 ms  statement: select * from (select * from
besttest order by tt offset 1000) i;

work_mem = '10MB' continues to be the external sort sweet spot for
this hardware, for whatever reason - I can add a few seconds to
execution time by increasing or decreasing the setting a bit. I'm
using an Intel Core i7-3520M CPU @ 2.90GHz, with a /proc/cpuinfo
reported L3 cache size of 4096 KB. I have been very careful to take
into account power saving features throughout all
experimentation/benchmarking of this patch and previous abbreviated
key patches - failing to do so is a good way to end up with complete
garbage when investigating this kind of thing.

Anyway, I'm not sure what this tells us about quicksort and tape sort,
but I think there might be an interesting and more general insight to
be gained here. I'd have thought that tape sort wastes memory
bandwidth by copying to operating system buffers to the extent that
things are slowed down considerably (this is after all a test
performed with lots of memory available, even when work_mem = '10
MB'). And even if that wasn't a significant factor, I'd expect
quicksort to win decisively anyway. Why does this happen?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers