Re: [HACKERS] postgresql.auto.conf and reload
On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: The patch chooses the last settings for every parameters and ignores the former settings. But I don't think that every parameters need to be processed this way. That is, we can change the patch so that only PGC_POSTMASTER parameters are processed that way. The invalid settings in the parameters except PGC_POSTMASTER can be checked by pg_ctl reload as they are now. Also this approach can reduce the number of comparison to choose the last setting, i.e., n in O(n^2) is the number of uncommented *PGC_POSTMASTER* parameters (not every parameters). Thought? I don't find that to be a particularly good idea. In the first place, it introduces extra complication and a surprising difference in the behavior of different GUCs. In the second place, I thought part of the point of this patch was to suppress log messages complaining about invalid values that then weren't actually used for anything. That issue exists just as much for non-POSTMASTER variables. (IOW, value cannot be changed now is not the only log message we're trying to suppress.) Yep, sounds reasonable. This makes me think that we can suppress such invalid message of the parameters which are actually not used at not only conf file reload but also *postmaster startup*. That's another story, though. Anyway, barring any objection, I will commit Amit's patch. Applied the slightly-modified version! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Robert, The issue is that there are 3 definitions of modulo, two of which are fine (Knuth floored division and Euclidian), and the last one much less useful. Alas, C (%) SQL (MOD) choose the bad definition:-( I really need any of the other two. The attached patch adds all versions, with % and mod consistent with their C and SQL unfortunate counterparts, and fmod and emod the sane ones. Three different modulo operators seems like a lot for a language that doesn't even have a real expression syntax, but I'll yield to whatever the consensus is on this one. I agree that it is overkill. In fact there is a link: if there was a real expression syntax, the remainder sign could be fixed afterwards, so the standard C/SQL version would do. If it is not available, the modulo operator must be right. If there is only one modulo added, I would rather have the Knuth version. However I was afraid that someone would object if % does not return the same result than the C/PostgreSQL versions (even if I think that nearly nobody has a clue about what % returns when arguments are negative), hence the 3 modulo version to counter this potential critic. But I would prefer just one version with the Knuth (or Euclidian) definitions. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Hello Alvaro, I wonder if it would be necessary to offer the division operator semantics corresponding to whatever additional modulo operator we choose to offer. That is, if we add emod, do we need ediv as well? I would make sense, however I do not need it, and I'm not sure of a use case where it would be needed, so I do not think that it is necessary. If it happens to be, it could be added then quite easily. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
On Wed, Aug 6, 2014 at 2:34 PM, furu...@pm.nttdata.co.jp wrote: I have improved the patch by making following changes: 1. Since stream_stop() was redundant, stream_stop() at the time of WAL file closing was deleted. 2. Change the Flash judging timing for the readability of source code. I have changed the Flash judging timing , from the continuous message after receiving to before the feedbackmassege decision of continue statement after execution. Thanks for the updated version of the patch! While reviewing the patch, I found that HandleCopyStream() is still long and which decreases the readability of the source code. So I feel inclined to refactor the HandleCopyStream() more for better readability. What about the attached refactoring patch? Sorry, I forgot to attached the patch in previous email. So attached. Thank you for the refactoring patch. I did a review of the patch. - break; /* ignore the rest of this XLogData packet */ + return true;/* ignore the rest of this XLogData packet */ For break statement at close of wal file, it is a return to true. It may be a behavior of continue statement. Is it satisfactory? Sorry I failed to see your point. In the original code, when we reach the end of WAL file and it's streaming stopping point, we break out of inner loop in the code block for processing XLogData packet. And then we goes back to top of outer loop in HandleCopyStream. ISTM that the refactored code also works the same way. Anyway, could you elaborate the problem? The walreceiver distributes XLogWalRcvProcessMsg and XLogWalRcvWrite, but isn't that division necessary? Not necessary, but I have no objection to the idea. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing PG_RETURN_UINT16
On Wed, Aug 6, 2014 at 4:47 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep m.kn...@web.de wrote: I’m missing the PG_RETURN_UINT16 macro in fmgr.h Since we already have the PG_GETARG_UINT16 macro I guess it makes sense to to have it. here is the trivial patch for it. I see no reason not to add this. Anyone else want to object? +1 to add that. What about backpatch to 9.4? This is very simple change and there seems to be no reason to wait for it until 9.5. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
For example, if we had reason to be concerned about *adversarial* inputs, I think that there is a good chance that our qsort() actually would be problematic to the point of driving us to prefer some generally slower alternative. That is an interesting point. Indeed, a database in general often stores user-supplied data, which may happen to be sorted for presentation purpose in an interface. Same thing occured with hashtable algorithms and was/is a way to do DOS attacks on web applications. I'm not sure whether the qsort version discussed here would apply on user-supplied data, though. If so, adding some randomness in the decision process would suffice to counter the adversarial input argument you raised. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: If so, adding some randomness in the decision process would suffice to counter the adversarial input argument you raised. This is specifically addressed by the paper. Indeed, randomly choosing a pivot is a common strategy. It won't fix the problem. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Aug 5, 2014 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote: This gradual approach looks good to me. And, if the additional compression algorithm like lz4 is always better than pglz for every scenarios, we can just change the code so that the additional algorithm is always used. Which would make the code simpler. Right. 3. Compressing one block vs all blocks: Andres suggested that compressing all backup blocks in one go may give us better compression ratio. This is worth trying. I'm wondering what would the best way to do so without minimal changes to the xlog insertion code. Today, we add more rdata items for backup block header(s) and backup blocks themselves (if there is a hole then 2 per backup block) beyond what the caller has supplied. If we have to compress all the backup blocks together, then one approach is to copy the backup block headers and the blocks to a temp buffer, compress that and replace the rdata entries added previously with a single rdata. Basically sounds reasonable. But, how does this logic work if there are multiple rdata and only some of them are backup blocks? My idea is to just make a pass over the rdata entries past the rdt_lastnormal element after processing the backup blocks and making additional entries in the chain. These additional rdata entries correspond to the backup blocks and their headers. So we can copy the rdata-data of these elements in a temp buffer and compress the entire thing in one go. We can then replace the rdata chain past the rdt_lastnormal with a single rdata with data pointing to the compressed data. Recovery code just needs to decompress this data the record header indicates that the backup data is compressed. Sure the exact mechanism to indicate if the data is compressed (and by which algorithm) can be worked out. If a hole is not copied to that temp buffer, ISTM that we should change backup block header so that it contains the info for a hole, e.g., location that a hole starts. No? AFAICS its not required if we compress the stream of BkpBlock and the block data. The current mechanism of constructing the additional rdata chain items takes care of hole anyways. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] postgresql.auto.conf and reload
On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jul 28, 2014 at 11:33 PM, Fujii Masao masao.fu...@gmail.com wrote: There is other side effect on this patch. With the patch, when reloading the configurartion file, the server cannot warm an invalid setting value if it's not the last setting of the parameter. This may cause problematic situation as follows. 1. ALTER SYSTEM SET work_mem TO '1024kB'; 2. Reload the configuration file --- success 3. Then, a user accidentally adds work_mem = '2048KB' into postgresql.conf The setting value '2048KB' is invalid, and the unit should be 'kB' instead of 'KB'. 4. Reload the configuration file --- success The invalid setting is ignored because the setting of work_mem in postgresql.auto.conf is preferred. So a user cannot notice that postgresql.conf has an invalid setting. 5. Failover on shared-disk HA configuration happens, then PostgreSQL fails to start up because of such an invalid setting. When PostgreSQL starts up, the last setting is preferred. But all the settings are checked. The reason is that during startup DataDir is not set by the time server processes config file due to which we process .auto.conf file in second pass. I think ideally it should ignore the invalid setting in such a case as it does during reload, however currently there is no good way to process .auto.conf incase DataDir is not set, so we handle it separately in second pass. What about picking up only data_directory at the first pass? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
- break; /* ignore the rest of this XLogData packet */ + return true;/* ignore the rest of this XLogData packet */ For break statement at close of wal file, it is a return to true. It may be a behavior of continue statement. Is it satisfactory? Sorry I failed to see your point. In the original code, when we reach the end of WAL file and it's streaming stopping point, we break out of inner loop in the code block for processing XLogData packet. And then we goes back to top of outer loop in HandleCopyStream. ISTM that the refactored code also works the same way. Anyway, could you elaborate the problem? I'm sorry. I was confused with the patch that I have to offer. It is necessary to exit the loop since the loop added to the continuously received the message if the patch. Refactor patch is no problem with the contents of the presentation. Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Enhancing pgbench parameter checking
Hi, It is pretty annoying that pgbench does not check parameter which should not be used with -i. I often type like: pgbench -c 10 -T 300 -S -i test and accidentally initialize pgbench database. This is pretty uncomfortable if the database is huge since initializing huge database takes long time. Why don't we check the case? Included is the patch to enhance the behavior of pgbench in this regard IMO. Here is a sample session after patching: $ ./pgbench -c 10 -T 300 -S -i test some parameters cannot be used in initialize mode Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index c0e5e24..d7a3f57 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -2520,6 +2520,8 @@ main(int argc, char **argv) char *filename = NULL; bool scale_given = false; + bool is_non_init_param_set = false; + CState *state; /* status of clients */ TState *threads; /* array of thread */ @@ -2599,12 +2601,14 @@ main(int argc, char **argv) break; case 'S': ttype = 1; +is_non_init_param_set = true; break; case 'N': ttype = 2; +is_non_init_param_set = true; break; case 'c': -nclients = atoi(optarg); +is_non_init_param_set = true; if (nclients = 0 || nclients MAXCLIENTS) { fprintf(stderr, invalid number of clients: %d\n, nclients); @@ -2629,6 +2633,7 @@ main(int argc, char **argv) #endif /* HAVE_GETRLIMIT */ break; case 'j': /* jobs */ +is_non_init_param_set = true; nthreads = atoi(optarg); if (nthreads = 0) { @@ -2637,9 +2642,11 @@ main(int argc, char **argv) } break; case 'C': +is_non_init_param_set = true; is_connect = true; break; case 'r': +is_non_init_param_set = true; is_latencies = true; break; case 's': @@ -2652,6 +2659,7 @@ main(int argc, char **argv) } break; case 't': +is_non_init_param_set = true; if (duration 0) { fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n); @@ -2665,6 +2673,7 @@ main(int argc, char **argv) } break; case 'T': +is_non_init_param_set = true; if (nxacts 0) { fprintf(stderr, specify either a number of transactions (-t) or a duration (-T), not both.\n); @@ -2681,12 +2690,14 @@ main(int argc, char **argv) login = pg_strdup(optarg); break; case 'l': +is_non_init_param_set = true; use_log = true; break; case 'q': use_quiet = true; break; case 'f': +is_non_init_param_set = true; ttype = 3; filename = pg_strdup(optarg); if (process_file(filename) == false || *sql_files[num_files - 1] == NULL) @@ -2696,6 +2707,8 @@ main(int argc, char **argv) { char *p; + is_non_init_param_set = true; + if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0') { fprintf(stderr, invalid variable definition: %s\n, optarg); @@ -2716,6 +2729,7 @@ main(int argc, char **argv) } break; case 'M': +is_non_init_param_set = true; if (num_files 0) { fprintf(stderr, query mode (-M) should be specifiled before transaction scripts (-f)\n); @@ -2731,6 +2745,7 @@ main(int argc, char **argv) } break; case 'P': +is_non_init_param_set = true; progress = atoi(optarg); if (progress = 0) { @@ -2745,6 +2760,8 @@ main(int argc, char **argv) /* get a double from the beginning of option value */ double throttle_value = atof(optarg); + is_non_init_param_set = true; + if (throttle_value = 0.0) { fprintf(stderr, invalid rate limit: %s\n, optarg); @@ -2764,6 +2781,7 @@ main(int argc, char **argv) index_tablespace = pg_strdup(optarg); break; case 4: +is_non_init_param_set = true; sample_rate = atof(optarg); if (sample_rate = 0.0 || sample_rate 1.0) { @@ -2776,6 +2794,7 @@ main(int argc, char **argv) fprintf(stderr, --aggregate-interval is not currently supported on Windows); exit(1); #else +is_non_init_param_set = true; agg_interval = atoi(optarg); if (agg_interval = 0) { @@ -2808,6 +2827,12 @@ main(int argc, char **argv) if (is_init_mode) { + if (is_non_init_param_set) + { + fprintf(stderr, some parameters cannot be used in initialize mode\n); + exit(1); + } + init(is_no_vacuum); exit(0); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Aug 5, 2014 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila amit.kapil...@gmail.com wrote: This essentially removes BgWriterDelay, but it's still mentioned in BgBufferSync(). Looking further, I see that with the patch applied, BgBufferSync() is still present in the source code but is no longer called from anywhere. Please don't submit patches that render things unused without actually removing them; it makes it much harder to see what you've changed. I realize you probably left it that way for testing purposes, but you need to clean such things up before submitting. Likewise, if you've rendered GUCs or statistics counters removed, you need to rip them out, so that the scope of the changes you've made is clear to reviewers. I have kept it just for the reason that if the basic approach is sounds reasonable/accepted, then I will clean it up. Sorry for the inconvenience, I didn't realized that it can be annoying for reviewer, I will remove all such code from patch in next version. A comparison of BgBufferSync() with BgBufferSyncAndMoveBuffersToFreelist() reveals that you've removed at least one behavior that some people (at least, me) will care about, which is the guarantee that the background writer will scan the entire buffer pool at least every couple of minutes. Okay, I will take care of this based on the conclusion of the other points in this mail. This is important because it guarantees that dirty data doesn't sit in memory forever. When the system becomes busy again after a long idle period, users will expect the system to have used the idle time to flush dirty buffers to disk. This also improves data recovery prospects if, for example, somebody loses their pg_xlog directory - there may be dirty buffers whose contents are lost, of course, but they won't be months old. b. New stats for number of buffers on freelist has been added, some old one's like maxwritten_clean can be removed as new logic for syncing buffers and moving them to free list doesn't use them. However I think it's better to remove them once the new logic is accepted. Added some new logs for info related to free list under BGW_DEBUG If I'm reading this right, the new statistic is an incrementing counter where, every time you update it, you add the number of buffers currently on the freelist. That makes no sense. I think using 'number of buffers currently on the freelist' and 'number of recently allocated buffers' for consecutive cycles, we can figure out approximately how many buffer allocations needs clock sweep assuming low and high threshold water marks are fixed. However there can be cases where it is not easy to estimate that number. I think what you should be counting is the number of allocations that are being satisfied from the free-list. Then, by comparing the rate at which that value is incrementing to the rate at which buffers_alloc is incrementing, somebody can figure out what percentage of allocations are requiring a clock-sweep run. Actually, I think it's better to flip it around: count the number of allocations that require an individual backend to run the clock sweep (vs. being satisfied from the free-list); call it, say, buffers_backend_clocksweep. We can then try to tune the patch to make that number as small as possible under varying workloads. This can give us clear idea to tune the patch, however we need to maintain 3 counters for it in code(recent_alloc (needed for current bgwriter logic) and other 2 suggested by you). Do you want to retain such counters in code or it's for kind of debug info for patch? d. Autotune the low and high threshold for freelist for various configurations. I think we need to come up with some kind of formula here rather than just a list of hard-coded constants. That was my initial intention as well and I have tried based on number of shared buffers like keeping threshold values as percentage of shared buffers but nothing could satisfy different kind of workloads. The current values I have choosen are based on experiments for various workloads at different thresholds. I have shown the lwlock_stats data for various loads based on current thresholds upthread. Another way could be to make them as config knobs and use the values as given by user incase it is provided by user else go with fixed values. There are other instances in code as well (one of them I remember offhand is in pglz_compress) where we use fixed values based on different sizes. And it definitely needs some comments explaining the logic behind the choices. Agreed, I shall improve them in next version of patch. Aside from those specific remarks, I think the elephant in the room is the question of whether it really makes sense to have one process which is responsible both for populating the free list and for writing buffers to disk. One problem, which I alluded to above under point (1), is
Re: [HACKERS] inherit support for foreign tables
(2014/07/01 11:10), Etsuro Fujita wrote: (2014/06/30 22:48), Tom Lane wrote: Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: Done. I think this is because create_foreignscan_plan() makes reference to attr_needed, which isn't computed for inheritance children. I wonder whether it isn't time to change that. It was coded like that originally only because calculating the values would've been a waste of cycles at the time. But this is at least the third place where it'd be useful to have attr_needed for child rels. +1 for calculating attr_needed for child rels. (I was wondering too.) I'll create a separate patch for it. Attached is a WIP patch for that. The following functions have been changed to refer to attr_needed: * check_index_only() * remove_unused_subquery_outputs() * check_selective_binary_conversion() I'll add this to the upcoming commitfest. If anyone has any time to glance at it before then, that would be a great help. Thanks, Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 799,806 check_selective_binary_conversion(RelOptInfo *baserel, } /* Collect all the attributes needed for joins or final output. */ ! pull_varattnos((Node *) baserel-reltargetlist, baserel-relid, ! attrs_used); /* Add all the attributes used by restriction clauses. */ foreach(lc, baserel-baserestrictinfo) --- 799,811 } /* Collect all the attributes needed for joins or final output. */ ! for (i = baserel-min_attr; i = baserel-max_attr; i++) ! { ! if (!bms_is_empty(baserel-attr_needed[i - baserel-min_attr])) ! attrs_used = ! bms_add_member(attrs_used, ! i - FirstLowInvalidHeapAttributeNumber); ! } /* Add all the attributes used by restriction clauses. */ foreach(lc, baserel-baserestrictinfo) *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *** *** 577,588 set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, childrel-has_eclass_joins = rel-has_eclass_joins; /* ! * Note: we could compute appropriate attr_needed data for the child's ! * variables, by transforming the parent's attr_needed through the ! * translated_vars mapping. However, currently there's no need ! * because attr_needed is only examined for base relations not ! * otherrels. So we just leave the child's attr_needed empty. */ /* * Compute the child's size. --- 577,585 childrel-has_eclass_joins = rel-has_eclass_joins; /* ! * Compute the child's attr_needed. */ + adjust_appendrel_attr_needed(rel, childrel, childRTE, appinfo); /* * Compute the child's size. *** *** 2173,2178 remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel) --- 2170,2176 { Bitmapset *attrs_used = NULL; ListCell *lc; + int i; /* * Do nothing if subquery has UNION/INTERSECT/EXCEPT: in principle we *** *** 2193,2204 remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel) * Collect a bitmap of all the output column numbers used by the upper * query. * ! * Add all the attributes needed for joins or final output. Note: we must ! * look at reltargetlist, not the attr_needed data, because attr_needed ! * isn't computed for inheritance child rels, cf set_append_rel_size(). ! * (XXX might be worth changing that sometime.) */ ! pull_varattnos((Node *) rel-reltargetlist, rel-relid, attrs_used); /* Add all the attributes used by un-pushed-down restriction clauses. */ foreach(lc, rel-baserestrictinfo) --- 2191,2205 * Collect a bitmap of all the output column numbers used by the upper * query. * ! * Add all the attributes needed for joins or final output. */ ! for (i = rel-min_attr; i = rel-max_attr; i++) ! { ! if (!bms_is_empty(rel-attr_needed[i - rel-min_attr])) ! attrs_used = ! bms_add_member(attrs_used, ! i - FirstLowInvalidHeapAttributeNumber); ! } /* Add all the attributes used by un-pushed-down restriction clauses. */ foreach(lc, rel-baserestrictinfo) *** a/src/backend/optimizer/path/indxpath.c --- b/src/backend/optimizer/path/indxpath.c *** *** 1768,1779 check_index_only(RelOptInfo *rel, IndexOptInfo *index) * way out. */ ! /* ! * Add all the attributes needed for joins or final output. Note: we must ! * look at reltargetlist, not the attr_needed data, because attr_needed ! * isn't computed for inheritance child rels. ! */ ! pull_varattnos((Node *) rel-reltargetlist, rel-relid, attrs_used); /* Add all the attributes used by restriction clauses. */ foreach(lc, rel-baserestrictinfo) --- 1768,1781 * way out. */ ! /* Collect all the attributes needed for joins or final output. */ ! for (i = rel-min_attr; i = rel-max_attr; i++) ! { ! if (!bms_is_empty(rel-attr_needed[i
Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.
On Tue, Aug 5, 2014 at 7:05 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: There are quite a few members added to the generic Path, Plan structures, whose use is is induced only through foreign scans. Each path now stores two sets of costs, one with parallelism and one without. The parallel values will make sense only when there is a foreign scan, which uses parallelism, in the plan tree. So, those costs are maintained unnecessarily or the memory for those members is wasted in most of the cases, where the tables involved are not foreign. Yeah, I don't think that's going to be acceptable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
On Wed, Aug 6, 2014 at 5:10 PM, furu...@pm.nttdata.co.jp wrote: - break; /* ignore the rest of this XLogData packet */ + return true;/* ignore the rest of this XLogData packet */ For break statement at close of wal file, it is a return to true. It may be a behavior of continue statement. Is it satisfactory? Sorry I failed to see your point. In the original code, when we reach the end of WAL file and it's streaming stopping point, we break out of inner loop in the code block for processing XLogData packet. And then we goes back to top of outer loop in HandleCopyStream. ISTM that the refactored code also works the same way. Anyway, could you elaborate the problem? I'm sorry. I was confused with the patch that I have to offer. It is necessary to exit the loop since the loop added to the continuously received the message if the patch. Refactor patch is no problem with the contents of the presentation. Okay, applied the patch. I heavily modified your patch based on the master that the refactoring patch has been applied. Attached is that modified version. Could you review that? Regards, -- Fujii Masao diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml index 7c50b01..c15776f 100644 --- a/doc/src/sgml/ref/pg_receivexlog.sgml +++ b/doc/src/sgml/ref/pg_receivexlog.sgml @@ -106,6 +106,21 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-F replaceable class=parameterinterval/replaceable/option/term + termoption--fsync-interval=replaceable class=parameterinterval/replaceable/option/term + listitem +para +Specifies the maximum time to issue sync commands to ensure the +received WAL file is safely flushed to disk, in seconds. The default +value is zero, which disables issuing fsyncs except when WAL file is +closed. If literal-1/literal is specified, WAL file is flushed as +soon as possible, that is, as soon as there are WAL data which has +not been flushed yet. +/para + /listitem + /varlistentry + + varlistentry termoption-v/option/term termoption--verbose/option/term listitem diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 5df2eb8..0b02c4c 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -371,7 +371,7 @@ LogStreamerMain(logstreamer_param *param) if (!ReceiveXlogStream(param-bgconn, param-startptr, param-timeline, param-sysidentifier, param-xlogdir, reached_end_position, standby_message_timeout, - NULL)) + NULL, 0)) /* * Any errors will already have been reported in the function process, diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 9640838..0b7af54 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -36,6 +36,7 @@ static char *basedir = NULL; static int verbose = 0; static int noloop = 0; static int standby_message_timeout = 10 * 1000; /* 10 sec = default */ +static int fsync_interval = 0; /* 0 = default */ static volatile bool time_to_abort = false; @@ -62,6 +63,8 @@ usage(void) printf(_(\nOptions:\n)); printf(_( -D, --directory=DIRreceive transaction log files into this directory\n)); printf(_( -n, --no-loop do not loop on connection lost\n)); + printf(_( -F --fsync-interval=INTERVAL\n + frequency of syncs to transaction log files (in seconds)\n)); printf(_( -v, --verbose output verbose messages\n)); printf(_( -V, --version output version information, then exit\n)); printf(_( -?, --help show this help, then exit\n)); @@ -330,7 +333,8 @@ StreamLog(void) starttli); ReceiveXlogStream(conn, startpos, starttli, NULL, basedir, - stop_streaming, standby_message_timeout, .partial); + stop_streaming, standby_message_timeout, .partial, + fsync_interval); PQfinish(conn); } @@ -360,6 +364,7 @@ main(int argc, char **argv) {port, required_argument, NULL, 'p'}, {username, required_argument, NULL, 'U'}, {no-loop, no_argument, NULL, 'n'}, + {fsync-interval, required_argument, NULL, 'F'}, {no-password, no_argument, NULL, 'w'}, {password, no_argument, NULL, 'W'}, {status-interval, required_argument, NULL, 's'}, @@ -389,7 +394,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nwWv, + while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv, long_options, option_index)) != -1) { switch (c) @@ -436,6 +441,15 @@ main(int argc, char **argv) case 'n': noloop = 1; break; + case 'F': + fsync_interval = atoi(optarg) * 1000; + if (fsync_interval -1000) + { +fprintf(stderr, _(%s:
[HACKERS] pgcrypto: PGP signatures
Hi hackers, Attached is a patch to add support for PGP signatures in encrypted messages into pgcrypto. Currently, the list of limitations is the following: - It only knows how to generate one signature per message. I don't see that as a problem. - If a message has been signed with multiple keys which have the same keyid as the one specified to verify the message, an error is returned. Naively, it seems that we should try all of them and return OK if even one of them matches, but that seems icky. - Only RSA signatures are supported. It wouldn't be too hard for someone familiar with DSA to add it in, but I'm not volunteering to do it. Personally I think supporting RSA is better than no support at all. As per usual, I'll also add this to the upcoming commitfest. Any feedback appreciated before that, of course. .marko *** a/contrib/pgcrypto/Makefile --- b/contrib/pgcrypto/Makefile *** *** 20,39 SRCS = pgcrypto.c px.c px-hmac.c px-crypt.c \ mbuf.c pgp.c pgp-armor.c pgp-cfb.c pgp-compress.c \ pgp-decrypt.c pgp-encrypt.c pgp-info.c pgp-mpi.c \ pgp-pubdec.c pgp-pubenc.c pgp-pubkey.c pgp-s2k.c \ ! pgp-pgsql.c MODULE_big= pgcrypto OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto ! DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = pgcrypto - cryptographic functions REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ $(CF_TESTS) \ crypt-des crypt-md5 crypt-blowfish crypt-xdes \ pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \ ! pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info EXTRA_CLEAN = gen-rtab --- 20,39 mbuf.c pgp.c pgp-armor.c pgp-cfb.c pgp-compress.c \ pgp-decrypt.c pgp-encrypt.c pgp-info.c pgp-mpi.c \ pgp-pubdec.c pgp-pubenc.c pgp-pubkey.c pgp-s2k.c \ ! pgp-sig.c pgp-pgsql.c MODULE_big= pgcrypto OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto ! DATA = pgcrypto--1.2.sql pgcrypto--1.0--1.1.sql pgcrypto--1.1--1.2.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = pgcrypto - cryptographic functions REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ $(CF_TESTS) \ crypt-des crypt-md5 crypt-blowfish crypt-xdes \ pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \ ! pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info pgp-sign EXTRA_CLEAN = gen-rtab *** a/contrib/pgcrypto/expected/pgp-encrypt.out --- b/contrib/pgcrypto/expected/pgp-encrypt.out *** *** 16,22 select pgp_sym_decrypt(pgp_sym_encrypt('Secret.', 'key'), expect-sess-key=0, expect-s2k-mode=3, expect-s2k-digest-algo=sha1, ! expect-compress-algo=0 '); pgp_sym_decrypt - --- 16,23 expect-sess-key=0, expect-s2k-mode=3, expect-s2k-digest-algo=sha1, ! expect-compress-algo=0, ! expect-digest-algo=sha512 '); pgp_sym_decrypt - *** *** 30,38 select pgp_sym_decrypt(pgp_sym_encrypt('Secret.', 'key'), expect-sess-key=1, expect-s2k-mode=0, expect-s2k-digest-algo=md5, ! expect-compress-algo=1 '); NOTICE: pgp_decrypt: unexpected cipher_algo: expected 4 got 7 NOTICE: pgp_decrypt: unexpected s2k_mode: expected 0 got 3 NOTICE: pgp_decrypt: unexpected s2k_digest_algo: expected 1 got 2 NOTICE: pgp_decrypt: unexpected use_sess_key: expected 1 got 0 --- 31,41 expect-sess-key=1, expect-s2k-mode=0, expect-s2k-digest-algo=md5, ! expect-compress-algo=1, ! expect-digest-algo=md5 '); NOTICE: pgp_decrypt: unexpected cipher_algo: expected 4 got 7 + NOTICE: pgp_decrypt: unexpected digest_algo: expected 1 got 10 NOTICE: pgp_decrypt: unexpected s2k_mode: expected 0 got 3 NOTICE: pgp_decrypt: unexpected s2k_digest_algo: expected 1 got 2 NOTICE: pgp_decrypt: unexpected use_sess_key: expected 1 got 0 *** a/contrib/pgcrypto/expected/pgp-info.out --- b/contrib/pgcrypto/expected/pgp-info.out *** *** 76,78 from encdata order by id; --- 76,151 FD0206C409B74875 (4 rows) + -- pgp_main_key_id + select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=1; + pgp_main_key_id + -- + 1C29BC0D18177364 + (1 row) + + select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=2; + pgp_main_key_id + -- + 48E9CD56FEA668DB + (1 row) + + select pgp_main_key_id(dearmor(pubkey)) from keytbl where id=3; + pgp_main_key_id + -- + 63F875F63F6774A0 + (1 row) + + select
Re: [HACKERS] A worst case for qsort
If so, adding some randomness in the decision process would suffice to counter the adversarial input argument you raised. This is specifically addressed by the paper. Indeed, randomly choosing a pivot is a common strategy. It won't fix the problem. Too bad. I must admit that I do not see how to build a test case which would trigger a worst case behavior against a qsort which chooses the pivot randomly, but I have not read the paper, and possibly there is an element of context which is eluding me. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing pgbench parameter checking
Included is the patch to enhance the behavior of pgbench in this regard IMO. Here is a sample session after patching: $ ./pgbench -c 10 -T 300 -S -i test some parameters cannot be used in initialize mode I have not tested, but the patch looks ok in principle. I'm not sure of the variable name is_non_init_parameter_set. I would suggest benchmarking_option_set? Also, to be consistent, maybe it should check that no initialization-specific option are set when benchmarking. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Three different modulo operators seems like a lot for a language that doesn't even have a real expression syntax, but I'll yield to whatever the consensus is on this one. Here is a third simpler patch which only implements the Knuth's modulo, where the remainder has the same sign as the divisor. I would prefer this version 3 (one simple modulo based on Knuth definition) or if there is a problem version 2 (all 3 modulos). Version 1 which provides a modulo compatible with C SQL is really useless to me. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index c0e5e24..3f53e2c 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -1574,6 +1574,22 @@ top: } snprintf(res, sizeof(res), INT64_FORMAT, ope1 / ope2); } +else if (strcmp(argv[3], %) == 0) +{ + int64_t remainder; + if (ope2 == 0) + { + fprintf(stderr, %s: division by zero in modulo\n, argv[0]); + st-ecnt++; + return true; + } + /* divisor signed remainder */ + remainder = ope1 % ope2; + if ((ope2 0 remainder 0) || + (ope2 0 remainder 0)) + remainder += ope2; + snprintf(res, sizeof(res), INT64_FORMAT, remainder); +} else { fprintf(stderr, %s: unsupported operator %s\n, argv[0], argv[3]); diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index b7d88f3..d722f31 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -735,7 +735,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ Each replaceableoperand/ is either an integer constant or a literal:/replaceablevariablename/ reference to a variable having an integer value. The replaceableoperator/ can be - literal+/, literal-/, literal*/, or literal//. + literal+/, literal-/, literal*/, literal%/ (divisor-signed version) or literal//. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Wed, Aug 6, 2014 at 6:12 AM, Amit Kapila amit.kapil...@gmail.com wrote: If I'm reading this right, the new statistic is an incrementing counter where, every time you update it, you add the number of buffers currently on the freelist. That makes no sense. I think using 'number of buffers currently on the freelist' and 'number of recently allocated buffers' for consecutive cycles, we can figure out approximately how many buffer allocations needs clock sweep assuming low and high threshold water marks are fixed. However there can be cases where it is not easy to estimate that number. Counters should be design in such a way that you can read it, and then read it again later, and make sense of it - you should not need to read the counter on *consecutive* cycles to interpret it. I think what you should be counting is the number of allocations that are being satisfied from the free-list. Then, by comparing the rate at which that value is incrementing to the rate at which buffers_alloc is incrementing, somebody can figure out what percentage of allocations are requiring a clock-sweep run. Actually, I think it's better to flip it around: count the number of allocations that require an individual backend to run the clock sweep (vs. being satisfied from the free-list); call it, say, buffers_backend_clocksweep. We can then try to tune the patch to make that number as small as possible under varying workloads. This can give us clear idea to tune the patch, however we need to maintain 3 counters for it in code(recent_alloc (needed for current bgwriter logic) and other 2 suggested by you). Do you want to retain such counters in code or it's for kind of debug info for patch? I only mean to propose one new counter, and I'd imagine including that in the final patch. We already have a counter of total buffer allocations; that's buffers_alloc. I'm proposing to add an additional counter for the number of those allocations not satisfied from the free list, with a name like buffers_alloc_clocksweep (I said buffers_backend_clocksweep above, but that's probably not best, as the existing buffers_backend counts buffer *writes*, not allocations). I think we would definitely want to retain this counter in the final patch, as an additional column in pg_stat_bgwriter. d. Autotune the low and high threshold for freelist for various configurations. I think we need to come up with some kind of formula here rather than just a list of hard-coded constants. That was my initial intention as well and I have tried based on number of shared buffers like keeping threshold values as percentage of shared buffers but nothing could satisfy different kind of workloads. The current values I have choosen are based on experiments for various workloads at different thresholds. I have shown the lwlock_stats data for various loads based on current thresholds upthread. Another way could be to make them as config knobs and use the values as given by user incase it is provided by user else go with fixed values. How did you go about determining the optimal value for a particular workload? When the list is kept short, it's less likely that a value on the list will be referenced or dirtied again before the page is actually recycled. That's clearly good. But when the list is long, it's less likely to become completely empty and thereby force individual backends to run the clock-sweep. My suspicion is that, when the number of buffers is small, the impact of the list being too short isn't likely to be very significant, because running the clock-sweep isn't all that expensive anyway - even if you have to scan through the entire buffer pool multiple times, there aren't that many buffers. But when the number of buffers is large, those repeated scans can cause a major performance hit, so having an adequate pool of free buffers becomes much more important. I think your list of high-watermarks is far too generous for low buffer counts. With more than 100k shared buffers, you've got a high-watermark of 2k buffers, which means that 2% or less of the buffers will be on the freelist, which seems a little on the high side to me, but probably in the ballpark of what we should be aiming for. But at 10001 shared buffers, you can have 1000 of them on the freelist, which is 10% of the buffer pool; that seems high. At 101 shared buffers, 75% of the buffers in the system can be on the freelist; that seems ridiculous. The chances of a buffer still being unused by the time it reaches the head of the freelist seem very small. Based on your existing list of thresholds, and taking the above into account, I'd suggest something like this: let the high-watermark for the freelist be 0.5% of the total number of buffers, with a maximum of 2000 and a minimum of 5. Let the low-watermark be 20% of the high-watermark. That might not be best, but I think some kind of formula like that can likely be made to work. I would
Re: [HACKERS] Proposal: Incremental Backup
On Wed, Aug 6, 2014 at 06:48:55AM +0100, Simon Riggs wrote: On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote: On Wed, Aug 6, 2014 at 09:17:35AM +0900, Michael Paquier wrote: On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote: Thinking some more, there seems like this whole store-multiple-LSNs thing is too much. We can still do block-level incrementals just by using a single LSN as the reference point. We'd still need a complex file format and a complex file reconstruction program, so I think that is still next release. We can call that INCREMENTAL BLOCK LEVEL. Yes, that's the approach taken by pg_rman for its block-level incremental backup. Btw, I don't think that the CPU cost to scan all the relation files added to the one to rebuild the backups is worth doing it on large instances. File-level backup would cover most of the Well, if you scan the WAL files from the previous backup, that will tell you what pages that need incremental backup. That would require you to store that WAL, which is something we hope to avoid. Plus if you did store it, you'd need to retrieve it from long term storage, which is what we hope to avoid. Well, for file-level backups we have: 1) use file modtime (possibly inaccurate) 2) use file modtime and checksums (heavy read load) For block-level backups we have: 3) accumulate block numbers as WAL is written 4) read previous WAL at incremental backup time 5) read data page LSNs (high read load) The question is which of these do we want to implement? #1 is very easy to implement, but incremental _file_ backups are larger than block-level backups. If we have #5, would we ever want #2? If we have #3, would we ever want #4 or #5? I am thinking we need a wiki page to outline all these options. There is a Wiki page. I would like to see that wiki page have a more open approach to implementations. I do think this is a very important topic for us. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Append to a GUC parameter ?
On Wed, Aug 6, 2014 at 12:12:29AM -0300, Fabrízio de Royes Mello wrote: On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: BTW, while there's unlikely to be a good reason to put search_path in pg.conf with appends, there are a LOT of reasons to want to be able to append to it during a session. [shrug...] You can do that today with current_setting()/set_config(). +1 With a very simple statement you can do that: fabrizio=# SELECT current_setting('search_path'); current_setting - $user,public (1 row) fabrizio=# SELECT set_config('search_path', current_setting('search_path')||', another_schema', false); set_config $user,public, another_schema (1 row) Yes, that is very nice, but it only works for session-level changes, i.e. you can't use that in postgresql.conf or via ALTER USER/DATABASE. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed redundant i18n strings in json
On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes jeff.ja...@gmail.com wrote: I think you missed one of the regression tests, see attached Woops. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On Sat, Aug 2, 2014 at 4:40 PM, Jeff Davis pg...@j-davis.com wrote: Attached is a patch that explicitly tracks allocated memory (the blocks, not the chunks) for each memory context, as well as its children. This is a prerequisite for memory-bounded HashAgg, which I intend to submit for the next CF. Hashjoin tracks the tuple sizes that it adds to the hash table, which is a good estimate for Hashjoin. But I don't think it's as easy for Hashagg, for which we need to track transition values, etc. (also, for HashAgg, I expect that the overhead will be more significant than for Hashjoin). If we track the space used by the memory contexts directly, it's easier and more accurate. I did some simple pgbench select-only tests, and I didn't see any TPS difference. I was curious whether a performance difference would show up when sorting, so I tried it out. I set up a test with pgbench -i 300. I then repeatedly restarted the database, and after each restart, did this: time psql -c 'set trace_sort=on; reindex index pgbench_accounts_pkey;' I alternated runs between master and master with this patch, and got the following results: master: LOG: internal sort ended, 1723933 KB used: CPU 2.58s/11.54u sec elapsed 16.88 sec LOG: internal sort ended, 1723933 KB used: CPU 2.50s/12.37u sec elapsed 17.60 sec LOG: internal sort ended, 1723933 KB used: CPU 2.14s/11.28u sec elapsed 16.11 sec memory-accounting: LOG: internal sort ended, 1723933 KB used: CPU 2.57s/11.97u sec elapsed 17.39 sec LOG: internal sort ended, 1723933 KB used: CPU 2.30s/12.57u sec elapsed 17.68 sec LOG: internal sort ended, 1723933 KB used: CPU 2.54s/11.99u sec elapsed 17.25 sec Comparing the median times, that's about a 3% regression. For this particular case, we might be able to recapture that by replacing the bespoke memory-tracking logic in tuplesort.c with use of this new facility. I'm not sure whether there are other cases that we might also want to test; I think stuff that runs all on the server side is likely to show up problems more clearly than pgbench. Maybe a PL/pgsql loop that does something allocation-intensive on each iteration, for example, like parsing a big JSON document. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] text search: restricting the number of parsed words in headline generation
FYI, I have kept this email from 2011 about poor performance of parsed words in headline generation. If someone wants to research it, please do so: http://www.postgresql.org/message-id/1314117620.3700.12.camel@dragflick --- On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable, or at least unhelpful. In headline generation function, we don't have any index or knowledge of where the match is. We discover the matches by first tokenizing and then comparing the matches with the query tokens. So it is hard to do anything better than first N words. After looking at the code in wparser_def.c a bit more, I wonder whether this patch is doing what you think it is. Did you do any profiling to confirm that tokenization is where the cost is? Because it looks to me like the match searching in hlCover() is at least O(N^2) in the number of tokens in the document, which means it's probably the dominant cost for any long document. I suspect that your patch helps not so much because it saves tokenization costs as because it bounds the amount of effort spent in hlCover(). I haven't tried to do anything about this, but I wonder whether it wouldn't be possible to eliminate the quadratic blowup by saving more state across the repeated calls to hlCover(). At the very least, it shouldn't be necessary to find the last query-token occurrence in the document from scratch on each and every call. Actually, this code seems probably flat-out wrong: won't every successful call of hlCover() on a given document return exactly the same q value (end position), namely the last token occurrence in the document? How is that helpful? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Tue, Aug 5, 2014 at 7:55 PM, Josh Berkus j...@agliodbs.com wrote: On 08/05/2014 04:41 PM, Alvaro Herrera wrote: I have chosen to keep the name minmax, even if the opclasses now let one implement completely different things on top of it such as geometry bounding boxes and bloom filters (aka bitmap indexes). I don't see a need for a rename: essentially, in PR we can just say we have these neat minmax indexes that other databases also have, but instead of just being used for integer data, they can also be used for geometry, GIS and bitmap indexes, so as always we're more powerful than everyone else when implementing new database features. Plus we haven't come up with a better name ... Several good suggestions have been made, like summarizing or summary indexes and compressed range indexes. I still really dislike the present name - you might think this is a type of index that has something to do with optimizing min and max, but what it really is is a kind of small index for a big table. The current name couldn't make that less clear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On Tue, Aug 5, 2014 at 5:53 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: This patch is pretty trivial. Another slightly less trivial but more useful version. The issue is that there are 3 definitions of modulo, two of which are fine (Knuth floored division and Euclidian), and the last one much less useful. Alas, C (%) SQL (MOD) choose the bad definition:-( I really need any of the other two. The attached patch adds all versions, with % and mod consistent with their C and SQL unfortunate counterparts, and fmod and emod the sane ones. Three different modulo operators seems like a lot for a language that doesn't even have a real expression syntax, but I'll yield to whatever the consensus is on this one. I wonder if it would be necessary to offer the division operator semantics corresponding to whatever additional modulo operator we choose to offer. That is, if we add emod, do we need ediv as well? Maybe we ought to break down and support a real expression syntax. Sounds like that would be better all around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote: Well, for file-level backups we have: 1) use file modtime (possibly inaccurate) 2) use file modtime and checksums (heavy read load) For block-level backups we have: 3) accumulate block numbers as WAL is written 4) read previous WAL at incremental backup time 5) read data page LSNs (high read load) The question is which of these do we want to implement? #1 is very easy to implement, but incremental _file_ backups are larger than block-level backups. If we have #5, would we ever want #2? If we have #3, would we ever want #4 or #5? You may want to implement both #3 and #2. #3 would need a config switch to enable updating the bitmap. That would make it optional to incur the I/O cost of updating the bitmap. When the bitmap isn't there, the backup would use #2. Slow, but effective. If slowness is a problem for you, you enable the bitmap and do #3. Sounds reasonable IMO, and it means you can start by implementing #2. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Questions on dynamic execution and sqlca
I'm very new to Postgres, but have plenty of experience developing stored procs in Oracle. I'm going to be creating Postgres stored procedures (functions actually, since I discovered that in postgres, everything is a function) to do a variety of batch-type processing. These functions may or may not be called by the .Net application that is being developed. To support both my Postgres function development and run-time monitoring, I wanted to develop generic logging functions that would be called by other Postgres functions to be developed in order to help trace through code and collect error information. The attached create_log_utilities.sql holds plsql for creating two logging functions (one for logging status messages, and one for logging errors). In the log_msg function, the various sets of EXEC and EXECUTE statements are from my experimenting with dynamically generating SQL. If I could get it working, the intent is to be able to add a LogTableName as an input parameter, thereby allowing individual developers to utilize their own version of the log table (w/ the same columns). I've been able to do this sort of thing w/ Oracle before. I've tried a variety of ways based on the on-line docs I've seen, but I always get a syntax error on EXEC when I use only the line EXEC statement (is there a setting I need to set in order to be able to include EXEC directives?). The closest I've come is the currently uncommented prepared statement - it compiles, but I get the following error messages: INFO: INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY, AUDIT_LEVEL, AUDIT_TIME, NOTE, SQL) VALUES ('Overpayment','Create TLI','LOG','2014-08-06 10:44:23.933','Created TLI','INSERT INTO TLIA...') CONTEXT: SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component, p_function, p_note, p_sql) PL/pgSQL function utility.logging_test() line 24 at PERFORM ERROR: INSERT has more expressions than target columns LINE 3: VALUES ($1, $2, $3, $4, $5, $6) ^ QUERY: PREPARE myinsert7 (text, text, text, timestamp, text, text) AS INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY, AUDIT_LEVEL, NOTE, SQL) VALUES ($1, $2, $3, $4, $5, $6) CONTEXT: PL/pgSQL function utility.log_msg (character,text,text,text,text) line 48 at SQL statement SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component, p_function, p_note, p_sql) PL/pgSQL function utility.logging_test() line 24 at PERFORM ** Error ** ERROR: INSERT has more expressions than target columns SQL state: 42601 Context: PL/pgSQL function utility.log_msg (character,text,text,text,text) line 48 at SQL statement SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component, p_function, p_note, p_sql) PL/pgSQL function utility.logging_test() line 24 at PERFORM In the other function (log_error ), the problem I'm having is that I'm trying to pull out the sqlca error code and description (as I've done in the past w/ Oracle), in order to write that information in my log table. The intent is that this function will only be called from within an EXCEPTION block (as I do in my logging_test function - I purposely run a bad query to trigger it). To exercise the code, I'm just executing select utility.logging_test(); in a query window. A few other items I could use clarification on: - What's the difference between hitting the Execute Query and Execute PGScript buttons? Both seem to compile the functions. - What are the differences among PL/SQL, PL/PGSQL and pgScript. - I installed Postgres 9.3.4 and I'm using PEM v4.0.2. When I click on the icon to Execute arbitrary SQL queries, I notice that the icons on the window that opens are different from the pgAdmin PostgreSQL Tools window that opens if I double-click on one of my .sql files. Is there a difference in these tools? Attached are the relevant scripts: (See attached file: create_bpc_audit.sql) - Create the log table (See attached file: create_log_utilities.sql)- Code to create the two logging functions (See attached file: test_log_utilities.sql)- Code to exercise the msg and error logging functions Thanks. Bill _ William Epstein Consulting I/T Specialist AIS ADM Information Management US Federal Office/Fax: 301-240-3887, Tie Line: 372-3887 International Business Machines (IBM) Corporation Global Business Services (GBS) create_bpc_audit.sql Description: Binary data create_log_utilities.sql Description: Binary data test_log_utilities.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
I just browsed the paper linked by Peter and it looks like the attack has to be active against a currently executing qsort. In the paper, what happens is the comparison function is supplied by the attacker and effectively lies about the result of a comparison. It keeps the lies consistent in a very specific manner so that eventually qsort returns its input in a properly sorted fashion. So it seems to me that the vulnerability only exits if an attacker supplied comparison function is permitted. For all other cases, assuming that only vetted comparison functions are permitted, the selection of a random pivot would make an attack on qsort using specially tailored input data improbable.
Re: [HACKERS] Questions on dynamic execution and sqlca
Bill, * Bill Epstein (epste...@us.ibm.com) wrote: [...] This should really go to the -general mailing list. The -hackers mailing list is for discussion regarding developing the PostgreSQL server itself. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal: Incremental Backup
On Wed, Aug 6, 2014 at 01:15:32PM -0300, Claudio Freire wrote: On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote: Well, for file-level backups we have: 1) use file modtime (possibly inaccurate) 2) use file modtime and checksums (heavy read load) For block-level backups we have: 3) accumulate block numbers as WAL is written 4) read previous WAL at incremental backup time 5) read data page LSNs (high read load) The question is which of these do we want to implement? #1 is very easy to implement, but incremental _file_ backups are larger than block-level backups. If we have #5, would we ever want #2? If we have #3, would we ever want #4 or #5? You may want to implement both #3 and #2. #3 would need a config switch to enable updating the bitmap. That would make it optional to incur the I/O cost of updating the bitmap. When the bitmap isn't there, the backup would use #2. Slow, but effective. If slowness is a problem for you, you enable the bitmap and do #3. Sounds reasonable IMO, and it means you can start by implementing #2. Well, Robert Haas had the idea of a separate process that accumulates the changed WAL block numbers, making it low overhead. I question whether we need #2 just to handle cases where they didn't enable #3 accounting earlier. If that is the case, just do a full backup and enable #3. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Summary seems good. If I get enough votes I can change it to that. CREATE INDEX foo ON t USING summary (cols) Summarizing seems weird on that command. Not sure about compressed range, as you would have to use an abbreviation or run the words together. Summarizing index sounds better to my ears, but both ideas based on summary are quite succint and to-the-point descriptions of what's happening, so I vote for those. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 01:31:14PM -0300, Claudio Freire wrote: On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: CREATE INDEX foo ON t USING crange (cols) -- misspelling of cringe? CREATE INDEX foo ON t USING comprange (cols) CREATE INDEX foo ON t USING compressedrng (cols) -- ugh -- or use an identifier with whitespace: CREATE INDEX foo ON t USING compressed range (cols) The word you'd use there is not necessarily the one you use on the framework, since the framework applies to many such techniques, but the index type there is one specific one. Block filter indexes? The create command can still use minmax, or rangemap if you prefer that, while the framework's code uses summary or summarizing. Summary sounds like materialized views to me. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: CREATE INDEX foo ON t USING crange (cols) -- misspelling of cringe? CREATE INDEX foo ON t USING comprange (cols) CREATE INDEX foo ON t USING compressedrng (cols) -- ugh -- or use an identifier with whitespace: CREATE INDEX foo ON t USING compressed range (cols) The word you'd use there is not necessarily the one you use on the framework, since the framework applies to many such techniques, but the index type there is one specific one. The create command can still use minmax, or rangemap if you prefer that, while the framework's code uses summary or summarizing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 1:35 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Aug 6, 2014 at 01:31:14PM -0300, Claudio Freire wrote: On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: CREATE INDEX foo ON t USING crange (cols) -- misspelling of cringe? CREATE INDEX foo ON t USING comprange (cols) CREATE INDEX foo ON t USING compressedrng (cols) -- ugh -- or use an identifier with whitespace: CREATE INDEX foo ON t USING compressed range (cols) The word you'd use there is not necessarily the one you use on the framework, since the framework applies to many such techniques, but the index type there is one specific one. Block filter indexes? Nice one -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] overflow checks optimized away
On Fri, Nov 29, 2013 at 02:04:10AM +, Greg Stark wrote: Attached is what I have so far. I have to say I'm starting to come around to Tom's point of view. This is a lot of hassle for not much gain. i've noticed a number of other overflow checks that the llvm optimizer is not picking up on so even after this patch it's not clear that all the signed overflow checks that depend on -fwrapv will be gone. This patch still isn't quite finished though. a) It triggers a spurious gcc warning about overflows on constant expressions. These value of these expressions aren't actually being used as they're used in the other branch of the overflow test. I think I see how to work around it for gcc using __builtin_choose_expr but it might be pretty grotty. b) I'm concerned these checks depend on INT_MIN/MAX and SHRT_MIN/MAX which may not be exactly the right length. I'm kind of confused why c.h assumes long is 32 bits and short is 16 bits though so I don't think I'm making it any worse. It may be better for us to just define our own limits since we know exactly how large we expect these data types to be. c) I want to add regression tests that will ensure that the overflow checks are all working. So far I haven't been able to catch any being optimized away even with -fno-wrapv and -fstrict-overflow. I think I just didn't build with the right options though and it should be possible. The goal here imho is to allow building with -fno-wrapv -fstrict-overflow safely. Whether we actually do or not is another question but it means we would be able to use new compilers like clang without worrying about finding the equivalent of -fwrapv for them. Where are we on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 1:55 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Claudio Freire wrote: On Wed, Aug 6, 2014 at 1:25 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: CREATE INDEX foo ON t USING crange (cols) -- misspelling of cringe? CREATE INDEX foo ON t USING comprange (cols) CREATE INDEX foo ON t USING compressedrng (cols) -- ugh -- or use an identifier with whitespace: CREATE INDEX foo ON t USING compressed range (cols) The word you'd use there is not necessarily the one you use on the framework, since the framework applies to many such techniques, but the index type there is one specific one. The create command can still use minmax, or rangemap if you prefer that, while the framework's code uses summary or summarizing. I think you're confusing the AM name with the opclass name. The name you specify in that part of the command is the access method name. You can specify the opclass together with each column, like so: CREATE INDEX foo ON t USING blockfilter (order_date date_minmax_ops, geometry gis_bbox_ops); Oh, uh... no, I'm not confusing them, but now I just realized how one would implement other classes of block filtering indexes, and yeah... you do it through the opclasses. I'm sticking to bloom filters: CREATE INDEX foo ON t USING blockfilter (order_date date_minmax_ops, path character_bloom_ops); Cool. Very cool. So, I like blockfilter a lot. I change my vote to blockfilter ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_export_snapshot on standby side
Seems we still have not addressed this. --- On Sat, May 25, 2013 at 10:18:57AM +0100, Simon Riggs wrote: On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote: We cannot run parallel pg_dump on the standby server because pg_export_snapshot() always fails on the standby. Is this the oversight of parallel pg_dump or pg_export_snapshot()? pg_export_snapshot() fails in the standby because it always assigns new XID and which is not allowed in the standby. Do we really need to assign new XID even in the standby for the exportable snapshot? Having looked at the code, I say No, we don't *need* to. There are various parts of the code that deal with takenDuringRecovery, so much of this was clearly intended to work in recovery. We use the topXid for the name of the snapshot file. That is clearly unnecessary and we should be using the virtualxid instead like we do elsewhere. We also use the topXid to test whether it is still running, though again, we could equally use the virtualxid instead. There is no problem with virtualxids possibly not being active anymore, since if we didn't have an xid before and don't have one now, and the xmin is the same, the snapshot is still valid. I think we should treat this as a bug and fix it in 9.3 while we're still in beta. Why? Because once we begin using the topXid as the filename we can't then change later to using the vxid. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Fri, Aug 1, 2014 at 10:58 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 07/08/2014 08:11 PM, Jeff Janes wrote: Is there some recipe for testing the 0002 patch? Can it be tested on an MinGW environment, or does it need to use the MicroSoft supplied compilers? I used MSVC. It ought to work with MinGW, I think, although you might need to tweak the Makefiles to make it compile. Certainly we should eventually make it work, before committing. If you try it out, let me know how it goes. I couldn't it to work when I tried it long time ago, but I didn't record the error and it may have been a failure to use the correct arguments to the configure. I think it was taking a path through all the #ifdef that resulted in a function never getting defined. But now it looks like 0002 needs a rebase Cheers, Jeff
[HACKERS] posix_fadvise() and pg_receivexlog
Hi, The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Wed, Aug 6, 2014 at 9:18 AM, John Cochran j69coch...@gmail.com wrote: So it seems to me that the vulnerability only exits if an attacker supplied comparison function is permitted. For all other cases, assuming that only vetted comparison functions are permitted, the selection of a random pivot would make an attack on qsort using specially tailored input data improbable. Whether or not random pivot selection would make it more difficult for a malicious party to generate pre-processed data that will cause very bad performance is not all that relevant IMV, since we don't do that. There are good practical reasons to prefer median of medians pivot selection, which is what we actually do, and which is clearly affected to the extent that pre-processing malicious data that causes (at least) near worst-case performance is possible. It's possible to predict pivot selection. As the paper says, An adversary can make such a quicksort go quadratic by arranging for the pivot to compare low against almost all items not seen during pivot selection. Random pivot selection will certainly result in more frequent lopsided partitions without any malicious intent. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On 2.8.2014 22:40, Jeff Davis wrote: Attached is a patch that explicitly tracks allocated memory (the blocks, not the chunks) for each memory context, as well as its children. This is a prerequisite for memory-bounded HashAgg, which I intend to Anyway, I'm really looking forward to the memory-bounded hashagg, and I'm willing to spend some time testing it. submit for the next CF. Hashjoin tracks the tuple sizes that it adds to the hash table, which is a good estimate for Hashjoin. But I don't think Actually, even for HashJoin the estimate is pretty bad, and varies a lot depending on the tuple width. With narrow tuples (e.g. ~40B of data), which is actually quite common case, you easily get ~100% palloc overhead. I managed to address that by using a custom allocator. See this: https://commitfest.postgresql.org/action/patch_view?id=1503 I wonder whether something like that would be possible for the hashagg? That would make the memory accounting accurate with 0% overhead (because it's not messing with the memory context at all), but it only for the one node (but maybe that's OK?). it's as easy for Hashagg, for which we need to track transition values, etc. (also, for HashAgg, I expect that the overhead will be more significant than for Hashjoin). If we track the space used by the memory contexts directly, it's easier and more accurate. I don't think that's comparable - I can easily think about cases leading to extreme palloc overhead with HashAgg (think of an aggregate implementing COUNT DISTINCT - that effectively needs to store all the values, and with short values the palloc overhead will be terrible). Actually, I was looking at HashAgg (it's a somehow natural direction after messing with Hash Join), and my plan was to use a similar dense allocation approach. The trickiest part would probably me making this available from the custom aggregates. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On Wed, Aug 6, 2014 at 1:39 PM, Fujii Masao masao.fu...@gmail.com wrote: The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? How do we know that the user doesn't plan to read them soon? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On 08/06/2014 08:39 PM, Fujii Masao wrote: Hi, The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? -1. The OS should be smart enough to not thrash the cache by files that are written sequentially and never read. If we go down this path, we'd need to sprinkle posix_fadvises into many, many places. Anyway, who are we to say that they won't be re-read soon? You might e.g have a secondary backup site where you copy the files received by pg_receivexlog, as soon as they're completed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
Random pivot selection will certainly result in more frequent lopsided partitions without any malicious intent. Yep. It makes adversary input attacks more or less impossible, at the price of higher average cost. Maybe a less randomized version would do, i.e. select randomly one of the 3 medians found, but would keep some nice better performance property. It would need proof, though. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
Maybe we ought to break down and support a real expression syntax. Sounds like that would be better all around. Adding operators is more or less orthogonal with providing a new expression syntax. I'm not sure that there is currently a crying need for it (a syntax). It would be a significant project. It would raise the question where to stop... And I just really need a few more functions/operators which can be fitted in the current implementation quite easily. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
Hi I returned to this issue and maybe I found a root issue. It is PL/pgSQL implicit IO cast Original text: postgres=# DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP if 0=0 then n = SQRT (f); end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; NOTICE: Result = 3162.28 DO Time: 31988.720 ms Little bit modified postgres=# DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP if 0=0 then n = SQRT (f)::real; end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; NOTICE: Result = 3162.28 DO Time: 9660.592 ms It is 3x faster there is invisible IO conversion from double precision::real via libc vfprintf https://github.com/okbob/plpgsql_check/ can raise a performance warning in this situation, but we cannot do too much now without possible breaking compatibility Regards Pavel 2014-08-05 16:02 GMT+02:00 Roberto Mello roberto.me...@gmail.com: On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner kgri...@ymail.com wrote: Since that is outside the loop, the difference should be nominal; Apologies. I misread on my phone and though it was within the loop. and in a quick test it was. On the other hand, reducing the procedural code made a big difference. snip test=# DO LANGUAGE plpgsql $$ DECLARE n real; BEGIN PERFORM SQRT(f) FROM generate_series(1, 1000) x(f); END $$; DO Time: 3916.815 ms That is a big difference. Are you porting a lot of code from PL/SQL, and therefore evaluating the performance difference of running this code? Or is this just a general test where you wish to assess the performance difference? PL/pgSQL could definitely use some loving, as far as optimization goes, but my feeling is that it hasn't happened because there are other suitable backends that give the necessary flexibility for the different use cases. Roberto -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
Hi this code is +/- equal to Oracle (it should be eliminate a useless code) postgres=# DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP --if 0=0 then n = SQRT (f)::real; --end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; NOTICE: Result = 3162.28 DO Time: 5787.797 ms 2014-08-06 21:41 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: Hi I returned to this issue and maybe I found a root issue. It is PL/pgSQL implicit IO cast Original text: postgres=# DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP if 0=0 then n = SQRT (f); end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; NOTICE: Result = 3162.28 DO Time: 31988.720 ms Little bit modified postgres=# DO LANGUAGE plpgsql $$ DECLARE n real; DECLARE f integer; BEGIN FOR f IN 1..1000 LOOP if 0=0 then n = SQRT (f)::real; end if; END LOOP; RAISE NOTICE 'Result = %',n; END $$; NOTICE: Result = 3162.28 DO Time: 9660.592 ms It is 3x faster there is invisible IO conversion from double precision::real via libc vfprintf https://github.com/okbob/plpgsql_check/ can raise a performance warning in this situation, but we cannot do too much now without possible breaking compatibility Regards Pavel 2014-08-05 16:02 GMT+02:00 Roberto Mello roberto.me...@gmail.com: On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner kgri...@ymail.com wrote: Since that is outside the loop, the difference should be nominal; Apologies. I misread on my phone and though it was within the loop. and in a quick test it was. On the other hand, reducing the procedural code made a big difference. snip test=# DO LANGUAGE plpgsql $$ DECLARE n real; BEGIN PERFORM SQRT(f) FROM generate_series(1, 1000) x(f); END $$; DO Time: 3916.815 ms That is a big difference. Are you porting a lot of code from PL/SQL, and therefore evaluating the performance difference of running this code? Or is this just a general test where you wish to assess the performance difference? PL/pgSQL could definitely use some loving, as far as optimization goes, but my feeling is that it hasn't happened because there are other suitable backends that give the necessary flexibility for the different use cases. Roberto -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
2014-08-06 Claudio Freire klaussfre...@gmail.com: So, I like blockfilter a lot. I change my vote to blockfilter ;) +1 for blockfilter, because it stresses the fact that the physical arrangement of rows in blocks matters for this index. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
ST == Shaun Thomas stho...@optionshouse.com writes: ST That said, the documentation here says FLOAT4 is an alias for REAL, ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than ST FLOAT8, which is an alias for DOUBLE PRECISION. There are some versions of glibc where doing certain math on double is faster than doing it on float, depending on how things are compiled. Maybe this is one of them? -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Aug 5, 2014 at 3:54 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote: Per your other email, here's the patch again; hopefully I've got the right stuff in the file this time. Your patch looks fine to me. I recommend committing patch 1 with these additions. I can't think of any reason to rehash the 2012 discussion. I've committed the patch I posted yesterday. I did not see a good reason to meld that together in a single commit with the first of the patches you posted; I'll leave you to revise that patch to conform with this approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing PG_RETURN_UINT16
On Wed, Aug 6, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Aug 6, 2014 at 4:47 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep m.kn...@web.de wrote: I’m missing the PG_RETURN_UINT16 macro in fmgr.h Since we already have the PG_GETARG_UINT16 macro I guess it makes sense to to have it. here is the trivial patch for it. I see no reason not to add this. Anyone else want to object? +1 to add that. What about backpatch to 9.4? This is very simple change and there seems to be no reason to wait for it until 9.5. Well, that's true. But on the other hand, if someone is wanting to write code that will compile with multiple PostgreSQL versions, they're probably going to add an #ifdef for this anyway, so I don't see much of a reason to think that will help very many people in practice. I've committed this, but just to master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
2014-08-06 22:07 GMT+02:00 James Cloos cl...@jhcloos.com: ST == Shaun Thomas stho...@optionshouse.com writes: ST That said, the documentation here says FLOAT4 is an alias for REAL, ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than ST FLOAT8, which is an alias for DOUBLE PRECISION. There are some versions of glibc where doing certain math on double is faster than doing it on float, depending on how things are compiled. Maybe this is one of them? no It is plpgsql issue only. PL/pgSQL uses a generic cast via serialization to string and new parsing It doesn't use a effective libc casting functions. see https://github.com/postgres/postgres/blob/master/src/pl/plpgsql/src/pl_exec.c function exec_cast_value -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GIST
On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. Thanks! Some comments: * I got this compiler warning: gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] ListCell *tmpPageData = so-curPageData; ^ * I'm getting two regression failures with this (opr_sanity and join). * After merging with master, build fails because of duplicate OIDs. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. * What's the reason for turning GISTScanOpaqueData.pageData from an array to a List? * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Can we have Fetch functions for all the datatypes in btree_gist contrib module, please? Do other contrib modules contain GiST opclasses that could have Fetch functions? Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2 function. But I couldn't understand how. The query is: select * from gist_tbl where b @ box(point(5,5), point(6,6)) or p @ box(point(0,0), point(100,100)) limit 10; It cannot use an index(-only) scan for this, because a single index scan can only return rows based on one key. In this case, you need to do two scans, and then return the rows returned by either scan, removing duplicates. A bitmap scan is possible, because it can remove the duplicates, but the planner can't produce a plain index scan plan that would do the same. A common trick when that happens in a real-world application is to re-write the query using UNION: select * from gist_tbl where b @ box(point(5,5), point(6,6)) UNION select * from gist_tbl where p @ box(point(0,0), point(100,100)) limit 10; Although that doesn't seem to actually work: ERROR: could not identify an equality operator for type box LINE 1: select * from gist_tbl ^ but that's not your patch's fault, the same happens with unpatched master. IOW, you don't need to worry about that case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] select_common_type()'s behavior doesn't match the documentation
Any progress on this? --- On Sat, Nov 30, 2013 at 12:43:39PM -0500, Tom Lane wrote: In our fine manual, at http://www.postgresql.org/docs/devel/static/typeconv-union-case.html it's claimed that the nontrivial parts of UNION type resolution work like this: 4. Choose the first non-unknown input type which is a preferred type in that category, if there is one. 5. Otherwise, choose the last non-unknown input type that allows all the preceding non-unknown inputs to be implicitly converted to it. (There always is such a type, since at least the first type in the list must satisfy this condition.) This appears to have only the vaguest of resemblances to what select_common_type() actually does, which is to make a single pass over the inputs in which it does this: /* * take new type if can coerce to it implicitly but not the * other way; but if we have a preferred type, stay on it. */ Thus for example there's a surprising inconsistency between these cases: regression=# select pg_typeof(t) from (select 'a'::text union select 'b'::char(1)) s(t); pg_typeof --- text text (2 rows) regression=# select pg_typeof(t) from (select 'a'::char(1) union select 'b'::text) s(t); pg_typeof --- character character (2 rows) I think that at the very least, we ought to prefer preferred types, the way the manual claims. I'm less certain about whether step 5 is ideal as written. This came up because some of my Salesforce colleagues were griping about the fact that UNION isn't commutative. They argue that the type resolution behavior ought not be sensitive at all to the ordering of the inputs. I'm not sure we can achieve that in general, but the current approach certainly seems more order-sensitive than it oughta be. Some trolling in the git history says that the last actual change in this area was in my commit b26dfb95222fddd25322bdddf3a5a58d3392d8b1 of 2002-09-18, though it appears the documentation has been rewritten more recently. It's a bit scary to be proposing to change behavior that's been stable for eleven years, but ... Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] select_common_type()'s behavior doesn't match the documentation
Tom Lane-2 wrote In our fine manual, at http://www.postgresql.org/docs/devel/static/typeconv-union-case.html it's claimed that the nontrivial parts of UNION type resolution work like this: 4. Choose the first non-unknown input type which is a preferred type in that category, if there is one. 5. Otherwise, choose the last non-unknown input type that allows all the preceding non-unknown inputs to be implicitly converted to it. (There always is such a type, since at least the first type in the list must satisfy this condition.) This came up because some of my Salesforce colleagues were griping about the fact that UNION isn't commutative. They argue that the type resolution behavior ought not be sensitive at all to the ordering of the inputs. I'm not sure we can achieve that in general, but the current approach certainly seems more order-sensitive than it oughta be. 4. Use the preferred type for whatever category all inputs share (per 3). Per 1 this is only used if at least one input does not agree. 5. No longer needed 6. Stays the same It is possible for a result type to not match any of the input types but if you want to be commutative this would have to be allowed. You could add a majority rules condition rules before 4 and punt if there is no one dominate type. Should #1 repeat after flattening domains to their base types? I would probably logically place 2 before 1 since if everything is unknown nothing else matters. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/select-common-type-s-behavior-doesn-t-match-the-documentation-tp5780985p5813963.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
Hello updated version patch in attachment 2014-08-05 13:31 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Wed, Jul 16, 2014 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-07-15 12:07 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Tue, Jul 15, 2014 at 7:01 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-07-15 11:29 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Thu, Jul 10, 2014 at 9:56 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello here is a proposed patch - autocomplete for known psql variable content Even after applying the patch, the following psql variables were not displayed on the tab-completion of \set command. HISTFILE HISTSIZE HOST IGNOREEOF LASTOID I'm not sure if it's worth displaying all of them on the tab-completion of \set because it seems strange to change some of them by \set command, for example HOST, though. For these variables are not default - so doesn't exist and cannot be completed by default. there are two fix: a) fix autocomplete source - preferred +1 here is the patch Thanks for the patch! I got the following compiler warnings. tab-complete.c:4155: warning: assignment discards qualifiers from pointer target type tab-complete.c:4166: warning: assignment discards qualifiers from pointer target type fixed +FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREOFF, Typo: IGNOREOFF - IGNOREEOF fixed +/* all psql known variables are included in list by default */ +for (known_varname = known_varnames; *known_varname; known_varname++) +varnames[nvars++] = pg_strdup(*known_varname); Don't we need to append both prefix and suffix to even known variables? ??? I am not sure if I understand well - probably system read only variables as DBNAME, USER, VERSION should not be there +else if (strcmp(prev2_wd, \\set) == 0) +{ +if (strcmp(prev_wd, AUTOCOMMIT) == 0) ISTM that some psql variables like IGNOREEOF are not there. Why not? yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION There are more reasons: * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, HISTSIZE, .. * variable is pseudo read only - change has not any effect: DBNAME, ENCODING, VERSION Regards Pavel Regards, -- Fujii Masao
Re: [HACKERS] plpgsql.extra_warnings='num_into_expressions'
On 7/21/14, 10:56 PM, I wrote: Here's a patch which allows you to notice those annoying bugs with INTO slightly more quickly. Adding to the next commit phest. New version, fixed bugs with set operations and VALUES lists. .marko *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** *** 4719,4725 a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$ varnameplpgsql.extra_errors/ for errors. Both can be set either to a comma-separated list of checks, literalnone/ or literalall/. The default is literalnone/. Currently the list of available checks ! includes only one: variablelist varlistentry termvarnameshadowed_variables/varname/term --- 4719,4725 varnameplpgsql.extra_errors/ for errors. Both can be set either to a comma-separated list of checks, literalnone/ or literalall/. The default is literalnone/. Currently the list of available checks ! is as follows: variablelist varlistentry termvarnameshadowed_variables/varname/term *** *** 4729,4734 a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$ --- 4729,4744 /para /listitem /varlistentry +varlistentry + termvarnamenum_into_expressions/varname/term + listitem + para + When possible, checks that the number of expressions specified in the + SELECT or RETURNING list of a query matches the number expected by the + INTO target. This check is done on a best effort basis. + /para + /listitem +/varlistentry /variablelist The following example shows the effect of varnameplpgsql.extra_warnings/ *** a/src/pl/plpgsql/src/pl_gram.y --- b/src/pl/plpgsql/src/pl_gram.y *** *** 22,27 --- 22,28 #include parser/scanner.h #include parser/scansup.h #include utils/builtins.h + #include nodes/nodefuncs.h /* Location tracking support --- simpler than bison's default */ *** *** 97,103 static PLpgSQL_row *make_scalar_list1(char *initial_name, PLpgSQL_datum *initial_datum, int lineno, int location); staticvoid check_sql_expr(const char *stmt, int location, ! int leaderlen); staticvoid plpgsql_sql_error_callback(void *arg); staticPLpgSQL_type*parse_datatype(const char *string, int location); staticvoid check_labels(const char *start_label, --- 98,104 PLpgSQL_datum *initial_datum, int lineno, int location); staticvoid check_sql_expr(const char *stmt, int location, ! int leaderlen, PLpgSQL_row *check_row); staticvoid plpgsql_sql_error_callback(void *arg); staticPLpgSQL_type*parse_datatype(const char *string, int location); staticvoid check_labels(const char *start_label, *** *** 106,111 static void check_labels(const char *start_label, --- 107,115 staticPLpgSQL_expr*read_cursor_args(PLpgSQL_var *cursor, int until, const char *expected); staticList*read_raise_options(void); + staticboolfind_a_star_walker(Node *node, void *context); + staticint tlist_result_column_count(Node *stmt); + %} *** *** 1438,1444 for_control : for_variable K_IN PLpgSQL_stmt_fori *new; /* Check first expression is well-formed */ ! check_sql_expr(expr1-query, expr1loc, 7); /* Read and check the second one */ expr2 = read_sql_expression2(K_LOOP, K_BY, --- 1442,1448 PLpgSQL_stmt_fori *new; /* Check first expression is well-formed */ ! check_sql_expr(expr1-query, expr1loc, 7, NULL);
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the patch I posted yesterday. I did not see a good reason to meld that together in a single commit with the first of the patches you posted; I'll leave you to revise that patch to conform with this approach. Okay. Attached is the same patch set, rebased on top on your commit with appropriate amendments. BTW, I haven't added any of the same portability measures that the existing strxfrm() selfuncs.c caller has. Since Windows can only use the optimization with the C locale, I don't believe we're affected, although the link http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99694; is now dead (even if you have a Microsoft account), so I won't swear that Visual Studio 2005 is unaffected in the C locale case. However, only VS 2008 and later versions support 64-bit builds, and only 64-bit builds support abbreviated keys within sort support for text, so it doesn't matter. As for the Solaris bugs that convert_string_datum() also refers to, well, I just don't have much sympathy for that case. If a version of Solaris that was old in 2003 still has a buggy strxfrm() implementation and wants to use Postgres 9.5, too bad. My usage of strxfrm() is the pattern that the standard expects. The idea that we should always do a dry-run, where strxfrm() is called with a NULL pointer to check how much memory is required, just because somebody's strxfrm() is so badly written as to feel entitled to write past the end of my buffer is just absurd. That should probably be removed from the existing convert_string_datum() strxfrm() call site too. I suspect that no Oracle-supported version of Solaris will be affected, but if I'm wrong, that's what we have a buildfarm for. The nodeAgg.c FIXME item remains. I've added a new TODO comment which suggests that we investigate more opportunistic attempts at getting away with a cheap memcmp() that indicates equality. That continues to only actually occur when the sort support comparator is called as a tie-breaker. -- Peter Geoghegan From 13be5686aaa894381e70abd5532964c73c11b5b5 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan p...@heroku.com Date: Sat, 2 Aug 2014 15:39:28 -0700 Subject: [PATCH 2/2] Abbreviated sortsupport keys Add a new infrastructure to sortsupport, and add a single client of this new infrastructure, the text sortsupport routine. This infrastructure allows opclasses whose sortsupport routines support this additional, optional new capability to provide abbreviated representations of Datums in advance of sorting, generated through an opclass-specific encoding scheme. An abbreviated representation is typically pass by value, and the capability is thought to mostly be useful for pass by reference types, but there are no hard requirements on representation. Opclasses that support this new capability provide a special abbreviation-aware comparator (in addition to the usual, ordinary sortsupport fmgr-eliding comparator). This comparator returns a value less than, equal to, or greater than 0, in a manner similar to the usual fmgr-eliding comparator (which is to say, a manner similar to any btree support function 1). However, there is an important difference: sortsupport clients such as tuplesort interpret the value 0 as inconclusive. Inconclusive comparisons obligate these clients to perform a tie-breaker ordinary comparison in respect of the same attribute. By generating abbreviated keys, the optimization can speed up internal sorts considerably. Because of the high cost of CPU cache stalls when sorting, cache efficiency is a crucial consideration in engineering sorting algorithms and related infrastructure. Pass by value representations have much greater temporal and spatial locality than pass by reference representations. Furthermore, the abbreviation-aware comparisons can themselves be far cheaper than an authoritative comparison, if the abbreviation encoding scheme can produce a reductionist representation sufficient to resolve most comparisons. Encoding may be expensive, but if an encoding process can occur N times, rather than having essentially the same encoding process occur N log N times, that can be a highly effective optimization. The expense of encoding may well be problematic if no benefit can be derived from abbreviated keys during sorting, though. Opclasses supporting abbreviated keys also provide an abort callback, which hints the number of rows to be sorted in total, and progress so far. This can either abort the abbreviation encoding process entirely, or give the encoding scheme the ability to adapt its encoding strategy in order to maximize the number of comparisons that are resolved with abbreviated keys. HyperLogLog, a cardinality estimator algorithm is added to give clients of the new sortsupport infrastructure a principled way to estimate the usefullness of encoding. Currently, only the MinimalTuple tuplesort case uses
Re: [HACKERS] Fixed redundant i18n strings in json
On Wed, Aug 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes jeff.ja...@gmail.com wrote: I think you missed one of the regression tests, see attached Woops. Thanks, committed. Thanks. It needs to go into 9_4_STABLE as well. Cheers, Jeff
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch n...@leadboat.com wrote: However, with work_mem set low enough to get an external sort, the difference is more interesting. If I set work_mem to 10 MB, then the query takes about 10.7 seconds to execute with a suitably patched Postgres. Whereas on master, it consistently takes a full 69 seconds. That's the largest improvement I've seen so far, for any case. Comparator cost affects external sorts more than it affects internal sorts. When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of the internal sort profile and 10.26% of the external sort profile. I took another look at this. If I run dd if=/dev/zero of=/home/pg/test, I can see with iotop that that has about 45 M/s total disk write fairly sustainably, with occasional mild blips during write-back. This is a Crucial mobile SSD, with an ext4/lvm file system, and happens to be what is close at hand. If I run the same external sorting query with a patched Postgres, I see 24 M/s total disk write throughout. With master, it's about 6 M/s, and falls to 0 during the final 36-way merge. I'm not sure if the same thing occurs with patched during the final merge, because the available resolution isn't good enough to be able to tell. Anyway, it's pretty clear that when patched, the external sort on text is, if not totally I/O bound, much closer to being I/O bound. A good external sort algorithm should be at least close to totally I/O bound. This makes I/O parallelism a viable strategy for speeding up sorts, where it might not otherwise be. I've heard of people using a dedicated temp tablespace disk with Postgres to speed up sorting, but that always seemed to be about reducing the impact on the heap filesystem, or vice versa. I've never heard of anyone using multiple disks to speed up sorting with Postgres (which I don't presume means it hasn't been done at least somewhat effectively). However, with external sort benchmarks (like http://sortbenchmark.org), using I/O parallelism strategically seems to be table stakes for external sort entrants. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reporting the commit LSN at commit time
Hi all To support transparent client-side failover in BDR, it's necessary to know what the LSN of a node was at the time a transaction committed and keep track of that in the client/proxy. I'm thinking about adding a new message type in the protocol that gets sent immediately before CommandComplete, containing the LSN of the commit. Clients would need to enable the sending of this message with a GUC that they set when they connect, so it doesn't confuse clients that aren't expecting it or aware of it. Is this something you can see being useful for other non-BDR purposes? Are there any obvious issues with this? Clients can always follow up with a second query to get the xlog position, after commit, but that's potentially slow and has a race that might cause a client to wait longer than it has to after fail-over to a different node. That's why having the server report it automatically seems useful. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning performance: cache stringToNode() of pg_constraint.ccbin
On Sun, Jan 12, 2014 at 12:53:40PM -0500, Noah Misch wrote: On Sat, Jan 11, 2014 at 02:10:01PM -0500, Bruce Momjian wrote: On Mon, Jun 3, 2013 at 03:07:27PM -0400, Noah Misch wrote: A colleague, Korry Douglas, observed a table partitioning scenario where deserializing pg_constraint.ccbin is a hot spot. The following test case, a simplification of a typical partitioning setup, spends 28% of its time in stringToNode() and callees thereof: Noah, what is the status on this? Further study revealed a defect in the patch's memory management, and I have not gotten around to correcting that. I talked to Noah and he can't continue on this item. Can someone else work on it? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
2014-08-06 23:38 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr: Three different modulo operators seems like a lot for a language that doesn't even have a real expression syntax, but I'll yield to whatever the consensus is on this one. Here is a third simpler patch which only implements the Knuth's modulo, where the remainder has the same sign as the divisor. I would prefer this version 3 (one simple modulo based on Knuth definition) or if there is a problem version 2 (all 3 modulos). Version 1 which provides a modulo compatible with C SQL is really useless to me. I like version 3, it is simple and practical. And it's enough in pgbench. If someone wants to use other implementation of modulo algorithm, he just changes his source code. Best regards, -- Mitsumasa KONDO
Re: [HACKERS] postgresql.auto.conf and reload
On Wed, Aug 6, 2014 at 11:39 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: The patch chooses the last settings for every parameters and ignores the former settings. But I don't think that every parameters need to be processed this way. That is, we can change the patch so that only PGC_POSTMASTER parameters are processed that way. The invalid settings in the parameters except PGC_POSTMASTER can be checked by pg_ctl reload as they are now. Also this approach can reduce the number of comparison to choose the last setting, i.e., n in O(n^2) is the number of uncommented *PGC_POSTMASTER* parameters (not every parameters). Thought? I don't find that to be a particularly good idea. In the first place, it introduces extra complication and a surprising difference in the behavior of different GUCs. In the second place, I thought part of the point of this patch was to suppress log messages complaining about invalid values that then weren't actually used for anything. That issue exists just as much for non-POSTMASTER variables. (IOW, value cannot be changed now is not the only log message we're trying to suppress.) Yep, sounds reasonable. This makes me think that we can suppress such invalid message of the parameters which are actually not used at not only conf file reload but also *postmaster startup*. That's another story, though. Anyway, barring any objection, I will commit Amit's patch. Applied the slightly-modified version! Thanks. There is a commitfest entry [1] for this patch, do you want some thing more to be addressed or shall we mark that as committed. [1] https://commitfest.postgresql.org/action/patch_view?id=1500 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] postgresql.auto.conf and reload
On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila amit.kapil...@gmail.com wrote: The reason is that during startup DataDir is not set by the time server processes config file due to which we process .auto.conf file in second pass. I think ideally it should ignore the invalid setting in such a case as it does during reload, however currently there is no good way to process .auto.conf incase DataDir is not set, so we handle it separately in second pass. What about picking up only data_directory at the first pass? I think that will workout and for that I think we might need to add few checks during ProcessConfigFile. Do you want to address it during parsing of config file or during setting of values? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_export_snapshot on standby side
On Thu, Aug 7, 2014 at 2:17 AM, Bruce Momjian br...@momjian.us wrote: Seems we still have not addressed this. Thanks for the reminder :) I'm not sure if I have time to work on this for a while. If anyone is interested in this, please feel free to try it. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/06/2014 08:39 PM, Fujii Masao wrote: Hi, The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? -1. The OS should be smart enough to not thrash the cache by files that are written sequentially and never read. Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe not, so I was thinking that posix_fadvise is called when the server closes WAL file. If we go down this path, we'd need to sprinkle posix_fadvises into many, many places. Yes, that's valid concern. But if we can prove that adding posix_fadvise to a certain place can improve the performance well, I'm inclined to do that. Anyway, who are we to say that they won't be re-read soon? You might e.g have a secondary backup site where you copy the files received by pg_receivexlog, as soon as they're completed. So whether posix_fadvise is called or not needs to be exposed as an user-configurable option. We would need to measure how useful exposing that is, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello updated version patch in attachment Thanks! But ISTM you forgot to attached the patch. +/* all psql known variables are included in list by default */ +for (known_varname = known_varnames; *known_varname; known_varname++) +varnames[nvars++] = pg_strdup(*known_varname); Don't we need to append both prefix and suffix to even known variables? ??? I am not sure if I understand well - probably system read only variables as DBNAME, USER, VERSION should not be there I had that question because complete_from_variables() is also called by the tab-completion of \echo : and it shows half-baked variables list. So I thought probably we need to change complete_from_variables() more to fix the problem. +else if (strcmp(prev2_wd, \\set) == 0) +{ +if (strcmp(prev_wd, AUTOCOMMIT) == 0) ISTM that some psql variables like IGNOREEOF are not there. Why not? yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION There are more reasons: * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, HISTSIZE, .. * variable is pseudo read only - change has not any effect: DBNAME, ENCODING, VERSION So HISTCONTROL should be there because it doesn't have such reasons at all? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Aug 5, 2014 at 9:32 PM, Peter Geoghegan p...@heroku.com wrote: I knew that I'd heard that at least once. Apparently some other database systems have external sorts that tend to be faster than equivalent internal sorts. I'd guess that that is an artifact of their having a substandard internal sort, though. This *almost* applies to patched Postgres if you pick a benchmark that is very sympathetic to my patch. To my surprise, work_mem = '10MB' (which results in an external tape sort) is sometimes snapping at the heels of a work_mem = '5GB' setting (which results in an in-memory quicksort). I have a 338 MB table, that consists or a single text column of 8 byte strings strings, with high cardinality. I ran VACUUM FREEZE, and took all the usual precautions of that kind. On the test table n_distinct = -1, and there is no discernible physical/logical correlation. The external sort case stabilized as follows: LOG: duration: 9731.776 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 9742.948 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 9733.918 ms statement: select * from (select * from besttest order by tt offset 1000) i; The in-memory case stabilized as follows: LOG: duration: 0.059 ms statement: set work_mem = '5GB'; LOG: duration: 9665.731 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 9602.841 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 9609.107 ms statement: select * from (select * from besttest order by tt offset 1000) i; FWIW, master performed as follows with work_mem = '10MB': LOG: duration: 60456.943 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 60368.987 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 61223.942 ms statement: select * from (select * from besttest order by tt offset 1000) i; And master did quite a lot better with work_mem = '5GB', in a way that fits with my prejudices about how quicksort is supposed to perform relative to tape sort: LOG: duration: 0.060 ms statement: set work_mem = '5GB'; LOG: duration: 41697.659 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 41755.496 ms statement: select * from (select * from besttest order by tt offset 1000) i; LOG: duration: 41883.888 ms statement: select * from (select * from besttest order by tt offset 1000) i; work_mem = '10MB' continues to be the external sort sweet spot for this hardware, for whatever reason - I can add a few seconds to execution time by increasing or decreasing the setting a bit. I'm using an Intel Core i7-3520M CPU @ 2.90GHz, with a /proc/cpuinfo reported L3 cache size of 4096 KB. I have been very careful to take into account power saving features throughout all experimentation/benchmarking of this patch and previous abbreviated key patches - failing to do so is a good way to end up with complete garbage when investigating this kind of thing. Anyway, I'm not sure what this tells us about quicksort and tape sort, but I think there might be an interesting and more general insight to be gained here. I'd have thought that tape sort wastes memory bandwidth by copying to operating system buffers to the extent that things are slowed down considerably (this is after all a test performed with lots of memory available, even when work_mem = '10 MB'). And even if that wasn't a significant factor, I'd expect quicksort to win decisively anyway. Why does this happen? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers