Re: [HACKERS] Parallel Seq Scan
On 2015-02-07 17:16:12 -0500, Robert Haas wrote: > On Sat, Feb 7, 2015 at 4:30 PM, Andres Freund wrote: > > > [ criticicm of Amit's heapam integration ] > > I'm not convinced that that reasoning is generally valid. While it may > > work out nicely for seqscans - which might be useful enough on its own - > > the more stuff we parallelize the *more* the executor will have to know > > about it to make it sane. To actually scale nicely e.g. a parallel sort > > will have to execute the nodes below it on each backend, instead of > > doing that in one as a separate step, ferrying over all tuples to > > indivdual backends through queues, and only then parallezing the > > sort. > > > > Now. None of that is likely to matter immediately, but I think starting > > to build the infrastructure at the points where we'll later need it does > > make some sense. > > Well, I agree with you, but I'm not really sure what that has to do > with the issue at hand. I mean, if we were to apply Amit's patch, > we'd been in a situation where, for a non-parallel heap scan, heapam.c > decides the order in which blocks get scanned, but for a parallel heap > scan, nodeParallelSeqscan.c makes that decision. Maybe I'm an old > fuddy-duddy[1] but that seems like an abstraction violation to me. I > think the executor should see a parallel scan as a stream of tuples > that streams into a bunch of backends in parallel, without really > knowing how heapam.c is dividing up the work. That's how it's > modularized today, and I don't see a reason to change it. Do you? I don't really agree. Normally heapam just sequentially scan the heap in one go, not much logic to that. Ok, then there's also the synchronized seqscan stuff - which just about every user of heapscans but the executor promptly disables again. I don't think a heap_scan_page() or similar API will consitute a relevant layering violation over what we already have. Note that I'm not saying that Amit's patch is right - I haven't read it - but that I don't think a 'scan this range of pages' heapscan API would not be a bad idea. Not even just for parallelism, but for a bunch of usecases. > Regarding tuple flow between backends, I've thought about that before, > I agree that we need it, and I don't think I know how to do it. I can > see how to have a group of processes executing a single node in > parallel, or a single process executing a group of nodes we break off > from the query tree and push down to it, but what you're talking about > here is a group of processes executing a group of nodes jointly. I don't think it really is that. I think you'd do it essentially by introducing a couple more nodes. Something like SomeUpperLayerNode | | AggCombinerNode / \ / \ / \ PartialHashAggNode PartialHashAggNode .PartialHashAggNode ... || || || || PartialSeqScanPartialSeqScan The only thing that'd potentially might need to end up working jointly jointly would be the block selection of the individual PartialSeqScans to avoid having to wait for stragglers for too long. E.g. each might just ask for a range of a 16 megabytes or so that it scans sequentially. In such a plan - a pretty sensible and not that uncommon thing for parallelized aggregates - you'd need to be able to tell the heap scans which blocks to scan. Right? > That seems like an excellent idea, but I don't know how to design it. > Actually routing the tuples between whichever backends we want to > exchange them between is easy enough, but how do we decide whether to > generate such a plan? What does the actual plan tree look like? I described above how I think it'd roughly look like. Whether to generate it probably would be dependant on the cardinality (not much point to do the above if all groups are distinct) and possibly the aggregates in use (if we have a parallizable sum/count/avg etc). > Maybe we designate nodes as can-generate-multiple-tuple-streams (seq > scan, mostly, I would think) and can-absorb-parallel-tuple-streams > (sort, hash, materialize), or something like that, but I'm really > fuzzy on the details. I don't think we really should have individual nodes that produce multiple streams - that seems like it'd end up being really complicated. I'd more say that we have distinct nodes (like the PartialSeqScan ones above) that do a teensy bit of coordination about which work to perform. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Traini
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, A bug had been introduced in the latest versions of the patch. The order of parameters passed to pglz_decompress was wrong. Please find attached patch with following correction, Original code, + if (pglz_decompress(block_image, record->uncompressBuf, + bkpb->bkp_len, bkpb->bkp_uncompress_len) == 0) Correction + if (pglz_decompress(block_image, bkpb->bkp_len, + record->uncompressBuf, bkpb->bkp_uncompress_len) == 0) >For example here you should not have a space between the function definition >and the variable declarations: >+{ >+ >+int orig_len = BLCKSZ - hole_length; >This is as well incorrect in two places: >if(hole_length != 0) >There should be a space between the if and its condition in parenthesis. Also corrected above code format mistakes. Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Syed, Rahila Sent: Monday, February 09, 2015 6:58 PM To: Michael Paquier; Fujii Masao Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes Hello, >> Do we always need extra two bytes for compressed backup block? >> ISTM that extra bytes are not necessary when the hole length is zero. >> In this case the length of the original backup block (i.e., >> uncompressed) must be BLCKSZ, so we don't need to save the original >> size in the extra bytes. >Yes, we would need a additional bit to identify that. We could steal it from >length in XLogRecordBlockImageHeader. This is implemented in the attached patch by dividing length field as follows, uint16 length:15, with_hole:1; >"2" should be replaced with the macro variable indicating the size of >extra header for compressed backup block. Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 >You can refactor XLogCompressBackupBlock() and move all the above code >to it for more simplicity This is also implemented in the patch attached. Thank you, Rahila Syed -Original Message- From: Michael Paquier [mailto:michael.paqu...@gmail.com] Sent: Friday, February 06, 2015 6:00 PM To: Fujii Masao Cc: Syed, Rahila; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: > Do we always need extra two bytes for compressed backup block? > ISTM that extra bytes are not necessary when the hole length is zero. > In this case the length of the original backup block (i.e., > uncompressed) must be BLCKSZ, so we don't need to save the original > size in the extra bytes. Yes, we would need a additional bit to identify that. We could steal it from length in XLogRecordBlockImageHeader. > Furthermore, when fpw compression is disabled and the hole length is > zero, we seem to be able to save one byte from the header of backup > block. Currently we use 4 bytes for the header, 2 bytes for the length > of backup block, 15 bits for the hole offset and 1 bit for the flag > indicating whether block is compressed or not. But in that case, the > length of backup block doesn't need to be stored because it must be > BLCKSZ. Shouldn't we optimize the header in this way? Thought? If we do it, that's something to tackle even before this patch on HEAD, because you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 bytes as hole-related data is not necessary. I imagine that a patch optimizing that wouldn't be that hard to write as well. > +int page_len = BLCKSZ - hole_length; > +char *scratch_buf; > +if (hole_length != 0) > +{ > +scratch_buf = compression_scratch; > +memcpy(scratch_buf, page, hole_offset); > +memcpy(scratch_buf + hole_offset, > + page + (hole_offset + hole_length), > + BLCKSZ - (hole_length + hole_offset)); > +} > +else > +scratch_buf = page; > + > +/* Perform compression of block */ > +if (XLogCompressBackupBlock(scratch_buf, > +page_len, > +regbuf->compressed_page, > +&compress_len)) > +{ > +/* compression is done, add record */ > +is_compressed = true; > +} > > You can refactor XLogCompressBackupBlock() and move all the above code > to it for more simplicity. Sure. -- Michael __ Disclaimer: Thi
Re: [HACKERS] RangeType internal use
On Mon, Feb 09, 2015 at 12:37:05PM -0500, Tom Lane wrote: > Robert Haas writes: > > Yeah, but people expect to be able to partition on ranges that are not > > all of equal width. I think any proposal that we shouldn't support > > that is the kiss of death for a feature like this - it will be so > > restricted as to eliminate 75% of the use cases. > > Well, that's debatable IMO (especially your claim that variable-size > partitions would be needed by a majority of users). I don't know about user wishes directly, though I do suspect fixed partition stride would cover more than 25% of uses cases. I do know that SQL Server, Oracle and MySQL have variable-stride range partitioning, and none of them have fixed-stride range partitioning. So, like Heikki and Robert, I would bet on variable-stride range partitioning. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench -f and vacuum
On Mon, Feb 9, 2015 at 2:54 PM, Tatsuo Ishii wrote: > Agreed. Here is the patch to implement the idea: -f just implies -n. Some small comments: - is_no_vacuum, as well as is_init_mode, are defined as an integers but their use imply that they are boolean switches. This patch sets is_no_vacuum to true, while the rest of the code actually increment its value when -n is used. Why not simply changing both flags as booleans? My suggestion is not really related to this patch and purely cosmetic but we could change things at the same time, or update your patch to increment to is_no_vacuum++ when -f is used. - The documentation misses some markups for pgbench and VACUUM and did not respect the 80-character limit. Attached is an updated patch with those things changed. Regards, -- Michael From 3553151908a1be40b67703fcc27b696bad546b96 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 10 Feb 2015 03:59:19 +0900 Subject: [PATCH] Imply -n when using -f in pgbench At the same time make the flags is_init_mode and is_no_vaccuum booleans. This is a purely cosmetic change but those flags have been always used as such, even if they were defined as integers. --- contrib/pgbench/pgbench.c | 9 + doc/src/sgml/pgbench.sgml | 7 +++ 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index ddd11a0..eed9324 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -2679,8 +2679,8 @@ main(int argc, char **argv) int c; int nclients = 1; /* default number of simulated clients */ int nthreads = 1; /* default number of threads */ - int is_init_mode = 0; /* initialize mode? */ - int is_no_vacuum = 0; /* no vacuum at all before testing? */ + bool is_init_mode = false; /* initialize mode? */ + bool is_no_vacuum = false; /* no vacuum at all before testing? */ int do_vacuum_accounts = 0; /* do vacuum accounts before testing? */ int ttype = 0; /* transaction type. 0: TPC-B, 1: SELECT only, * 2: skip update of branches and tellers */ @@ -2753,13 +2753,13 @@ main(int argc, char **argv) switch (c) { case 'i': -is_init_mode++; +is_init_mode = true; break; case 'h': pghost = pg_strdup(optarg); break; case 'n': -is_no_vacuum++; +is_no_vacuum = true; break; case 'v': do_vacuum_accounts++; @@ -2872,6 +2872,7 @@ main(int argc, char **argv) case 'f': benchmarking_option_set = true; ttype = 3; +is_no_vacuum = true; /* "-f" implies "-n" (no vacuum) */ filename = pg_strdup(optarg); if (process_file(filename) == false || *sql_files[num_files - 1] == NULL) exit(1); diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index 7d203cd..0070882 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -316,6 +316,13 @@ pgbench options dbname -N, -S, and -f are mutually exclusive. + +Please note that -f implies -n, +which means that VACUUM for the standard +pgbench tables will not be performed before +running the benchmarking. VACUUM command should +be run beforehand if needed. + -- 2.3.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
On 2015/02/10 7:23, Dean Rasheed wrote: On 9 February 2015 at 21:17, Stephen Frost wrote: On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita I noticed that when updating security barrier views on foreign tables, we fail to give FOR UPDATE to selection queries issued at ForeignScan. I've looked into this a fair bit more over the weekend and the issue appears to be that the FDW isn't expecting a do-instead sub-query. I've been considering how we might be able to address that but havn't come up with any particularly great ideas and would welcome any suggestions. Simply having the FDW try to go up through the query would likely end up with too many queries showing up with 'for update'. We add the 'for update' to the sub-query before we even get called from the 'Modify' path too, which means we can't use that to realize when we're getting ready to modify rows and therefore need to lock them. In any case, I'll continue to look but would welcome any other thoughts. Sorry, I didn't have time to look at this properly. My initial thought is that expand_security_qual() needs to request a lock on rows coming from the relation it pushes down into a subquery if that relation was the result relation, because otherwise it won't have any locks, since preprocess_rowmarks() only adds PlanRowMarks to non-target relations. That seems close to what I had in mind; expand_security_qual() needs to request a FOR UPDATE lock on rows coming from the relation it pushes down into a subquery only when that relation is the result relation and *foreign table*. Thanks for dicussing this issue! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainModifyTarget doesn't work as expected
On 2015/02/07 1:09, Tom Lane wrote: > IIRC, this code was written at a time when we didn't have NO INHERIT check > constraints and so it was impossible for the parent table to get optimized > away while leaving children. So the comment in ExplainModifyTarget was > good at the time. But it no longer is. > > I think your basic idea of preserving the original parent table's relid > is correct; but instead of doing it like this patch does, I'd be inclined > to make ModifyTable inherit from Scan not Plan, and use the scan.scanrelid > field to carry the parent RTI. Then you would probably end up with a net > savings of code rather than net addition; certainly ExplainModifyTarget > would go away entirely since you'd just treat ModifyTable like any other > Scan in this part of EXPLAIN. Will follow your revision. Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-advocacy] GSoC 2015 - mentors, students and admins.
I am up for mentoring again. On 10 Feb 2015 02:23, "Thom Brown" wrote: > Hi all, > > Google Summer of Code 2015 is approaching. I'm intending on registering > PostgreSQL again this year. > > Before I do that, I'd like to have an idea of how many people are > interested in being either a student or a mentor. > > I've volunteered to be admin again, but if anyone else has a strong > interest of seeing the projects through this year, please let yourself be > known. > > Who would be willing to mentor projects this year? > > Project ideas are welcome, and I've copied many from last year's > submissions into this year's wiki page. Please feel free to add more (or > delete any that stand no chance or are redundant): > https://wiki.postgresql.org/wiki/GSoC_2015 > > Students can find more information about GSoC here: > http://www.postgresql.org/developer/summerofcode > > Thanks > > Thom >
Re: [HACKERS] RangeType internal use
On 10-02-2015 AM 02:37, Tom Lane wrote: > Robert Haas writes: >> On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane wrote: >>> It's going to be complicated and probably buggy, and I think it is heading >>> in the wrong direction altogether. If you want to partition in some >>> arbitrary complicated fashion that the system can't reason about very >>> effectively, we *already have that*. IMO the entire point of building >>> a new partitioning infrastructure is to build something simple, reliable, >>> and a whole lot faster than what you can get from inheritance >>> relationships. And "faster" is going to come mainly from making the >>> partitioning rules as simple as possible, not as complex as possible. > >> Yeah, but people expect to be able to partition on ranges that are not >> all of equal width. I think any proposal that we shouldn't support >> that is the kiss of death for a feature like this - it will be so >> restricted as to eliminate 75% of the use cases. > > Well, that's debatable IMO (especially your claim that variable-size > partitions would be needed by a majority of users). But in any case, > partitioning behavior that is emergent from a bunch of independent pieces > of information scattered among N tables seems absolutely untenable from > where I sit. Whatever we support, the behavior needs to be described by > *one* chunk of information --- a sorted list of bin bounding values, > perhaps. > I'm a bit confused here. I got an impression that partitioning formula as you suggest would consist of two pieces of information - an origin point & a bin width. Then routing a tuple consists of using exactly these two values to tell a bin number and hence a partition in O(1) time assuming we've made all partitions be exactly bin-width wide. You mention here a sorted list of bin bounding values which we can very well put together for a partitioned table in its relation descriptor based on whatever information we stored in catalog. That is, we can always have a *one* chunk of partitioning information as *internal* representation irrespective of how generalized we make our on-disk representation. We can get O(log N) if not O(1) from that I'd hope. In fact, that's what I had in mind about this. Did I read it wrong? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 9, 2015 at 10:27 PM, Syed, Rahila wrote: > (snip) Thanks for showing up here! I have not tested the test the patch, those comments are based on what I read from v17. >>> Do we always need extra two bytes for compressed backup block? >>> ISTM that extra bytes are not necessary when the hole length is zero. >>> In this case the length of the original backup block (i.e., >>> uncompressed) must be BLCKSZ, so we don't need to save the original >>> size in the extra bytes. > >>Yes, we would need a additional bit to identify that. We could steal it from >>length in XLogRecordBlockImageHeader. > > This is implemented in the attached patch by dividing length field as follows, > uint16 length:15, > with_hole:1; IMO, we should add details about how this new field is used in the comments on top of XLogRecordBlockImageHeader, meaning that when a page hole is present we use the compression info structure and when there is no hole, we are sure that the FPW raw length is BLCKSZ meaning that the two bytes of the CompressionInfo stuff is unnecessary. > >>"2" should be replaced with the macro variable indicating the size of >>extra header for compressed backup block. > Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 > >>You can refactor XLogCompressBackupBlock() and move all the >>above code to it for more simplicity > This is also implemented in the patch attached. This portion looks correct to me. A couple of other comments: 1) Nitpicky but, code format is sometimes strange. For example here you should not have a space between the function definition and the variable declarations: +{ + +int orig_len = BLCKSZ - hole_length; This is as well incorrect in two places: if(hole_length != 0) There should be a space between the if and its condition in parenthesis. 2) For correctness with_hole should be set even for uncompressed pages. I think that we should as well use it for sanity checks in xlogreader.c when decoding records. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enabling nestedloop and disabling hashjon
Ravi Kiran writes: > I tried modifying the postgresql.conf file where I set the value > enable_hashjoin=off and also enable_mergejoin=off, so that I could force > postgres to use nested loop. > but postgres is still using the hash join algorithm even after modifying > the postgresql code. Did you remember "pg_ctl reload"? enable_hashjoin=off will most certainly work if it's active. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sloppy back-patching of column-privilege leak
Stephen Frost writes: > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: >> FWIW using different commit messages for different branches is a >> mistake. The implicit policy we have is that there is one message, >> identical for all branches, that describe how the patch differs in each >> branch whenever necessary. Note that the git_changelog output that >> Robert pasted is not verbatim from the tool; what it actually prints is >> three separate entries, one for each different message you used, which >> is not what is supposed to occur. > Ok, thanks. That's certainly easy enough to do and I'll do so in the > future. I could have sworn I'd seen cases where further clarification > was done for branch-specific commits but perhaps something else was > different there. Some people do it differently when the per-branch commits are very much different, but what Alvaro suggests is certainly handy when it comes time to make release notes ;-). >> I say this policy is implicit because I don't recall it being spelled >> out anywhere, but since it's embodied in git_changelog's working >> principle it's important we stick to it. > I have to admit that I've never really used git_changelog. It's pretty handy. The output for a couple of recent commits looks like Author: Noah Misch Branch: master [a7a4adcf8] 2015-02-06 23:14:27 -0500 Assert(PqCommReadingMsg) in pq_peekbyte(). Interrupting pq_recvbuf() can break protocol sync, so its callers all deserve this assertion. The one pq_peekbyte() caller suffices already. Author: Heikki Linnakangas Branch: master [ff16b40f8] 2015-02-06 11:26:50 +0200 Branch: REL9_4_STABLE [3bc4c6942] 2015-02-06 11:27:12 +0200 Branch: REL9_3_STABLE [5f0ba4abb] 2015-02-06 11:32:16 +0200 Branch: REL9_2_STABLE [2af568c6b] 2015-02-06 11:32:37 +0200 Branch: REL9_1_STABLE [0d36d9f2b] 2015-02-06 11:32:42 +0200 Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM When beginning streaming replication, the client usually issues the IDENTIFY_SYSTEM command, which used to return the current WAL insert position. That's not suitable for the intended purpose of that field, however. pg_receivexlog uses it to start replication from the reported point, but if it hasn't been flushed to disk yet, it will fail. Change IDENTIFY_SYSTEM to report the flush position instead. Backpatch to 9.1 and above. 9.0 doesn't report any WAL position. Heikki's five commits got merged into one entry because they had identical log-message texts and were committed on the same day. Further back in the output you find things like Author: Peter Eisentraut Branch: master [1332bbb30] 2015-02-01 22:36:44 -0500 Branch: REL9_4_STABLE Release: REL9_4_1 [0ca8cc581] 2015-02-01 22:40:13 -0500 Branch: REL9_3_STABLE Release: REL9_3_6 [6b9b705c9] 2015-02-01 22:40:25 -0500 Branch: REL9_2_STABLE Release: REL9_2_10 [040f5ef50] 2015-02-01 22:40:36 -0500 Branch: REL9_1_STABLE Release: REL9_1_15 [2b0d33599] 2015-02-01 22:40:45 -0500 Branch: REL9_0_STABLE Release: REL9_0_19 [b09ca8834] 2015-02-01 22:40:53 -0500 doc: Improve claim about location of pg_service.conf The previous wording claimed that the file was always in /etc, but of course this varies with the installation layout. Write instead that it can be found via `pg_config --sysconfdir`. Even though this is still somewhat incorrect because it doesn't account of moved installations, it at least conveys that the location depends on the installation. which show the first actual release incorporating each patch. So even if you're not writing release notes, it's mighty handy for identifying when/whether a given patch has shipped. I tend to run this once a week or so and keep the output around in a text file for quick searching. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
On 9 February 2015 at 21:17, Stephen Frost wrote: >> > On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita >> > > I noticed that when updating security barrier views on foreign tables, >> > > we fail to give FOR UPDATE to selection queries issued at ForeignScan. >> > I've looked into this a fair bit more over the weekend and the issue > appears to be that the FDW isn't expecting a do-instead sub-query. > I've been considering how we might be able to address that but havn't > come up with any particularly great ideas and would welcome any > suggestions. Simply having the FDW try to go up through the query would > likely end up with too many queries showing up with 'for update'. We > add the 'for update' to the sub-query before we even get called from > the 'Modify' path too, which means we can't use that to realize when > we're getting ready to modify rows and therefore need to lock them. > > In any case, I'll continue to look but would welcome any other thoughts. > Sorry, I didn't have time to look at this properly. My initial thought is that expand_security_qual() needs to request a lock on rows coming from the relation it pushes down into a subquery if that relation was the result relation, because otherwise it won't have any locks, since preprocess_rowmarks() only adds PlanRowMarks to non-target relations. Of course that means that it may end up locking more rows than are actually updated, but that's essentially the same as a SELECT FOR UPDATE on a s.b. view right now. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Am 09.02.15 um 13:13 schrieb Hakan Kocaman: > Hi, > > 2015-02-09 10:37 GMT+01:00 Marc Balmer mailto:m...@msys.ch>>: > > > (I use cursors to display large datasets in a page-wise way, where the > user can move per-page, or, when displaying a single record, per record. > When the user goes back from per-record view to page-view, I have to > restore the cursor to the position it was on before the user changed to > per-record view.) > > > On a totaly unrelated note: > http://use-the-index-luke.com/de/blog/2013-07/pagination-done-the-postgresql-way yes, totally unrelated, indeed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] enabling nestedloop and disabling hashjon
Hi, I want to disable the hashjoin algorithm used by postgres by default, and enable the nested loop join algorithm, can some one tell me how to do that. I tried modifying the postgresql.conf file where I set the value enable_hashjoin=off and also enable_mergejoin=off, so that I could force postgres to use nested loop. but postgres is still using the hash join algorithm even after modifying the postgresql code. can some one tell me what I am doing wrong, or is there any other file where I need to modify to enable nested loop join algorithm. Thanks ᐧ
[HACKERS] Corner case for add_path_precheck
While thinking about add_path_precheck() function, it occurred to me that it can discard some paths that otherwise would have chance be accepted in this part of add_path(): /* * Same pathkeys and outer rels, and fuzzily * the same cost, so keep just one; to decide * which, first check rows and then do a fuzzy * cost comparison with very small fuzz limit. * (We used to do an exact cost comparison, * but that results in annoying * platform-specific plan variations due to * roundoff in the cost estimates.) If things * are still tied, arbitrarily keep only the * old path. Notice that we will keep only * the old path even if the less-fuzzy * comparison decides the startup and total * costs compare differently. */ if (new_path->rows < old_path->rows) remove_old = true; /* new dominates old */ else if (new_path->rows > old_path->rows) accept_new = false; /* old dominates new */ else if (compare_path_costs_fuzzily(new_path, The special case is that the path passed to add_path_precheck() has costs *equal to* those of the old_path. If pathkeys, outer rells and costs are the same, as summarized in the comment above, I expect add_path_precheck() to return false. Do I misread anything? (Maybe the fact that this does not happen too often is that add_path_precheck() compares the costs exactly, as opposed to the "fuzzy comparison"?) -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sloppy back-patching of column-privilege leak
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > FWIW using different commit messages for different branches is a > mistake. The implicit policy we have is that there is one message, > identical for all branches, that describe how the patch differs in each > branch whenever necessary. Note that the git_changelog output that > Robert pasted is not verbatim from the tool; what it actually prints is > three separate entries, one for each different message you used, which > is not what is supposed to occur. Ok, thanks. That's certainly easy enough to do and I'll do so in the future. I could have sworn I'd seen cases where further clarification was done for branch-specific commits but perhaps something else was different there. > I say this policy is implicit because I don't recall it being spelled > out anywhere, but since it's embodied in git_changelog's working > principle it's important we stick to it. I have to admit that I've never really used git_changelog. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables
* Stephen Frost (sfr...@snowman.net) wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > > On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita > > wrote: > > > I noticed that when updating security barrier views on foreign tables, > > > we fail to give FOR UPDATE to selection queries issued at ForeignScan. > > > Here is an example. > [...] > > > postgres=# alter view rw_view set (security_barrier = true); > > > ALTER VIEW > > > postgres=# explain verbose delete from rw_view; > > > QUERY PLAN > > > -- > > > Delete on public.base_ftbl base_ftbl_1 (cost=100.00..144.54 rows=14 > > > width=6) > > >Remote SQL: DELETE FROM public.base_tbl WHERE ctid = $1 > > >-> Subquery Scan on base_ftbl (cost=100.00..144.54 rows=14 width=6) > > > Output: base_ftbl.ctid > > > -> Foreign Scan on public.base_ftbl base_ftbl_2 > > > (cost=100.00..144.40 rows=14 width=6) > > >Output: base_ftbl_2.ctid > > >Remote SQL: SELECT ctid FROM public.base_tbl WHERE > > > ((visibility = 'public'::text)) > > > (7 rows) > > > > > > Correct me if I am wrong. > > > > That looks like a bug to me. > > Agreed. I've been looking at this and I suspect it's related to the > discussion around prepsecurity.c and generating the security barrier > subquery that I've been having with Dean. An initial look, at least, > shows that GetForeignPlan is looking at the subquery instead of the base > relation (as it expects to be). > > I'll continue digging into it. I've looked into this a fair bit more over the weekend and the issue appears to be that the FDW isn't expecting a do-instead sub-query. I've been considering how we might be able to address that but havn't come up with any particularly great ideas and would welcome any suggestions. Simply having the FDW try to go up through the query would likely end up with too many queries showing up with 'for update'. We add the 'for update' to the sub-query before we even get called from the 'Modify' path too, which means we can't use that to realize when we're getting ready to modify rows and therefore need to lock them. In any case, I'll continue to look but would welcome any other thoughts. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] sloppy back-patching of column-privilege leak
Stephen Frost wrote: > > Besides the sloppiness of back-patching in > > that one macro and nothing else, this is a huge fraction of the patch > > that's just *gone* in the 9.0 and 9.1 branches, and there's not so > > much as a hint about it in the commit message, which is pretty > > astonishing to me. > > Right, 9.0 and 9.1 don't have as detailed messages and so there wasn't > as much to do in those older branches. I was well aware of that and I > could have sworn that I included something in the commit messages.. > Looks like I did, but not in a way which was as clear as it should have > been. Specifically, in those older branches, the commit message only > talks about the functions which exist in those branches- > BuildIndexValueDescription and ri_ReportViolation. The commit messages > for 9.2 and beyond also reference ExecBuildSlotValueDescription, which > is what you're getting at. > > In hindsight, I agree that doing just that wasn't sufficient and > thinking on it now I realize that, while having different commit > messages for each branch may make sense if you're only looking at that > branch, it ends up being confusing for folks following the overall > project as they likely look at just the subject of each commit and > expect the contents for every branch to be the same. To that point, > I'll try to be clearer when there's a difference in commit message for > each branch, or simply include everything for every branch in an > identical commit message across all of them. FWIW using different commit messages for different branches is a mistake. The implicit policy we have is that there is one message, identical for all branches, that describe how the patch differs in each branch whenever necessary. Note that the git_changelog output that Robert pasted is not verbatim from the tool; what it actually prints is three separate entries, one for each different message you used, which is not what is supposed to occur. I say this policy is implicit because I don't recall it being spelled out anywhere, but since it's embodied in git_changelog's working principle it's important we stick to it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > I happened to notice this morning while hacking that the > "hasRowSecurity" fields added to PlannerGlobal and PlannedStmt have > not been given proper nodefuncs.c support. Both need to be added to > outfuncs.c, and the latter to copyfuncs.c. The latter omission may > well be a security bug, although I haven't attempted to verify that, > but fortunately this isn't released yet. I saw this and will address it. Would be great if you wouldn't mind CC'ing me directly on anything RLS-related, same as you CC'd me on the column-privilege backpatch. I expect I'll probably notice anyway, but I'll see them faster when I'm CC'd. I agree that it's great that we're catching issues prior to when the feature is released and look forward to anything else you (or anyone else!) finds. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] sloppy back-patching of column-privilege leak
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > In 9.2 and newer branches, this commit makes substantial changes to > execMain.c. In 9.1 and 9.0, though, the change to that file is just > this: > > --- a/src/backend/executor/execMain.c > +++ b/src/backend/executor/execMain.c > @@ -95,6 +95,15 @@ static void intorel_receive(TupleTableSlot *slot, > DestReceiver *self); > static void intorel_shutdown(DestReceiver *self); > static void intorel_destroy(DestReceiver *self); > > +/* > + * Note that this macro also exists in commands/trigger.c. There does not > + * appear to be any good header to put it into, given the structures that > + * it uses, so we let them be duplicated. Be sure to update both if one > needs > + * to be changed, however. > + */ > +#define GetModifiedColumns(relinfo, estate) \ > + (rt_fetch((relinfo)->ri_RangeTableIndex, > (estate)->es_range_table)->modifiedCols) > + > /* end of local decls */ > > We shouldn't be adding a macro to that file that isn't used anywhere. Ah, yes, agreed, that didn't need to be added in the older branches. > Either that file needs more substantial changes, or it doesn't need > this, either. What gives? It doesn't need it. The older branches don't produce as detailed errors and because of that there weren't any further changes to be made. Including the extraneous #define in those older branches was certainly a mistake and I'll correct it. > Besides the sloppiness of back-patching in > that one macro and nothing else, this is a huge fraction of the patch > that's just *gone* in the 9.0 and 9.1 branches, and there's not so > much as a hint about it in the commit message, which is pretty > astonishing to me. Right, 9.0 and 9.1 don't have as detailed messages and so there wasn't as much to do in those older branches. I was well aware of that and I could have sworn that I included something in the commit messages.. Looks like I did, but not in a way which was as clear as it should have been. Specifically, in those older branches, the commit message only talks about the functions which exist in those branches- BuildIndexValueDescription and ri_ReportViolation. The commit messages for 9.2 and beyond also reference ExecBuildSlotValueDescription, which is what you're getting at. In hindsight, I agree that doing just that wasn't sufficient and thinking on it now I realize that, while having different commit messages for each branch may make sense if you're only looking at that branch, it ends up being confusing for folks following the overall project as they likely look at just the subject of each commit and expect the contents for every branch to be the same. To that point, I'll try to be clearer when there's a difference in commit message for each branch, or simply include everything for every branch in an identical commit message across all of them. > Notice how the comments match the actual behavior. But in 9.0, you've > got the SAME COMMENTS with DIFFERENT BEHAVIOR: Good point. The comments clearly were for the master-based behavior but the older branches don't have the same detail and so they should have been updated. I knew that they were different, and understood and expected those differences, made the specific changes which were appropriate for each branch, but missed updating the comments. > The comments say that you'll see the relevant columns, but you don't. > Since when is it OK to check things into our regression tests where > the comments say one thing and the behavior is something else? I'm a bit confused by this- I certainly didn't intend this mistake to be implying that our policy on comments and the code they're associated with should be changed to allow that they don't match up, nor do I advocate such a position now. > I don't even know what to say about this. I cannot understand how you > can back-patch something like this and fail to notice that the > regression tests give different output on different branches, and that > that output is inconsistent with the comments. I was quite aware that the regression tests were different on different branches, as I knew (which the commit messages show) that only a subset of the patch was relevant for the older branches since they didn't include the detail where the security issue originated from. I certainly reviewed the regression tests but obviously missed that the comments ended up out-dated due to the back-branch capabilities. Thanks for the review and comments. I'll remove the extraneous #define, the comment change which was made to the other #define (as it's not relevant since the #define is only located in one place) and fix the regression test comments to match the behavior in the older branches. Thanks again! Stephen signature.asc Description: Digital signature
[HACKERS] GSoC 2015 - mentors, students and admins.
Hi all, Google Summer of Code 2015 is approaching. I'm intending on registering PostgreSQL again this year. Before I do that, I'd like to have an idea of how many people are interested in being either a student or a mentor. I've volunteered to be admin again, but if anyone else has a strong interest of seeing the projects through this year, please let yourself be known. Who would be willing to mentor projects this year? Project ideas are welcome, and I've copied many from last year's submissions into this year's wiki page. Please feel free to add more (or delete any that stand no chance or are redundant): https://wiki.postgresql.org/wiki/GSoC_2015 Students can find more information about GSoC here: http://www.postgresql.org/developer/summerofcode Thanks Thom
[HACKERS] sloppy back-patching of column-privilege leak
Branch: master [804b6b6db] 2015-01-28 12:31:30 -0500 Branch: REL9_4_STABLE Release: REL9_4_1 [3cc74a3d6] 2015-01-28 12:32:06 -0500 Branch: REL9_3_STABLE Release: REL9_3_6 [4b9874216] 2015-01-28 12:32:39 -0500 Branch: REL9_2_STABLE Release: REL9_2_10 [d49f84b08] 2015-01-28 12:32:56 -0500 Branch: REL9_1_STABLE Release: REL9_1_15 [9406884af] 2015-01-28 12:33:15 -0500 Branch: REL9_0_STABLE Release: REL9_0_19 [3a2063369] 2015-01-28 12:33:29 -0500 In 9.2 and newer branches, this commit makes substantial changes to execMain.c. In 9.1 and 9.0, though, the change to that file is just this: --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -95,6 +95,15 @@ static void intorel_receive(TupleTableSlot *slot, DestReceiver *self); static void intorel_shutdown(DestReceiver *self); static void intorel_destroy(DestReceiver *self); +/* + * Note that this macro also exists in commands/trigger.c. There does not + * appear to be any good header to put it into, given the structures that + * it uses, so we let them be duplicated. Be sure to update both if one needs + * to be changed, however. + */ +#define GetModifiedColumns(relinfo, estate) \ + (rt_fetch((relinfo)->ri_RangeTableIndex, (estate)->es_range_table)->modifiedCols) + /* end of local decls */ We shouldn't be adding a macro to that file that isn't used anywhere. Either that file needs more substantial changes, or it doesn't need this, either. What gives? Besides the sloppiness of back-patching in that one macro and nothing else, this is a huge fraction of the patch that's just *gone* in the 9.0 and 9.1 branches, and there's not so much as a hint about it in the commit message, which is pretty astonishing to me. What's even more troubling is the new regression tests. In master, this commit added this: +INSERT INTO t1 (c1, c2) VALUES (1, 1); -- fail, but row not shown +ERROR: duplicate key value violates unique constraint "t1_pkey" +UPDATE t1 SET c2 = 1; -- fail, but row not shown +ERROR: duplicate key value violates unique constraint "t1_pkey" +INSERT INTO t1 (c1, c2) VALUES (null, null); -- fail, but see columns being inserted +ERROR: null value in column "c1" violates not-null constraint +DETAIL: Failing row contains (c1, c2) = (null, null). +INSERT INTO t1 (c3) VALUES (null); -- fail, but see columns being inserted or have SELECT +ERROR: null value in column "c1" violates not-null constraint +DETAIL: Failing row contains (c1, c3) = (null, null). +INSERT INTO t1 (c1) VALUES (5); -- fail, but see columns being inserted or have SELECT +ERROR: null value in column "c2" violates not-null constraint +DETAIL: Failing row contains (c1) = (5). +UPDATE t1 SET c3 = 10; -- fail, but see columns with SELECT rights, or being modified +ERROR: new row for relation "t1" violates check constraint "t1_c3_check" +DETAIL: Failing row contains (c1, c3) = (1, 10). Notice how the comments match the actual behavior. But in 9.0, you've got the SAME COMMENTS with DIFFERENT BEHAVIOR: +INSERT INTO t1 (c1, c2) VALUES (1, 1); -- fail, but row not shown +ERROR: duplicate key value violates unique constraint "t1_pkey" +UPDATE t1 SET c2 = 1; -- fail, but row not shown +ERROR: duplicate key value violates unique constraint "t1_pkey" +INSERT INTO t1 (c1, c2) VALUES (null, null); -- fail, but see columns being inserted +ERROR: null value in column "c1" violates not-null constraint +INSERT INTO t1 (c3) VALUES (null); -- fail, but see columns being inserted or have SELECT +ERROR: null value in column "c1" violates not-null constraint +INSERT INTO t1 (c1) VALUES (5); -- fail, but see columns being inserted or have SELECT +ERROR: null value in column "c2" violates not-null constraint +UPDATE t1 SET c3 = 10; -- fail, but see columns with SELECT rights, or being modified +ERROR: new row for relation "t1" violates check constraint "t1_c3_check" The comments say that you'll see the relevant columns, but you don't. Since when is it OK to check things into our regression tests where the comments say one thing and the behavior is something else? I don't even know what to say about this. I cannot understand how you can back-patch something like this and fail to notice that the regression tests give different output on different branches, and that that output is inconsistent with the comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq's multi-threaded SSL callback handling is busted
I did some more digging on bug http://www.postgresql.org/message-id/cahul3dpwyfnugdgo95ohydq4kugdnbkptjq0mnbtubhcmg4...@mail.gmail.com which describes a deadlock when using libpq with SSL in a multi-threaded environment with other threads doing SSL independently. Attached is a reproducing Python script in my experience is faster at showing the problem. Run it with python -u pg_lock.py As Heikki correctly diagnosed, the problem is with libpq unsetting the OpenSSL locking callback while another thread is holding one of the locks. The other thread then never releases the lock and the next time anyone tries to take it, the application deadlocks. The exact way it goes down is like this: T1 (libpq) T2 (Python) start libpq connection init ssl system add locking callback start ssl operation take lock finish libpq connection destroy ssl system remove locking callback (!) release lock, noop since no callback And the next time any thread tries to take the lock, it deadlocks. We added unsetting the locking callback in 4e816286533dd34c10b368487d4079595a3e1418 due to this bug report: http://www.postgresql.org/message-id/48620925.6070...@pws.com.au Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault with the (attached) reproduction script from the original 2008 bug report. If your php.ini loads both the pgsql and curl extensions, reproduce the segfault with: php -f pg_segfault.php The most difficult part about fixing this bug is to determine *who's at fault*. I now lean towards the opinion that we shouldn't be messing with OpenSSL callbacks *at all*. First of all, the current behaviour is crazy. We're setting and unsetting the locking callback every time a connection is made/closed, which is not how OpenSSL is supposed to be used. The *application* using libpq should set a callback before it starts threads, it's no business of the library's. The old behaviour was slightly less insane (set callbacks first time we're engaging OpenSSL code, never touch them again). The real sane solution is to leave it up to the application. I posit we should remove all CRYPTO_set_*_callback functions and associated cruft from libpq. This unfortunately means that multi-threaded applications using libpq and SSL will break if they haven't been setting their own callbacks (if they have, well, tough luck! libpq will just stomp over them the first time it connects to Postgres, but at least *some* callbacks are left present after that). However, AFAICS if your app is not in C, then runtimes already handle that for you (as they should). Python: https://hg.python.org/cpython/file/dc820b44ce21/Modules/_ssl.c#l4284 PHP: https://github.com/php/php-src/blob/master/ext/curl/interface.c#L1235 Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because libpq was setting them on its own. If we remove the callback handling from libpq, PHP will need to add them. By the way, the MySQL extension for PHP also does not set those callbacks. Let me reiterate: I now believe the callbacks should be set by the application, libraries should not touch them, since they don't know what will they be stomping on. If the application is run through a VM like Python or PHP, it's the VM that should make sure the callbacks are set. I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in libpq, but at the very least this will require a warning in the release notes about how you can't assume that libpq will take care of making sure your app is multi-threaded safe when using OpenSSL. I also don't know how far that's back-patcheable. I would very much like to have this change back-patched, since setting and resetting the callback makes using libpq in a threaded OpenSSL-enabled app arguably less safe than if it didn't use any locking. If the app is written correctly, it will have set locking callbacks before starting. Then libpq will happily stomp on them. If the app hasn't set callbacks, it wasn't written correctly in the first place and it will get segfaults instead of deadlocks. Thanks, Jan #!/usr/bin/env python import sys import time import threading import urllib import psycopg2 DB_HOST = '127.0.0.1' DB_USER = 'postgres' DB_NAME = 'postgres' HTTPS_URL = 'https://google.com' NUM_HTTPS_THREADS = 10 def connect(): conn = psycopg2.connect( host=DB_HOST, user=DB_USER, database=DB_NAME, sslmode='require') return conn class Worker(threading.Thread): def __init__(self): super(Worker, self).__init__() self._stop = threading.Event() def stop(self): self._stop.set() def stopped(self): return self._stop.isSet() def run(self): i = 0 while not self.stopped(): i += 1 self.do_work(i) class Libpq(Worke
Re: [HACKERS] RangeType internal use
On Mon, Feb 9, 2015 at 12:37 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane wrote: >>> It's going to be complicated and probably buggy, and I think it is heading >>> in the wrong direction altogether. If you want to partition in some >>> arbitrary complicated fashion that the system can't reason about very >>> effectively, we *already have that*. IMO the entire point of building >>> a new partitioning infrastructure is to build something simple, reliable, >>> and a whole lot faster than what you can get from inheritance >>> relationships. And "faster" is going to come mainly from making the >>> partitioning rules as simple as possible, not as complex as possible. > >> Yeah, but people expect to be able to partition on ranges that are not >> all of equal width. I think any proposal that we shouldn't support >> that is the kiss of death for a feature like this - it will be so >> restricted as to eliminate 75% of the use cases. > > Well, that's debatable IMO (especially your claim that variable-size > partitions would be needed by a majority of users). But in any case, > partitioning behavior that is emergent from a bunch of independent pieces > of information scattered among N tables seems absolutely untenable from > where I sit. Whatever we support, the behavior needs to be described by > *one* chunk of information --- a sorted list of bin bounding values, > perhaps. That's exactly the representation I had in mind. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeType internal use
Robert Haas writes: > On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane wrote: >> It's going to be complicated and probably buggy, and I think it is heading >> in the wrong direction altogether. If you want to partition in some >> arbitrary complicated fashion that the system can't reason about very >> effectively, we *already have that*. IMO the entire point of building >> a new partitioning infrastructure is to build something simple, reliable, >> and a whole lot faster than what you can get from inheritance >> relationships. And "faster" is going to come mainly from making the >> partitioning rules as simple as possible, not as complex as possible. > Yeah, but people expect to be able to partition on ranges that are not > all of equal width. I think any proposal that we shouldn't support > that is the kiss of death for a feature like this - it will be so > restricted as to eliminate 75% of the use cases. Well, that's debatable IMO (especially your claim that variable-size partitions would be needed by a majority of users). But in any case, partitioning behavior that is emergent from a bunch of independent pieces of information scattered among N tables seems absolutely untenable from where I sit. Whatever we support, the behavior needs to be described by *one* chunk of information --- a sorted list of bin bounding values, perhaps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeType internal use
On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane wrote: > It's going to be complicated and probably buggy, and I think it is heading > in the wrong direction altogether. If you want to partition in some > arbitrary complicated fashion that the system can't reason about very > effectively, we *already have that*. IMO the entire point of building > a new partitioning infrastructure is to build something simple, reliable, > and a whole lot faster than what you can get from inheritance > relationships. And "faster" is going to come mainly from making the > partitioning rules as simple as possible, not as complex as possible. Yeah, but people expect to be able to partition on ranges that are not all of equal width. I think any proposal that we shouldn't support that is the kiss of death for a feature like this - it will be so restricted as to eliminate 75% of the use cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Feb 9, 2015 at 5:38 AM, Magnus Hagander wrote: > On Mon, Feb 9, 2015 at 11:09 AM, Marco Nenciarini > wrote: >> >> Il 08/02/15 17:04, Magnus Hagander ha scritto: >> > >> > Filenames are now shown for attachments, including a direct link to the >> > attachment itself. I've also run a job to populate all old threads. >> > >> >> I wonder what is the algorithm to detect when an attachment is a patch. >> >> If you look at https://commitfest.postgresql.org/4/94/ all the >> attachments are marked as "Patch: no", but many of them are >> clearly a patch. > > It uses the "magic" module, same as the "file" command. And that one claims: > > mha@mha-laptop:/tmp$ file 0003-File-based-incremental-backup-v9.patch > 0003-File-based-incremental-backup-v9.patch: ASCII English text, with very > long lines > > I think it doesn't consider it a patch because it's not actually a patch - > it looks like a git-format actual email message that *contains* a patch. It > even includes the unix From separator line. So if anything it should have > detected that it's an email message, which it apparently doesn't. > > Picking from the very top patch on the cf, an actual patch looks like this: > > mha@mha-laptop:/tmp$ file psql_fix_uri_service_004.patch > psql_fix_uri_service_004.patch: unified diff output, ASCII text, with very > long lines Can we make it smarter, so that the kinds of things people produce intending for them to be patches are thought by the CF app to be patches? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeType internal use
Amit Langote writes: > Okay, let me back up a little and think about your suggestion which I do > not seem to understand very well - it raises a few questions for me: > does this mean a partitioning criteria is associated with parent > (partitioned table) rather than each individual partition? Absolutely. Anything else is not scalable; it's just another flavor of the inheritance + CHECK constraint mechanism. The entire point of doing a new partitioning design IMO is to get away from that. It should be possible to determine which partition a row belongs to in O(1) time, not O(N). > I would guess > that bin width is partition interval such that each bin number gives > partition number (of equal-sized consecutively numbered partitions > without gaps). But I don't quite understand what origin point is? Is > that a key literal value from which to begin counting bins and if so, is > it stored in catalog as part of the partitioning rule? Yeah, I would think so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeType internal use
Heikki Linnakangas writes: > On 02/09/2015 03:21 AM, Tom Lane wrote: >> Meh. I don't care for that much --- it sounds a lot like deciding that >> your problem is a nail because there is a hammer within reach. A random >> collection of ranges doesn't seem like a very appropriate representation >> to me; first because there is no simple way to enforce that it partitions >> the key space (no overlaps and no missing portions), and second because it >> provides little purchase for efficient tuple routing algorithms. The best >> you could possibly hope for is some log-N tree search mechanism, and that >> would require a fair amount of setup beforehand. > Building a tree or a sorted array of the min or max bounds of the ranges > doesn't sound hard. log-N sounds fast enough. It's going to be complicated and probably buggy, and I think it is heading in the wrong direction altogether. If you want to partition in some arbitrary complicated fashion that the system can't reason about very effectively, we *already have that*. IMO the entire point of building a new partitioning infrastructure is to build something simple, reliable, and a whole lot faster than what you can get from inheritance relationships. And "faster" is going to come mainly from making the partitioning rules as simple as possible, not as complex as possible. Just to start with: one of the absolutely fundamental things we need out of partitioning is the ability to have uniqueness constraints that span a partitioned table set. That means the partitions have to be disjoint, and it has to be not possible to break that. The design proposed here won't enforce that without complicated (and again, possibly buggy) logic. In short, this design requires a whole lot of extra mechanism to get to places that we have to get to, and I don't believe that that extra complexity is going to buy anything useful at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Hi, 2015-02-09 10:37 GMT+01:00 Marc Balmer : > > (I use cursors to display large datasets in a page-wise way, where the > user can move per-page, or, when displaying a single record, per record. > When the user goes back from per-record view to page-view, I have to > restore the cursor to the position it was on before the user changed to > per-record view.) > > > On a totaly unrelated note: http://use-the-index-luke.com/de/blog/2013-07/pagination-done-the-postgresql-way kind regards hakm kocaman > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Parallel Seq Scan
On Mon, Feb 9, 2015 at 2:31 AM, Amit Kapila wrote: > Another idea is to use Executor level interfaces (like ExecutorStart(), > ExecutorRun(), ExecutorEnd()) for execution rather than using Portal > level interfaces. I have used Portal level interfaces with the > thought that we can reuse the existing infrastructure of Portal to > make parallel execution of scrollable cursors, but as per my analysis > it is not so easy to support them especially backward scan, absolute/ > relative fetch, etc, so Executor level interfaces seems more appealing > to me (something like how Explain statement works (ExplainOnePlan)). > Using Executor level interfaces will have advantage that we can reuse them > for other parallel functionalaties. In this approach, we need to take > care of constructing relavant structures (with the information passed by > master backend) required for Executor interfaces, but I think these should > be lesser than what we need in previous approach (extract seqscan specific > stuff from executor). I think using the executor-level interfaces instead of the portal-level interfaces is a good idea. That would possibly let us altogether prohibit access to the portal layer from within a parallel worker, which seems like it might be a good sanity check to add. But that seems to still require us to have a PlannedStmt and a QueryDesc, and I'm not sure whether that's going to be too much of a pain. We might need to think about an alternative API for starting the Executor like ExecutorStartParallel() or ExecutorStartExtended(). But I'm not sure. If you can revise things to go through the executor interfaces I think that would be a good start, and then perhaps after that we can see what else makes sense to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >> Do we always need extra two bytes for compressed backup block? >> ISTM that extra bytes are not necessary when the hole length is zero. >> In this case the length of the original backup block (i.e., >> uncompressed) must be BLCKSZ, so we don't need to save the original >> size in the extra bytes. >Yes, we would need a additional bit to identify that. We could steal it from >length in XLogRecordBlockImageHeader. This is implemented in the attached patch by dividing length field as follows, uint16 length:15, with_hole:1; >"2" should be replaced with the macro variable indicating the size of >extra header for compressed backup block. Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 >You can refactor XLogCompressBackupBlock() and move all the >above code to it for more simplicity This is also implemented in the patch attached. Thank you, Rahila Syed -Original Message- From: Michael Paquier [mailto:michael.paqu...@gmail.com] Sent: Friday, February 06, 2015 6:00 PM To: Fujii Masao Cc: Syed, Rahila; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: > Do we always need extra two bytes for compressed backup block? > ISTM that extra bytes are not necessary when the hole length is zero. > In this case the length of the original backup block (i.e., > uncompressed) must be BLCKSZ, so we don't need to save the original > size in the extra bytes. Yes, we would need a additional bit to identify that. We could steal it from length in XLogRecordBlockImageHeader. > Furthermore, when fpw compression is disabled and the hole length is > zero, we seem to be able to save one byte from the header of backup > block. Currently we use 4 bytes for the header, 2 bytes for the length > of backup block, 15 bits for the hole offset and 1 bit for the flag > indicating whether block is compressed or not. But in that case, the > length of backup block doesn't need to be stored because it must be > BLCKSZ. Shouldn't we optimize the header in this way? Thought? If we do it, that's something to tackle even before this patch on HEAD, because you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 bytes as hole-related data is not necessary. I imagine that a patch optimizing that wouldn't be that hard to write as well. > +int page_len = BLCKSZ - hole_length; > +char *scratch_buf; > +if (hole_length != 0) > +{ > +scratch_buf = compression_scratch; > +memcpy(scratch_buf, page, hole_offset); > +memcpy(scratch_buf + hole_offset, > + page + (hole_offset + hole_length), > + BLCKSZ - (hole_length + hole_offset)); > +} > +else > +scratch_buf = page; > + > +/* Perform compression of block */ > +if (XLogCompressBackupBlock(scratch_buf, > +page_len, > +regbuf->compressed_page, > +&compress_len)) > +{ > +/* compression is done, add record */ > +is_compressed = true; > +} > > You can refactor XLogCompressBackupBlock() and move all the above code > to it for more simplicity. Sure. -- Michael __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v17.patch Description: Support-compression-for-full-page-writes-in-WAL_v17.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
At 2015-02-09 12:52:41 +0200, hlinnakan...@vmware.com wrote: > > Ok, I've committed a patch that just moves the existing code to > common/pg_crc.c […] Thanks, looks good. > Attached is a rebased version of the slicing-by-8 patch. Looks OK too. > Do you have access to big-endian hardware to test this on? Yes, I tested it on a Linux/ppc system. I wasn't speculating when I said it "does make a noticeable difference", though I'm afraid I did not keep the timings after submitting the revised patch. The speedup was some double-digit percentage, IIRC. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The return value of allocate_recordbuf()
On Mon, Feb 9, 2015 at 7:58 PM, Fujii Masao wrote: > MemoryContextAllocExtended() was added, so isn't it time to replace palloc() > with MemoryContextAllocExtended(CurrentMemoryContext, MCXT_ALLOC_NO_OOM) > in allocate_recordbuf()? Yeah, let's move on here, but not with this patch I am afraid as this breaks any client utility using xlogreader.c in frontend, like pg_xlogdump or pg_rewind because MemoryContextAllocExtended is not available in frontend, only in backend. We are going to need something like palloc_noerror, defined on both backend (mcxt.c) and frontend (fe_memutils.c) side. Btw, the huge flag should be used as well as palloc only allows allocation up to 1GB, and this is incompatible with ~9.4 where malloc is used. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier wrote: > On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote: >> - * Wait for more WAL to arrive. Time out after 5 >> seconds, >> - * like when polling the archive, to react to a trigger >> - * file promptly. >> + * Wait for more WAL to arrive. Time out after >> the amount of >> + * time specified by wal_retrieve_retry_interval, like >> + * when polling the archive, to react to a >> trigger file promptly. >> */ >> WaitLatch(&XLogCtl->recoveryWakeupLatch, >>WL_LATCH_SET | WL_TIMEOUT, >> - 5000L); >> + wal_retrieve_retry_interval * 1000L); >> >> This change can prevent the startup process from reacting to >> a trigger file. Imagine the case where the large interval is set >> and the user want to promote the standby by using the trigger file >> instead of pg_ctl promote. I think that the sleep time should be 5s >> if the interval is set to more than 5s. Thought? > > I disagree here. It is interesting to accelerate the check of WAL > availability from a source in some cases for replication, but the > opposite is true as well as mentioned by Alexey at the beginning of > the thread to reduce the number of requests when requesting WAL > archives from an external storage type AWS. Hence a correct solution > would be to check periodically for the trigger file with a maximum > one-time wait of 5s to ensure backward-compatible behavior. We could > reduce it to 1s or something like that as well. You seem to have misunderstood the code in question. Or I'm missing something. The timeout of the WaitLatch is just the interval to check for the trigger file while waiting for more WAL to arrive from streaming replication. Not related to the retry time to restore WAL from the archive. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The return value of allocate_recordbuf()
On Mon, Jan 5, 2015 at 1:25 PM, Michael Paquier wrote: > On Thu, Jan 1, 2015 at 1:10 AM, Robert Haas wrote: >> On Mon, Dec 29, 2014 at 6:14 AM, Heikki Linnakangas >> wrote: >>> Hmm. There is no way to check beforehand if a palloc() will fail because of >>> OOM. We could check for MaxAllocSize, though. >> >> I think we need a version of palloc that returns NULL instead of >> throwing an error. The error-throwing behavior is for the best in >> almost every case, but I think the no-error version would find enough >> users to be worthwhile. > Compression is one of those areas, be it compression of WAL or another > type. The new API would allow to fallback to the non-compression code > path if buffer allocation for compression cannot be done because of an > OOM. > > FWIW, I actually looked at how to do that a couple of weeks back, and > you just need a wrapper function, whose content is the existing > AllocSetAlloc, taking an additional boolean flag to trigger an ERROR > or leave with NULL if an OOM appears. On top of that we will need a > new method in MemoryContextMethods, let's call it alloc_safe, for its > equivalent, the new palloc_safe. MemoryContextAllocExtended() was added, so isn't it time to replace palloc() with MemoryContextAllocExtended(CurrentMemoryContext, MCXT_ALLOC_NO_OOM) in allocate_recordbuf()? Regards, -- Fujii Masao *** a/src/backend/access/transam/xlogreader.c --- b/src/backend/access/transam/xlogreader.c *** *** 149,155 allocate_recordbuf(XLogReaderState *state, uint32 reclength) if (state->readRecordBuf) pfree(state->readRecordBuf); ! state->readRecordBuf = (char *) palloc(newSize); state->readRecordBufSize = newSize; return true; } --- 149,163 if (state->readRecordBuf) pfree(state->readRecordBuf); ! state->readRecordBuf = ! (char *) MemoryContextAllocExtended(CurrentMemoryContext, ! newSize, ! MCXT_ALLOC_NO_OOM); ! if (state->readRecordBuf == NULL) ! { ! state->readRecordBufSize = 0; ! return false; ! } state->readRecordBufSize = newSize; return true; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Am 09.02.15 um 11:47 schrieb Marc Balmer: > > > Am 09.02.15 um 10:46 schrieb Heikki Linnakangas: >> [...] >> You could fairly easily write an extension to do that, btw. A C function >> could call GetPortalByName() and peek into the PortalData.portalPos field. >> > > Would > > PGresult *PQdescribePortal(PGconn *conn, const char *portalName); > > from libpq also provide this information? Should there be a way to run > PQdescribePortal() from psql (e.g. \dP) ?? ... and a quickly hacked test program shows that it does *not* return the cursor position. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
On 02/08/2015 08:33 PM, Abhijit Menon-Sen wrote: At 2015-02-08 18:46:30 +0200, hlinnakan...@vmware.com wrote: So I propose to move pg_crc.c to src/common, and move the tables from pg_crc_tables.h directly to pg_crc.c. Thoughts? Sounds fine to me. Ok, I've committed a patch that just moves the existing code to common/pg_crc.c. I also moved pg_crc.h from include/utils to include/utils. That'll need any external programs to change their #include accordingly, but I think it was worth that. include/common is clearly the correct home for that file, and the only reason to keep it in include/utils would've been for backwards-compatibility. Attached is a rebased version of the slicing-by-8 patch. I've made some cosmetic changes. Most notably, I turned the bswap32() function into a macro. Better to avoid the overhead of a function call, and it also avoids building the function altogether on little-endian systems that don't need it. I'll continue review this. At 2015-01-02 16:46:29 +0200, hlinnakan...@vmware.com wrote: Would it even make sense to keep the crc variable in different byte order, and only do the byte-swap once in END_CRC32() ? ...this certainly does make a noticeable difference. Will investigate. Do you have access to big-endian hardware to test this on? It seems like an obvious optimization to shave off that one instruction from the hot loop, but if it turns out not to make any measurable difference, I'd prefer to keep the tables in the same order on big and little endian systems, reducing the amount of #ifdefs needed. I tested this on my laptop by adding a BSWAP32() into the hot loop - which is bogus on a little endian Intel system - and it seems to make about 5% difference in a quick micro-benchmark. But it would be nice to get some numbers from the kind of big endian systems that people run in the real world. - Heikki diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 90b56e7..509f961 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -193,6 +193,23 @@ fi])# PGAC_C_TYPES_COMPATIBLE +# PGAC_C_BUILTIN_BSWAP32 +# - +# Check if the C compiler understands __builtin_bswap32(), +# and define HAVE__BUILTIN_BSWAP32 if so. +AC_DEFUN([PGAC_C_BUILTIN_BSWAP32], +[AC_CACHE_CHECK(for __builtin_bswap32, pgac_cv__builtin_bswap32, +[AC_TRY_COMPILE([static unsigned long int x = __builtin_bswap32(0xaabbccdd);], +[], +[pgac_cv__builtin_bswap32=yes], +[pgac_cv__builtin_bswap32=no])]) +if test x"$pgac_cv__builtin_bswap32" = xyes ; then +AC_DEFINE(HAVE__BUILTIN_BSWAP32, 1, + [Define to 1 if your compiler understands __builtin_bswap32.]) +fi])# PGAC_C_BUILTIN_BSWAP32 + + + # PGAC_C_BUILTIN_CONSTANT_P # - # Check if the C compiler understands __builtin_constant_p(), diff --git a/configure b/configure index 8490eb7..fa271fe 100755 --- a/configure +++ b/configure @@ -10333,6 +10333,36 @@ if test x"$pgac_cv__types_compatible" = xyes ; then $as_echo "#define HAVE__BUILTIN_TYPES_COMPATIBLE_P 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_bswap32" >&5 +$as_echo_n "checking for __builtin_bswap32... " >&6; } +if ${pgac_cv__builtin_bswap32+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +static unsigned long int x = __builtin_bswap32(0xaabbccdd); +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv__builtin_bswap32=yes +else + pgac_cv__builtin_bswap32=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_bswap32" >&5 +$as_echo "$pgac_cv__builtin_bswap32" >&6; } +if test x"$pgac_cv__builtin_bswap32" = xyes ; then + +$as_echo "#define HAVE__BUILTIN_BSWAP32 1" >>confdefs.h + +fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_constant_p" >&5 $as_echo_n "checking for __builtin_constant_p... " >&6; } if ${pgac_cv__builtin_constant_p+:} false; then : diff --git a/configure.in b/configure.in index b4bd09e..e6a49d1 100644 --- a/configure.in +++ b/configure.in @@ -1185,6 +1185,7 @@ PGAC_C_SIGNED PGAC_C_FUNCNAME_SUPPORT PGAC_C_STATIC_ASSERT PGAC_C_TYPES_COMPATIBLE +PGAC_C_BUILTIN_BSWAP32 PGAC_C_BUILTIN_CONSTANT_P PGAC_C_BUILTIN_UNREACHABLE PGAC_C_VA_ARGS diff --git a/src/common/pg_crc.c b/src/common/pg_crc.c index faf5a66..281707f 100644 --- a/src/common/pg_crc.c +++ b/src/common/pg_crc.c @@ -19,76 +19,1156 @@ #include "c.h" +#include "common/pg_crc.h" + +#ifdef WORDS_BIGENDIAN +#define CRC8(x) pg_crc32c_table[0][((crc >> 24) ^ (x)) & 0xFF] ^ (crc << 8) +#else +#define CRC8(x) pg_crc32c_table[0][(crc ^ (x)) & 0xFF] ^ (crc >> 8) +#endif + +/* + * This function computes a CRC using the slicing-by-8 algorithm, which + * uses an 8*256 lookup table to operate on eight bytes in parallel and + * recombine the results. + * + * Michael E. Kounavi
Re: [HACKERS] Parallel Seq Scan
On Sun, Feb 8, 2015 at 11:03 PM, Robert Haas wrote: > > On Sat, Feb 7, 2015 at 10:36 PM, Amit Kapila wrote: > >> Well, I agree with you, but I'm not really sure what that has to do > >> with the issue at hand. I mean, if we were to apply Amit's patch, > >> we'd been in a situation where, for a non-parallel heap scan, heapam.c > >> decides the order in which blocks get scanned, but for a parallel heap > >> scan, nodeParallelSeqscan.c makes that decision. > > > > I think other places also decides about the order/way heapam.c has > > to scan, example the order in which rows/pages has to traversed is > > decided at portal/executor layer and the same is passed till heap and > > in case of index, the scanlimits (heap_setscanlimits()) are decided > > outside heapam.c and something similar is done for parallel seq scan. > > In general, the scan is driven by Scandescriptor which is constructed > > at upper level and there are some API's exposed to derive the scan. > > If you are not happy with the current way nodeParallelSeqscan has > > set the scan limits, we can have some form of callback which do the > > required work and this callback can be called from heapam.c. > > I thought about a callback, but what's the benefit of doing that vs. > hard-coding it in heapam.c? Basically I want to address your concern of setting scan limit via sequence scan node, one of the ways could be that pass a callback_function and callback_state to heap_beginscan which will remember that information in HeapScanDesc and then use in heap_getnext(), now callback_state will have info about next page which will be updated by callback_function. We can remember callback_function and callback_state information in estate which will be set only by parallel worker which means it won't effect non-parallel case. I think this will be helpful in future as well where we want particular scan or sort to use that information to behave as parallel scan or sort. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Am 09.02.15 um 10:46 schrieb Heikki Linnakangas: > [...] > You could fairly easily write an extension to do that, btw. A C function > could call GetPortalByName() and peek into the PortalData.portalPos field. > Would PGresult *PQdescribePortal(PGconn *conn, const char *portalName); from libpq also provide this information? Should there be a way to run PQdescribePortal() from psql (e.g. \dP) ?? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Feb 9, 2015 at 11:09 AM, Marco Nenciarini < marco.nenciar...@2ndquadrant.it> wrote: > Il 08/02/15 17:04, Magnus Hagander ha scritto: > > > > Filenames are now shown for attachments, including a direct link to the > > attachment itself. I've also run a job to populate all old threads. > > > > I wonder what is the algorithm to detect when an attachment is a patch. > > If you look at https://commitfest.postgresql.org/4/94/ all the > attachments are marked as "Patch: no", but many of them are > clearly a patch. > It uses the "magic" module, same as the "file" command. And that one claims: mha@mha-laptop:/tmp$ file 0003-File-based-incremental-backup-v9.patch 0003-File-based-incremental-backup-v9.patch: ASCII English text, with very long lines I think it doesn't consider it a patch because it's not actually a patch - it looks like a git-format actual email message that *contains* a patch. It even includes the unix From separator line. So if anything it should have detected that it's an email message, which it apparently doesn't. Picking from the very top patch on the cf, an actual patch looks like this: mha@mha-laptop:/tmp$ file psql_fix_uri_service_004.patch psql_fix_uri_service_004.patch: unified diff output, ASCII text, with very long lines -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] New CF app deployment
Il 08/02/15 17:04, Magnus Hagander ha scritto: > > Filenames are now shown for attachments, including a direct link to the > attachment itself. I've also run a job to populate all old threads. > I wonder what is the algorithm to detect when an attachment is a patch. If you look at https://commitfest.postgresql.org/4/94/ all the attachments are marked as "Patch: no", but many of them are clearly a patch. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
2015-02-09 10:59 GMT+01:00 Marc Balmer : > > > > 2015-02-09 10:37 GMT+01:00 Marc Balmer m...@msys.ch>>: > > > > Currently there are FETCH and the (non standard) MOVE commands to > work > > on cursors. > > > > (I use cursors to display large datasets in a page-wise way, where > the > > user can move per-page, or, when displaying a single record, per > record. > > When the user goes back from per-record view to page-view, I have to > > restore the cursor to the position it was on before the user changed > to > > per-record view.) > > > > I have to "manually" keep track of the cursor position, but in some > > cases it would definitely be easier to just query the current cursor > > position directly from the database and later use "MOVE ABSOLUTE" to > > rewind it to that position. That could be achieved e.g. by a > > hypothetical "TELL " command. It does, however, not > exist > > and I have not found an alternative. Is there a way to query the > > current cusros position at all? If not, does a TELL command sound > like > > a good or bad idea? > > > > > > It sounds like good idea. > > > > Do we need a new statement? We can implement returning the position to > > MOVE statement. It returns a delta, but it can returns a absolute > > position too. > > On second thought, a new statement is not needed at all. As Heikki > noticed in hsi reply, it could either be a new function or have move to > return the current position somehow(tm). Or a nw option to move, maybe > "MOVE NOT" (don't move the cursor but return it's position? > > returning a absolute position in FETCH, MOVE statements has minimal overhead probably, so you can get a current position as side effect of last statement and we support MOVE RELATIVE 0; Regards Pavel > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
> > 2015-02-09 10:37 GMT+01:00 Marc Balmer mailto:m...@msys.ch>>: > > Currently there are FETCH and the (non standard) MOVE commands to work > on cursors. > > (I use cursors to display large datasets in a page-wise way, where the > user can move per-page, or, when displaying a single record, per record. > When the user goes back from per-record view to page-view, I have to > restore the cursor to the position it was on before the user changed to > per-record view.) > > I have to "manually" keep track of the cursor position, but in some > cases it would definitely be easier to just query the current cursor > position directly from the database and later use "MOVE ABSOLUTE" to > rewind it to that position. That could be achieved e.g. by a > hypothetical "TELL " command. It does, however, not exist > and I have not found an alternative. Is there a way to query the > current cusros position at all? If not, does a TELL command sound like > a good or bad idea? > > > It sounds like good idea. > > Do we need a new statement? We can implement returning the position to > MOVE statement. It returns a delta, but it can returns a absolute > position too. On second thought, a new statement is not needed at all. As Heikki noticed in hsi reply, it could either be a new function or have move to return the current position somehow(tm). Or a nw option to move, maybe "MOVE NOT" (don't move the cursor but return it's position? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Hi 2015-02-09 10:37 GMT+01:00 Marc Balmer : > Currently there are FETCH and the (non standard) MOVE commands to work > on cursors. > > (I use cursors to display large datasets in a page-wise way, where the > user can move per-page, or, when displaying a single record, per record. > When the user goes back from per-record view to page-view, I have to > restore the cursor to the position it was on before the user changed to > per-record view.) > > I have to "manually" keep track of the cursor position, but in some > cases it would definitely be easier to just query the current cursor > position directly from the database and later use "MOVE ABSOLUTE" to > rewind it to that position. That could be achieved e.g. by a > hypothetical "TELL " command. It does, however, not exist > and I have not found an alternative. Is there a way to query the > current cusros position at all? If not, does a TELL command sound like > a good or bad idea? > It sounds like good idea. Do we need a new statement? We can implement returning the position to MOVE statement. It returns a delta, but it can returns a absolute position too. Regards Pavel > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
On 02/09/2015 11:37 AM, Marc Balmer wrote: Currently there are FETCH and the (non standard) MOVE commands to work on cursors. (I use cursors to display large datasets in a page-wise way, where the user can move per-page, or, when displaying a single record, per record. When the user goes back from per-record view to page-view, I have to restore the cursor to the position it was on before the user changed to per-record view.) I have to "manually" keep track of the cursor position, but in some cases it would definitely be easier to just query the current cursor position directly from the database and later use "MOVE ABSOLUTE" to rewind it to that position. That could be achieved e.g. by a hypothetical "TELL " command. It does, however, not exist and I have not found an alternative. Is there a way to query the current cusros position at all? If not, does a TELL command sound like a good or bad idea? It's the first time I hear anyone wanting that, but I guess it might come handy sometimes. I think you'd usually still rather keep track of that in the client, though, because it's easy to do, and it avoids the extra round-trip to execute the TELL command. Not sure we'd want to add the TELL keyword for this. Perhaps a pg_cursor_pos() function would be better. You could fairly easily write an extension to do that, btw. A C function could call GetPortalByName() and peek into the PortalData.portalPos field. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] For cursors, there is FETCH and MOVE, why no TELL?
Currently there are FETCH and the (non standard) MOVE commands to work on cursors. (I use cursors to display large datasets in a page-wise way, where the user can move per-page, or, when displaying a single record, per record. When the user goes back from per-record view to page-view, I have to restore the cursor to the position it was on before the user changed to per-record view.) I have to "manually" keep track of the cursor position, but in some cases it would definitely be easier to just query the current cursor position directly from the database and later use "MOVE ABSOLUTE" to rewind it to that position. That could be achieved e.g. by a hypothetical "TELL " command. It does, however, not exist and I have not found an alternative. Is there a way to query the current cusros position at all? If not, does a TELL command sound like a good or bad idea? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6 Feature help requested: Inclusion Constraints
On Sat, 2015-02-07 at 16:08 -0800, Jeff Davis wrote: > I believe Inclusion Constraints will be important for postgres. I forgot to credit Darren Duncan with the name of this feature: http://www.postgresql.org/message-id/4f8bb9b0.5090...@darrenduncan.net Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers