Re: [HACKERS] Improving our clauseless-join heuristics
>>Another way to look at this is that if we have >> select ... from a,b,c,d where a.x = b.y + c.z >>we want to consider a cross-join of b and c, in the hopes that we can do >>something useful with the join clause at the next level where it can >>join to a. From b's perspective there is no percentage in joining to d. For this kind of query, currently (referring 9.0.3 code) also it considers join of b,c and b,d. As there is no join clause between b,c,d so it will go in path of make_rels_by_clauseless_joins() where it considers join of b,c and b,d. In this kind of query, if the suggestion by me in below mail is followed, then it will consider joining a,b a,c a,d at level-2 in function make_rels_by_clause_joins() which it currently doesn't do which may generate useless join paths. However in real-world scenario's this kind of queries where 2 cols of different tables are used in one side expression (b.y + c.z) of where clause may be less. >>On the other hand, when we come to consider d, it will have no join >>clauses so we will consider joining it to each other rel in turn. When it come to consider d, as at level -2 it only consider later rels. So it should not consider joining with each other rel. -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Monday, April 16, 2012 9:57 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Improving our clauseless-join heuristics Amit Kapila writes: > That case is handled by make_rels_by_clauseless_joins > It will be handled by make_rels_by_clauseless_joins() if given rel old_rel > doesn't have any join clause. > However if it has join clause but doesn't able to join with any other rels > like in the example you have provided for relation c, it is not able to join > with other rel d. > In such cases it can do cross-join with d, because it has not found any > relation to join with. > Doesn't it will address the problem you mentioned? Sounds to me like that's going in the wrong direction, ie, joining to exactly the wrong relations. If you have to cross-join it's better to cross-join against relations that are included in one of the join clauses that does mention the current relation. Another way to look at this is that if we have select ... from a,b,c,d where a.x = b.y + c.z we want to consider a cross-join of b and c, in the hopes that we can do something useful with the join clause at the next level where it can join to a. From b's perspective there is no percentage in joining to d. On the other hand, when we come to consider d, it will have no join clauses so we will consider joining it to each other rel in turn. So if there is any value in joining d early, we will find that out. Or at least that's the theory. Now that I look at it this way, I think there is a bug here: when we are at level 2, we only consider later rels in the list for forced clauseless joins. That would be okay if the condition were symmetrical, but it isn't. This makes for a bogus FROM-list ordering dependency in handling of clauseless joins. Not sure how much that matters in the real world, but it still seems wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: > In the department of query cancellations, I believe Noah argued > previously that this wasn't really going to cause a problem. And, > indeed, if the master has a mix of inserts, updates, and deletes, then > it seems likely that any recovery conflicts generated this way would > be hitting about the same place in the XID space as the ones caused by > pruning away dead tuples. What will be different, if we go this > route, is that an insert-only master, which right now only generates > conflicts at freezing time, will instead generate conflicts much > sooner, as soon as the relation is vacuumed. I do not use Hot Standby > enough to know whether this is a problem, and I'm not trying to block > this approach, but I do want to make sure that everyone agrees that > it's acceptable before we go do it, because inevitably somebody is > going to get bit. Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. A standby with the GUC "off" would ignore all-visible indicators and also decline to generate conflicts when flipping them on. > As to correctness, after further review of lazy_scan_heap(), I think > there are some residual bugs here that need to be fixed whatever we > decide to do about the Hot Standby case: > > 1. I noticed that when we do PageSetAllVisible() today we follow it up > with SetBufferCommitInfoNeedsSave(). PD_ALL_VISIBLE is not merely a > hint, so I think these should be changed to MarkBufferDirty(), which > shouldn't make much difference given current code, but proposals to > handle hint changes differently than non-hint changes abound, so it's > probably not good to rely on those meaning the same thing forever. Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a positive visibility map bit? That is to say, PD_ALL_VISIBLE is fully a hint in its role as a witness of tuple visibility, but it is not a hint in its role as a witness of the visibility map status? In any event, I agree that those call sites should use MarkBufferDirty(). The large comment in SetBufferCommitInfoNeedsSave() seems incorrect. On systems with weaker memory ordering, the FlushBuffer() process's clearing of BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave() process until some time after the former retrieved buffer contents from shared memory. Memory barriers could make the comment true, but we should probably just update the comment to explain that a race condition may eat the update entirely. Incidentally, is there a good reason for MarkBufferDirty() to increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave() not to do so? Looks like an oversight. > 2. More seriously, lazy_scan_heap() releases the buffer lock before > setting the all-visible bit on the heap. This looks just plain wrong > (and it's my fault, to be clear). Somebody else can come along after > we've released the buffer lock and mark the page not-all-visible. > That concurrent process will check the visibility map bit, find it > already clear, and go on its merry way. Then, we set the visibility > map bit and go on our merry way. Now we have a not-all-visible page > with the all-visible bit set in the visibility map. Oops. Hmm, indeed. This deserves its own open item. > I think > this probably needs to be rewritten so that lazy_scan_heap() acquires > a pin on the visibility map page before locking the buffer for cleanup > and holds onto that pin until either we exit the range of heap pages > covered by that visibility map page or trigger index vac due to > maintenance_work_mem exhaustion. With that infrastructure in place, > we can set the visibility map bit at the same time we set the > page-level bit without risking holding the buffer lock across a buffer > I/O (we might have to hold the buffer lock across a WAL flush in the > worst case, but that hazard exists in a number of other places as well > and fixing it here is well out of scope). Looks reasonable at a glance. > 1. vacuum on master sets the page-level bit and the visibility map bit > 2. the heap page with the bit is written to disk BEFORE the WAL entry > generated in (1) gets flushed; this is allowable, since it's not an > error for the page-level bit to be set while the visibility-map bit is > unset, only the other way around > 3. crash (still before the WAL entry is flushed) > 4. some operation on the master emits an FPW for the page without > first rendering it not-all-visible > > At present, I'm not sure there's any real problem here, since normally > an all-visible heap page is only going to get another WAL entry if > somebody does an insert, update, or delete on it, in which case the > visibility map bit is going to get cleared anyway. Freezing the page > might emit a new FPW, but that's going to generate conflicts anyway, > so no problem the
Re: [HACKERS] not null validation option in contrib/file_fdw
Thank you for the review. > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew Dunstan > Sent: Friday, April 13, 2012 9:16 PM > To: Shigeru HANADA > Cc: Etsuro Fujita; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] not null validation option in contrib/file_fdw > > > > On 04/13/2012 07:21 AM, Shigeru HANADA wrote: > > (2012/04/13 16:59), Etsuro Fujita wrote: > >> I updated the patch added to CF 2012-Next [1]. Attached is the > >> updated version of the patch. > > I applied the patch and ran regression tests of file_fdw, and I got > > SIGSEGV X-( > > > > The failure occurs in fileGetOptions, and it is caused by > > list_delete_cell used in foreach loop; ListCell points delete target > > has been free-ed in list_delete_cell, but foreach accesses it to get > > next element. > > > > Some of backend functions which use list_delete_cell in loop use "for" > > loop instead of foreach, and other functions exit the loop after > > calling list_delete_cell. Since we can't stop searching non-COPY > > options until meeting the end of the options list, we would need to > > choose former ("for" loop), or create another list which contains only > > valid COPY options and return it via other_options parameter. > > > > Yes, the code in fileGetOptions() appears to be bogus. Sorry, I will fix it. > Also, "validate" is a terrible name for the option (and in the code) IMNSHO. > It's far too generic. "validate_not_null" or some such would surely be > better. I thought it would be used for not only NOT NULL but also CHECK and foreign key constraints. That is, when a user sets the option to 'true', file_fdw verifies that each tuple meets all kinds of constraints. So, how about "validate_data_file" or simply "validate_file"? Best regards, Etsuro Fujita > cheers > > andrew > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch wrote: > On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: >> In the department of query cancellations, I believe Noah argued >> previously that this wasn't really going to cause a problem. And, >> indeed, if the master has a mix of inserts, updates, and deletes, then >> it seems likely that any recovery conflicts generated this way would >> be hitting about the same place in the XID space as the ones caused by >> pruning away dead tuples. What will be different, if we go this >> route, is that an insert-only master, which right now only generates >> conflicts at freezing time, will instead generate conflicts much >> sooner, as soon as the relation is vacuumed. I do not use Hot Standby >> enough to know whether this is a problem, and I'm not trying to block >> this approach, but I do want to make sure that everyone agrees that >> it's acceptable before we go do it, because inevitably somebody is >> going to get bit. > > Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. > A standby with the GUC "off" would ignore all-visible indicators and also > decline to generate conflicts when flipping them on. I'm against adding a new GUC, especially for that fairly thin reason. >> 1. vacuum on master sets the page-level bit and the visibility map bit >> 2. the heap page with the bit is written to disk BEFORE the WAL entry >> generated in (1) gets flushed; this is allowable, since it's not an >> error for the page-level bit to be set while the visibility-map bit is >> unset, only the other way around >> 3. crash (still before the WAL entry is flushed) >> 4. some operation on the master emits an FPW for the page without >> first rendering it not-all-visible >> >> At present, I'm not sure there's any real problem here, since normally >> an all-visible heap page is only going to get another WAL entry if >> somebody does an insert, update, or delete on it, in which case the >> visibility map bit is going to get cleared anyway. Freezing the page >> might emit a new FPW, but that's going to generate conflicts anyway, >> so no problem there. So I think there's no real problem here, but it >> doesn't seem totally future-proof - any future type of WAL record that >> might modify the page without rendering it not-all-visible would >> create a very subtle hazard for Hot Standby. We could dodge the whole >> issue by leaving the hack in heapgetpage() intact and just be happy >> with making index-only scans work, or maybe somebody has a more clever >> idea. > > Good point. We could finagle things so the standby notices end-of-recovery > checkpoints and blocks the optimization for older snapshots. For example, > maintain an integer count of end-of-recovery checkpoints seen and store that > in each Snapshot instead of takenDuringRecovery; use 0 on a master. When the > global value is greater than the snapshot's value, disable the optimization > for that snapshot. I don't know whether the optimization is powerful enough > to justify that level of trouble, but it's an option to consider. > > For a different approach, targeting the fragility, we could add assertions to > detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a > full-page image. We don't need a future proof solution, especially not at this stage of the release cycle. Every time you add a WAL record, we need to consider the possible conflict impact. We just need a simple and clear solution. I'll work on that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, Apr 16, 2012 at 7:52 AM, Tom Lane wrote: > Dunno. It might be easier to sell the idea of adding support for range > joins in a couple of years, after we've seen how much use ranges get. Once we've started the journey towards range types we must complete it reasonably quickly. Having partially implemented, unoptimised features is the same as not having the feature at all, so it will remain unused until it really works. We could say that about many features, but if Jeff is championing this, I'd say go for it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On Sun, Apr 15, 2012 at 11:31 PM, Greg Smith wrote: > On 04/15/2012 05:46 AM, Simon Riggs wrote: >> >> Our problem is not lack of resource, it is ineffective >> delegation. As Hannu points out, he didn't know the patch would be >> rejected, so he didn't know help was needed to save something useful. >> I considered that the job of the CF manager, but perhaps it is was >> not. > > > Note that one of the influences on the "death march" here was lack of a > fully devoted CF manager for the full duration. I did some rabble rousing > to get things started in the usual way, but my time for this only budgeted > for six weeks in that role. And that was quite optimistic for this one. > > Trying to quantify how much time investment the CF manager role really > involves is one of my important projects to chew on. Whoever ends up doing > that should at least have an idea what scale of problem they're getting > into. Again, if the CF manager role is too big, then we delegate. No reason why we can't have a CF team, with different people responsible for different areas. The key thing is working out how to mobilise the resources we have to best effect. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different gettext domain needed for error context
On 15.04.2012 00:54, Tom Lane wrote: I really think we need to change errcontext itself to pass the correct domain. If we are going to require a domain to be provided (and this does require that, for correct operation), then we need to break any code that doesn't provide it in a visible fashion. A possibly more attractive alternative is to redefine errcontext with a macro that allows TEXTDOMAIN to be passed in behind-the-scenes, thus keeping source-level compatibility. We can do this with the same type of hack we've used for many years for elog(): #define errcontext set_errcontext_domain(TEXTDOMAIN), errcontext_msg where the actual message-passing function is now called errcontext_msg. Ok then, here's a patch using that approach. I had to rename a few local variables called "errcontext", because the macro now tries to expands those and you get an error. Note: If you want to test this at home, the original test case I posted doesn't currently work because the text of the context messages in PL/pgSQL have been slightly changed since I posted the original test case, but the translations have not been updated yet. Until then, you can manually remove the double quotes in messages like 'function \"%s\"' in the .po file to test this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0c301b2..a10b569 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6504,7 +6504,7 @@ StartupXLOG(void) bool recoveryContinue = true; bool recoveryApply = true; bool recoveryPause = false; - ErrorContextCallback errcontext; + ErrorContextCallback errcallback; TimestampTz xtime; InRedo = true; @@ -6566,10 +6566,10 @@ StartupXLOG(void) } /* Setup error traceback support for ereport() */ -errcontext.callback = rm_redo_error_callback; -errcontext.arg = (void *) record; -errcontext.previous = error_context_stack; -error_context_stack = &errcontext; +errcallback.callback = rm_redo_error_callback; +errcallback.arg = (void *) record; +errcallback.previous = error_context_stack; +error_context_stack = &errcallback; /* * ShmemVariableCache->nextXid must be beyond record's xid. @@ -6614,7 +6614,7 @@ StartupXLOG(void) RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record); /* Pop the error context stack */ -error_context_stack = errcontext.previous; +error_context_stack = errcallback.previous; if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 95fec8d..cb7e67a 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -1854,7 +1854,7 @@ CopyFrom(CopyState cstate) TupleTableSlot *myslot; MemoryContext oldcontext = CurrentMemoryContext; - ErrorContextCallback errcontext; + ErrorContextCallback errcallback; CommandId mycid = GetCurrentCommandId(true); int hi_options = 0; /* start with default heap_insert options */ BulkInsertState bistate; @@ -1998,10 +1998,10 @@ CopyFrom(CopyState cstate) econtext = GetPerTupleExprContext(estate); /* Set up callback to identify error line number */ - errcontext.callback = CopyFromErrorCallback; - errcontext.arg = (void *) cstate; - errcontext.previous = error_context_stack; - error_context_stack = &errcontext; + errcallback.callback = CopyFromErrorCallback; + errcallback.arg = (void *) cstate; + errcallback.previous = error_context_stack; + error_context_stack = &errcallback; for (;;) { @@ -2116,7 +2116,7 @@ CopyFrom(CopyState cstate) nBufferedTuples, bufferedTuples); /* Done, clean up */ - error_context_stack = errcontext.previous; + error_context_stack = errcallback.previous; FreeBulkInsertState(bistate); diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c index 80dbdd1..33966ee 100644 --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -144,10 +144,10 @@ setup_parser_errposition_callback(ParseCallbackState *pcbstate, /* Setup error traceback support for ereport() */ pcbstate->pstate = pstate; pcbstate->location = location; - pcbstate->errcontext.callback = pcb_error_callback; - pcbstate->errcontext.arg = (void *) pcbstate; - pcbstate->errcontext.previous = error_context_stack; - error_context_stack = &pcbstate->errcontext; + pcbstate->errcallback.callback = pcb_error_callback; + pcbstate->errcallback.arg = (void *) pcbstate; + pcbstate->errcallback.previous = error_context_stack; + error_context_stack = &pcbstate->errcallback; } /* @@ -157,7 +157,7 @@ void cancel_parser_errposition_callback(ParseCallbackState *pcbstate) { /* Pop the error context stack */ - error_context_stack = pcbstate->errcontext.previous; + error_context_stack = pcbstate->errcallback.previous; }
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On 16.04.2012 10:38, Simon Riggs wrote: On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch wrote: On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: In the department of query cancellations, I believe Noah argued previously that this wasn't really going to cause a problem. And, indeed, if the master has a mix of inserts, updates, and deletes, then it seems likely that any recovery conflicts generated this way would be hitting about the same place in the XID space as the ones caused by pruning away dead tuples. What will be different, if we go this route, is that an insert-only master, which right now only generates conflicts at freezing time, will instead generate conflicts much sooner, as soon as the relation is vacuumed. I do not use Hot Standby enough to know whether this is a problem, and I'm not trying to block this approach, but I do want to make sure that everyone agrees that it's acceptable before we go do it, because inevitably somebody is going to get bit. Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. A standby with the GUC "off" would ignore all-visible indicators and also decline to generate conflicts when flipping them on. I'm against adding a new GUC, especially for that fairly thin reason. Same here. Can we have a "soft" hot standby conflict that doesn't kill the query, but disables index-only-scans? In the long run, perhaps we need to store XIDs in the visibility map instead of just a bit. If you we only stored one xid per 100 pages or something like that, the storage overhead would not be much higher than what we have at the moment. But that's obviously not going to happen for 9.2... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
Alvaro Herrera writes: > I've used Redmine a lot, as you know, and I only keep using it because > it's a requirement at work. It is certainly not close to usable for > general pgsql stuff. (Trac, which we used to use prior to Redmine, was > certainly much worse, though). Same story here, still using redmine a lot, all with custom reports etc. > I can't say that it's all that slow, or that there's a problem with the > code, or that the search doesn't work right (and I've never had a wiki > edit disappear, either, and I've used that a lot). It's just the wrong > tool altogether. It's indeed slow here, and I agree that's not the problem. Not the tool we need, +1. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] man pages for contrib programs
Peter Eisentraut writes: > Good question. I guess we could keep the original name "... Modules" > for that chapter. Those are a kind of server application in my mind, I think we want to keep using “module” to mean the shared library file we load at runtime, be it a .so, a .dylib or a .dll. That said, those non-extension stuff are most often just a module. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?
On 13.04.2012 19:17, Guillaume Lelarge wrote: On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote: On 08.04.2012 11:59, Guillaume Lelarge wrote: There could be a good reason which would explain why we can't (or don't want to) do this, but I don't see it right now. Me neither, except a general desire to keep internals hidden. I propose the attached. Sounds good to me. I would love to see this happening in 9.2. Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC install.pm to also copy the header file. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] nodes/*funcs.c inconsistencies
I observed these inconsistencies in node support functions: - _copyReassignOwnedStmt() uses COPY_SCALAR_FIELD() on the string field "newrole", and _equalReassignOwnedStmt() uses COMPARE_NODE_FIELD(). - _outCreateForeignTableStmt() calls _outCreateStmt() directly. This produces the label "CREATEFOREIGNTABLESTMTCREATESTMT". The attached patch splits things out the way we normally do in outfuncs.c. There's no readfuncs.c support, so this is strictly cosmetic. - _outColumnDef() uses WRITE_INT_FIELD for the "storage" field, a char. Again, no readfuncs.c support to create a compatibility problem. - _copyRenameStmt() and _equalRenameStmt() ignore the "relationType" field, but I can't see a good reason to do so. PostgreSQL 9.1 added this field, but only recent master (after commit 38b9693f of 3 April 2012) references the field beyond setting it in the parser. - _copyViewStmt() and _equalViewStmt() ignore the "options" field, and _equalColumnDef() ignores "fdwoptions". These are new in PostgreSQL 9.2, and I see no good reason to ignore them. I'd suggest backpatching the ReassignOwnedStmt() bits; the wrong code could produce crashes. The rest are for master only. Thanks, nm diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index c94799b..eed0ea4 100644 *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *** *** 2891,2896 _copyRenameStmt(const RenameStmt *from) --- 2891,2897 RenameStmt *newnode = makeNode(RenameStmt); COPY_SCALAR_FIELD(renameType); + COPY_SCALAR_FIELD(relationType); COPY_NODE_FIELD(relation); COPY_NODE_FIELD(object); COPY_NODE_FIELD(objarg); *** *** 3047,3052 _copyViewStmt(const ViewStmt *from) --- 3048,3054 COPY_NODE_FIELD(aliases); COPY_NODE_FIELD(query); COPY_SCALAR_FIELD(replace); + COPY_NODE_FIELD(options); return newnode; } *** *** 3650,3656 _copyReassignOwnedStmt(const ReassignOwnedStmt *from) ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt); COPY_NODE_FIELD(roles); ! COPY_SCALAR_FIELD(newrole); return newnode; } --- 3652,3658 ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt); COPY_NODE_FIELD(roles); ! COPY_STRING_FIELD(newrole); return newnode; } diff --git a/src/backend/nodes/equalindex 9564210..c06b068 100644 *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *** *** 1306,1311 static bool --- 1306,1312 _equalRenameStmt(const RenameStmt *a, const RenameStmt *b) { COMPARE_SCALAR_FIELD(renameType); + COMPARE_SCALAR_FIELD(relationType); COMPARE_NODE_FIELD(relation); COMPARE_NODE_FIELD(object); COMPARE_NODE_FIELD(objarg); *** *** 1438,1443 _equalViewStmt(const ViewStmt *a, const ViewStmt *b) --- 1439,1445 COMPARE_NODE_FIELD(aliases); COMPARE_NODE_FIELD(query); COMPARE_SCALAR_FIELD(replace); + COMPARE_NODE_FIELD(options); return true; } *** *** 1945,1951 static bool _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt *b) { COMPARE_NODE_FIELD(roles); ! COMPARE_NODE_FIELD(newrole); return true; } --- 1947,1953 _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt *b) { COMPARE_NODE_FIELD(roles); ! COMPARE_STRING_FIELD(newrole); return true; } *** *** 2182,2187 _equalColumnDef(const ColumnDef *a, const ColumnDef *b) --- 2184,2190 COMPARE_NODE_FIELD(collClause); COMPARE_SCALAR_FIELD(collOid); COMPARE_NODE_FIELD(constraints); + COMPARE_NODE_FIELD(fdwoptions); return true; } diff --git a/src/backend/nodes/outfunindex 594b3fd..f713773 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *** *** 1927,1937 _outPlannerParamItem(StringInfo str, const PlannerParamItem *node) * */ static void ! _outCreateStmt(StringInfo str, const CreateStmt *node) { - WRITE_NODE_TYPE("CREATESTMT"); - WRITE_NODE_FIELD(relation); WRITE_NODE_FIELD(tableElts); WRITE_NODE_FIELD(inhRelations); --- 1927,1938 * */ + /* + * print the basic stuff of all nodes that inherit from CreateStmt + */ static void ! _outCreateStmtInfo(StringInfo str, const CreateStmt *node) { WRITE_NODE_FIELD(relation); WRITE_NODE_FIELD(tableElts); WRITE_NODE_FIELD(inhRelations); *** *** 1944,1954 _outCreateStmt(StringInfo str, const CreateStmt *node) } static void _outCreateForeignTableStmt(St
[HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, this is bug report and a patch for it. The first patch in the attachments is for 9.2dev and next one is for 9.1.3. On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does not check against WAL segments written. This makes checkpointer always run at the speed according to checkpoint_timeout regardless of WAL advancing rate. This leads to unexpected imbalance in the numbers of WAL segment files between the master and the standby(s) for high advance rate of WALs. And what is worse, the master would have much higher chance to remove some WAL segments before the standby receives them. XLogPageRead()@xlog.c triggers checkpoint referring to WAL segment advance. So I think this is a bug of bgwriter in 9.1. The attached patches fix that on 9.2dev and 9.1.3 respctively. In the backported version to 9.1.3, bgwriter.c is modified instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is used as the equivalent of GetStandbyFlushRecPtr() in 9.2. By the way, GetStandbyFlushRecPtr() acquires spin lock within. It might be enough to read XLogCtl->recoveryLastRecPtr without lock to make rough estimation, but I can't tell it is safe or not. Same discussion could be for GetWalRcvWriteRecPtr() on 9.1.3. However, it seems to work fine on a simple test. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index c9473f7..f86e9b9 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -491,8 +491,8 @@ CheckpointerMain(void) * Initialize checkpointer-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = +do_restartpoint ? GetStandbyFlushRecPtr() : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -731,28 +731,29 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded - * compares against RedoRecptr, so this is not completely accurate. - * However, it's good enough for our purposes, we're only calculating an - * estimate anyway. + * computed before calling CreateCheckPoint. The code in + * XLogInsert that actually triggers a checkpoint when + * checkpoint_segments is exceeded compares against RedoRecptr. + * Similarly, we consult WAL flush location instead on hot + * standbys and XLogPageRead compares it aganst RedoRecptr, too. + * Altough these are not completely accurate, it's good enough for + * our purposes, we're only calculating an estimate anyway. */ - if (!RecoveryInProgress()) + + recptr = RecoveryInProgress() ? GetStandbyFlushRecPtr() : GetInsertRecPtr(); + elapsed_xlogs = + (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / + CheckPointSegments; + + if (progress < elapsed_xlogs) { - recptr = GetInsertRecPtr(); - elapsed_xlogs = - (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / - CheckPointSegments; - - if (progress < elapsed_xlogs) - { - ckpt_cached_elapsed = elapsed_xlogs; - return false; - } + ckpt_cached_elapsed = elapsed_xlogs; + return false; } /* diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 5643ec8..0ce9945 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -56,6 +56,7 @@ #include "pgstat.h" #include "postmaster/bgwriter.h" #include "replication/syncrep.h" +#include "replication/walreceiver.h" #include "storage/bufmgr.h" #include "storage/fd.h" #include "storage/ipc.h" @@ -489,8 +490,8 @@ BackgroundWriterMain(void) * Initialize bgwriter-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = do_restartpoint ? +GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location -
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Mon, Apr 16, 2012 at 1:05 PM, Kyotaro HORIGUCHI wrote: > Hello, this is bug report and a patch for it. > > The first patch in the attachments is for 9.2dev and next one is > for 9.1.3. > > On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does > not check against WAL segments written. This makes checkpointer > always run at the speed according to checkpoint_timeout > regardless of WAL advancing rate. Thanks, I'll take a look. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, Apr 16, 2012 at 10:29 AM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 16.04.2012 08:40, Jeff Davis wrote: > >> Does someone know of a spatial join algorithm (without IP claims) that >> would be as good as this one for ranges? >> > > I'd be happy with an algorithm that's specific to ranges, too, but my gut > geeling is that there has to be a lot of research on spatial join > algorithms out there. Implementing one of them would be more widely > applicable, so I think we should look into spatial join algorithms first > and see how far that gets us, before we write something specific to ranges. There is a good overview article about spatial joins. http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf It shows that there is a lot of methods based on building temporaty indexes. It might mean that building of temporary GiST index is not a bad idea. Also there are methods which looks like extension of Jeff Davis proposal to the multidimensional case. It is Plane Sweep and External Plane Sweep methods. However, it might use sophisticated data structures for the "sweep". And I believe it should use it for good performance. -- With best regards, Alexander Korotkov.
[HACKERS] A typo fix in a comment in xlog.c
Hello, I found a duplicate words in the comment of StartupXLOG@xlog.c and the attached patch fixes it. Essentially the fix is in one line as follows, - * We're in recovery, so unlogged relations relations may be trashed + * We're in recovery, so unlogged relations may be trashed But I did fill-paragraph for the fixed comment so the patch replaces a little bit more. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0c301b2..04dec4e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6354,10 +6354,10 @@ StartupXLOG(void) CheckRequiredParameterValues(); /* - * We're in recovery, so unlogged relations relations may be trashed - * and must be reset. This should be done BEFORE allowing Hot Standby - * connections, so that read-only backends don't try to read whatever - * garbage is left over from before. + * We're in recovery, so unlogged relations may be trashed and must + * be reset. This should be done BEFORE allowing Hot Standby + * connections, so that read-only backends don't try to read + * whatever garbage is left over from before. */ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A typo fix in a comment in xlog.c
On mån, 2012-04-16 at 21:28 +0900, Kyotaro HORIGUCHI wrote: > Hello, I found a duplicate words in the comment of > StartupXLOG@xlog.c and the attached patch fixes it. > > > Essentially the fix is in one line as follows, > > - * We're in recovery, so unlogged relations relations may be trashed > + * We're in recovery, so unlogged relations may be trashed Fixed. > But I did fill-paragraph for the fixed comment so the patch > replaces a little bit more. You might want to adjust your fill-column setting to 79, so pgindent doesn't reformat that again. Compare to what I just committed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, 2012-01-31 at 12:58 -0500, Andrew Dunstan wrote: > > On 01/30/2012 10:37 AM, Andrew Dunstan wrote: > > > > > >> Aside: is query_to_json really necessary? It seems rather ugly and > >> easily avoidable using row_to_json. > >> > > > > I started with this, again by analogy with query_to_xml(). But I agree > > it's a bit ugly. If we're not going to do it, then we definitely need > > to look at caching the output funcs in the function info. A closer > > approximation is actually: > > > >SELECT array_to_json(array_agg(q)) > >FROM ( your query here ) q; > > > > > > But then I'd want the ability to break that up a bit with line feeds, > > so we'd need to adjust the interface slightly. (Hint: don't try the > > above with "select * from pg_class".) > > > > > > I'll wait on further comments, but I can probably turn these changes > > around very quickly once we're agreed. > > > > > > > > based on Abhijit's feeling and some discussion offline, the consensus > seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array() call in SELECT array_to_json(array()); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['[,]']' ? > cheers > > andrew > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 04/16/2012 09:34 AM, Hannu Krosing wrote: based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array() call in SELECT array_to_json(array()); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['[,]']' ? The way I usually write this is: select array_to_json(array_agg(q)) from () q; It's a pity you didn't make this comment back in January when we were talking about this. I think it's too late now in this release cycle to be talking about adding the aggregate function. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
Heikki Linnakangas writes: > Can we have a "soft" hot standby conflict that doesn't kill the query, > but disables index-only-scans? Well, there wouldn't be any way for the planner to know whether an index-only scan would be safe or not. I think this would have to look like a run-time fallback. Could it be structured as "return that the page's all-visible bit is not set, instead of failing?" Or am I confused about where the conflict is coming from? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Apr 16, 2012 at 9:10 AM, Andrew Dunstan wrote: > > > On 04/16/2012 09:34 AM, Hannu Krosing wrote: >>> >>> based on Abhijit's feeling and some discussion offline, the consensus >>> seems to be to remove query_to_json. >> >> The only comment I have here is that query_to_json could have been >> replaced with json_agg, so thet you don't need to do double-buffering >> for the results of array() call in >> >> SELECT array_to_json(array()); >> >> Or is there some other way to avoid it except to wrap row_to_json() >> calls in own aggregate function which adds enclosing brackets and comma >> separator ( like this : '['[,]']' ? >> >> > > The way I usually write this is: > > select array_to_json(array_agg(q)) > from () q; > > It's a pity you didn't make this comment back in January when we were > talking about this. I think it's too late now in this release cycle to be > talking about adding the aggregate function. I find array_agg to be pretty consistently slower than array()...although not much, say around 5-10%. I use array_agg only when grouping. try timing select array_to_json(array_agg(v)) from (select v from generate_series(1,100) v) q; vs select array_to_json(array(select v from generate_series(1,100) v)); I agree with Hannu but as things stand if I'm trying to avoid the extra buffer I've found myself doing the final aggregation on the client -- it's easy enough. BTW, I'm using the json stuff heavily and it's just absolutely fantastic. Finally I can write web applications without wondering exactly where it was that computer science went off the rails. I've already demoed a prototype app that integrates pg directly with the many high quality js libraries out there and it makes things very easy and quick by making writing data services trivial. Data pushes are still quite a pain but I figure something can be worked out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote: > Yes, just like when the readahead window set to 256, FETCH 1024 > will iterate through 4 windows or FETCH 64 iterates through the > same window 4 times. This is the idea behind the "readahead window". Really? It's definitely not the idea behind FETCH 1024. Using the same window 4 times for FETCH 64 is the idea though, I agree. > How about allowing the readahead window to be resized for the > non-decorated case if the runtime encounters FETCH N and N is > greater than the previous window? To be resized by what? IMO a FETCH N should always be a FETCH N, no matter what, i.e. if the readahead window is larger, use it, but even if it's smaller we should still fetch N at the same time. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, 2012-04-16 at 10:10 -0400, Andrew Dunstan wrote: > > On 04/16/2012 09:34 AM, Hannu Krosing wrote: > >> based on Abhijit's feeling and some discussion offline, the consensus > >> seems to be to remove query_to_json. > > The only comment I have here is that query_to_json could have been > > replaced with json_agg, so thet you don't need to do double-buffering > > for the results of array() call in > > > > SELECT array_to_json(array()); > > > > Or is there some other way to avoid it except to wrap row_to_json() > > calls in own aggregate function which adds enclosing brackets and comma > > separator ( like this : '['[,]']' ? > > > > > > The way I usually write this is: > > select array_to_json(array_agg(q)) > from () q; > > It's a pity you didn't make this comment back in January when we were > talking about this. I think it's too late now in this release cycle to > be talking about adding the aggregate function. My comment is not meant to propose changing anything in 9.2. I think what we have here is absolutely fantastic :) If doing something in 9.3 then what I would like is some way to express multiple queries. Basically a variant of query_to_json(query text[]) where queries would be evaluated in order and then all the results aggregated into on json object. But "aggregation on client" as suggested by Merlin may be a better way to do it for larger result(set)s. Especially as it could enable streaming of the resultsets without having to first buffer everything on the server. If we can add something, then perhaps a "deeper" pretty_print feature samples: hannu=# \d test Table "public.test" Column |Type | Modifiers +-+--- id | integer | not null default nextval('test_id_seq'::regclass) data | text| tstamp | timestamp without time zone | default now() Indexes: "test_pkey" PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test from test),true); -[ RECORD 1 ]-+ array_to_json | [{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"}, | {"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"}] This is OK hannu=# \d test2 Table "public.test2" Column |Type | Modifiers +-+ id | integer | not null default nextval('test2_id_seq'::regclass) data2 | test| tstamp | timestamp without time zone | default now() Indexes: "test2_pkey" PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test2 from test2),true); -[ RECORD 1 ]-+--- array_to_json | [{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"}, | {"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}] This is "kind of OK" hannu=# \d test3 Table "public.test3" Column |Type | Modifiers +-+ id | integer | not null default nextval('test3_id_seq'::regclass) data3 | test2[] | tstamp | timestamp without time zone | default now() Indexes: "test3_pkey" PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test3 from test3),true); -[ RECORD 1 ]-+--- array_to_json | [{"id":1,"data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}],"tstamp":"2012-04-16 14:40:15.795947"}] but this would be nicer if printed like pythons pprint : >>> pprint.pprint(d) [{'data3': [{'data2': {'data': '0.262814193032682', 'id': 1, 'tstamp': '2012-04-05 13:21:03.235204'}, 'id': 1, 'tstamp': '2012-04-05 13:25:03.644497'}, {'data2': {'data': '0.157406373415
Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.
On Sun, Apr 15, 2012 at 12:13 AM, Thom Brown wrote: > On 14 April 2012 15:58, Fujii Masao wrote: >> On Sat, Apr 14, 2012 at 4:16 AM, Thom Brown wrote: >>> I have a question though. What happens when this is set to "write" >>> (or "remote_write" as proposed) but it's being used on a standalone >>> primary? At the moment it's not documented what level of guarantee >>> this would provide. >> >> http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION-HA >> - >> Commits made when synchronous_commit is set to on or write will >> wait until the synchronous standby responds. The response may >> never occur if the last, or only, standby should crash. >> - >> >> Is this description not enough? If not enough, how should we change >> the document? > > No, that's not what I was referring to. If you don't have a standby > (i.e. a single, isolated database cluster with no replication), and > its synchronous_commit is set to 'remote_write', what effect does that > have? It's the same effect as 'on' and 'local' do, i.e., transaction commit waits for only local WAL flush. This behavior is not documented explicitly... How should we change the document? What about adding the following into the explanation of synchronous_commit parameter (maybe the end of second paragraph of that)? - If synchronous_standby_names is not set, on, remote_write and local provide the same synchronization level; transaction commit only waits for local flush. - Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug tracker tool we need (was: Last gasp)
Dimitri Fontaine writes: > Alvaro Herrera writes: >> I've used Redmine a lot, as you know, and I only keep using it because >> it's a requirement at work. It is certainly not close to usable for >> general pgsql stuff. (Trac, which we used to use prior to Redmine, was >> certainly much worse, though). > > Same story here, still using redmine a lot, all with custom reports etc. > >> I can't say that it's all that slow, or that there's a problem with the >> code, or that the search doesn't work right (and I've never had a wiki >> edit disappear, either, and I've used that a lot). It's just the wrong >> tool altogether. > > It's indeed slow here, and I agree that's not the problem. Not the tool > we need, +1. I still fail to see how Redmine doesn't fit into requirements summarized at that wiki page[1], so that must be something other than formal requirement of being free/open software and running postgres behind (some sort of "feeling" maybe?) Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you please describe your ideal tool for the task? Given that every other existing tool likely have pissed off someone already, I guess our best bet is writing one from scratch. Or maybe there isn't really a need for a tracker? The core team have managed to live without one for so long after all... -- Regards, Alex [1] http://wiki.postgresql.org/wiki/TrackerDiscussion -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need (was: Last gasp)
On Mon, Apr 16, 2012 at 18:24, Alex wrote: > > Dimitri Fontaine writes: > >> Alvaro Herrera writes: >>> I've used Redmine a lot, as you know, and I only keep using it because >>> it's a requirement at work. It is certainly not close to usable for >>> general pgsql stuff. (Trac, which we used to use prior to Redmine, was >>> certainly much worse, though). >> >> Same story here, still using redmine a lot, all with custom reports etc. >> >>> I can't say that it's all that slow, or that there's a problem with the >>> code, or that the search doesn't work right (and I've never had a wiki >>> edit disappear, either, and I've used that a lot). It's just the wrong >>> tool altogether. >> >> It's indeed slow here, and I agree that's not the problem. Not the tool >> we need, +1. > > I still fail to see how Redmine doesn't fit into requirements summarized > at that wiki page[1], so that must be something other than formal > requirement of being free/open software and running postgres behind > (some sort of "feeling" maybe?) One thing to note is that the referenced wiki page is over a year old. And that many more things have been said on email lists than are actually in that page. But as one note - I don't believe you can drive redmine completely from email, which is certainly a requirement that has been discussed, but is not entirely listed on that page. > Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you > please describe your ideal tool for the task? > > Given that every other existing tool likely have pissed off someone > already, I guess our best bet is writing one from scratch. FWIW, I think the closest thing we've found so far would be debbugs - which IIRC doesn't have any kind of reasonable database backend, which would be a strange choice for a project like ours :) And makes many things harder... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Apr 16, 2012 at 11:19 AM, Hannu Krosing wrote: > If doing something in 9.3 then what I would like is some way to express > multiple queries. Basically a variant of > > query_to_json(query text[]) > > where queries would be evaluated in order and then all the results > aggregated into on json object. I personally don't like variants of to_json that push the query in as text. They defeat parameterization and have other issues. Another point for client side processing is the new row level processing in libpq, so I'd argue that if the result is big enough to warrant worring about buffering (and it'd have to be a mighty big json doc), the best bet is to extract it as rows. I'm playing around with node.js for the json serving and the sending code looks like this: var first = true; query.on('row', function(row) { if(first) { first = false; response.write('['); } else response.write(','); response.write(row.jsondata); }); query.on('end', function() { response.write(']'); response.end(); }); -- not too bad merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE
Excerpts from Nikhil Sontakke's message of lun abr 16 03:56:06 -0300 2012: > > > Displace yes. It would error out if someone says > > > > > > ALTER TABLE ONLY... CHECK (); > > > > > > suggesting to use the ONLY with the CHECK. > > > > I'd say the behavior for that case can revert to the PostgreSQL 9.1 > > behavior. > > If the table has children, raise an error. Otherwise, add an inheritable > > CHECK constraint, albeit one lacking inheritors at that moment. > > > Ok, that sounds reasonable. Good, I agree with that too. Are you going to submit an updated patch? I started working on your original a couple of days ago but got distracted by some family news here. I'll send it to you so that you can start from there, to avoid duplicate work. > Another thing that we should consider is that if we are replacing ONLY with > NO INHERIT, then instead of just making a cosmetic syntactic change, we > should also replace all the is*only type of field names with noinherit for > the sake of completeness and uniformity. Yeah, I was considering the same thing. "conisonly" isn't a very good name on its own (it only made sense because the ONLY came from "ALTER TABLE ONLY"). -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need
Magnus Hagander writes: > One thing to note is that the referenced wiki page is over a year old. > And that many more things have been said on email lists than are > actually in that page. Yeah, I went through it briefly and rather important concern seem to have been raised by Tom Lane in this msg: http://archives.postgresql.org/pgsql-hackers/2011-05/msg01480.php This paragraph: > The real question is, who is going to keep it up to date? GSM has the > right point of view here: we need at least a couple of people who are > willing to invest substantial amounts of time, or it's not going to go > anywhere. Seeing that we can barely manage to keep the mailing list > moderator positions staffed, I'm not hopeful. > But as one note - I don't believe you can drive redmine completely > from email, which is certainly a requirement that has been discussed, > but is not entirely listed on that page. Ugh, what do you mean by that? You can change any attribute (like status, priority, assigned person, etc.) of a ticket via email. Anything else? > FWIW, I think the closest thing we've found so far would be debbugs - > which IIRC doesn't have any kind of reasonable database backend, which > would be a strange choice for a project like ours :) And makes many > things harder... What stops us from writing a postgres backend for debbugs if it is so brilliant on handing email and stuff? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
2012-04-16 18:04 keltezéssel, Michael Meskes írta: On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote: Yes, just like when the readahead window set to 256, FETCH 1024 will iterate through 4 windows or FETCH 64 iterates through the same window 4 times. This is the idea behind the "readahead window". Really? It's definitely not the idea behind FETCH 1024. Using the same window 4 times for FETCH 64 is the idea though, I agree. OK. I would like to stretch your agreement a little. :-) Can we state that caching means that if the cache should serve the incoming request(s) until the request spills out of it? If your answer to the above is "yes", then please consider this case: - readahead window is 256 (via ECPGFETCHSZ) - FETCH 64 was executed twice, so you are in the middle of the cache - FETCH 1024 is requested So, if I understand you correctly, you expect this scenario: - set a "one-time" readahead window size ( N - # of rows that can be served = 1024 - 128 = 768) so the next FETCH by the runtime will fullfill this request fully - serve the request's first 128 rows from the current cache - for the 129th row, FETCH 768 will be executed - all subsequent requests use the old readahead size How about allowing the readahead window to be resized for the non-decorated case if the runtime encounters FETCH N and N is greater than the previous window? To be resized by what? By the new FETCH request. Instead of the above, I imagined this: - the runtime notices that the new request is larger than the current readahead window size, modifies the readahead window size upwards, so the next FETCH will use it - serve the request's first 128 rows from the current cache - for the 129th row, FETCH 1024 will be executed and the remaining 768 rows will be served from the new cache - all subsequent requests use the new readahead size, 1024 IMO a FETCH N should always be a FETCH N, no matter what This part of your statement contradicts with caching. :-) , i.e. if the readahead window is larger, use it, but even if it's smaller we should still fetch N at the same time. So, there can be occasional one-time larger requests but smaller ones should apply the set window size, right? Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
On Mon, Apr 16, 2012 at 9:05 PM, Kyotaro HORIGUCHI wrote: > In the backported version to 9.1.3, bgwriter.c is modified > instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is > used as the equivalent of GetStandbyFlushRecPtr() in 9.2. In 9,2, GetXLogReplayRecPtr() should be used instead of GetStandbyFlushRecPtr(). A restartpoint is scheduled to finish before next restartpoint is triggered. A restartpoint is triggered if too much WAL files have been replayed since last restartpoint. So a restartpoint should be scheduled according to the replay location not the receive location. -* computed before calling CreateCheckPoint. The code in XLogInsert that -* actually triggers a checkpoint when checkpoint_segments is exceeded -* compares against RedoRecptr, so this is not completely accurate. -* However, it's good enough for our purposes, we're only calculating an -* estimate anyway. +* computed before calling CreateCheckPoint. The code in +* XLogInsert that actually triggers a checkpoint when +* checkpoint_segments is exceeded compares against RedoRecptr. +* Similarly, we consult WAL flush location instead on hot +* standbys and XLogPageRead compares it aganst RedoRecptr, too. +* Altough these are not completely accurate, it's good enough for +* our purposes, we're only calculating an estimate anyway. I think that basically it's better not to change the comments (i.e., not to add the line feed) if their contents are the same as previous ones, to highlight what you actually changed in the patch. Typo: RedoRecptr should be RedoRecPtr? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote: > Jeff Davis writes: > > 1. Order the ranges on both sides by the lower bound, then upper bound. > > Empty ranges can be excluded entirely. > > 2. Left := first range on left, Right := first range on right > > 3. If Left or Right is empty, terminate. > > 4. If lower(Left) > upper(Right), discard Right, goto 2 > > 5. If lower(Right) > upper(Left), discard Left, goto 2 > > 6. return (Left, Right) as joined tuple > > 7. Right := next range on right > > 8. goto 3 > > This is surely not correct in detail. As written, it will be impossible > for any tuple on the right side to be joined to more than one left-side > tuple. You will need something analogous to the mark-and-restore > rewinding logic in standard mergejoin to make this work. Every time you discard a tuple on the left, you go to step 2, which rewinds the right side back to the first non-discarded tuple. So, implemented using mark and restore: * start off with the marks at the first tuple on each side * "discard" means move the mark down a tuple * setting it back to the first range means restoring to the mark * going to the next range (step 7) just means getting another tuple, without changing the mark Only one side really needs the mark and restore logic, but it was easier to write the pseudocode in a symmetrical way (except step 7). > I will note that mergejoin is not only one of the more complicated > executor modules, but it accounts for a huge percentage of the planner's > complexity; which doesn't exactly make me sympathize with the notion of > let's-copy-and-paste-all-that. It'd be a lot better if we could build > it as a small tweak to mergejoin rather than an independent thing. > > (Having said that, I have no idea how the planner support could work > at all, because its treatment of mergejoins is intimately tied to > mergejoinable equality and EquivalenceClasses and PathKeys, which don't > seem readily extensible to the range situation.) I think this is the more important problem area. It's clear that I'll need to do some more investigation into the planning. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane wrote: > Heikki Linnakangas writes: >> Can we have a "soft" hot standby conflict that doesn't kill the query, >> but disables index-only-scans? > > Well, there wouldn't be any way for the planner to know whether an > index-only scan would be safe or not. I think this would have to look > like a run-time fallback. Could it be structured as "return that the > page's all-visible bit is not set, instead of failing?" Or am I > confused about where the conflict is coming from? The starting point is that HS now offers 4 different mechanisms for avoiding conflicts, probably the best of which is hot standby feedback. So we only need to improve things if those techniques don't work for people. So initially, my attitude is lets throw a conflict and wait for feedback during beta. If we do need to do something, then introduce concept of a visibility conflict. On replay: If feedback not set, set LSN of visibility conflict on PROCs that conflict, if not already set. On query: If feedback not set, check conflict LSN against page, if page is later, check visibility. In terms of optimisation, I wouldn't expect to have to adjust costs at all. The difference would only show for long running queries and typically they don't touch too much new data as a fraction of total. The cost model for index only is pretty crude anyhow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?
On Mon, 2012-04-16 at 13:09 +0300, Heikki Linnakangas wrote: > On 13.04.2012 19:17, Guillaume Lelarge wrote: > > On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote: > >> On 08.04.2012 11:59, Guillaume Lelarge wrote: > >>> There could be a good reason which would explain why we can't (or don't > >>> want to) do this, but I don't see it right now. > >> > >> Me neither, except a general desire to keep internals hidden. I propose > >> the attached. > > > > Sounds good to me. I would love to see this happening in 9.2. > > Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed > MSVC install.pm to also copy the header file. > Thanks. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas wrote: > Can we have a "soft" hot standby conflict that doesn't kill the query, but > disables index-only-scans? Yeah, something like that seems possible. For example, suppose the master includes, in each mark-heap-page-all-visible record, the newest XID on the page. On the standby, we keep track of the most recent such XID ever seen in shared memory. After noting that a page is all-visible, the standby cross-checks its snapshot against this XID and does the heap fetch anyway if it's too new. Conceivably this can be done with memory barriers but not without locks: we must update the XID in shared memory *before* marking the page all-visible, and we must check the visibility map or page-level bit *before* fetching the XID from shared memory - but the actual reads and writes of the XID are atomic. Now, if an index-only scan suffers a very high number of these "soft conflicts" and consequently does a lot more heap fetches than expected, performance might suck. But that should be rare, and could be mitigated by turning on hot_standby_feedback. Also, there might be some hit for repeatedly reading that XID from memory. Alternatively, we could try to forbid index-only scan plans from being created in the first place *only when* the underlying snapshot is too old. But then what happens if a conflict happens later, after the plan has been created but before it's fully executed? At that point it's too late to switch gears, so we'd still need something like the above. And the above might be adequate all by itself, since in practice index-only scans are mostly going to be useful when the data isn't changing all that fast, so most of the queries that would be cancelled by "hard" conflicts wouldn't have actually had a problem anyway. > In the long run, perhaps we need to store XIDs in the visibility map instead > of just a bit. If you we only stored one xid per 100 pages or something like > that, the storage overhead would not be much higher than what we have at the > moment. But that's obviously not going to happen for 9.2... Well, that would also have the undesirable side effect of greatly reducing the granularity of the map. As it is, updating a tiny fraction of the tuples in the table could result in the entire table ceasing to be not-all-visible if you happen to update exactly one tuple per page through the entire heap. Or more generally, updating X% of the rows in the table can cause Y% of the rows in the table to no longer be visible for index-only-scan purposes, where Y >> X. What you're proposing would make that much worse. I think we're actually going to find that the current system isn't tight enough; my suspicion is that users are going to complain that we're not aggressive enough about marking pages all-visible, because autovac won't kick in until updates+deletes>20%, but (1) index-only scans also care about inserts and (2) by the time we've got 20% dead tuples index-only scans may well be worthless. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch wrote: > Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement > to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a > positive visibility map bit? That is to say, PD_ALL_VISIBLE is fully a hint > in its role as a witness of tuple visibility, but it is not a hint in its role > as a witness of the visibility map status? Exactly. > The large comment in SetBufferCommitInfoNeedsSave() seems incorrect. On > systems with weaker memory ordering, the FlushBuffer() process's clearing of > BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave() > process until some time after the former retrieved buffer contents from shared > memory. True. > Memory barriers could make the comment true, but we should probably > just update the comment to explain that a race condition may eat the update > entirely. Agreed. > Incidentally, is there a good reason for MarkBufferDirty() to > increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave() > not to do so? Looks like an oversight. Also agreed. >> 2. More seriously, lazy_scan_heap() releases the buffer lock before >> setting the all-visible bit on the heap. This looks just plain wrong >> (and it's my fault, to be clear). Somebody else can come along after >> we've released the buffer lock and mark the page not-all-visible. >> That concurrent process will check the visibility map bit, find it >> already clear, and go on its merry way. Then, we set the visibility >> map bit and go on our merry way. Now we have a not-all-visible page >> with the all-visible bit set in the visibility map. Oops. > > Hmm, indeed. This deserves its own open item. Also agreed. > Good point. We could finagle things so the standby notices end-of-recovery > checkpoints and blocks the optimization for older snapshots. For example, > maintain an integer count of end-of-recovery checkpoints seen and store that > in each Snapshot instead of takenDuringRecovery; use 0 on a master. When the > global value is greater than the snapshot's value, disable the optimization > for that snapshot. I don't know whether the optimization is powerful enough > to justify that level of trouble, but it's an option to consider. I suspect not. It seems we can make index-only scans work without doing something like this; it's only the sequential-scan optimization we might lose. But nobody's ever really complained about not having that, to my knowledge. > For a different approach, targeting the fragility, we could add assertions to > detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a > full-page image. The holes are narrow enough that I fear such cases would be detected only on high-velocity production systems, which is not exactly where you want to find out about such problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote: > There is a good overview article about spatial joins. > http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf Thank you, that's exactly the kind of overview I was looking for. > It shows that there is a lot of methods based on building temporaty > indexes. It might mean that building of temporary GiST index is not a > bad idea. That had occurred to me, but I was hesitant to only use temp indexes. It still doesn't really offer a good solution when both sides of the join are relatively large (because of random I/O). Also the build speed of the index would be more important than it is now. On the other hand, if I tackle the problem using temp indexes, it would be a more general solution useful for many problems (for instance, we really need a better solution when the result of a subquery doesn't fit in a work_mem-sized hashtable). We may end up with some kind of combination (as the sweep seems to be; see below). > Also there are methods which looks like extension of Jeff Davis > proposal to the multidimensional case. It is Plane Sweep and > External Plane Sweep methods. However, it might use sophisticated data > structures for the "sweep". And I believe it should use it for good > performance. That method looks closer to what I had in mind. My Range Join proposal is essentially the same as the sweep method except that it only joins on one dimension, and the rest of the dimensions have to be checked with RECHECK. So, this still works for an N-dimensional join, as long as a single dimension is fairly selective. The sweep method seems to do the first dimension with the sweep method, and maintains a changing index that it uses to do the lookup. In other words, it's essentially just a way to do the RECHECK on the other dimensions in less than O(n) time. So, if the sweep dimension is not selective at all, this degenerates to the temporary index method (or something close, anyway). The fact that, in the sweep method, there is still one "special" dimension, makes me think that it will be easier to make the API work based on ranges (which is a big win because postgres already knows about ranges). If so, that makes it easier to divide the project into stages, because we could get it working for ranges and then extend it to support other dimensions more efficiently later. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans vs. Hot Standby, round two
On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs wrote: > If we do need to do something, then introduce concept of a visibility > conflict. > > On replay: > If feedback not set, set LSN of visibility conflict on PROCs that > conflict, if not already set. > > On query: > If feedback not set, check conflict LSN against page, if page is > later, check visibility. Hmm, should have read the whole thread before replying. This similar to what I just proposed in response to Heikki's message, but using LSN in lieu of (or maybe you mean in addition to) XID. I don't think we can ignore the need to throw conflicts just because hot_standby_feedback is set; there are going to be corner cases, for example, when it's just recently been turned on and the master has already done cleanup; or if the master and standby have recently gotten disconnected for even just a few seconds. But fundamentally we all seem to be converging on some variant of the "soft conflict" idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 04/15/2012 12:01 PM, Tom Lane wrote: Where I think we have been fooling ourselves is in failing to tell the difference between a patch that is committable in the current fest, versus one that is still WIP and is going to need more development time. I wonder if this bit of state might be worth extending the UI to include. Just a little toggle box with the options "WIP" and "Commit Submission" there. [I am unattached to those particular terms] I think everyone is clear that Command Triggers is an example that reflects a more general problem seen many times before; I'll continue using it as a fresh example here without meaning to pick on Dimitri in particular. If Dimitri had submitted that in January while ticking "Commit Submission", it might have sparked a talk about the difference in expectations earlier. If we use Robert as the bad guy watching the CF progress (again, just as an example, not trying to paint him with that title), I think it would have been easier for him to write an e-mail like this: "It's March now, and this feature has been under heavy review and development for 6 weeks. This looks more like a WIP feature to me, not one that arrived as a Commit Submission. Is there any useful subset to consider instead?" Compared to the current way such things happen, that's a more factual style of message without as much emotion or judging, and one that can be raised much earlier in the CF cycle. I think it will be easier for people to write those, compared with having to be the person saying "this isn't ready to commit" only at the end. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving our clauseless-join heuristics
Amit Kapila writes: > For this kind of query, currently (referring 9.0.3 code) also it considers > join of b,c and b,d. > As there is no join clause between b,c,d so it will go in path of > make_rels_by_clauseless_joins() where it considers join of b,c and b,d. > In this kind of query, if the suggestion by me in below mail is followed, > then it will consider joining a,b a,c a,d at level-2 in function > make_rels_by_clause_joins() which it currently doesn't do which may generate > useless join paths. > However in real-world scenario's this kind of queries where 2 cols of > different tables are > used in one side expression (b.y + c.z) of where clause may be less. > On the other hand, when we come to consider d, it will have no join > clauses so we will consider joining it to each other rel in turn. > When it come to consider d, as at level -2 it only consider later rels. So > it should not consider joining with each other rel. I might still be misunderstanding, but I think what you are suggesting is that in the loop in make_rels_by_clause_joins, if we find that the old_rel doesn't have a join clause/restriction with the current other_rel, we check to see whether other_rel has any join clauses at all, and force the join to occur anyway if it doesn't. I can't really get excited about doing it that way instead of the current way. In the first place, it seems to miss the need to clauseless-join two relations when neither of them have any join clauses, for instance plain old "SELECT * FROM a, b". So you still need something like the make_rels_by_clauseless_joins code path, and it's not entirely clear how to avoid duplicated work there. In the second place, instead of N tests to see whether a rel lacks any join clauses, we'd now have something approaching O(N^2) such tests, in the typical case where most base rels are directly joined to only a few other rels. So it seems to make things slower for little obvious benefit. In general, queries with join-clause-less rels are pretty uncommon, so I don't want to introduce extra work into make_rels_by_clause_joins to handle the case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Sun, 2012-04-15 at 23:18 -0700, Darren Duncan wrote: > Your proposal makes me think of something similar which might be useful, > INclusion constraints. As "exclusion constraints" might be thought of like a > generalization of unique/key constraints, "inclusion constraints" are like a > generalization of foreign key constraints. The "inclusion constraints" > basically allow some comparison operator other than is-equal to test if > values > in one table match values in another table, and the constraint allows the > former > if the test results in true. An example of said inclusion test is whether > the > range in one table is contained in a range in another table. I assume/hope > that, similarly, now that we have range types in 9.2, that the existing > "exclusion constraints" can be used with range comparison operators. Yes, Range Foreign Keys are one of the loose ends for Range Types that I'd like to wrap up. > As to your actual proposal, it sounds like a generalization of the relational > join or set intersection operator where instead of comparing sets defined in > terms of an enumeration of discrete values we are comparing sets defined by a > range, which conceptually have infinite values depending on the data type the > range is defined over. But if we're doing this, then it would seem to make > sense to go further and see if we have set analogies for all of our > relational > or set operators, should we want to do work with non-discrete sets. > > Now this sounds interesting in theory, but I would also assume that these > could > be implemented by an extension in terms of existing normal relational > operators, > where each range value is a discrete value, combined with operators for > unioning > or differencing etc ranges. A relation of ranges effectively can represent a > discontinuous range; in that case, the empty discontinuous range is also > canonically representable by a relation with zero tuples. > > Jeff, I get the impression your proposal is partly about helping performance > by > supporting this internally, rather than one just defining it as a SQL > function, > am I right? Per my proposal, the problem statement is that it's "slow", so speeding it up is really the entire proposal ;) Extending the syntax might be interesting as well, but I don't have a proposal for that. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
Alex wrote: Jay Levitt writes: Alex wrote: I didn't follow this whole thread, but have we considered Redmine[1]? As the resident "Ruby is shiny, let's do everything in Rails on my MacBook" guy, I'd like to make a statement against interest: I've tried Redmine a few times and it's been painful. Much of the codebase is deprecated, it's slow, it has no meaningful search (in 2012?!), I've seen wiki edits disappear, and at the moment pulling up its own FAQ page at redmine.org times out. Yay, that's totally FUD to me. You're right, it was. My bad. Someday I will find the balance between precision and concision. Could you please elaborate a bit on your points? Deprecated codebase? Let me guess... It runs on an outdated version of Rails (2.3) but only because Rails is changing so rapidly, I believe. There is work in progress[1] to move to the supported branch Rails-3.x. I wasn't even thinking of that; I know many production systems still run on Rails 2.3, and in fact it probably even performs better for some workloads. 3.x is a mixed bag. I don't hold that against Redmine. But it's still FUD, because I can't remember where I saw this information. So: withdrawn. Slow? Do you have any data to back this point up? No measurable data; just a sigh of relief when switching from Redmine to Github - and GitHub ain't a speed demon. In general, I've seen multi-second page load times on crazy-simple things like wiki edits; this was on a hosted provider (sourcerepo.com), but they also hosted our git repo and we had no speed problems there. No meaningful search, eh? Works for me. Redmine searches return partial-word matches, and there's no way to disable that. Searching for "test" finds "latest". To me, that's broken. Also, the UI is very 5 years ago; e.g., "compare revisions" uses the same columns-of-radio-buttons approach as MediaWiki. If the goal is a tool to reduce friction and increase involvement, you want a smoother UX. Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Aggressive memory consumption in {ts,array}_typanalyze
Both $SUBJECT functions pass to hash_create() an expected hash table size of 1 * attstattarget. Based on header comments, this represents a near-worst case. These typanalyze functions scan the hash tables sequentially, thereby visiting the entire allocation. Per the recommendation in comments at hash_create(), we should be more conservative. On my system, this tiny test case runs in 2.8s and dirties 1.0 GiB of local memory: CREATE TEMP TABLE t AS SELECT '{}'::int[]; SET default_statistics_target = 1; ANALYZE t; Rather arbitrarily, I reduced the hash_create() size hint by 99.9%, to the width of the histograms destined for pg_statistic. This streamlined the test case to <20ms runtime and 2 MiB of memory. To verify that nothing awful happens when the hash table sees considerable dynamic growth, I used a subject table entailing a 9M-element hash table at ANALYZE time: CREATE UNLOGGED TABLE t AS SELECT array[n,300+n,600+n] FROM generate_series(1,300) t(n); Unpatched master takes 15s and dirties 2.1 GiB; patched takes 15s and dirties 1.2 GiB. The timing noise overlapped any systematic difference, but the patched version might have been around 500ms slower. Based on that, I'm comfortable trusting that improving smaller cases in this way will not greatly harm larger cases. The lack of field complaints about ts_typanalyze() resource usage does argue against the need for a change here, but I think adding array_typanalyze() in PostgreSQL 9.2 significantly increases our risk exposure. Sites may have cranked up the statistics target on array columns to compensate for the lack of explicit statistical support. Every cluster has several array columns in the system catalogs. The size hint I chose is fairly arbitrary. Any suggestions for principled alternatives? Thanks, nm *** a/src/backend/tsearch/ts_typanalyze.c --- b/src/backend/tsearch/ts_typanalyze.c *** *** 186,192 compute_tsvector_stats(VacAttrStats *stats, hash_ctl.match = lexeme_match; hash_ctl.hcxt = CurrentMemoryContext; lexemes_tab = hash_create("Analyzed lexemes table", ! bucket_width * 7, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); --- 186,192 hash_ctl.match = lexeme_match; hash_ctl.hcxt = CurrentMemoryContext; lexemes_tab = hash_create("Analyzed lexemes table", ! num_mcelem, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); *** a/src/backend/utils/adt/array_typanalyze.c --- b/src/backend/utils/adt/array_typanalyze.c *** *** 282,288 compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, elem_hash_ctl.match = element_match; elem_hash_ctl.hcxt = CurrentMemoryContext; elements_tab = hash_create("Analyzed elements table", ! bucket_width * 7, &elem_hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); --- 282,288 elem_hash_ctl.match = element_match; elem_hash_ctl.hcxt = CurrentMemoryContext; elements_tab = hash_create("Analyzed elements table", ! num_mcelem, &elem_hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Tue, Apr 17, 2012 at 12:12 AM, Jeff Davis wrote: > On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote: > > > There is a good overview article about spatial joins. > > http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf > > Thank you, that's exactly the kind of overview I was looking for. > > > It shows that there is a lot of methods based on building temporaty > > indexes. It might mean that building of temporary GiST index is not a > > bad idea. > > That had occurred to me, but I was hesitant to only use temp indexes. It > still doesn't really offer a good solution when both sides of the join > are relatively large (because of random I/O). Also the build speed of > the index would be more important than it is now. > Note, that amount of random I/O during GiST index build significanlty dicreased after my GSoC project for buffering GiST index build. However, GiST index build is still quite CPU-expensive. This problem could be evaded by support of another methods of index build (another than producing a lot of penalty and picksplit calls). Hilbert curve and similar methods could help here. Implementation of such methods would require extension of GiST interface. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Memory usage during sorting
On 14 April 2012 14:34, Peter Geoghegan wrote: > FWIW, I started playing with adding timsort to Postgres last night: > > https://github.com/Peter2ndQuadrant/postgres/tree/timsort I've fixed this feature-branch so that every qsort_arg call site (including the tuplesort specialisations thereof) call timsort_arg instead. All but 4 regression tests pass, but they don't really count as failures, since they're down to an assumption in the tests that the order certain tuples appear should be the same as our current quicksort implementation returns them, even though, in these problematic cases, that is partially dictated by implementation - our quicksort isn't stable, but timsort is. I think that the original tests are faulty, but I understand that they're only expected to provide smoke testing. I did have to fix one bug in the implementation that I found, which was an assumption that comparators would only return one of {-1, 0, 1}. The C standard requires only that they return negative, zero or positive values, as required. I also polished the code a bit, and added more comments. I haven't actually quantified the benefits yet, and have no numbers to show just yet, but I suspect it will prove to be a fairly compelling win in certain situations. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis wrote: > That had occurred to me, but I was hesitant to only use temp indexes. It > still doesn't really offer a good solution when both sides of the join > are relatively large (because of random I/O). Also the build speed of > the index would be more important than it is now. The thing I like most about temp indexes is that they needn't be temporary. I'd like to see something along the lines of demand-created optional indexes, that we reclaim space/maintenance overhead on according to some cache management scheme. More space you have, the more of the important ones hang around. The rough same idea applies to materialised views. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need
Alex wrote: I still fail to see how Redmine doesn't fit into requirements summarized at that wiki page[1], so that must be something other than formal requirement of being free/open software and running postgres behind (some sort of "feeling" maybe?) Well, if those requirements are in fact requirements, Redmine could suit the purpose (perhaps with some custom extensions). But yes, Redmine "feels" wrong to me, though I have never been particularly happy with *any* self-hosted bug tracker. Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT - Redmine feels the least-worst to me; it's about equal to Trac, and it's in Ruby so I could theoretically(!) improve it. I think the biggest missing pieces in Redmine aside from custom CF stuff are: better search, single sign-on (it requires Yet Another Login), a better UX (AJAX, syntax highlighting) and better git integration (a la pull requests, where private git commits = patches). Those are some pretty big pieces. I don't think Redmine out-of-the-box would improve either CFs or community involvement. Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you please describe your ideal tool for the task? My opinion isn't all that important, since I currently have an infinite opinion-to-contribution ratio, but in my unicorniverse: We'd accept that open source hasn't always produced great UX, we'd use GitHub's issue tracker, allow volunteers to do "bug wrangling" triage via tags, use GitHub hooks to integrate with the existing CF app, and write archiving tools that would let us easily export everything off of GitHub for when (a) something better comes along or (b) GitHub pops out of existence or adds egregious licensing terms like BitKeeper. Reasons: - Familiarity: Many developers already have a GitHub account and use it - Discoverability: GitHub has great SEO - Tight integration of git with patch and issue management (pull requests, fork networks, etc); eliminates ceremony rather than adding to it - Readable UI with syntax highlighting, etc - Patch commenting and git integration encourage actual review-resubmit cycles instead of "Here, look, I fixed it for you" reviews - Two-way email/web integration - Meets Tom's "would be sort of neat" criteria[1] - Could easily implement Simon's "pony" criteria[2] with tags and API - Easily extensible with API and hooks - Subjectively: Its design encourages better community and core interactions than any I've seen in 25 years. GitHub could well be a non-starter, but if third-party-dependence is really the holdup, I'd volunteer to write the tools - in fact, a google of [export issues from github] shows a few that might already suffice. Given that every other existing tool likely have pissed off someone already, I guess our best bet is writing one from scratch. ISTR there's a great "Writing your own bug tracker is an anti-pattern" blog, but I can't find it anymore. Or maybe there isn't really a need for a tracker? The core team have managed to live without one for so long after all... As an end-user, I've reported exactly one bug in a release version of Postgres, and it was fixed (and back-ported!) the next day. So I really can't complain about the tracking of actual bugs. Sounds like we do need something better for CF/patch workflow, tho. Jay [1] Tom wrote: Now what would be sort of neat is if we had a way to keep all the versions of patch X plus author and reviewer information, links to reviews and discussion, etc. in some sort of centralized place [2] Simon wrote: My I-Want-a-Pony idea is some kind of rating system that allows us all to judge patches in terms of importance/popularity, complexity and maturity. I guess a Balanced Scorecard for the development process. So we can all see whats going on. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
Simon Riggs wrote: I'd like to see something along the lines of demand-created optional indexes, that we reclaim space/maintenance overhead on according to some cache management scheme. More space you have, the more of the important ones hang around. The rough same idea applies to materialised views. +10; this sort of demand-driven optimization could be the database equivalent of Java's HotSpot (which accomplished the amazing task of making Java kinda fastish sometimes). Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Apr 16, 2012, at 1:40 AM, Jeff Davis wrote: > See attached SQL for example. The > Problem statement: slow. Nested loops are the only option, although they > can benefit from an inner GiST index if available. But if the join is > happening up in the plan tree somewhere, then it's impossible for any > index to be available. Hmm. This sounds like something that Tom's recent work on parameterized plans ought to have fixed, or if not, it seems closely related. And by "this" I mean specifically the ability to use a GiST index to drive a nested loop that is higher up in the plan tree than the immediate parent of the index scan. This is not an argument against your proposal, just an observation. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need
On 04/16/2012 09:24 AM, Alex wrote: Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you please describe your ideal tool for the task? Given that every other existing tool likely have pissed off someone already, I guess our best bet is writing one from scratch. Or maybe there isn't really a need for a tracker? The core team have managed to live without one for so long after all... I believe the biggest hurdle for many hackers is that in redmine, email is not a first class citizen. The majority of hackers are never going to want to go into a web interface to get something done, they live in VI/Emacs and the command line. One thing that redmine definitely breaks is proper handling of attachments in email, thus the first thing moving to redmine would break would be patch submission. JD -- Regards, Alex [1] http://wiki.postgresql.org/wiki/TrackerDiscussion -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
Robert Haas writes: > On Apr 16, 2012, at 1:40 AM, Jeff Davis wrote: >> See attached SQL for example. The >> Problem statement: slow. Nested loops are the only option, although they >> can benefit from an inner GiST index if available. But if the join is >> happening up in the plan tree somewhere, then it's impossible for any >> index to be available. > Hmm. This sounds like something that Tom's recent work on > parameterized plans ought to have fixed, or if not, it seems closely > related. Not really. It's still going to be a nestloop, and as such not terribly well suited for queries where there are a lot of matchable rows on both sides. The work I've been doing is really about making nestloops usable in cases where join order restrictions formerly prevented it --- but Jeff's complaint has nothing to do with that. (This thought also makes me a bit dubious about the nearby suggestions that more indexes will fix it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need (was: Last gasp)
On Mon, Apr 16, 2012 at 06:29:47PM +0200, Magnus Hagander wrote: > FWIW, I think the closest thing we've found so far would be debbugs - > which IIRC doesn't have any kind of reasonable database backend, which > would be a strange choice for a project like ours :) And makes many > things harder... FWIW, Don Armstrong (the main debbugs hacker these days I believe) recently posted a blog post about a more obscure feature of debuugs (forcemerge), where he finishs with "so we can eventually keep a postgresql database updated in addition to the flatfile database.": http://www.donarmstrong.com/posts/debbugs_forcemerge/ I don't know whether this implies a full PostgreSQL backend or just a helper DB for some part of the functionality, but it might be something to keep watching. Regards, Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Slow temporary tables when using sync rep
Hi, I've noticed that when using synchronous replication (on 9.2devel at least), temporary tables become really slow: thom@test=# create temporary table temp_test (a text, b text); CREATE TABLE Time: 16.812 ms thom@test=# SET synchronous_commit = 'local'; SET Time: 2.739 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 3.911 ms thom@test=# SET synchronous_commit = 'remote_write'; SET Time: 2.826 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 831.384 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 1700.154 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 4976.853 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 5294.213 ms thom@test=# insert into temp_test (a, b) values ('one', 'two'); It appears to be taking exactly 6 seconds between each transaction, but as if it's only attempting to complete every 6 seconds like a heartbeat: thom@test=# insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 141.586 ms INSERT 0 1 Time: 6009.059 ms INSERT 0 1 Time: 6009.305 ms INSERT 0 1 Time: 6009.610 ms thom@test=# begin; BEGIN Time: 0.469 ms thom@test=*# insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two');insert into temp_test (a, b) values ('one', 'two'); INSERT 0 1 Time: 0.841 ms INSERT 0 1 Time: 0.395 ms INSERT 0 1 Time: 0.354 ms INSERT 0 1 Time: 0.419 ms thom@test=*# end; COMMIT Time: 5649.679 ms This doesn't apply to regular tables: thom@test=# create table test (a text, b text); CREATE TABLE Time: 18.806 ms thom@test=# SET synchronous_commit = 'local'; SET Time: 2.751 ms thom@test=# insert into test (a, b) values ('one', 'two'); INSERT 0 1 Time: 6.546 ms thom@test=# SET synchronous_commit = 'remote_write'; SET Time: 2.713 ms thom@test=# insert into test (a, b) values ('one', 'two'); INSERT 0 1 Time: 7.257 ms thom@test=# insert into test (a, b) values ('one', 'two'); INSERT 0 1 Time: 6.308 ms thom@test=# insert into test (a, b) values ('one', 'two'); INSERT 0 1 Time: 8.871 ms Is this a known problem? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slow temporary tables when using sync rep
On Mon, Apr 16, 2012 at 6:27 PM, Thom Brown wrote: > Hi, > > I've noticed that when using synchronous replication (on 9.2devel at > least), temporary tables become really slow: > > Since temporary tables are only present until the session ends (or possibly only until a commit), why are they replicated at all? BTW, should we have an entry in the index for 'temporary tables? -- Mike Nolan
Re: [HACKERS] Improving our clauseless-join heuristics
>>I might still be misunderstanding, but I think what you are suggesting >>is that in the loop in make_rels_by_clause_joins, if we find that the >>old_rel doesn't have a join clause/restriction with the current >>other_rel, we check to see whether other_rel has any join clauses at >>all, and force the join to occur anyway if it doesn't. It is on similar lines, but the only difference is that it will try to join old_rel with other_rel list incase old_rel is not able to join with any of other_rel in the list with proper join clause between them. >>In the second place, instead of N tests to see whether a rel lacks any join clauses, >>we'd now have something approaching O(N^2) such tests, in the typical >>case where most base rels are directly joined to only a few other rels. In the typical case where most base rels are directly joined to only a few other rels, it will not go in the path where it has to try new combinations, as the idea is that if in first pass the old_rel is not able find any suitable rel to join then only it will go in second pass to try to join with other_rel list. >>In the first place, it seems to miss the need to >>clauseless-join two relations when neither of them have any join >>clauses, for instance plain old "SELECT * FROM a, b". So you still need >>something like the make_rels_by_clauseless_joins code path, and it's >>not entirely clear how to avoid duplicated work there. I believe this will not be related to clause-less joins because we have entered the function make_rels_by_clause_joins() for relation which has join clause and which can be joined to one of the other_rel. So in most cases I believe it will find one of the members in other_rel list with a proper join clause. Only in some cases when it is not able to find any of the other_rel with proper join clause between them, it will go in new path. So in a way if we see this case handling is different from handling clause-less join case. The other way like you said "is there any join clause that both these relations participate in?" Does this mean that we will evaluate the join list to see if there can be any valid join between 2 relations. Is it possible that in all scenarios or cases we will be able to find the existence of any valid join which can be possible may be not at this level but at next level. -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Tuesday, April 17, 2012 2:05 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Improving our clauseless-join heuristics Amit Kapila writes: > For this kind of query, currently (referring 9.0.3 code) also it considers > join of b,c and b,d. > As there is no join clause between b,c,d so it will go in path of > make_rels_by_clauseless_joins() where it considers join of b,c and b,d. > In this kind of query, if the suggestion by me in below mail is followed, > then it will consider joining a,b a,c a,d at level-2 in function > make_rels_by_clause_joins() which it currently doesn't do which may generate > useless join paths. > However in real-world scenario's this kind of queries where 2 cols of > different tables are > used in one side expression (b.y + c.z) of where clause may be less. > On the other hand, when we come to consider d, it will have no join > clauses so we will consider joining it to each other rel in turn. > When it come to consider d, as at level -2 it only consider later rels. So > it should not consider joining with each other rel. I might still be misunderstanding, but I think what you are suggesting is that in the loop in make_rels_by_clause_joins, if we find that the old_rel doesn't have a join clause/restriction with the current other_rel, we check to see whether other_rel has any join clauses at all, and force the join to occur anyway if it doesn't. I can't really get excited about doing it that way instead of the current way. In the first place, it seems to miss the need to clauseless-join two relations when neither of them have any join clauses, for instance plain old "SELECT * FROM a, b". So you still need something like the make_rels_by_clauseless_joins code path, and it's not entirely clear how to avoid duplicated work there. In the second place, instead of N tests to see whether a rel lacks any join clauses, we'd now have something approaching O(N^2) such tests, in the typical case where most base rels are directly joined to only a few other rels. So it seems to make things slower for little obvious benefit. In general, queries with join-clause-less rels are pretty uncommon, so I don't want to introduce extra work into make_rels_by_clause_joins to handle the case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A typo fix in a comment in xlog.c
> > But I did fill-paragraph for the fixed comment so the patch > > replaces a little bit more. > > You might want to adjust your fill-column setting to 79, so pgindent > doesn't reformat that again. Compare to what I just committed. Thank you for sugestion. I could't decide fill-column fit to every code and comments. -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join
On Mon, 2012-04-16 at 22:20 +0100, Simon Riggs wrote: > On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis wrote: > > > That had occurred to me, but I was hesitant to only use temp indexes. It > > still doesn't really offer a good solution when both sides of the join > > are relatively large (because of random I/O). Also the build speed of > > the index would be more important than it is now. > > The thing I like most about temp indexes is that they needn't be temporary. > > I'd like to see something along the lines of demand-created optional > indexes, that we reclaim space/maintenance overhead on according to > some cache management scheme. More space you have, the more of the > important ones hang around. The rough same idea applies to > materialised views. I think to make an informed decision, the next thing I need to do is hack up a prototype version of my merge join idea, and see how well it performs against an index nestloop today. It seems to me that both approaches may have merit independently. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need
> > I believe the biggest hurdle for many hackers is that in redmine, > email is not a first class citizen. The majority of hackers are never > going to want to go into a web interface to get something done, they > live in VI/Emacs and the command line. > > One thing that redmine definitely breaks is proper handling of > attachments in email, thus the first thing moving to redmine would > break would be patch submission. Right. This is not too hard to fix, however, and I didn't suggest we move to vanilla Redmine. It's hackable, so we can just bash it into shape we need (and we need a few Ruby hackers, so there's always someone in the postgres community to fix it when it breaks, of course.) -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Hello, thank you for comment. > > In the backported version to 9.1.3, bgwriter.c is modified > > instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is > > used as the equivalent of GetStandbyFlushRecPtr() in 9.2. > > In 9,2, GetXLogReplayRecPtr() should be used instead of > GetStandbyFlushRecPtr(). A restartpoint is scheduled to finish > before next restartpoint is triggered. A restartpoint is > triggered if too much WAL files have been replayed since last > restartpoint. So a restartpoint should be scheduled according > to the replay location not the receive location. I agree with it basically. But I've get confused to look into GetStandbyFlushRecPtr(). | if (XLByteLT(receivePtr, replayPtr)) | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr; | else | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr; This seems imply receivePtr may be behind replayPtr. I don't understand what condition makes it but anyway the bottom line I think is that a restartpoint should be based on WALs surely synced. So I choosed GetStandbyFlushRecPtr() to read the location. If receivePtr/restorePtr always precede or are equal to replayPtr, I prefer GetXLogReplayRecPtr() as you suggest. (And some comment about the order among these pointers might should be supplied for the part) > I think that basically it's better not to change the comments > (i.e., not to add the line feed) if their contents are the same > as previous ones, to highlight what you actually changed in the > patch. Hmm. It is a priority matter between pointing up in or compactness of a patch and consistency in outcome of that. I think the latter takes precedence over the former. Altough, I could have found a description on better balance. But more than that, I've found fill-column for this comment be too short... > Typo: RedoRecptr should be RedoRecPtr? I think that's right. I've unconsciously brought that spelling from the orignal comment. regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote: > OK. I would like to stretch your agreement a little. :-) > ... Yeah, you got a point here. > By the new FETCH request. Instead of the above, I imagined this: > - the runtime notices that the new request is larger than the current > readahead window size, modifies the readahead window size upwards, > so the next FETCH will use it > - serve the request's first 128 rows from the current cache > - for the 129th row, FETCH 1024 will be executed and the remaining > 768 rows will be served from the new cache That means window size goes up to 1024-128 for that one case? > - all subsequent requests use the new readahead size, 1024 Sounds reasonable to me. > So, there can be occasional one-time larger requests but > smaller ones should apply the set window size, right? Yes. I do agree that FETCH N cannot fetch N all the time, but please make it work like what you suggested to make sure people don't have to recompile. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
2012-04-17 05:52 keltezéssel, Michael Meskes írta: On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote: OK. I would like to stretch your agreement a little. :-) ... Yeah, you got a point here. By the new FETCH request. Instead of the above, I imagined this: - the runtime notices that the new request is larger than the current readahead window size, modifies the readahead window size upwards, so the next FETCH will use it - serve the request's first 128 rows from the current cache - for the 129th row, FETCH 1024 will be executed and the remaining 768 rows will be served from the new cache That means window size goes up to 1024-128 for that one case? I listed two scenarios. 1. occasional bump of the readahead window for large requests, for smaller requests it uses the originally set size 2. permanent bump of the readahead window for large requests (larger than previously seen), all subsequent requests use the new size Both can be implemented easily, which one do you prefer? If you always use very large requests, 1) behaves like 2) - all subsequent requests use the new readahead size, 1024 Sounds reasonable to me. So, there can be occasional one-time larger requests but smaller ones should apply the set window size, right? Yes. I do agree that FETCH N cannot fetch N all the time, but please make it work like what you suggested to make sure people don't have to recompile. Michael -- -- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote: > I listed two scenarios. > 1. occasional bump of the readahead window for large requests, >for smaller requests it uses the originally set size > 2. permanent bump of the readahead window for large requests >(larger than previously seen), all subsequent requests use >the new size > > Both can be implemented easily, which one do you prefer? > If you always use very large requests, 1) behaves like 2) I'd say let's go for #2. #1 is probably more efficient but not what the programmer asked us to do. After all it's easy to increase the window size accordingly if you want so as a programmer. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?
2012/4/16 Heikki Linnakangas : > On 13.04.2012 19:17, Guillaume Lelarge wrote: >> >> On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote: >>> >>> On 08.04.2012 11:59, Guillaume Lelarge wrote: There could be a good reason which would explain why we can't (or don't want to) do this, but I don't see it right now. >>> >>> >>> Me neither, except a general desire to keep internals hidden. I propose >>> the attached. >> >> >> Sounds good to me. I would love to see this happening in 9.2. > > > Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC > install.pm to also copy the header file. > Hello, it doesn't work for modules from contrib directory pavel ~/src/postgresql/contrib/check_plpgsql $ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE -c -o check_plpgsql.o check_plpgsql.c check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory compilation terminated. Regards Pavel > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] not null validation option in contrib/file_fdw
I updated the patch. Attached is an updated version of the patch. Changes: * fix a bug in fileGetOptions() * rename the validation option and its code to "validate_data_file" * clean up Best regards, Etsuro Fujita > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita > Sent: Monday, April 16, 2012 4:09 PM > To: 'Andrew Dunstan'; 'Shigeru HANADA' > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] not null validation option in contrib/file_fdw > > Thank you for the review. > > > -Original Message- > > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew > > Dunstan > > Sent: Friday, April 13, 2012 9:16 PM > > To: Shigeru HANADA > > Cc: Etsuro Fujita; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] not null validation option in contrib/file_fdw > > > > > > > > On 04/13/2012 07:21 AM, Shigeru HANADA wrote: > > > (2012/04/13 16:59), Etsuro Fujita wrote: > > >> I updated the patch added to CF 2012-Next [1]. Attached is the > > >> updated version of the patch. > > > I applied the patch and ran regression tests of file_fdw, and I got > > > SIGSEGV X-( > > > > > > The failure occurs in fileGetOptions, and it is caused by > > > list_delete_cell used in foreach loop; ListCell points delete target > > > has been free-ed in list_delete_cell, but foreach accesses it to get > > > next element. > > > > > > Some of backend functions which use list_delete_cell in loop use "for" > > > loop instead of foreach, and other functions exit the loop after > > > calling list_delete_cell. Since we can't stop searching non-COPY > > > options until meeting the end of the options list, we would need to > > > choose former ("for" loop), or create another list which contains > > > only valid COPY options and return it via other_options parameter. > > > > > > > Yes, the code in fileGetOptions() appears to be bogus. > > Sorry, I will fix it. > > > Also, "validate" is a terrible name for the option (and in the code) > IMNSHO. > > It's far too generic. "validate_not_null" or some such would surely be > > better. > > I thought it would be used for not only NOT NULL but also CHECK and foreign > key constraints. That is, when a user sets the option to 'true', file_fdw > verifies that each tuple meets all kinds of constraints. So, how about > "validate_data_file" or simply "validate_file"? > > Best regards, > Etsuro Fujita > > > cheers > > > > andrew > > > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To > > make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers file_fdw_notnull_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug tracker tool we need
On Mon, Apr 16, 2012 at 23:48, Jay Levitt wrote: > Alex wrote: >> >> I still fail to see how Redmine doesn't fit into requirements summarized >> at that wiki page[1], so that must be something other than formal >> requirement of being free/open software and running postgres behind >> (some sort of "feeling" maybe?) > > > Well, if those requirements are in fact requirements, Redmine could suit the > purpose (perhaps with some custom extensions). But yes, Redmine "feels" > wrong to me, though I have never been particularly happy with *any* > self-hosted bug tracker. That's probably one reason people aren't jumping on this. Because there is no tracker out there that people actually *like*... > Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT - > Redmine feels the least-worst to me; it's about equal to Trac, and it's in > Ruby so I could theoretically(!) improve it. It being in ruby is actually generally a liability in the postgresql.org world - we have very few people who actually know it. And some of those who know it no longer want to work with it. So if you're good with Ruby, and want to contribute, feel free to contact me off-list because we have some things we need help with :-) > I think the biggest missing pieces in Redmine aside from custom CF stuff > are: better search, single sign-on (it requires Yet Another Login), a better We already have redmine working with single password (though not single signon) for a limited number of projects such as pgweb. > UX (AJAX, syntax highlighting) and better git integration (a la pull > requests, where private git commits = patches). Those are some pretty big > pieces. I don't think Redmine out-of-the-box would improve either CFs or > community involvement. I think we can all agree on that :-) >> Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you >> please describe your ideal tool for the task? > > > My opinion isn't all that important, since I currently have an infinite > opinion-to-contribution ratio, but in my unicorniverse: We'd accept that > open source hasn't always produced great UX, we'd use GitHub's issue > tracker, allow volunteers to do "bug wrangling" triage via tags, use GitHub > hooks to integrate with the existing CF app, and write archiving tools that > would let us easily export everything off of GitHub for when (a) something > better comes along or (b) GitHub pops out of existence or adds egregious > licensing terms like BitKeeper. > > Reasons: > > - Familiarity: Many developers already have a GitHub account and use it Most of the more senior developers don't use github. Other than possibly as a place to store a plain git repository. So that's not really relevant. > - Discoverability: GitHub has great SEO I'm willing to bet that postgresql.org has better SEO when it comes to postgres. > - Tight integration of git with patch and issue management (pull requests, > fork networks, etc); eliminates ceremony rather than adding to it There are some things that help there, certainly. Pull requests is basically email, but fork network tracking and such can be darn useful sometimes. > - Readable UI with syntax highlighting, etc Yes, this part is very nice. > - Patch commenting and git integration encourage actual review-resubmit > cycles instead of "Here, look, I fixed it for you" reviews The amount of spam coming through that system, and the inability/unwillingness of github to even care about it is a killer argument *against* github. We have working antispam for email. The github antispam is somewhere around where email antispam was in 1994. > - Two-way email/web integration > - Meets Tom's "would be sort of neat" criteria[1] > - Could easily implement Simon's "pony" criteria[2] with tags and API > - Easily extensible with API and hooks > - Subjectively: Its design encourages better community and core interactions > than any I've seen in 25 years. > > GitHub could well be a non-starter, but if third-party-dependence is really > the holdup, I'd volunteer to write the tools - in fact, a google of [export > issues from github] shows a few that might already suffice. It *is* a non-starter, because (a) it's a third party dependency, and (b) AFAIK they don't provide *data access* to the issue trackers. If the actual issue trackers were stored in git, like the webpages are for example, it would be a different story...b >> Given that every other existing tool likely have pissed off someone >> already, I guess our best bet is writing one from scratch. > > ISTR there's a great "Writing your own bug tracker is an anti-pattern" blog, > but I can't find it anymore. I'm sure there's more than one :-) If it was that easy we would've written one already. But doing that is likely going to be a lot of work, and it needs to be maintained. (Not that picking another one means it doesn't need to be maintained - I've seen so many things broken by upgrading redmine or bugzilla that it's silly...) -- Magnus
[HACKERS] Gsoc2012 idea, tablesample
On 24.03.2012 22:12, Joshua Berkus wrote: Qi, Yeah, I can see that. That's a sign that you had a good idea for a project, actually: your idea is interesting enough that people want to debate it. Make a proposal on Monday and our potential mentors will help you refine the idea. Yep. The discussion withered, so let me try to summarize: 1. We probably don't want the SQL syntax to be added to the grammar. This should be written as an extension, using custom functions as the API, instead of extra SQL syntax. 2. It's not very useful if it's just a dummy replacement for "WHERE random() < ?". It has to be more advanced than that. Quality of the sample is important, as is performance. There was also an interesting idea of on implementing monetary unit sampling. I think this would be a useful project if those two points are taken care of. Another idea that Robert Haas suggested was to add support doing a TID scan for a query like "WHERE ctid< '(501,1)'". That's not enough work for GSoC project on its own, but could certainly be a part of it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?
On 17.04.2012 07:56, Pavel Stehule wrote: 2012/4/16 Heikki Linnakangas: Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC install.pm to also copy the header file. Hello, it doesn't work for modules from contrib directory pavel ~/src/postgresql/contrib/check_plpgsql $ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE -c -o check_plpgsql.o check_plpgsql.c check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory compilation terminated. Hmm, the makefile rule I added copies the plpgsql.h file to include/server directory when you do "make install". That makes the file available when you build with USE_PGXS=1, without access to the source tree, but doesn't change the situation when you build inside contrib. If you plop the module directly to contrib, I guess you'll have to do CFLAGS += -I$(top_srcdir)/src/pl/plpgsql/src That's what pldebugger has always done. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
This is new version of the patch. I replaced GetStandbyFlushRecPtr with GetXLogReplayRecPtr to check progress of checkpoint following Fujii's sugestion. The first one is for 9.2dev, and the second is 9.1.3 backported version. === By the way, I took a close look around there, > I agree with it basically. But I've get confused to look into > GetStandbyFlushRecPtr(). > > | if (XLByteLT(receivePtr, replayPtr)) > | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr; > | else > | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr; - receivePtr seems always updated just after syncing received xlog. - replayPtr is updated just BEFORE xlog_redo operation, and - restorePtr is updated AFTER xlog_redo(). - And, replayPtr seems not exceeds receivePtr. These seems quite reasonable. These conditions make following conditional expression. restorePtr <= replayPtr <= receivePtr But XLByteLT(recievePtr, replayPtr) this should not return true under the condition above.. Something wrong in my assumption? Anyway, I understand here that you say the location returned by GetXLogReplayRecPtr() is always flushed. -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index c9473f7..c2fafbf 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -491,8 +491,8 @@ CheckpointerMain(void) * Initialize checkpointer-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = +do_restartpoint ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -731,28 +731,30 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_segments is exceeded - * compares against RedoRecptr, so this is not completely accurate. - * However, it's good enough for our purposes, we're only calculating an - * estimate anyway. + * computed before calling CreateCheckPoint. The code in + * XLogInsert that actually triggers a checkpoint when + * checkpoint_segments is exceeded compares against RedoRecPtr. + * Similarly, we consult WAL replay location instead on hot + * standbys and XLogPageRead compares it aganst RedoRecPtr, too. + * Altough these are not completely accurate, it's good enough for + * our purposes, we're only calculating an estimate anyway. */ - if (!RecoveryInProgress()) + + recptr = + RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr(); + elapsed_xlogs = + (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / + CheckPointSegments; + + if (progress < elapsed_xlogs) { - recptr = GetInsertRecPtr(); - elapsed_xlogs = - (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / - CheckPointSegments; - - if (progress < elapsed_xlogs) - { - ckpt_cached_elapsed = elapsed_xlogs; - return false; - } + ckpt_cached_elapsed = elapsed_xlogs; + return false; } /* diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 5643ec8..a272866 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -56,6 +56,7 @@ #include "pgstat.h" #include "postmaster/bgwriter.h" #include "replication/syncrep.h" +#include "replication/walreceiver.h" #include "storage/bufmgr.h" #include "storage/fd.h" #include "storage/ipc.h" @@ -489,8 +490,8 @@ BackgroundWriterMain(void) * Initialize bgwriter-private variables used during checkpoint. */ ckpt_active = true; - if (!do_restartpoint) -ckpt_start_recptr = GetInsertRecPtr(); + ckpt_start_recptr = do_restartpoint ? +GetXLogReplayRecPtr() : GetInsertRecPtr(); ckpt_start_time = now; ckpt_cached_elapsed = 0; @@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress) return false; /* - * Check progress against WAL segments written and checkpoint_segments. + * Check progress against WAL segments written, or replayed for + * hot standby, and checkpoint_segments. * * We compare the current WAL insert location against the location - * computed before calling CreateCheckPoint. The code in XLogInsert that - * actually triggers a checkpoint when checkpoint_