Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
On 19/08/10 04:46, Robert Haas wrote: At any rate, we should definitely NOT wait another month to start thinking about Sync Rep again. Agreed. EnterpriseDB is interested in having that feature, so I'm on the hook to spend time on it regardless of commitfests. I haven't actually looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed the view that the biggest thing standing in the way of a halfway decent Sync Rep implementation was a number of polling loops that needed to be replaced with something that wouldn't introduce up-to-100ms delays. Well, that's the only uncontroversial thing about it that doesn't require any fighting over the UI or desired behavior. That's why I've focused on that first, and also because it's useful regardless of synchronous replication. But once that's done, we'll have to nail down how synchronous replication is supposed to behave, and how to configure it. And so far we haven't seen a patch for that. Somebody write one. And then let's get it reviewed and committed RSN. Fujii is on vacation, but I've started working on it. The two issues with Fujii's latest patch are that it would not respond promptly on platforms where signals don't interrupt sleep, and it suffers the classic race condition that pselect() was invented for. I'm going to replace pg_usleep() with select(), and use the so called "self-pipe trick" to get over the race condition. I have that written up but I want to do some testing and cleanup before posting the patch. It may seem like we're early in the release cycle yet, but for a feature of this magnitude we are not. We committed way too much big stuff at the very end of the last release cycle; Hot Standby was still being cleaned up in May after commit in November. We'll be lucky to commit sync rep that early. Agreed. We need to decide the scope and minimum set of features real soon to get something concrete finished. BTW, on what platforms signals don't interrupt sleep? Although that issue has been discussed many times before, I couldn't find any reference to a real platform in the archives. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Magnus Hagander wrote: > Is there some way to make cvs2git work this way, and just not bother > even trying to create merge commits, or is that fundamentally > impossible and we need to look at another tool? The good news: (I just reminded myself/realized that) Max Bowsher has already implemented pretty much exactly what you want in the cvs2svn trunk version, including noting in the commit messages any cherry-picks that are not reflected in the repo ancestry. The bad news: It is broken [1]. But I don't think it should be too much work to fix it. Michael [1] http://cvs2svn.tigris.org/ds/viewMessage.do?dsForumId=1670&dsMessageId=2624153 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
On Wed, Aug 18, 2010 at 7:46 PM, Greg Smith wrote: > Kevin Grittner wrote: >> >> I don't think I want to try to handle two in a row, and I think your style >> is better suited >> than mine to the final CF for a release, but I might be able to take on >> the 2010-11 CF if people want that > > Ha, you just put yourself right back on the hook with that comment, and > Robert does seem like the right guy for CF-4 @ 2011-01. Leaving the > question of what's going to happen with CF-2 next month. My reputation precedes me, apparently. Although I appreciate everyone so far being willing to avoid mentioning exactly what that reputation might be. :-) > I think the crucial thing with the 2010-09 CF is that we have to get serious > progress made sorting out all the sync rep ideas before/during that one. > The review Yeb did and subsequent discussion was really helpful, but the > scope on that needs to actually get nailed down to *something* concrete if > it's going to get built early enough in the 9.1 release to be properly > reviewed and tested for more than one round. Parts of the design and scope > still feel like they're expanding to me, and I think having someone heavily > involved in the next CF who is willing to push on nailing down that > particular area is pretty important. Will volunteer myself if I can stay on > schedule to make it past the major time commitment sink I've had so far this > year by then. Sitting on Sync Rep is a job and a half by itself, without adding all the other CF work on top of it. Maybe we should try to find two vi^Holunteers: a CommitFest Manager (CFM) and a Major Feature Babysitter (MBS). At any rate, we should definitely NOT wait another month to start thinking about Sync Rep again. I haven't actually looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed the view that the biggest thing standing in the way of a halfway decent Sync Rep implementation was a number of polling loops that needed to be replaced with something that wouldn't introduce up-to-100ms delays. And so far we haven't seen a patch for that. Somebody write one. And then let's get it reviewed and committed RSN. It may seem like we're early in the release cycle yet, but for a feature of this magnitude we are not. We committed way too much big stuff at the very end of the last release cycle; Hot Standby was still being cleaned up in May after commit in November. We'll be lucky to commit sync rep that early. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Alvaro Herrera wrote: > Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010: > >> 3. Run >> >> git filter-branch >> >> This rewrites the commits using any parentage changes from the grafts >> file. This changes most commits' SHA1 hashes. After this you can >> discard the .git/info/grafts file. You would then want to remove the >> original references, which were moved to "refs/original". > > Hmm. If I need to do two changes in the same branch, do I need to > mention the new SHA1 for the second one (after filter-branch changes its > SHA1), or the original one? If the former, then this is going to be a > very painful process. No, all SHA1s refer to the values for the *old* versions of the commits. Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security label support, part.2
>>> How about an idea to add a new flag in RangeTblEntry which shows where >>> the RangeTblEntry came from, instead of clearing requiredPerms? >>> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks >>> on the child tables. >> >> How about the external module just checks if the current object being >> queried has parents, and if so, goes and checks the >> labels/permissions/etc on those children? That way the query either >> always fails or never fails for a given caller, rather than sometimes >> working and sometimes not depending on the query. >> > Hmm, this idea may be feasible. The RangeTblEntry->inh flag of the parent > will give us a hint whether we also should check labels on its children. > http://code.google.com/p/sepgsql/source/browse/trunk/sepgsql/relation.c#293 At least, it seems to me this logic works as expected. postgres=# CREATE TABLE tbl_p (a int, b text); CREATE TABLE postgres=# CREATE TABLE tbl_1 (check (a < 100)) inherits (tbl_p); CREATE TABLE postgres=# CREATE TABLE tbl_2 (check (a >= 100 and a < 200)) inherits (tbl_p); CREATE TABLE postgres=# CREATE TABLE tbl_3 (check (a >= 300)) inherits (tbl_p); CREATE TABLE postgres=# SECURITY LABEL on TABLE tbl_p IS 'system_u:object_r:sepgsql_table_t:s0'; SECURITY LABEL postgres=# SECURITY LABEL on COLUMN tbl_p.a IS 'system_u:object_r:sepgsql_table_t:s0'; SECURITY LABEL postgres=# SECURITY LABEL on COLUMN tbl_p.b IS 'system_u:object_r:sepgsql_table_t:s0'; SECURITY LABEL postgres=# set sepgsql_debug_audit = on; SET postgres=# SELECT a FROM ONLY tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p STATEMENT: SELECT a FROM ONLY tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a STATEMENT: SELECT a FROM ONLY tbl_p WHERE a = 150; a --- (0 rows) -> ONLY tbl_p was not expanded postgres=# SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_1 STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_1.a STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_2 STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_2.a STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_3 STATEMENT: SELECT a FROM tbl_p WHERE a = 150; LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_3.a STATEMENT: SELECT a FROM tbl_p WHERE a = 150; a --- (0 rows) -> tbl_p was expanded to tbl_1, tbl_2 and tbl_3 postgres=# set sepgsql_debug_audit = off; SET postgres=# EXPLAIN SELECT a FROM tbl_p WHERE a = 150; QUERY PLAN Result (cost=0.00..50.75 rows=12 width=4) -> Append (cost=0.00..50.75 rows=12 width=4) -> Seq Scan on tbl_p (cost=0.00..25.38 rows=6 width=4) Filter: (a = 150) -> Seq Scan on tbl_2 tbl_p (cost=0.00..25.38 rows=6 width=4) Filter: (a = 150) (6 rows) -> Actually, it does not scan tbl_1 and tbl_3 due to the a = 150. -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security label support, part.2
(2010/08/18 21:52), Stephen Frost wrote: > * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: >> If rte->requiredPerms would not be cleared, the user of the hook will >> be able to check access rights on the child tables, as they like. > > This would only be the case for those children which are being touched > in the current query, which would depend on what conditionals are > applied, what the current setting of check_constraints is, and possibly > other factors. I do *not* like this approach. > Indeed, the planner might omit scan on the children which are not obviously referenced, but I'm not certain whether its RangeTblEntry would be also removed from the PlannedStmt->rtable, or not. >> How about an idea to add a new flag in RangeTblEntry which shows where >> the RangeTblEntry came from, instead of clearing requiredPerms? >> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks >> on the child tables. > > How about the external module just checks if the current object being > queried has parents, and if so, goes and checks the > labels/permissions/etc on those children? That way the query either > always fails or never fails for a given caller, rather than sometimes > working and sometimes not depending on the query. > Hmm, this idea may be feasible. The RangeTblEntry->inh flag of the parent will give us a hint whether we also should check labels on its children. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Kevin Grittner wrote: I don't think I want to try to handle two in a row, and I think your style is better suited than mine to the final CF for a release, but I might be able to take on the 2010-11 CF if people want that Ha, you just put yourself right back on the hook with that comment, and Robert does seem like the right guy for CF-4 @ 2011-01. Leaving the question of what's going to happen with CF-2 next month. I think the crucial thing with the 2010-09 CF is that we have to get serious progress made sorting out all the sync rep ideas before/during that one. The review Yeb did and subsequent discussion was really helpful, but the scope on that needs to actually get nailed down to *something* concrete if it's going to get built early enough in the 9.1 release to be properly reviewed and tested for more than one round. Parts of the design and scope still feel like they're expanding to me, and I think having someone heavily involved in the next CF who is willing to push on nailing down that particular area is pretty important. Will volunteer myself if I can stay on schedule to make it past the major time commitment sink I've had so far this year by then. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2010-07 final report
At the close of the 2010-07 CommitFest, the numbers were: 72 patches were submitted 3 patches were withdrawn (deleted) by their authors 14 patches were moved to CommitFest 2010-09 -- 55 patches in CommitFest 2010-07 -- 3 committed to 9.0 -- 52 patches for 9.1 -- 1 rejected 20 returned with feedback 31 committed for 9.1 When we hit the end of the allotted time, I moved the last two patches to the next CF, for want of a better idea for disposition. One is "Ready for Committer" with an author who is a committer. The other is my WiP patch for serializable transactions -- there's a lot to review and the reviewer had unexpected demands on his time during the CF; he said he'll continue work on that outside the CF. -Kevin At the end of week four: > 72 patches were submitted > 3 patches were withdrawn (deleted) by their authors > 12 patches were moved to CommitFest 2010-09 > -- > 57 patches in CommitFest 2010-07 > -- > 3 committed to 9.0 > -- > 54 patches for 9.1 > -- > 1 rejected > 18 returned with feedback > 28 committed for 9.1 > -- > 47 disposed > -- > 7 pending > 2 ready for committer > -- > 5 will still need reviewer attention > 1 waiting on author to respond to review > -- > 4 patches need review now and have a reviewer assigned -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On Aug 18, 2010, at 9:02 AM, Robert Haas wrote: > On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark wrote: >> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page wrote: >>> Which is ideal for monitoring your own connection - having the info in >>> the pg_stat_activity is also valuable for monitoring and system >>> administration. Both would be ideal :-) >> >> Hm, I think I've come around to the idea that having the info in >> pg_stat_activity would be very nice. I can just picture sitting in >> pgadmin while a bunch of reports are running and seeing progress bars >> for all of them... >> >> But progress bars alone aren't really the big prize. I would really >> love to see the explain plans for running queries. This would improve >> the DBAs view of what's going on in the system immensely. Currently >> you have to grab the query and try to set up a similar environment for >> it to run explain on it. If analyze has run since or if the tables >> have grown or shrank or if the query was run with some constants as >> parameters it can be awkward. If some of the tables in the query were >> temporary tables it can be impossible. You can never really be sure >> you're looking at precisely the same plan than the other user's >> session is running. >> >> But stuffing the whole json or xml explain plan into pg_stat_activity >> seems like it doesn't really fit the same model that the existing >> infrastructure is designed around. It could be quite large and if we >> want to support progress feedback it could change quite frequently. >> >> We do stuff the whole query there (up to a limited size) so maybe I'm >> all wet and stuffing the explain plan in there would be fine? > > It seems to me that progress reporting could add quite a bit of > overhead. For example, in the whole-database vacuum case, the most > logical way to report progress would be to compute pages visited > divided by pages to be visited. But the total number of pages to be > visited is something that doesn't need to be computed in advance > unless someone cares about progress. I don't think we want to incur > that overhead in all cases just on the off chance someone might ask. > We need to think about ways to structure this so that it only costs > when someone's using it. I wish that I could get explain analyze output step-by-step while running a long query instead of seeing it jump out at the end of execution. Some queries "never" end and it would be nice to see which step is spinning (explain can be a red herring). To me the "progress bar" is nice, but I don't see how it would be reliable enough to draw any inferences (such as execution time). If I could get the explain analyze results *and* the actual query results, that would be a huge win, too. Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Robert Haas wrote: > I'd just like to take a minute to thank him publicly for his > efforts. We started this CommitFest with something like 60 > patches, which is definitely on the larger side for a CommitFest, > and Kevin did a great job staying on top of what was going on with > all of them and, I felt, really helped keep us on track. At the > same time, I felt he did this with a very light touch that made > the whole thing go very smoothly. So -- thanks, Kevin! You're welcome. It was educational for me. I don't think I want to try to handle two in a row, and I think your style is better suited than mine to the final CF for a release, but I might be able to take on the 2010-11 CF if people want that. My hand was not always so light behind the scenes, though -- I sent or received about 100 off-list emails to try to keep things moving. Hopefully nobody was too offended by my nagging. :-) Oh, and thanks for putting together the CF web application. Without that, I couldn't have done half as well as I did. > I also appreciate the efforts of all those who reviewed. Yes, I'll second that! I've always been impressed with the PostgreSQL community, and managing this CF gave me new insights and appreciation for the intelligence, professionalism, and community spirit of its members -- authors, reviewers, and committers. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
Josh Berkus writes: >> Most likely that's the libc implementation of the select()-based sleeps >> for vacuum_cost_delay. I'm still suspicious that the writes are eating >> more cost_delay points than you think. > Tested that. It does look like if I increase vacuum_cost_limit to 1 > and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes > 2-3 before each pollsys. The math seems completely wrong on that, > though -- it should be 50 and 30 pages, or similar. I think there could be a lot of cost_delay points getting expended without any effects visible at the level of strace. Maybe try fooling with vacuum_cost_page_hit and vacuum_cost_page_miss, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On ons, 2010-08-18 at 13:45 +0100, Greg Stark wrote: > But progress bars alone aren't really the big prize. I would really > love to see the explain plans for running queries. The auto_explain module does that already. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On tis, 2010-08-17 at 13:52 -0400, Stephen Frost wrote: > I don't like how the backend would have to send something NOTICE-like, > I had originally been thinking "gee, it'd be nice if psql could query > pg_stat while doing something else", but that's not really possible... > So, I guess NOTICE-like messages would work, if the backend could be > taught to do it. That should be doable; you'd just have to do some ereport(NOTICE) variant inside pgstat_report_progress and have a switch to turn it on and off, and have psql do something with it. The latter is really the interesting part; the former is relatively easy once the general framework is in place. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
> Tested that. It does look like if I increase vacuum_cost_limit to 1 > and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes > 2-3 before each pollsys. The math seems completely wrong on that, > though -- it should be 50 and 30 pages, or similar. If I can, I'll test > a vacuum without cost_delay and make sure the pollsys() are connected to > the cost delay and not something else. Hmmm. Looks like, at least in 8.3, running a manual vacuum on a table doesn't prevent anti-wraparound vacuum from restarting. So I can't do any further testing until we can restart the server. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-tuple memory leak in 9.0
Dean Rasheed writes: > The problem is that the trigger code assumes that anything it > allocates in the per-tuple memory context will be freed per-tuple > processed, which used to be the case because the loop in ExecutePlan() > calls ResetPerTupleExprContext() once each time round the loop, and > that used to correspond to once per tuple. > However, with the refactoring of that code out to nodeModifyTable.c, > this is no longer the case because the ModifyTable node processes all > the tuples from the subquery before returning, so I guess that the > loop in ExecModifyTable() needs to call ResetPerTupleExprContext() > each time round. Hmmm ... it seems a bit unclean to be resetting the output-tuple exprcontext at a level below the top of the plan. I agree that that's probably the sanest fix at the moment, but I fear we may need to revisit this in connection with writable CTEs. We might need a separate output tuple context for each ModifyTable node, or something like that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Kevin didn't send out an official gavel-banging announcement of the end of CommitFest 2009-07 (possibly because I neglected until today to give him privileges to actually change it in the web application), but I'd just like to take a minute to thank him publicly for his efforts. We started this CommitFest with something like 60 patches, which is definitely on the larger side for a CommitFest, and Kevin did a great job staying on top of what was going on with all of them and, I felt, really helped keep us on track. At the same time, I felt he did this with a very light touch that made the whole thing go very smoothly. So -- thanks, Kevin! I also appreciate the efforts of all those who reviewed. Good reviews are really critical to keep the burden from building up on committers, and I appreciate the efforts of everyone who contributed, in many cases probably on their own time. I'm particularly grateful to the people who were vigilant about spelling, grammar, coding style, whitespace, and other nitpicky little issues that are not much fun, but which at least for me are a major time sink if they're still lingering when it comes time to do the actual commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
> That would explain all the writes, but it doesn't seem to explain why > your two servers aren't behaving similarly. Well, that's why I said "ostensibly identical". There may in fact be differences, not just in the databases but in some OS libs as well. These servers have been in production for quite a while, and the owner has a messy deployment process. > Most likely that's the libc implementation of the select()-based sleeps > for vacuum_cost_delay. I'm still suspicious that the writes are eating > more cost_delay points than you think. Tested that. It does look like if I increase vacuum_cost_limit to 1 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes 2-3 before each pollsys. The math seems completely wrong on that, though -- it should be 50 and 30 pages, or similar. If I can, I'll test a vacuum without cost_delay and make sure the pollsys() are connected to the cost delay and not something else. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: utf8_to_unicode (trivial)
Robert Haas writes: > Anyway, it's not really important enough to me to have a protracted > argument about it. Let's wait and see if anyone else has an opinion, > and perhaps a consensus will emerge. Well, nobody else seems to care, so I went ahead and committed the shorter form of the patch, ie just rename & export the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
Tom Lane wrote: > Josh Berkus writes: >> This is an anti-wraparound vacuum, so it could have something to >> do with the hint bits. Maybe it's setting the freeze bit on >> every page, and writing them one page at a time? > > That would explain all the writes, but it doesn't seem to explain > why your two servers aren't behaving similarly. One was bulk-loaded from the other, or they were bulk-loaded at different times? Or one had some other activity that boosted the xid count, possibly in another database? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] trace_recovery_messages
Fujii Masao writes: > The explanation of trace_recovery_messages in the document > is inconsistent with the definition of it in guc.c. Setting the default to WARNING is confusing and useless, because there are no trace_recovery calls with that debug level. IMO the default setting should be LOG, which makes trace_recovery() a clear no-op (rather than not clearly a no-op). There is circumstantial evidence in the code that this was the original intention: int trace_recovery_messages = LOG; The documentation of the parameter is about as clear as mud, too. We need to explain what it does rather than just copy-and-paste a lot of text from log_min_messages. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
Josh Berkus writes: >> Rather, what you need to be thinking about is how >> come vacuum seems to be making lots of pages dirty on only one of these >> machines. > This is an anti-wraparound vacuum, so it could have something to do with > the hint bits. Maybe it's setting the freeze bit on every page, and > writing them one page at a time? That would explain all the writes, but it doesn't seem to explain why your two servers aren't behaving similarly. > Still don't understand the call to pollsys, even so, though. Most likely that's the libc implementation of the select()-based sleeps for vacuum_cost_delay. I'm still suspicious that the writes are eating more cost_delay points than you think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
> On further reflection, though: since we put in the BufferAccessStrategy > code, which was in 8.3, the background writer isn't *supposed* to be > very much involved in writing pages that are dirtied by VACUUM. VACUUM > runs in a small ring of buffers and is supposed to have to clean its own > dirt most of the time. So it's wrong to blame this on the bgwriter not > holding up its end. Rather, what you need to be thinking about is how > come vacuum seems to be making lots of pages dirty on only one of these > machines. This is an anti-wraparound vacuum, so it could have something to do with the hint bits. Maybe it's setting the freeze bit on every page, and writing them one page at a time? Still don't understand the call to pollsys, even so, though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
Josh Berkus writes: >> What I find interesting about that trace is the large proportion of >> writes. That appears to me to indicate that it's *not* a matter of >> vacuum delays, or at least not just a matter of that. The process seems >> to be getting involved in having to dump dirty buffers to disk. Perhaps >> the background writer is malfunctioning? > You appear to be correct in that it's write-related. Will be testing on > what specificially is producing it. > Note that this is one of two ostensibly duplicate servers, and the issue > has never appeared on the other server. On further reflection, though: since we put in the BufferAccessStrategy code, which was in 8.3, the background writer isn't *supposed* to be very much involved in writing pages that are dirtied by VACUUM. VACUUM runs in a small ring of buffers and is supposed to have to clean its own dirt most of the time. So it's wrong to blame this on the bgwriter not holding up its end. Rather, what you need to be thinking about is how come vacuum seems to be making lots of pages dirty on only one of these machines. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?
> What I find interesting about that trace is the large proportion of > writes. That appears to me to indicate that it's *not* a matter of > vacuum delays, or at least not just a matter of that. The process seems > to be getting involved in having to dump dirty buffers to disk. Perhaps > the background writer is malfunctioning? You appear to be correct in that it's write-related. Will be testing on what specificially is producing it. Note that this is one of two ostensibly duplicate servers, and the issue has never appeared on the other server. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, proof of concept
On Wed, Aug 18, 2010 at 11:29 AM, Peter Eisentraut wrote: > On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote: >> >> creating collations ...FATAL: invalid byte sequence for encoding >> >> "UTF8": 0xe56c09 >> >> CONTEXT: COPY tmp_pg_collation, line 86 >> >> STATEMENT: COPY tmp_pg_collation FROM >> >> E'/usr/local/pgsql/9.1/share/locales.txt'; >> >> """ >> > >> > Hmm, what is in that file on that line? >> > >> > >> >> bokmål ISO-8859-1 > > Hey, that borders on genius: Use a non-ASCII letter in the name of a > locale whose purpose it is to configure how non-ASCII letters are > interpreted. :-/ > > Interestingly, I don't see this on a Debian system. Good thing to know > that this needs separate testing on different Linux variants. > > Yeah! and when installing centos 5 i don't have a chance to choose what locales i want, it just installs all of them -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Excerpts from Robert Haas's message of mié ago 18 13:10:19 -0400 2010: > I think what is frustrating is that we have a mental image of what the > history looks like in CVS based on what we actually do, and it doesn't > look anything like the history that cvs2git created. You can to all > kinds of crazy things in CVS, like tag the whole tree and then move > the tags on half a dozen individual files forward or backward in time, > or delete the tags off them altogether. But we believe (perhaps > naively) that we haven't done those things, so we're expecting to get > a simple linear history without merges, and definitely without commits > from one branch jumping into the midst of other branches. In fact, we went some lengths to remove some of the more problematic artifacts in our original CVS repository, so that a Git conversion wouldn't have a problem with them. It's disappointing that it ends up punting in this manner. I do welcome the offer of Michael's development time to solve our problems. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Per-tuple memory leak in 9.0
While testing triggers, I came across the following memory leak. Here's a simple test case: CREATE TABLE foo(a int); CREATE OR REPLACE FUNCTION trig_fn() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER ins_trig BEFORE INSERT ON foo FOR EACH ROW EXECUTE PROCEDURE trig_fn(); INSERT INTO foo SELECT g FROM generate_series(1, 500) AS g; Memory usage goes up by around 100 bytes per row for the duration of the query. The problem is that the trigger code assumes that anything it allocates in the per-tuple memory context will be freed per-tuple processed, which used to be the case because the loop in ExecutePlan() calls ResetPerTupleExprContext() once each time round the loop, and that used to correspond to once per tuple. However, with the refactoring of that code out to nodeModifyTable.c, this is no longer the case because the ModifyTable node processes all the tuples from the subquery before returning, so I guess that the loop in ExecModifyTable() needs to call ResetPerTupleExprContext() each time round. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, Aug 18, 2010 at 12:18 PM, Michael Haggerty wrote: > Tom Lane wrote: >> Michael Haggerty writes: >>> The "exclusive" possibility is to ignore the fact that some of the >>> content of B4 came from trunk and to pretend that FILE1 just appeared >>> out of nowhere in commit B4 independent of the FILE1 in TRUNK: >> >>> T0 -- T1 -- T2 T3 -- T4 TRUNK >>> \ >>> B1 -- B2 -- B3 -- B4 BRANCH1 >> >>> This is also wrong, because it doesn't reflect the true lineage of FILE1. >> >> Maybe not, but that *is* how things appeared in the CVS history, [...] > > I forgot to point out that "the CVS history" looks nothing like this, > because the CVS history is only defined file by file. So the CVS > history of FILE0 might look like this: > > 1.0 - 1.1 -- 1.2 - 1.3 - 1.4 TRUNK > \ > 1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4 BRANCH1 > > whereas the history of FILE1 probably looks more like this: > > 1.1 - 1.2 - 1.3 TRUNK > \ > 1.2.2.1 -- 1.2.2.2 BRANCH1 > > (here I've tried to put corresponding commits in the same relative > location) and there might be a FILE2 that looks like this: > > 1.0 1.1 --- 1.2 TRUNK > \ > *no commit here* BRANCH1 > > Perhaps this makes it clearer why creating a single git history requires > some compromises. I think we all understand that the conversion process may create some artifacts. Also, since I think this has not yet been mentioned, I really appreciate you being willing to jump into this discussion and possibly try to write some code to help us get what we want. I think what is frustrating is that we have a mental image of what the history looks like in CVS based on what we actually do, and it doesn't look anything like the history that cvs2git created. You can to all kinds of crazy things in CVS, like tag the whole tree and then move the tags on half a dozen individual files forward or backward in time, or delete the tags off them altogether. But we believe (perhaps naively) that we haven't done those things, so we're expecting to get a simple linear history without merges, and definitely without commits from one branch jumping into the midst of other branches. What was really alarming to me about what I found yesterday is that - even after reading your explanation - I can't understand why it did that. I think it's human nature to like it when good things happen to us and to dislike it when bad things happen to us, but we tend to hate the bad things a lot more when we feel like we didn't deserve it. If you're going 90 MPH and get a speeding ticket, you may be steamed, but at some level you know you deserved it. If you were going 50 MPH on a road where the speed limit is 55 MPH and the cop tickets you for 60 MPH, even the most mild-mannered driver may feel an urge to say something less polite than "thank you, officer". Hence our consternation. Perhaps there is some way to tilt your head so that these merge commits are the Right Thing To Do, but to me at least it feels extremely weird and inexplicable. If at some point, we had taken the majority of the deltas between 9.0 and 8.3 and put them into 8.3 and the converter said "oh, that's a merge", well, we might want an option to turn that behavior off, but at least it would be clear why it happened. But the merge commit that got fabricated here almost by definition has to be ignoring the vast bulk of the activity on one side, which just doesn't feel right. To what degree does your proposed solution (an "exclusive" option) resemble "don't ever create merge commits"? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, 2010-08-18 at 12:26 -0400, Alvaro Herrera wrote: > Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010: > > On Wed, Aug 18, 2010 at 17:33, Khee Chin wrote: > > > I previously proposed off-list an alternate solution to generate the git > > > repository which was turned down due to it not being able to handle > > > incremental updates. However, since we are now looking at a one-time > > > conversion, this method might come in handy. > > > > cvs2git *is* the tool we've been using now that it's a one-off > > conversion. It's the one that's causing the current problems. We had a lot of luck with cvs to svn conversion in the past. And supposedly the git-svn stuff is top notch. It may be worth a shot. JD > -- > Álvaro Herrera > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010: > 3. Run > > git filter-branch > > This rewrites the commits using any parentage changes from the grafts > file. This changes most commits' SHA1 hashes. After this you can > discard the .git/info/grafts file. You would then want to remove the > original references, which were moved to "refs/original". Hmm. If I need to do two changes in the same branch, do I need to mention the new SHA1 for the second one (after filter-branch changes its SHA1), or the original one? If the former, then this is going to be a very painful process. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, proof of concept
On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote: > >> creating collations ...FATAL: invalid byte sequence for encoding > >> "UTF8": 0xe56c09 > >> CONTEXT: COPY tmp_pg_collation, line 86 > >> STATEMENT: COPY tmp_pg_collation FROM > >> E'/usr/local/pgsql/9.1/share/locales.txt'; > >> """ > > > > Hmm, what is in that file on that line? > > > > > > bokmål ISO-8859-1 Hey, that borders on genius: Use a non-ASCII letter in the name of a locale whose purpose it is to configure how non-ASCII letters are interpreted. :-/ Interestingly, I don't see this on a Debian system. Good thing to know that this needs separate testing on different Linux variants. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010: > On Wed, Aug 18, 2010 at 17:33, Khee Chin wrote: > > I previously proposed off-list an alternate solution to generate the git > > repository which was turned down due to it not being able to handle > > incremental updates. However, since we are now looking at a one-time > > conversion, this method might come in handy. > > cvs2git *is* the tool we've been using now that it's a one-off > conversion. It's the one that's causing the current problems. I think the point is to run the repo through cvsclone, which apparently changes the repo in some (not documented) ways, removing "corruption". Not sure how this is an essential part of Khee Chin's proposal. The cited URL is no longer valid however. The code can be found here http://samba.org/ftp/tridge/rtc/cvsclone.l -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Tom Lane wrote: > Michael Haggerty writes: >> The "exclusive" possibility is to ignore the fact that some of the >> content of B4 came from trunk and to pretend that FILE1 just appeared >> out of nowhere in commit B4 independent of the FILE1 in TRUNK: > >> T0 -- T1 -- T2 T3 -- T4TRUNK >>\ >> B1 -- B2 -- B3 -- B4BRANCH1 > >> This is also wrong, because it doesn't reflect the true lineage of FILE1. > > Maybe not, but that *is* how things appeared in the CVS history, [...] I forgot to point out that "the CVS history" looks nothing like this, because the CVS history is only defined file by file. So the CVS history of FILE0 might look like this: 1.0 - 1.1 -- 1.2 - 1.3 - 1.4TRUNK \ 1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4BRANCH1 whereas the history of FILE1 probably looks more like this: 1.1 - 1.2 - 1.3TRUNK \ 1.2.2.1 -- 1.2.2.2 BRANCH1 (here I've tried to put corresponding commits in the same relative location) and there might be a FILE2 that looks like this: 1.0 1.1 --- 1.2TRUNK \ *no commit here* BRANCH1 Perhaps this makes it clearer why creating a single git history requires some compromises. Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Robert Haas wrote: > Exactly. IMHO, the way this should work is by starting at the > beginning of time and working forward. [...] What you are describing is more or less the algorithm that was used by cvs2svn version 1.x. It mostly works, but has nasty edge cases that are impossible to fix. cvs2svn version 2.x uses a better algorithm [1]. It can be changed to add an "exclusive" mode, it's a simple matter of programming. I will try to find some time to work on it. Michael [1] http://cvs2svn.tigris.org/source/browse/cvs2svn/trunk/doc/design-notes.txt?view=markup -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Alvaro Herrera wrote: > Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010: > >> [...] Alternatively, you >> could write a tool that would rewrite the ancestry information in the >> repository *after* the cvs2git conversion using .git/info/grafts (see >> git-filter-branch(1)). Such rewriting would have to occur before the >> repository is published, because the rewriting will change the hashes of >> most commits. > > AFAICT, graft points are not checked in[1], thus they don't propagate; are > you saying that we should run the migration, then manually inject the > graft points, then run some conversion tool that writes a different > repository with those graft points welded into the history? This sounds > like it needs some manual work (namely find out the appropriate graft > points for each branch), that can be prepared beforehand. Otherwise it > seems easier than reworking the cvs2git code for the "mostly-exclusive" > option. It is true that grafts are not propagated, but they can be baked into a repository (at the cost of rewriting the SHA1 hashes) using "git filter-branch". The procedure would be as follows: 1. Convert using cvs2git 2. Create a file .git/info/grafts containing the changes that you want to make to the project's ancestry. The file has the format commit parent0 parent1 ... where each of the entries is a SHA1 hash from the existing repository. Only commits whose parentage should be changed need to be mentioned. This is the tricky step because it requires some logic to decide what needs changing. And it can only be done after the cvs2git conversion, because it requires the SHA1s resulting from the conversion. 3. Run git filter-branch This rewrites the commits using any parentage changes from the grafts file. This changes most commits' SHA1 hashes. After this you can discard the .git/info/grafts file. You would then want to remove the original references, which were moved to "refs/original". 4. Publish the repository. As long as the repository is only published after the grafts have been baked in, there is no reason that anybody else would need the grafts file. Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, Aug 18, 2010 at 11:03 AM, Tom Lane wrote: > Michael Haggerty writes: >> So let's take the simplest example: a branch BRANCH1 is created from >> trunk commit T1, then some time later another FILE1 from trunk commit T3 >> is added to BRANCH1 in commit B4. How should this series of events be >> represented in a git repository? >> ... >> The "exclusive" possibility is to ignore the fact that some of the >> content of B4 came from trunk and to pretend that FILE1 just appeared >> out of nowhere in commit B4 independent of the FILE1 in TRUNK: > >> T0 -- T1 -- T2 T3 -- T4 TRUNK >> \ >> B1 -- B2 -- B3 -- B4 BRANCH1 > >> This is also wrong, because it doesn't reflect the true lineage of FILE1. > > Maybe not, but that *is* how things appeared in the CVS history, and > we'd rather have a git history that looks like the CVS history than > one that claims that boatloads of utterly unrelated commits are part > of a branch's history. Exactly. IMHO, the way this should work is by starting at the beginning of time and working forward. At each step, we examine the earliest revision of each file for which no git commit has yet been written. From among those, we select the one with the earliest timestamp. We then also select all other files whose most recent unprocessed revision is nearly contemporaneous and shares the same author and log message. From the results, we generate a commit. Then we repeat. When we arrive at a branch point, the branch gets processed separately from the trunk. If there is no trunk rev which has every file at the rev where it starts on the branch, then we use some sane algorithm to pick the best one (perhaps, the one that has the right revs of the most files) and then insert a fixup commit on the branch to remove the deltas and carry on as before. > The "inclusive" possibility might be tolerable if it restricted itself > to mentioning commits that actually touched FILE1 in between its > addition to TRUNK and its addition to BRANCH1. So far as I can see, > though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4 > ... not even between T3 and B4, but back to the branch point. How can > you possibly justify that as either sane or useful? git can't do that. It's finding those commits by following parent pointers from the merge commits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, Aug 18, 2010 at 17:33, Khee Chin wrote: > I previously proposed off-list an alternate solution to generate the git > repository which was turned down due to it not being able to handle > incremental updates. However, since we are now looking at a one-time > conversion, this method might come in handy. cvs2git *is* the tool we've been using now that it's a one-off conversion. It's the one that's causing the current problems. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Tom Lane wrote: > Michael Haggerty writes: >> So let's take the simplest example: a branch BRANCH1 is created from >> trunk commit T1, then some time later another FILE1 from trunk commit T3 >> is added to BRANCH1 in commit B4. How should this series of events be >> represented in a git repository? >> ... >> The "exclusive" possibility is to ignore the fact that some of the >> content of B4 came from trunk and to pretend that FILE1 just appeared >> out of nowhere in commit B4 independent of the FILE1 in TRUNK: > >> T0 -- T1 -- T2 T3 -- T4TRUNK >>\ >> B1 -- B2 -- B3 -- B4BRANCH1 > >> This is also wrong, because it doesn't reflect the true lineage of FILE1. > > Maybe not, but that *is* how things appeared in the CVS history, and > we'd rather have a git history that looks like the CVS history than > one that claims that boatloads of utterly unrelated commits are part > of a branch's history. > > The "inclusive" possibility might be tolerable if it restricted itself > to mentioning commits that actually touched FILE1 in between its > addition to TRUNK and its addition to BRANCH1. So far as I can see, > though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4 > ... not even between T3 and B4, but back to the branch point. How can > you possibly justify that as either sane or useful? There is no way, in git, to claim that (say) T3 was incorporated into B4 but that T2 was not. If T3 is listed as a parent of B4, then it is implied that all ancestors of T3 are also incorporated into B4. This is a crucial simplification that helps DVCSs merge reliably. So an "exclusive" option is definitely the way to go for the postgresql project. [By the way, it *is* possible to list the commits that touched FILE1: git log BRANCH1 -- FILE1 The user would first have to find out that FILE1 is the file that is the subject of merge B4, which could be done using "git diff B3..B4". But I am not arguing that this is the preferred solution, given your project's practice to do cherry-picks and never full merges.] Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
2010/8/18 Tom Lane : > Pavel Stehule writes: >> 2010/8/18 Tom Lane : >>> There would be plenty of scope to re-use the machinery without any >>> SQL-level extensions. All you need is a polymorphic aggregate >>> transition function that maintains a tuplestore or whatever. > >> Have we to use a transisdent function? If we implement median as >> special variant of aggregate - because we need to push an sort, then >> we can skip a transident function function - and call directly final >> function. > > Well, that would require a whole bunch of *other* mechanisms, which you > weren't saying anything about in your original proposal. But driving > it off the transtype declaration would be quite inappropriate anyway IMO. > I'll test both variant first. Maybe there are not any significant difference between them. Now nodeAgg can build, fill a tuplesort. So I think is natural use it. It needs only one - skip a calling a transident function and directly call final function with external tuplesort. Minimally you don't need 2x same code. Regards Pavel Stehule > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
Pavel Stehule writes: > 2010/8/18 Tom Lane : >> There would be plenty of scope to re-use the machinery without any >> SQL-level extensions. Â All you need is a polymorphic aggregate >> transition function that maintains a tuplestore or whatever. > Have we to use a transisdent function? If we implement median as > special variant of aggregate - because we need to push an sort, then > we can skip a transident function function - and call directly final > function. Well, that would require a whole bunch of *other* mechanisms, which you weren't saying anything about in your original proposal. But driving it off the transtype declaration would be quite inappropriate anyway IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
I previously proposed off-list an alternate solution to generate the git repository which was turned down due to it not being able to handle incremental updates. However, since we are now looking at a one-time conversion, this method might come in handy. --- Caveat: cvs2git apparently requires CVSROOT somewhere in the path for it to work. I did a symbolic link of the current directory $PWD with CVSROOT to bypass the quirk cvs2git requires. mkdir work cd work wget http://ftp.netbsd.se/pkgsrc/distfiles/cvsclone-0.00/cvsclone.l flex cvsclone.l && gcc -Wall -O2 lex.yy.c -o cvsclone cvsclone -d :pserver:anon...@anoncvs.postgresql.org:/projects/cvsroot pgsql ln -s $PWD CVSROOT cvs2git --blobfile=blobfile --dumpfile=dumpfile --username pgdude --encoding=UTF8 --fallback-encoding=UTF8 CVSROOT/pgsql > cvs2git.log mkdir git && cd git && git init . cat ../blobfile ../dumpfile | git fast-import git reset --hard cd .. --- Regards, Khee Chin. On Wed, Aug 18, 2010 at 11:14 PM, Alvaro Herrera wrote: > Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010: > > > cvs2git doesn't currently have this option. I'm not sure how much work > > it would be to implement; probably a few days'. Alternatively, you > > could write a tool that would rewrite the ancestry information in the > > repository *after* the cvs2git conversion using .git/info/grafts (see > > git-filter-branch(1)). Such rewriting would have to occur before the > > repository is published, because the rewriting will change the hashes of > > most commits. > > AFAICT, graft points are not checked in[1], thus they don't propagate; are > you saying that we should run the migration, then manually inject the > graft points, then run some conversion tool that writes a different > repository with those graft points welded into the history? This sounds > like it needs some manual work (namely find out the appropriate graft > points for each branch), that can be prepared beforehand. Otherwise it > seems easier than reworking the cvs2git code for the "mostly-exclusive" > option. > > I am sort of assuming that this "conversion tool" already exists, but > maybe this is not the case? > > [1] > http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor > > -- > Álvaro Herrera > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] git: uh-oh
Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010: > cvs2git doesn't currently have this option. I'm not sure how much work > it would be to implement; probably a few days'. Alternatively, you > could write a tool that would rewrite the ancestry information in the > repository *after* the cvs2git conversion using .git/info/grafts (see > git-filter-branch(1)). Such rewriting would have to occur before the > repository is published, because the rewriting will change the hashes of > most commits. AFAICT, graft points are not checked in[1], thus they don't propagate; are you saying that we should run the migration, then manually inject the graft points, then run some conversion tool that writes a different repository with those graft points welded into the history? This sounds like it needs some manual work (namely find out the appropriate graft points for each branch), that can be prepared beforehand. Otherwise it seems easier than reworking the cvs2git code for the "mostly-exclusive" option. I am sort of assuming that this "conversion tool" already exists, but maybe this is not the case? [1] http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote: > 2010/8/18 Tom Lane : > > David Fetter writes: > >> Apart from the medians, which "median-like" aggregates do you > >> have in mind to start with? If you can provide examples of > >> "median-like" aggregates that people might need to implement as > >> user-defined aggregates, or other places where people would use > >> this machinery, it will make your case stronger for this > >> refactoring. > > > > There would be plenty of scope to re-use the machinery without any > > SQL-level extensions. All you need is a polymorphic aggregate > > transition function that maintains a tuplestore or whatever. I > > don't see that extra syntax in CREATE AGGREGATE is really buying > > much of anything. > > > > Have we to use a transisdent function? If we implement median as > special variant of aggregate - because we need to push an sort, then > we can skip a transident function function - and call directly final > function. This mechanism is used for aggregates with ORDER BY now. > So there can be a special path for direct call of final func. There > is useles to call transident function. Just a wacky idea here. Could we make a special state transition function called IDENTITY or some such that would turn into a noop? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
Michael Haggerty writes: > So let's take the simplest example: a branch BRANCH1 is created from > trunk commit T1, then some time later another FILE1 from trunk commit T3 > is added to BRANCH1 in commit B4. How should this series of events be > represented in a git repository? > ... > The "exclusive" possibility is to ignore the fact that some of the > content of B4 came from trunk and to pretend that FILE1 just appeared > out of nowhere in commit B4 independent of the FILE1 in TRUNK: > T0 -- T1 -- T2 T3 -- T4TRUNK >\ > B1 -- B2 -- B3 -- B4BRANCH1 > This is also wrong, because it doesn't reflect the true lineage of FILE1. Maybe not, but that *is* how things appeared in the CVS history, and we'd rather have a git history that looks like the CVS history than one that claims that boatloads of utterly unrelated commits are part of a branch's history. The "inclusive" possibility might be tolerable if it restricted itself to mentioning commits that actually touched FILE1 in between its addition to TRUNK and its addition to BRANCH1. So far as I can see, though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4 ... not even between T3 and B4, but back to the branch point. How can you possibly justify that as either sane or useful? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
David Fetter writes: > On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote: >> There would be plenty of scope to re-use the machinery without any >> SQL-level extensions. All you need is a polymorphic aggregate >> transition function that maintains a tuplestore or whatever. >> I don't see that extra syntax in CREATE AGGREGATE is really buying >> much of anything. > Thanks for clarifying. Might this help out with things like GROUPING > SETS or wCTEs? Don't see how --- this is just about what you can do within an aggregate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
2010/8/18 Tom Lane : > David Fetter writes: >> Apart from the medians, which "median-like" aggregates do you have in >> mind to start with? If you can provide examples of "median-like" >> aggregates that people might need to implement as user-defined >> aggregates, or other places where people would use this machinery, it >> will make your case stronger for this refactoring. > > There would be plenty of scope to re-use the machinery without any > SQL-level extensions. All you need is a polymorphic aggregate > transition function that maintains a tuplestore or whatever. > I don't see that extra syntax in CREATE AGGREGATE is really buying > much of anything. > Have we to use a transisdent function? If we implement median as special variant of aggregate - because we need to push an sort, then we can skip a transident function function - and call directly final function. This mechanism is used for aggregates with ORDER BY now. So there can be a special path for direct call of final func. There is useles to call transident function. Regards Pavel > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote: > David Fetter writes: > > Apart from the medians, which "median-like" aggregates do you have in > > mind to start with? If you can provide examples of "median-like" > > aggregates that people might need to implement as user-defined > > aggregates, or other places where people would use this machinery, it > > will make your case stronger for this refactoring. > > There would be plenty of scope to re-use the machinery without any > SQL-level extensions. All you need is a polymorphic aggregate > transition function that maintains a tuplestore or whatever. > I don't see that extra syntax in CREATE AGGREGATE is really buying > much of anything. Thanks for clarifying. Might this help out with things like GROUPING SETS or wCTEs? Cheers, David (a little slow today). -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
2010/8/18 David Fetter : > On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote: >> 2010/8/18 David Fetter : >> > Which median do you plan to implement? Or do you plan to implement >> > several different medians, each with distinguishing names? >> >> my proposal enabled implementation of any "median like" function. But >> if we implement median as special case of aggregate, then some basic >> "median" will be implemented. > > Apart from the medians, which "median-like" aggregates do you have in > mind to start with? If you can provide examples of "median-like" > aggregates that people might need to implement as user-defined > aggregates, or other places where people would use this machinery, it > will make your case stronger for this refactoring. > I didn't think about some special median - this proposal is just about aggregates with large a transistent data. Then the access to tuplestore can be very usefull. > Otherwise, it seems like a more reasonable thing to make the medians > special case code. yes, minimally for this moment. Regards Pavel > > Cheers, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fet...@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
David Fetter writes: > Apart from the medians, which "median-like" aggregates do you have in > mind to start with? If you can provide examples of "median-like" > aggregates that people might need to implement as user-defined > aggregates, or other places where people would use this machinery, it > will make your case stronger for this refactoring. There would be plenty of scope to re-use the machinery without any SQL-level extensions. All you need is a polymorphic aggregate transition function that maintains a tuplestore or whatever. I don't see that extra syntax in CREATE AGGREGATE is really buying much of anything. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote: > 2010/8/18 David Fetter : > > Which median do you plan to implement? Or do you plan to implement > > several different medians, each with distinguishing names? > > my proposal enabled implementation of any "median like" function. But > if we implement median as special case of aggregate, then some basic > "median" will be implemented. Apart from the medians, which "median-like" aggregates do you have in mind to start with? If you can provide examples of "median-like" aggregates that people might need to implement as user-defined aggregates, or other places where people would use this machinery, it will make your case stronger for this refactoring. Otherwise, it seems like a more reasonable thing to make the medians special case code. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
2010/8/18 David Fetter : > On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote: >> 2010/8/18 Tom Lane : >> > Pavel Stehule writes: >> >> I still thinking about a "median" type functions. My idea is to >> >> introduce a new syntax for stype definition - like >> > >> >> stype = type, or >> >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or >> >> stype = TUPLESTORE OF type, or >> >> stype = TUPLESORT OF type [ DESC | ASC ] >> > >> > This seems like a fairly enormous amount of conceptual (and code) >> > infrastructure just to make it possible to build median() out of >> > spare parts. It's also exposing some implementation details that >> > I'd just as soon not expose in SQL. I'd rather just implement >> > median as a special-purpose aggregate. >> >> yes, it is little bit strange - but when we talked last time about >> this topic, I understand, so you dislike any special solution for >> this functionality. So I searched different more general way. On the >> other hand, I agree so special purpose aggregate (with a few changes >> in nodeAgg) can be enough. The median (and additional forms) is >> really special and there are not wide used use case. > > Which median do you plan to implement? Or do you plan to implement > several different medians, each with distinguishing names? my proposal enabled implementation of any "median like" function. But if we implement median as special case of aggregate, then some basic "median" will be implemented. Regards Pavel > > Cheers, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fet...@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote: > 2010/8/18 Tom Lane : > > Pavel Stehule writes: > >> I still thinking about a "median" type functions. My idea is to > >> introduce a new syntax for stype definition - like > > > >> stype = type, or > >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or > >> stype = TUPLESTORE OF type, or > >> stype = TUPLESORT OF type [ DESC | ASC ] > > > > This seems like a fairly enormous amount of conceptual (and code) > > infrastructure just to make it possible to build median() out of > > spare parts. It's also exposing some implementation details that > > I'd just as soon not expose in SQL. I'd rather just implement > > median as a special-purpose aggregate. > > yes, it is little bit strange - but when we talked last time about > this topic, I understand, so you dislike any special solution for > this functionality. So I searched different more general way. On the > other hand, I agree so special purpose aggregate (with a few changes > in nodeAgg) can be enough. The median (and additional forms) is > really special and there are not wide used use case. Which median do you plan to implement? Or do you plan to implement several different medians, each with distinguishing names? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
2010/8/18 Tom Lane : > Pavel Stehule writes: >> I still thinking about a "median" type functions. My idea is to >> introduce a new syntax for stype definition - like > >> stype = type, or >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or >> stype = TUPLESTORE OF type, or >> stype = TUPLESORT OF type [ DESC | ASC ] > > This seems like a fairly enormous amount of conceptual (and code) > infrastructure just to make it possible to build median() out of spare > parts. It's also exposing some implementation details that I'd just as > soon not expose in SQL. I'd rather just implement median as a > special-purpose aggregate. yes, it is little bit strange - but when we talked last time about this topic, I understand, so you dislike any special solution for this functionality. So I searched different more general way. On the other hand, I agree so special purpose aggregate (with a few changes in nodeAgg) can be enough. The median (and additional forms) is really special and there are not wide used use case. Regards Pavel > > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions
Pavel Stehule writes: > I still thinking about a "median" type functions. My idea is to > introduce a new syntax for stype definition - like > stype = type, or > stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or > stype = TUPLESTORE OF type, or > stype = TUPLESORT OF type [ DESC | ASC ] This seems like a fairly enormous amount of conceptual (and code) infrastructure just to make it possible to build median() out of spare parts. It's also exposing some implementation details that I'd just as soon not expose in SQL. I'd rather just implement median as a special-purpose aggregate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark wrote: > On Tue, Aug 17, 2010 at 11:29 PM, Dave Page wrote: >> Which is ideal for monitoring your own connection - having the info in >> the pg_stat_activity is also valuable for monitoring and system >> administration. Both would be ideal :-) > > Hm, I think I've come around to the idea that having the info in > pg_stat_activity would be very nice. I can just picture sitting in > pgadmin while a bunch of reports are running and seeing progress bars > for all of them... > > But progress bars alone aren't really the big prize. I would really > love to see the explain plans for running queries. This would improve > the DBAs view of what's going on in the system immensely. Currently > you have to grab the query and try to set up a similar environment for > it to run explain on it. If analyze has run since or if the tables > have grown or shrank or if the query was run with some constants as > parameters it can be awkward. If some of the tables in the query were > temporary tables it can be impossible. You can never really be sure > you're looking at precisely the same plan than the other user's > session is running. > > But stuffing the whole json or xml explain plan into pg_stat_activity > seems like it doesn't really fit the same model that the existing > infrastructure is designed around. It could be quite large and if we > want to support progress feedback it could change quite frequently. > > We do stuff the whole query there (up to a limited size) so maybe I'm > all wet and stuffing the explain plan in there would be fine? It seems to me that progress reporting could add quite a bit of overhead. For example, in the whole-database vacuum case, the most logical way to report progress would be to compute pages visited divided by pages to be visited. But the total number of pages to be visited is something that doesn't need to be computed in advance unless someone cares about progress. I don't think we want to incur that overhead in all cases just on the off chance someone might ask. We need to think about ways to structure this so that it only costs when someone's using it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On 18 August 2010 13:45, Greg Stark wrote: > On Tue, Aug 17, 2010 at 11:29 PM, Dave Page wrote: >> Which is ideal for monitoring your own connection - having the info in >> the pg_stat_activity is also valuable for monitoring and system >> administration. Both would be ideal :-) > > Hm, I think I've come around to the idea that having the info in > pg_stat_activity would be very nice. I can just picture sitting in > pgadmin while a bunch of reports are running and seeing progress bars > for all of them... > > But progress bars alone aren't really the big prize. I would really > love to see the explain plans for running queries. Do you mean just see the explain plan? Or see at what stage of the plan the query has reached? I think the latter would be awesome. And if it's broken down by step, wouldn't it be feasible to knew how far through that step it's got for some steps? Obviously for ones with a LIMIT applied it wouldn't know how far through it had got, but for things like a sequential scan or sort it should be able to indicate how far through it is. -- Thom Brown Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security label support, part.2
On Wed, Aug 18, 2010 at 8:49 AM, Stephen Frost wrote: > In the end, I'm thinking that if the external security module wants to > enforce a check against all the children of a parent, they could quite > possibly handle that already and do it in such a way that it won't break > depending on the specific query. To wit, it could query the catalog to > determine if the current table is a parent of any children, and if so, > go check the labels/permissions/etc on those children. I'd much rather > have something where the permissions check either succeeds or fails > against the parent, depending on the permissions of the parent and its > children, than on what the query is itself and what conditionals are > applied to it. Interesting idea. Again, I haven't read the code, but seems worth further investigation, at least. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security label support, part.2
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: > If rte->requiredPerms would not be cleared, the user of the hook will > be able to check access rights on the child tables, as they like. This would only be the case for those children which are being touched in the current query, which would depend on what conditionals are applied, what the current setting of check_constraints is, and possibly other factors. I do *not* like this approach. > How about an idea to add a new flag in RangeTblEntry which shows where > the RangeTblEntry came from, instead of clearing requiredPerms? > If the flag is true, I think ExecCheckRTEPerms() can simply skip checks > on the child tables. How about the external module just checks if the current object being queried has parents, and if so, goes and checks the labels/permissions/etc on those children? That way the query either always fails or never fails for a given caller, rather than sometimes working and sometimes not depending on the query. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] security label support, part.2
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > If C1, C2, and C3 inherit from P, it's perfectly reasonable to grant > permissions to X on C1 and C2, Y on C3, and Z on C1, C2, C3, and P. I > don't think we should disallow that. Sure, it's possible to do things > that are less sane, but if we put ourselves in the business of > removing useful functionality because it might be misused, we'll put > ourselves out of business. > > Having said that, I'm not sure that the same arguments really hold > water in the world of label based security. Suppose we have > compartmentalized security: P is a table of threats, with C1 > containing data on nukes, C2 containing data on terrorists, and C3 > containing data on foreign militaries. If we create a label for each > of these threat types, we can apply that label to the corresponding > table; but what label shall we assign P? Logically, the label for P > should be set up in such a fashion that the only people who can read P > are those who can read C1, C2, and C3 anyway, but who is to say that > such a label exists? Even if KaiGai's intended implementation of > SE-PostgreSQL supports construction of such a label, who is to say > that EVERY conceivable labeling system will also do so? I don't see why using labels in the second case changes anything. Consider roles. If you only had a role that could see threats, a role that could see nukes, and a role that could see terrorists, but no role that could see all of them, it's the same problem. Additionally, this kind of problem *isn't* typically addressed with the semantics or the structure of inheiritance- it's done with row-level security and is completely orthogonal to the inheiritance issue. Imagine a new table, C4, is added to P and the admin configures it such that only the 'view_c4' role has access to that child table directly. Now, Z can see what's in C4 through P, even though Z doesn't have access to C4. In the old system, if Z's query happened to hit C4, the whole query would fail but at least Z wouldn't see any C4 data. Other queries on P done by Z would be fine, so long as they didn't hit C4. > In fact, it > seems to me that it might be far more reasonable, in a case like this, > to ignore the *parent* label and look only at each *child* label, > which to me is an argument that we should set this up so as to allow > individual users of this hook to do as they like. I think it'd be more reasonable to do this for inheiritance in general, but the problem is that people use it for partitioning, and there is a claim out there that it's against what the SQL spec says. The folks using inheiritance for partitioning would probably prefer to not have to deal with setting up the permissions on the child tables. I think that's less of an issue now, but I didn't like the previous behavior where certain queries would work and certain queries wouldn't work against the parent table, either. > It's also worth pointing out that the hook in ExecCheckRTPerms() does > not presuppose label-based security. It could be used to implement > some other policy altogether, which only strengthens the argument that > we can't know how the user of the hook wants to handle these cases. This comes back around, in my view, to the distinction between really using inheiritance for inheiritance, vs using it for partitioning. If it's used for partitioning (which certainly seems to be the vast majority of the cases I've seen it used) then I think it should really be considered and viewed as a single object to the authentication system. I don't suppose we're going to get rid of inheiritance for inheiritance any time soon though. In the end, I'm thinking that if the external security module wants to enforce a check against all the children of a parent, they could quite possibly handle that already and do it in such a way that it won't break depending on the specific query. To wit, it could query the catalog to determine if the current table is a parent of any children, and if so, go check the labels/permissions/etc on those children. I'd much rather have something where the permissions check either succeeds or fails against the parent, depending on the permissions of the parent and its children, than on what the query is itself and what conditionals are applied to it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Progress indication prototype
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page wrote: > Which is ideal for monitoring your own connection - having the info in > the pg_stat_activity is also valuable for monitoring and system > administration. Both would be ideal :-) Hm, I think I've come around to the idea that having the info in pg_stat_activity would be very nice. I can just picture sitting in pgadmin while a bunch of reports are running and seeing progress bars for all of them... But progress bars alone aren't really the big prize. I would really love to see the explain plans for running queries. This would improve the DBAs view of what's going on in the system immensely. Currently you have to grab the query and try to set up a similar environment for it to run explain on it. If analyze has run since or if the tables have grown or shrank or if the query was run with some constants as parameters it can be awkward. If some of the tables in the query were temporary tables it can be impossible. You can never really be sure you're looking at precisely the same plan than the other user's session is running. But stuffing the whole json or xml explain plan into pg_stat_activity seems like it doesn't really fit the same model that the existing infrastructure is designed around. It could be quite large and if we want to support progress feedback it could change quite frequently. We do stuff the whole query there (up to a limited size) so maybe I'm all wet and stuffing the explain plan in there would be fine? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: tuplestore, tuplesort aggregate functions
Hello I still thinking about a "median" type functions. My idea is to introduce a new syntax for stype definition - like stype = type, or stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or stype = TUPLESTORE OF type, or stype = TUPLESORT OF type [ DESC | ASC ] when stype is ARRAY of then final and transistent functions can be a PL functions. When stype isn't scalar, then sfunc can be undefined (it use a buildin functions). Then we can implement a aggregate only with final functions. so median function can be defined: CREATE FUNCTION num_median_final(internal) RETURNS numeric AS ... CREATE AGGREGATE median(numeric) (stype = TUPLESORT OF numeric, finalfunc = num_median_final); This feature has impact primary on agg executor, and can be relative simple - no planner changes (or not big), minimal parser changes. Main reason for this feature is possible access to tuplesort and tuplesort. I hope, so this can solve a problems with computing a median and similar functions on very large datasets. comments? regards -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security label support, part.2
2010/8/18 KaiGai Kohei : >> It's also worth pointing out that the hook in ExecCheckRTPerms() does >> not presuppose label-based security. It could be used to implement >> some other policy altogether, which only strengthens the argument that >> we can't know how the user of the hook wants to handle these cases. >> > If rte->requiredPerms would not be cleared, the user of the hook will > be able to check access rights on the child tables, as they like. > How about an idea to add a new flag in RangeTblEntry which shows where > the RangeTblEntry came from, instead of clearing requiredPerms? > If the flag is true, I think ExecCheckRTEPerms() can simply skip checks > on the child tables. Something along those lines might work, although I haven't yet scrutinized the code well enough to have a real clear opinion on what the best way of dealing with this is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GROUPING SETS revisited
Hello I found a break in GROUPING SETS implementation. Now I am playing with own executor and planner node and I can't to go forward :(. Probably this feature will need a significant update of our agg implementation. Probably needs a some similar structure like CTE but it can be a little bit reduced - there are a simple relation between source query and result query - I am not sure, if this has to be implemented via subqueries? The second question is relative big differencies between GROUP BY behave and GROUP BY GROUPING SETS behave. Now I don't know about way to join GROUP BY and GROUPING SETS together Any ideas welcome Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, Aug 18, 2010 at 11:01, Michael Haggerty wrote: > Martijn van Oosterhout wrote: >> On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote: >>> So let's take the simplest example: a branch BRANCH1 is created from >>> trunk commit T1, then some time later another FILE1 from trunk commit T3 >>> is added to BRANCH1 in commit B4. How should this series of events be >>> represented in a git repository? >> >> >> >>> The "exclusive" possibility is to ignore the fact that some of the >>> content of B4 came from trunk and to pretend that FILE1 just appeared >>> out of nowhere in commit B4 independent of the FILE1 in TRUNK: >>> >>> T0 -- T1 -- T2 T3 -- T4 TRUNK >>> \ >>> B1 -- B2 -- B3 -- B4 BRANCH1 >>> >>> This is also wrong, because it doesn't reflect the true lineage of FILE1. >> >> But the "true lineage" is not stored anywhere in CVS so I don't see why >> you need to fabricate it for git. Sure, it would be really nice if you >> could, but if you can't do it reliably, you may as well not do it at >> all. What's the loss? > > CVS does record (albeit somewhat ambiguously) the branch from which a > new branch sprouted. The history above might result from commands like > > cvs update -A > cvs tag -b BRANCH1 > cvs update -r BRANCH1 > cvs commit -m T2 > touch FILE1 cvs commit -m B1 > cvs add FILE1 > cvs commit -m T3 cvs commit -m B2 > > cvs commit -m B3 > cvs tag -b BRANCH1 FILE1 > > or the last step might have been an explicit merge into BRANCH1: > > cvs update -j T1 -j T3 > cvs commit -m B4 > > Either way, the CVS history relatively clearly indicates that content > was ported from TRUNK to BRANCH1. There is no way to distinguish > whether it was a cherry-pick (not recordable in git's history) vs. a > full merge without more information or more intelligence. Well, in *our* case we know that it was a "cherry-pick". Because we've done no full merges ;) So if there's a way for us to short-wire the tool, that'd be great. > Magnus Hagander wrote: >> Our requirements are simple: our cvs history is linear, the git >> history should be linear. It is *not* the same commit that's on head >> and the branch. They are two different commits, that happen to have >> the same commit message and mostly the same content. > > I don't think this is at all an issue of cvs2svn merging commits that > happen to have the same commit message and/or commit time. The merge > commits are all manufactured by cvs2svn to do two things: > > 1. Add content that needs to be on the branch, because a file was added > to the branch after the branch's creation. This *needs* to be done to > ensure that the branch has the correct content. Ok. > 2. Indicate the origin of the new branch content. This goal is debatable. I agree this is debatable. We've kind of debated it already (though not in exactly this context) and decided we'd rather have it appear as brand new content on this branch and not as a merge. >> Bottom line is, we want zero merge commits in the git repository. We >> may start using that sometime in the future (but for now, we've >> decided we don't want that even in the future), but we most >> *definitely* don't want it in the past. We don't care about >> "representing the proper heritage of FILE1" in git, because we never >> did in cvs. >> >> Is there some way to make cvs2git work this way, and just not bother >> even trying to create merge commits, or is that fundamentally >> impossible and we need to look at another tool? > > A merge is just a special case of content being taken from one branch > and added to another. Logically, the same thing happens when a branch > is created, and some of the same problems can occur in that situation. > A branch can be created using content from multiple source branches, > which cvs2git currently also represents as a merge. Can be, yes. AFAIK, we don't ever do that (though I can't swear to that, since there have been some funky things in our cvs repository earlier) > Assuming that you don't want to discard all record of where a branch > sprouted from, it is therefore necessary to choose a single parent > branch for each branch creation. To be sure, this choice can be > incorrect the same way as the merge commits discussed above are > incorrect. But one reasonable "mostly-exclusive" approach would be to > choose the most likely parent as the source of the branch and ignore all > others. Yes, I believe that is what we'd prefer, as it's what most closely matches how *we*'ve been using CVS. > cvs2git doesn't currently have this option. I'm not sure how much work > it would be to implement; probably a few days'. Alternatively, you Would this be something you'd consider doing, since it might be of interest to others? I'm sure if it's
Re: [HACKERS] git: uh-oh
Martijn van Oosterhout wrote: > On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote: >> So let's take the simplest example: a branch BRANCH1 is created from >> trunk commit T1, then some time later another FILE1 from trunk commit T3 >> is added to BRANCH1 in commit B4. How should this series of events be >> represented in a git repository? > > > >> The "exclusive" possibility is to ignore the fact that some of the >> content of B4 came from trunk and to pretend that FILE1 just appeared >> out of nowhere in commit B4 independent of the FILE1 in TRUNK: >> >> T0 -- T1 -- T2 T3 -- T4TRUNK >>\ >> B1 -- B2 -- B3 -- B4BRANCH1 >> >> This is also wrong, because it doesn't reflect the true lineage of FILE1. > > But the "true lineage" is not stored anywhere in CVS so I don't see why > you need to fabricate it for git. Sure, it would be really nice if you > could, but if you can't do it reliably, you may as well not do it at > all. What's the loss? CVS does record (albeit somewhat ambiguously) the branch from which a new branch sprouted. The history above might result from commands like cvs update -A cvs tag -b BRANCH1 cvs update -r BRANCH1 cvs commit -m T2 touch FILE1 cvs commit -m B1 cvs add FILE1 cvs commit -m T3 cvs commit -m B2 cvs commit -m B3 cvs tag -b BRANCH1 FILE1 or the last step might have been an explicit merge into BRANCH1: cvs update -j T1 -j T3 cvs commit -m B4 Either way, the CVS history relatively clearly indicates that content was ported from TRUNK to BRANCH1. There is no way to distinguish whether it was a cherry-pick (not recordable in git's history) vs. a full merge without more information or more intelligence. Magnus Hagander wrote: > Our requirements are simple: our cvs history is linear, the git > history should be linear. It is *not* the same commit that's on head > and the branch. They are two different commits, that happen to have > the same commit message and mostly the same content. I don't think this is at all an issue of cvs2svn merging commits that happen to have the same commit message and/or commit time. The merge commits are all manufactured by cvs2svn to do two things: 1. Add content that needs to be on the branch, because a file was added to the branch after the branch's creation. This *needs* to be done to ensure that the branch has the correct content. 2. Indicate the origin of the new branch content. This goal is debatable. > Bottom line is, we want zero merge commits in the git repository. We > may start using that sometime in the future (but for now, we've > decided we don't want that even in the future), but we most > *definitely* don't want it in the past. We don't care about > "representing the proper heritage of FILE1" in git, because we never > did in cvs. > > Is there some way to make cvs2git work this way, and just not bother > even trying to create merge commits, or is that fundamentally > impossible and we need to look at another tool? A merge is just a special case of content being taken from one branch and added to another. Logically, the same thing happens when a branch is created, and some of the same problems can occur in that situation. A branch can be created using content from multiple source branches, which cvs2git currently also represents as a merge. Assuming that you don't want to discard all record of where a branch sprouted from, it is therefore necessary to choose a single parent branch for each branch creation. To be sure, this choice can be incorrect the same way as the merge commits discussed above are incorrect. But one reasonable "mostly-exclusive" approach would be to choose the most likely parent as the source of the branch and ignore all others. cvs2git doesn't currently have this option. I'm not sure how much work it would be to implement; probably a few days'. Alternatively, you could write a tool that would rewrite the ancestry information in the repository *after* the cvs2git conversion using .git/info/grafts (see git-filter-branch(1)). Such rewriting would have to occur before the repository is published, because the rewriting will change the hashes of most commits. Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git: uh-oh
On Wed, Aug 18, 2010 at 08:25, Michael Haggerty wrote: > Tom Lane wrote: >> I lack git-fu pretty completely, but I do have the CVS logs ;-). >> It looks like some of these commits that are being ascribed to the >> REL8_3_STABLE branch were actually only committed on HEAD. For >> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was >> only in HEAD. It was back-patched a few hours later (1 Mar 3:41), >> and that's also shown here, but the HEAD commit shouldn't be. >> >> I wonder whether the repository is completely OK and the problem >> is that this webpage isn't filtering the commits correctly. > > Please don't panic :-) We're not panic'ing just yet :-) > The problem is that it is *impossible* to faithfully represent a CVS or > Subversion history with its ancestry information in a git repository (or > AFAIK any of the DVCS repositories). The reason is that CVS > fundamentally records the history of single files, and each file can > have a branching history that is incompatible with those of other files. > For example, in CVS, a file can be added to a branch after the branch > already exists, different files can be added to a branch from multiple > parent branches, and even more perverse things are allowed. The CVS > history can record this mish-mash (albeit with much ambiguity). It can. IIRC we have cleaned a couple of such things out. > Given the choice between two wrong histories, cvs2git uses the > "inclusive" style. The result is that the ancestors of B4 include not > only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3. > The display in the website that was quoted [2] seems to mash all of the > ancestors together without showing the topology of the history, making > the result quite confusing. The true history looks more like this: > > $ git log --oneline --graph REL8_3_10 master > [...] > | * 2a91f07 tag 8.3.10 > | * eb1b49f Preliminary release notes for releases 8.4.3, 8.3 > | * dcf9673 Use SvROK(sv) rather than directly checking SvTYP > | * 1194fb9 Update time zone data files to tzdata release 201 > | * fdfd1ec Return proper exit code (3) from psql when ON_ERR > | * 77524a1 Backport fix from HEAD that makes ecpglib give th > | * 55391af Add missing space in example. > | * 982aa23 Require hostname to be set when using GSSAPI auth > | * cb58615 Update time zone data files to tzdata release 201 > | * ebe1e29 When reading pg_hba.conf and similar files, do no > | * 5a401e6 Fix a couple of places that would loop forever if > | * 5537492 Make contrib/xml2 use core xml.c's error handler, > | * c720f38 Export xml.c's libxml-error-handling support so t > | * 42ac390 Make iconv work like other optional libraries for > | * b03d523 pgindent run on xml.c in 8.3 branch, per request > | * 7efcdaa Add missing library and include dir for XSLT in M > | * 6ab1407 Do not run regression tests for contrib/xml2 on M > | * fff18e6 Backpatch MSVC build fix for XSLT > | * 7ae09ef Fix numericlocale psql option when used with a nu > | * de92a3d Fix contrib/xml2 so regression test still works w > | * 80f81c3 This commit was manufactured by cvs2svn to crea > | |\ > | |/ > |/| > * | a08b04f Fix contrib/xml2 so regression test still works w > * | 0d69e0f It's clearly now pointless to do backwards compat > * | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES > * | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2. > * | 5b65b67 add EPERM to the list of return codes to expect f > | * a4067b3 Remove xmlCleanupParser calls from contrib/xml2. > | * 91b76a4 Back-patch today's memory management fixups in co > | * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor > | * 043041e This commit was manufactured by cvs2svn to crea > | |\ > | |/ > |/| > * | 98cc16f Fix up memory management problems in contrib/xml2 > * | 17e1420 Second try at fsyncing directories in CREATE DATA > * | a350f70 Assorted code cleanup for contrib/xml2. No chang > * | 3524149 Update complex locale example in the documentatio > [...] > > The left branch is master, the right branch is the one leading to > REL8_3_10. You can see that there are multiple merges from master to > the branch, presumably when new files from trunk were ported to the > branch. This is even easier to see using a graphical history browser > like gitk. Yeah, this is clearly the problem. > There are good arguments for both the "inclusive" and the "exclusive" > representation of history. The ideal would require a lot more > intelligence and better heuristics (and slow down the conversion > dramatically). But even the smartest conversion would still be wrong, > because git is simply incapable of representing an arbitrary CVS > history. The main practical result of the impedance mismatch is that it > will be more difficult to merge between branches that originated in CVS > (but that is no surprise!) Our requirements are simple: our cvs history is linear, the git history should be linear. It is *not* the same commit that's on