Re: [HACKERS] Command Triggers, patch v11
Thom Brown writes: > Will you also be committing the trigger function variable changes > shortly? I don't wish to test anything prior to this as that will > involve a complete re-test from my side anyway. It's on its way, I had to spend time elsewhere, sorry about that. With some luck I can post a intermediate patch later this evening limited to PLpgSQL support for your testing. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Mon, Mar 19, 2012 at 1:50 AM, Tom Lane wrote: > For an example in our own > problem space look at mysql, whose regression tests take well over an > hour to run on a fast box. So they must be damn near bug-free right? > Uh, not so much, and I think the fact that developers can't easily run > their test suite is not unrelated to that. The other problem with this approach is that it's hard to keep a huge test suite 100% clean. Changes inevitably introduce behaviour changes that cause some of the tests to fail. If the test suite is huge then it's a lot of work to be continually fixing these tests and you're always behind. If it's always the case that some tests in this huge suite are failing then it's extra work whenever you make a change to dig through the results and determine whether any of the failures are caused by your changes and represent a real problem. Even if you do the work it's easy to get it wrong and miss a real failure. My suggestion would be to go ahead and check in the python or perl script but not make that the pg_regress tests that are run by mak check. Cherry pick just a good set of tests that test most of the tricky bits and check that in to run on make test. I tihnk there's even precedent for that in one of the other modules that has a make longcheck or make slowcheck or something like that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On Sunday, March 18, 2012 07:29:30 PM Tom Lane wrote: > BTW, I've been looking through how to do what I suggested earlier to get > rid of the coziness and code duplication between CreateTableAs and the > prepare.c code; namely, let CreateTableAs create a DestReceiver and then > call ExecuteQuery with that receiver. It appears that we still need at > least two bits of added complexity with that approach: > > 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so > that it can enforce that the prepared query is a SELECT. (BTW, maybe > this should be weakened to "something that returns tuples", in view of > RETURNING?) I don't really see the use case but given the amount of work it probably takes it seems reasonable to allow that. > 2. Since ExecuteQuery goes through the Portal machinery, there's no way > for it to pass in eflags to control the table OIDs setup. This is > easily solved by adding an eflags parameter to PortalStart, which > doesn't seem too awful to me, since the typical caller would just pass > zero. (ExecuteQuery would also have to know about testing > interpretOidsOption to compute the eflags to use, unless we add an > eflags parameter to it, which might be the cleaner option.) If we go down this route I think adding an eflag is the better choice. Thinking of it - my patch already added that ;) > In short I'm thinking: add an eflags parameter to PortalStart, and add > both an eflags parameter and a "bool mustReturnTuples" parameter to > ExecuteQuery. This definitely seems cleaner than the current > duplication of code. Hm. I am not *that* convinced anymore. It wasn't that much duplication in the end... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and pg_config dependency
On Fri, Mar 16, 2012 at 08:11:17PM -0300, Alvaro Herrera wrote: > > Excerpts from Bruce Momjian's message of vie mar 16 20:06:28 -0300 2012: > > Àlvaro told me he got a Spanish-language report that pg_upgrade > > failed because it required pg_config, and pg_config is only supplied > > with the devel packages. > > > > I initially thought that it was a packaging problem, but I later > > realized the pg_config is mostly developer settings, and that using > > pg_config was not getting any change to libdir by dynamic_library_path > > in postgresql.conf, and that I should just merge the pg_upgrade_support > > detection code into the existing shared library detection "LOAD" code I > > already had. > > > > This avoids the pg_config dependency, works better for libdir, and > > reduces the amount of code. > > Bruce also found a reference to this old bug report: > http://archives.postgresql.org/pgsql-bugs/2011-12/msg00254.php > This includes a link to a Debian bug report by Peter. > > > Patch attached. Should this be backpatched to PG 9.1; I think so. > > +1 Applied, and backpatched to 9.1. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark wrote: > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas wrote: >> 12789 28.2686 libc-2.13.so strcoll_l >> 6802 15.0350 postgres text_cmp > > I'm still curious how it would compare to call strxfrm and sort the > resulting binary blobs. I don't think the sortsupport stuff actually > makes this any easier though. Since using it requires storing the > binary blob somewhere I think the support would have to be baked into > tuplesort (or hacked into the sortkey as an expr that was evaluated > earlier somehow). Well, the real problem here is that the strxfrm'd representations aren't just bigger - they are huge. On MacBook Pro, if the input representation is n characters, the strxfrm'd representation is 9x+3 characters. If the we're sorting very wide tuples of which the sort key is only a small part, maybe that would be acceptable, but if the sort key makes up the bulk of the tuple than caching the strxfrm() representation works out to slashing work_mem tenfold. That might be just fine if the sort is going to fit in work_mem either way, but if it turns a quicksort into a heap sort then I feel pretty confident that it's going to be a loser. Keep in mind that even if the strxfrm'd representation were no larger at all, it would still amount to an additional copy of the data, so you'd still potentially be eating up lots of work_mem that way. The fact that it's an order of magnitude larger is just making a bad problem worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane wrote: > However, it occurred to me that we could pretty easily jury-rig > something that would give us an idea about the actual benefit available > here. To wit: make a C function that wraps strxfrm, basically > strxfrm(text) returns bytea. Then compare the performance of > ORDER BY text_col to ORDER BY strxfrm(text_col). > > (You would need to have either both or neither of text and bytea > using the sortsupport code paths for this to be a fair comparison.) Since the index will be ~9x bigger at least on this machine, I think I know the answer, but I suppose it doesn't hurt to test it. It's not that much work. > One other thing I've always wondered about in this connection is the > general performance of sorting toasted datums. Is it better to detoast > them in every comparison, or pre-detoast to save comparison cycles at > the cost of having to push much more data around? I didn't see any > discussion of this point in Robert's benchmarks, but I don't think we > should go very far towards enabling sortsupport for text until we > understand the issue and know whether we need to add more infrastructure > for it. If you cross your eyes a little bit, this is very much like > the strxfrm question... It would be surprising to me if there is a one-size-fits-all answer to this question. For example, if the tuple got toasted because it's got lots of columns and we had to take desperate measures to get it to fit into an 8kB block at all, chances are that detoasting will work out well. We'll use a bit more memory, but hopefully that'll be repaid by much faster comparisons. OTOH, if you have a data set with many relatively short strings and a few very long ones, detoasting up-front could turn a quicksort into a heapsort. Since only a small fraction of the comparisons would have involved one of the problematic long strings anyway, it's unlikely to be worth the expense of keeping those strings around in detoasted form for the entire sort (unless maybe reconstructing them even a few times is problematic because we're under heavy cache pressure and we get lots of disk seeks as a result). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane wrote: > 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so > that it can enforce that the prepared query is a SELECT. (BTW, maybe > this should be weakened to "something that returns tuples", in view of > RETURNING?) +1 for "something that returns with tuples". CREATE TABLE ... AS DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to support. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Robert Haas writes: > On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane wrote: >> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so >> that it can enforce that the prepared query is a SELECT. (BTW, maybe >> this should be weakened to "something that returns tuples", in view of >> RETURNING?) > +1 for "something that returns with tuples". CREATE TABLE ... AS > DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to > support. For the moment I've backed off that idea. The main definitional question we'd have to resolve is whether we want to allow WITH NO DATA, and if so what does that mean (should the DELETE execute, or not?). I am also not certain that the RETURNING code paths would cope with a WITH OIDS specification, and there are some other things that would need fixed. It might be cool to do it sometime, but it's not going to happen in this patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incompatible pointer types with newer zlib
On Sat, Mar 17, 2012 at 3:58 AM, Peter Eisentraut wrote: > On tor, 2012-03-01 at 19:19 +0200, Peter Eisentraut wrote: >> I think the best fix would be to rearrange _PrintFileData() so that it >> doesn't use FH at all. Instead, we could define a separate >> ArchiveHandle field IF that works more like OF, and then change >> ahwrite() to use that. > > Here is a patch that might fix this. I haven't been able to test this > properly, so this is just from tracing the code. It looks like > _PrintFileData() doesn't need to use FH at all, so it could use a local > file handle variable instead. Could someone verify this please? It looks like this code can be used via the undocumented -Ff option to pg_dump. But considering this code was added in 2000 as demonstration code and has apparently never been documented, and considering also that we now have the "directory" archive format which is presumably quite a similar idea but documented and intended for production use, maybe we should just rip out pg_backup_files/archFiles altogether. pg_dump is crufty enough without supporting undocumented and obsolete options for multiple decades. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On Mon, Mar 19, 2012 at 12:45 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane wrote: >>> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so >>> that it can enforce that the prepared query is a SELECT. (BTW, maybe >>> this should be weakened to "something that returns tuples", in view of >>> RETURNING?) > >> +1 for "something that returns with tuples". CREATE TABLE ... AS >> DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to >> support. > > For the moment I've backed off that idea. The main definitional > question we'd have to resolve is whether we want to allow WITH NO DATA, > and if so what does that mean (should the DELETE execute, or not?). > I am also not certain that the RETURNING code paths would cope with > a WITH OIDS specification, and there are some other things that would > need fixed. It might be cool to do it sometime, but it's not going to > happen in this patch. Fair enough. It would be nice to have, but it definitely does not seem worth spending a lot of time on right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote: > If we were going to change the output at all, I would vote for "CREATE > TABLE " so as to preserve the rowcount functionality. Keep in > mind though that this would force client-side changes, for instance in > libpq's PQcmdTuples(). Fixing that one routine isn't so painful, but > what of other client-side libraries, not to mention applications? Doesn't seem worth it to me. At least, "SELECT " makes some sense: rows were selected. "CREATE TABLE " means what? tables were created? What might make sense is to delegate this additional information to separate fields in a future protocol revision. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut wrote: > On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote: >> If we were going to change the output at all, I would vote for "CREATE >> TABLE " so as to preserve the rowcount functionality. Keep in >> mind though that this would force client-side changes, for instance in >> libpq's PQcmdTuples(). Fixing that one routine isn't so painful, but >> what of other client-side libraries, not to mention applications? > > Doesn't seem worth it to me. At least, "SELECT " makes some sense: > rows were selected. "CREATE TABLE " means what? tables > were created? > > What might make sense is to delegate this additional information to > separate fields in a future protocol revision. I think that we would not have bothered to add the row count to the command tag output for SELECT unless it were useful. It seems to be *more* useful for CTAS than for SELECT; after all, SELECT also returns the actual rows. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Hi Noah, thank you again for your thorough review, which is much appreciated. > I think the patch has reached the stage where a committer can review > it without wasting much time on things that might change radically. > So, I'm marking it Ready for Committer. Please still submit an update > correcting the above; I'm sure your committer will appreciate it. Attached is v5, which should address all the remaining issues. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote: > I recommend removing the new block of code in RI_FKey_eachcascade_del() and > letting array_remove() throw the error. If you find a way to throw a nicer > error without an extra scan, by all means submit that to a future CF as an > improvement. I don't think it's important enough to justify cycles at this > late hour of the current CF. It makes sense; we have removed the block of code and updated the error message following your suggestion. Now the error is thrown by array_remove() itself. We'll keep an eye on this for further improvements in the future. > > > pg_constraint.confeach needs documentation. > > > > Most of pg_constraint columns, including all the boolean ones, are > > documented only in the "description" column of > > > > http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760 > > > > it seems that confiseach should not be an exception, since it just > > indicates whether the constraint is of a certain kind or not. > > Your patch adds two columns to pg_constraint, confiseach and confeach, but it > only mentions confiseach in documentation. Just add a similar minimal mention > of confeach. Sorry, our mistake; a mention for confeach has now been added, and both entries have been reordered to reflect the column position in pg_constraint). > That is to say, they start with a capital letter and end with a period. Your > old text was fine apart from the lack of a period. (Your new text also lacks > a period.) Fixed, it should be fine now (another misunderstanding on our side, apologies). > If the cost doesn't exceed O(F log P), where F is the size of the FK table and > P is the size of the PK table, I'm not worried. If it can be O(F^2), we would > have a problem to be documented, if not fixed. We have rewritten the old query in a simpler way; now its cost is O(F log P). Here F must represent the size of the "flattened" table, that is, the total number of values that must be checked, which seems a reasonable assumption in any case. > Your change to array_replace() made me look at array_remove() again and > realize that it needs the same treatment. This command yields a segfault: > SELECT array_remove('{a,null}'::text[], null); Fixed. > > This latest version introduces two calls to get_element_type(), both of which > should be get_base_element_type(). Fixed. > Patch "Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE", committed > between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint() > requiring an update in this patch. Run this in the regression DB: > [local] regression=# alter table fktableforeachfk alter ftest1 type int[]; > ERROR: could not find cast from 1007 to 23 Thanks for pointing it out. We have added a regression test and then fixed the issue. > > RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE. Fixed. EACH-foreign-key.v5.patch.bz2 Description: application/bzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Robert Haas writes: > On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut wrote: >> Doesn't seem worth it to me. At least, "SELECT " makes some sense: >> rows were selected. "CREATE TABLE " means what? tables >> were created? >> >> What might make sense is to delegate this additional information to >> separate fields in a future protocol revision. > I think that we would not have bothered to add the row count to the > command tag output for SELECT unless it were useful. It seems to be > *more* useful for CTAS than for SELECT; after all, SELECT also returns > the actual rows. I think we're all in agreement that we need to keep the rowcount functionality. What seems to me to be in some doubt is whether to continue to present the tag "SELECT " or to change it to something like "CREATE TABLE ". For the moment I've got the patch doing the former. It would not be terribly hard to change it, but I'm not going to break backward compatibility unless there's a pretty clear consensus to do so. BTW, I just came across another marginal-loss-of-functionality issue: in previous versions you could PREPARE a SELECT INTO, but now you get regression=# prepare foo as select * into bar from int8_tbl; ERROR: utility statements cannot be prepared Is anybody excited about that? If it is something we have to keep, it seems like pretty nearly a deal-breaker for this patch, because storing a CreateTableAsStmt containing an already-prepared plan would complicate matters unreasonably. You can still get approximately the same result with prepare foo as select * from int8_tbl; create table bar as execute foo; which if anything is more useful since you didn't nail down the target table name in the PREPARE, but nonetheless it's different. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012: > I am understanding the scenario now. I will take a little of your time to > modify my original idea: > > The "middle layer" still exists, but NOT on the individual client side , > rather , on the server side. That is, we maintain the middle layer on the > server, and it is same for all the users. We can mutate the ordering, and > changes would be reflected in all the clients, since they are all accessing > the same middle layer, present on the server. I will take a little of your time to ask whether you read the older threads on the topic -- the one that Andrew Dunstan linked to, and the older ones, to which that email links to? This whole area has been extensively discussed before. In any case, I repeat what I said before: this is way too complex a topic for it to make a good GSoC project (unless you're already intimately familiar with the parser and executor code.) You need to pick something smaller, more localized. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera wrote: > > Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012: > >> I am understanding the scenario now. I will take a little of your time to >> modify my original idea: >> >> The "middle layer" still exists, but NOT on the individual client side , >> rather , on the server side. That is, we maintain the middle layer on the >> server, and it is same for all the users. We can mutate the ordering, and >> changes would be reflected in all the clients, since they are all accessing >> the same middle layer, present on the server. > > I will take a little of your time to ask whether you read the older > threads on the topic -- the one that Andrew Dunstan linked to, and the > older ones, to which that email links to? This whole area has been > extensively discussed before. > > In any case, I repeat what I said before: this is way too complex a > topic for it to make a good GSoC project (unless you're already > intimately familiar with the parser and executor code.) You need to > pick something smaller, more localized. > > -- > Álvaro Herrera > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Hi Alvaro, Thanks for your reply.Yes,I have read the earlier posts. I am familiar with database systems.I have worked on them before and I have done designing of some very basic databases. Please guide me to some area where I can research for some gsoc project. Thanks, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
I wrote: > One thing I soon found is that it lacks support for EXPLAIN SELECT INTO. While I'm not particularly excited about fixing PREPARE ... SELECT INTO or CREATE RULE ... SELECT INTO, I've come to the conclusion that the EXPLAIN case is a must-fix. Because not only is EXPLAIN SELECT INTO broken, but so is EXPLAIN CREATE TABLE AS, and the latter functionality is actually documented. So presumably somebody went out of their way to make this work, at some point. Since I've got the table-creation code moved into intorel_startup, this doesn't look to be that painful, but it will require an API change for ExplainOnePlan, which is slightly annoying because that's probably in use by third-party plugins. We could either break them obviously (by adding an explicit parameter) or break them subtly (by adding an ExplainState field they might forget to initialize). The former seems preferable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incompatible pointer types with newer zlib
Robert Haas writes: > maybe we should just rip out pg_backup_files/archFiles altogether. > pg_dump is crufty enough without supporting undocumented and obsolete > options for multiple decades. +1 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incompatible pointer types with newer zlib
On 03/19/2012 02:53 PM, Tom Lane wrote: Robert Haas writes: maybe we should just rip out pg_backup_files/archFiles altogether. pg_dump is crufty enough without supporting undocumented and obsolete options for multiple decades. +1 Yeah, go for it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On Tue, Mar 20, 2012 at 12:08 AM, Atri Sharma wrote: > On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera > wrote: >> >> Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012: >> >>> I am understanding the scenario now. I will take a little of your time to >>> modify my original idea: >>> >>> The "middle layer" still exists, but NOT on the individual client side , >>> rather , on the server side. That is, we maintain the middle layer on the >>> server, and it is same for all the users. We can mutate the ordering, and >>> changes would be reflected in all the clients, since they are all accessing >>> the same middle layer, present on the server. >> >> I will take a little of your time to ask whether you read the older >> threads on the topic -- the one that Andrew Dunstan linked to, and the >> older ones, to which that email links to? This whole area has been >> extensively discussed before. >> >> In any case, I repeat what I said before: this is way too complex a >> topic for it to make a good GSoC project (unless you're already >> intimately familiar with the parser and executor code.) You need to >> pick something smaller, more localized. >> >> -- >> Álvaro Herrera >> The PostgreSQL Company - Command Prompt, Inc. >> PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > > -- > > Hi Alvaro, > > Thanks for your reply.Yes,I have read the earlier posts. > > I am familiar with database systems.I have worked on them before and I > have done designing of some very basic databases. > > Please guide me to some area where I can research for some gsoc project. > > Thanks, > > Atri > > > -- > Regards, > > Atri > l'apprenant -- Hi Alvaro, I was just going through PGfoundry and I think I will be able to work on the PL/Java project.Please let me know If I can proceed further on ot for GSOC and also,if currently there is any work that needs to be done on PL/java. -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane wrote: > So has somebody found a hole in the n log n lower bound on the cost of > comparison-based sorting? I thought that had been proven pretty > rigorously. There's not much danger of anyone finding a way around that bound since the proof is quite trivial, but keep in mind that it's a worst-case bound. For example, if you have reason to believe that the input is likely to already be sorted, you can check for that case in using just n-1 comparisons. If it turns out you're right, you can "sort" the data in linear time. Heck, you can also check for reverse-sorted input at the same time, if you like, and handle that special case too. Of course, when the data is unordered, such special-case checks hurt rather than help, even though it's still O(n lg n) overall; those pesky constant factors do matter quite a bit. Still, sometimes it's worth it: our quicksort algorithm checks for sorted input on each recursive invocation, since quicksort degrades rather badly in that scenario; but our heapsort doesn't contain a similar check, probably because heapsort typically works fine in that scenario. One thing that seems inefficient to me about our current algorithm is the use of the run number as a leading column in the sort key. There's no real reason why the tuples destined for the next run need to be maintained in heap order; we could just store them unordered and heapify the whole lot of them when it's time to start the next run. That ought to save comparison cycles for the current run, since the heap will be less deep, and there are other possible savings as well - in particular, an unordered array of tuples can be heapify'd in linear time, whereas our current algorithm is O(n lg n). However, when I actually implemented this, I found that doing it this way was a loser. In fact, excluding the next-run tuples from the heap for the current run was a BIG loser even before you add in the time required to re-heapify. This confused the daylights out of me for a while, because it's hard to understand how insert and siftup can be slower on a small heap than a large one. My working theory is that, even though we must necessarily do more comparisons when manipulating a larger heap, many of those comparisons are resolved by comparing run numbers, and that's a lot cheaper than invoking the real comparator. For example, if we insert into a heap whose rightmost child is in the next run, and the new tuple goes into the current run, and the current size of the heap is such that the initial position of the new element is a descendent of the right node, it will very quickly crash through all of the next-run tuples and only need one REAL comparison - against the root. Either the new element ends up as the root, or it ends up as the direct child of the root; now we remove the root and, perhaps, replace it with its rightmost child, meaning that the next element we read in can do the exact same thing all over again. If we exclude the next-run elements from the heap, then the average depth of the heap when inserting a new element is smaller, but all the elements are in the same-run, and we have to invoke the real comparator every time. In other words, those next-run tuples act as dummies which effectively create a heap of uneven depth, and the average depth encountered when inserting tuples turns out to be less than what we get by pulling out the dummies and making the depth uniform. This makes me kind of nervous, because I don't understand why things should work out so well as all that. If throwing some dummy elements into the heap makes things work better, then maybe we should throw in some more. Or maybe it would work better to take some but not all of them out. There certainly doesn't seem to be any indication in the code that this is an anticipated effect, and it seems an awfully providential accident. It may (or may not) be related to the effect that Greg Stark mentions in a nearby post: increasing work_mem makes the heap bigger, so heap maintenance gets more expensive, sometimes without any corresponding benefit. For example, if the input happens to already be sorted, then inserting into the heap is cheap, because each new element is already greater than its parent, and life is good. But removing from the heap is painfully expensive, because to remove element from the heap we stick the last element in the heap into the whole left by the departing element and then sift it down. However, that requires 2*heap_depth comparisons, all of which involve really invoking the comparison function since everything's going to end up going into a single run, and that gets more and more expensive as work_mem increases. So a heap sort of already-sorted data seems to get slower and slower the more memory you give it to work with. For example, a sort of 5 million random strings of length 30, all in shared_buffers, takes 22.03 seconds on my MacBook Pro with work_mem = 128MB, but just
Re: [HACKERS] Regarding column reordering project for GSoc 2012
Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012: > I was just going through PGfoundry and I think I will be able to work > on the PL/Java project.Please let me know If I can proceed further on > ot for GSOC and also,if currently there is any work that needs to be > done on PL/java. Hm, I'm not sure on PL/Java. As far as I know, it is a dead project. But maybe I'm wrong and somebody can mentor you to do something useful with it -- there are so many advancements in PLs that I'm sure there must be something that can be done with PL/Java. Did you have a look at the TODO list? http://wiki.postgresql.org/wiki/Todo -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: autocomplete for functions
On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote: > I'm a bit concerned about whether that's actually going to be useful. > A quick check shows that in the regression database, the proposed patch > produces 3246 possible completions, which suggests that by the time you > get down to a unique match you're going to have typed most of the name > anyway. Well, the regression test database is not really an example of real-life object naming, I think. I tried this out on a couple of real databases and found it quite handy. > > BTW, you should at least exclude dropped columns, I think. > Yes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On mån, 2012-03-19 at 02:35 +, Peter Geoghegan wrote: > I see your point of view. I suppose I can privately hold onto the test > suite, since it might prove useful again. I would still like to have those tests checked in, but not run by default, in case someone wants to hack on this particular feature again. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On 19.03.2012 21:29, Alvaro Herrera wrote: Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012: I was just going through PGfoundry and I think I will be able to work on the PL/Java project.Please let me know If I can proceed further on ot for GSOC and also,if currently there is any work that needs to be done on PL/java. Hm, I'm not sure on PL/Java. As far as I know, it is a dead project. I haven't followed PL/java either, but it doesn't look quite dead to me. The last release was in September 2011, and there are active discussions on the mailing lists. If it's not completely up-to-date with latest JDK and PostgreSQL releases, perhaps bringing it up-to-date and then adding some missing functionality would be a useful GSoC project. I would suggest that you ask on the pl/java mailing list if there is something suitable for a GSoC project there, and if one of the pl/java developers would be willing to mentor. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: autocomplete for functions
Peter Eisentraut writes: > On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote: >> I'm a bit concerned about whether that's actually going to be useful. >> A quick check shows that in the regression database, the proposed patch >> produces 3246 possible completions, which suggests that by the time you >> get down to a unique match you're going to have typed most of the name >> anyway. > Well, the regression test database is not really an example of real-life > object naming, I think. Perhaps not, but a solid 2000 of those names are from the system-created entries in pg_proc, and the regression DB doesn't have an especially large number of tables either. I doubt that real DBs are likely to have materially fewer completions. This connects somewhat to the discussions we've had in the past about trying to get not-intended-for-public-use functions out of the standard search path. Not that you want to put a full visibility check into the tab-completion query, but if it could simply exclude a "pg_private" namespace, that probably wouldn't be too expensive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On mån, 2012-03-19 at 08:59 +, Greg Stark wrote: > The other problem with this approach is that it's hard to keep a huge > test suite 100% clean. Changes inevitably introduce behaviour changes > that cause some of the tests to fail. I think we are used to that because of the way pg_regress works. When you have a better test infrastructure that tests actual functionality rather than output formatting, this shouldn't be the case (nearly as much). If someone wanted to bite the bullet and do the work, I think we could move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly standard Test::* modules) and reduce that useless reformatting work and test more interesting things. Just a thought ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: autocomplete for functions
Excerpts from Tom Lane's message of lun mar 19 16:53:49 -0300 2012: > This connects somewhat to the discussions we've had in the past about > trying to get not-intended-for-public-use functions out of the standard > search path. Not that you want to put a full visibility check into the > tab-completion query, I'm rather of the contrary opinion -- surely if we're going to complete function names, we should only complete those that are in schemas in the path; similarly for column names. (BTW I didn't check but does this completion work if I schema-qualify a column name?) > but if it could simply exclude a "pg_private" > namespace, that probably wouldn't be too expensive. +1 -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 19:55, Peter Eisentraut wrote: > If someone wanted to bite the bullet and do the work, I think we could > move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly > standard Test::* modules) and reduce that useless reformatting work and > test more interesting things. Just a thought ... I think that that is a good idea. However, I am not a Perl hacker, though this is the second time that that has left me at a disadvantage when working on Postgres, so I think it's probably time to learn a certain amount. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double free in current HEAD's pg_dump
Excerpts from Joachim Wieland's message of dom mar 18 02:24:21 -0300 2012: > There's a double free in the current HEAD's pg_dump. Fix attached. Thanks, applied. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storage Manager crash at mdwrite()
On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark wrote: > On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis wrote: > > There is a lot of difference between those two. In particular, it looks > > like the problem you are seeing is coming from the background writer, > > which is not running during initdb. > > The difference that comes to mind is that the postmaster forks. If the > library opens any connections prior to forking and then uses them > after forking that would work at first but it would get confused > quickly once more than one backend tries to use the same connection. > The data being sent would all be mixed together and they would see > responses to requests other processes sent. > The opened connections thing was the first thing I thought of.. the HDFS documentation claims the C interface is thread-safe.. However, since I noticed that different processes (having different process IDs) are calling the mdwrite, then it might be a possibility. I tried reconnecting to HDFS on every mdwrite request, but it didn't work out. > You need to ensure that any network connections are opened up *after* > the new processes are forked. > > There are other things going on that could cause problems. initdb > probably doesn't deal with many errors so it might not be casing any > longjmps that happen when Postgres handles errors. I suspect it > doesn't create any temporary files, etc. > > There's also a long history of threads not playing well with Postgres. > I think that's overblown and I believe they should work fine. Most of > it was caused by a bug in an old version of libc. But you do have to > ensure that the other threads don't call any Postgres functions > because the Postgres code is very much not thread-aware. > > -- > greg > I'm novice in PG, and if I got this one running then I'll have achieved what I wanted to do without further digging in PG. So I was thinking if there was a configuration option (or something similar) that will eliminate (or reduce) the role the background writer process.. I believe it can be one of the WAL options but I'm not sure.. Any suggestions?
Re: [HACKERS] sortsupport for text
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote: > On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark wrote: > > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas wrote: > >> 12789 28.2686 libc-2.13.so strcoll_l > >> 6802 15.0350 postgres text_cmp > > > > I'm still curious how it would compare to call strxfrm and sort the > > resulting binary blobs. I don't think the sortsupport stuff actually > > makes this any easier though. Since using it requires storing the > > binary blob somewhere I think the support would have to be baked into > > tuplesort (or hacked into the sortkey as an expr that was evaluated > > earlier somehow). > > Well, the real problem here is that the strxfrm'd representations > aren't just bigger - they are huge. On MacBook Pro, if the input > representation is n characters, the strxfrm'd representation is 9x+3 > characters. Ouch. I was holding out hope that you could get a meaningful improvement if we could use the first X bytes of the strxfrm output so you only need to do a strcoll on strings that actually nearly match. But with an information density of 9 bytes for one 1 character it doesn't seem worthwhile. That and this gem in the strxfrm manpage: RETURN VALUE The strxfrm() function returns the number of bytes required to store the transformed string in dest excluding the terminating '\0' character. If the value returned is n or more, the contents of dest are indeterminate. Which means that you have to take the entire transformed string, you can't just ask for the first bit. I think that kind of leaves the whole idea dead in the water. Just for interest I looked at the ICU API for this and they have the same restriction. There is another function which you can use to return partial sort keys (ucol_nextSortKeyPart) but it produces "uncompressed sortkeys", which it seems is what Mac OSX is doing, which seems useless for our purposes. Either this is a hard problem or we're nowhere near a target use case. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On Mon, Mar 19, 2012 at 2:49 PM, Heikki Linnakangas wrote: > On 19.03.2012 21:29, Alvaro Herrera wrote: >> >> >> Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012: >> >>> I was just going through PGfoundry and I think I will be able to work >>> on the PL/Java project.Please let me know If I can proceed further on >>> ot for GSOC and also,if currently there is any work that needs to be >>> done on PL/java. >> >> >> Hm, I'm not sure on PL/Java. As far as I know, it is a dead project. > > > I haven't followed PL/java either, but it doesn't look quite dead to me. The > last release was in September 2011, and there are active discussions on the > mailing lists. If it's not completely up-to-date with latest JDK and > PostgreSQL releases, perhaps bringing it up-to-date and then adding some > missing functionality would be a useful GSoC project. > > I would suggest that you ask on the pl/java mailing list if there is > something suitable for a GSoC project there, and if one of the pl/java > developers would be willing to mentor. pl/java works pretty well and is somewhat widely used although it might need some more active maintainers. just jumping into the project and nailing some old bugs and getting the juices flowing would be a tremendous help, as well as the less glamorous work of documentation and regular status updates. one pl/java based project that IMO is just screaming to be done is a pl/java based FDW (wrapping JDBC) that would approximately reproduce dblink...maybe with some extra features geared towards ETL type jobs like a row based callback for transformations in java. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Sun, Mar 18, 2012 at 7:35 PM, Peter Geoghegan wrote: > On 19 March 2012 01:50, Tom Lane wrote: >> I am *not* a fan of regression tests that try to microscopically test >> every feature in the system. > > I see your point of view. I suppose I can privately hold onto the test > suite, since it might prove useful again. > > I will work on a pg_regress based approach with a reasonably-sized > random subset of about 20 of my existing tests, to provide some basic > smoke testing. This may sound rather tortured, but in the main regression suite there is a .c file that links some stuff into the backend that is then accessed via CREATE FUNCTION to do some special fiddly bits. Could a creative hook be used here to avoid the repetition you are avoiding via Python? (e.g. constant resetting of pg_stat_statements or whatnot). It might sound too much like changing the system under test, but I think it would still retain most of the value. I also do like the pg_regress workflow in general, although clearly it cannot do absolutely everything. Running and interpreting the results of your tests was not hard, but it was definitely *different* which could be a headache if one-off testing frameworks proliferate. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: > There's no real reason why the tuples destined for the next run need > to be maintained in heap order; we could just store them unordered and > heapify the whole lot of them when it's time to start the next run. This sounded familiar http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
Excerpts from Joachim Wieland's message of lun mar 19 00:31:47 -0300 2012: > On Wed, Mar 14, 2012 at 2:02 PM, Robert Haas wrote: > >> I think we should somehow unify both functions, the code is not very > >> consistent in this respect, it also calls exit_horribly() when it has > >> AH available. See for example pg_backup_tar.c > > > > I think we should get rid of die_horribly(), and instead have arrange > > to always clean up AH via an on_exit_nicely hook. > > Attached is a patch that gets rid of die_horribly(). > > For the parallel case it maintains an array with as many elements as > we have worker processes. When the workers start, they enter their Pid > (or ThreadId) and their ArchiveHandle (AH). The exit handler function > in a process can then find its own ArchiveHandle by comparing the own > Pid with all the elements in the array. Sounds good to me in general ... my only gripe is this: I wonder if it would be better to have a central routine that knows about both archive_close_connection and archive_close_connection_parallel -- and the argument to the callback is a struct that contains both a pointer to the struct with the connection to be closed, a ParallelState (either of which can be null), and a flag stating which of the ParallelState/AH is in use. That way, you avoid having to reset the callbacks when you switch from AH to parallel; instead you just clear out the AH connection, set the ParallelState, and flip the switch. The general archive_close_connection checks the flag to know which routine to call. I mean, what you have probably works fine now, but it doesn't seem very extensible. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
On 3/18/12 8:11 PM, HuangQi wrote: > The implementation seems to be done quite fully. There is even a patch > file. Why is the implementation not added into the release of Postgres? As > so much has already being done, what could I do in this case for the Gsoc? That would be good for you to research. archives.postgresql.org will help you find the discussions around that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect assumptions with low LIMITs
On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote: > The problems are as I described them > > (1) no account made for sparsity, and other factors leading to an > overestimate of rows (N) > > (2) inappropriate assumption of the effect of LIMIT m, which causes a > costly SeqScan to appear better than an IndexScan for low m/N, when in > fact that is seldom the case. > > Overestimating N in (1) inverts the problem, so that an overestimate > isn't the safe thing at all. I think the actual problem has more to do with risk. The planner doesn't know how uniform the distribution of the table is, which introduces risk for the table scan. I would tend to agree that for low selectivity fraction and a very low limit (e.g. 1-3 in your example) and a large table, it doesn't seem like a good risk to use a table scan. I don't know how that should be modeled or implemented though. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout wrote: > Ouch. I was holding out hope that you could get a meaningful > improvement if we could use the first X bytes of the strxfrm output so > you only need to do a strcoll on strings that actually nearly match. > But with an information density of 9 bytes for one 1 character it > doesn't seem worthwhile. When I was playing with glibc it was 4n. I think what they do is have n bytes for the high order bits, then n bytes for low order bits like capitalization or whitespace differences. I suspect they used to use 16 bits for each and have gone to some larger size. > That and this gem in the strxfrm manpage: > > RETURN VALUE > The strxfrm() function returns the number of bytes required to > store the transformed string in dest excluding the terminating > '\0' character. If the value returned is n or more, the > contents of dest are indeterminate. > > Which means that you have to take the entire transformed string, you > can't just ask for the first bit. I think that kind of leaves the whole > idea dead in the water. I believe the intended API is that you allocate a buffer with your guess of the right size, call strxfrm and if it returns a larger number you realloc your buffer and call it again. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus wrote: > On 3/18/12 8:11 PM, HuangQi wrote: >> The implementation seems to be done quite fully. There is even a patch >> file. Why is the implementation not added into the release of Postgres? As >> so much has already being done, what could I do in this case for the Gsoc? > > That would be good for you to research. archives.postgresql.org will > help you find the discussions around that. I actually tried to find out, personally...not sure if I was searching wrongly, but searching for TABLESAMPLE did not yield a cornucopia of useful conversations at the right time in history (~2007), even when the search is given a broad date-horizon (all), so I, too, an uninformed as to the specific objections. http://www.postgresql.org/search/?m=1&q=TABLESAMPLE&l=&d=-1&s=d -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
I've applied the CTAS patch after rather heavy editorialization. Don't know what consequences that will have for Dimitri's patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
Greg Stark writes: > On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: >> There's no real reason why the tuples destined for the next run need >> to be maintained in heap order; we could just store them unordered and >> heapify the whole lot of them when it's time to start the next run. > This sounded familiar > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e Yeah, see also the pgsql-hackers thread starting here: http://archives.postgresql.org/pgsql-hackers/1999-10/msg00384.php That was a long time ago, of course, but I have some vague recollection that keeping next-run tuples in the current heap achieves a net savings in the total number of comparisons needed to heapify both runs. Robert's point about integer comparisons being faster than data comparisons may or may not be relevant. Certainly they are faster, but there are never very many run numbers in the heap at once (possibly no more than 2, I forget; and in any case often only 1). So I'd expect most tuple comparisons to end up having to do a data comparison anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
> On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus wrote: > > On 3/18/12 8:11 PM, HuangQi wrote: > >> The implementation seems to be done quite fully. There is even a patch > >> file. Why is the implementation not added into the release of Postgres? As > >> so much has already being done, what could I do in this case for the Gsoc? > > > > That would be good for you to research. archives.postgresql.org will > > help you find the discussions around that. > > I actually tried to find out, personally...not sure if I was searching > wrongly, but searching for TABLESAMPLE did not yield a cornucopia of > useful conversations at the right time in history (~2007), even when > the search is given a broad date-horizon (all), so I, too, an > uninformed as to the specific objections. > > http://www.postgresql.org/search/?m=1&q=TABLESAMPLE&l=&d=-1&s=d > I sent a mail to Nail Conway asking him about this. Hope he could give a good answer. While waiting for the response, how about the skip scan? Daniel mentioned there is still some unknown.I searched this mail thread suggesting the skip scan to TODO list. http://archives.postgresql.org/pgsql-bugs/2010-03/msg00144.phpAlso this thread talking about http://archives.postgresql.org/pgsql-hackers/2010-03/msg00328.php Not sure whether this is feasible for Gsoc Best RegardsHuang Qi Victor
Re: [HACKERS] patch for parallel pg_dump
On Tue, March 20, 2012 04:04, Joachim Wieland wrote: > On Mon, Mar 19, 2012 at 9:14 PM, Alvaro Herrera > wrote: >> Sounds good to me in general ... my only gripe is this: I wonder if it >> would be better to have a central routine that knows about both >> archive_close_connection and archive_close_connection_parallel -- and >> the argument to the callback is a struct that contains both a pointer to >> the struct with the connection to be closed [...] > > I had a similar idea before but then concluded that for it you need to > have this struct globally available so that everybody (pg_dump.c / > pg_restore.c / pg_backup_archive.c) can access it to set the > appropriate state. > > I gave it a second thought and now just defined a function that these > consumers can call, that way the variable can stay at file scope at > least. > > Also we don't need this switch, we can set the ParallelState in the > struct before any child forks off and reset it to NULL after the last > child has terminated. > > New patch attached, thanks for your comments. > [pg_dump_die_horribly.2.diff ] In my hands, the patch complains: In file included from gram.y:13255:0: scan.c: In function yy_try_NUL_trans: scan.c:16243:23: warning: unused variable yyg [-Wunused-variable] pg_backup_archiver.c:3320:1: error: static declaration of archive_close_connection follows non-static declaration pg_backup.h:170:13: note: previous declaration of archive_close_connection was here make[3]: *** [pg_backup_archiver.o] Error 1 make[2]: *** [all-pg_dump-recurse] Error 2 make[2]: *** Waiting for unfinished jobs make[1]: *** [all-bin-recurse] Error 2 make: *** [all-src-recurse] Error 2 -- make returned 2 - abort Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
On Sat, Mar 17, 2012 at 8:28 PM, Daniel Farina wrote: > This thread evolved out of an attempt to implement > pg_terminate_backend for non-superusers. I thought -- probably > erroneously -- that the major objection to that was the known > possibility of a PID-cycling race condition, whereby a signal could be > misdirected, in the case of terminate_backend, SIGTERM. So now this > fork of the thread is about fixing these unlikely races, and then > passing administration requests (such as "query cancel" or "die" ) as > out-of-band information via SIGUSR1, just like how LISTEN/NOTIFY and > conflict signals are passed. > > To prevent ambiguity, I am using a new special number -- a 'SessionId' > -- that is guaranteed unique to all backends ever created during the > uptime of a database. This number is currently implemented in a way > that is guessable (so it cannot be accepted from external sources), > but I actually think it may be even more useful for a number of other > uses if given a non-guessable form (like cancellation keys). In this > respect it would fulfill pretty much the same purposes as the notion > of a "session" seen on the web. It's after midnight here so maybe I'm being slow, but I don't understand what problem the SessionId solves. ISTM that you could solve the problem like this: 1. Acquire ProcArrayLock in exclusive mode, to keep the set of PGPROCs from changing. 2. Search for the target PGPROC by PID; when you find it, set a bit in the PGPROC indicating that you want it to cancel/die/whatever. 3. Remember the PID. 4. Release ProcArrayLock. 5. Send SIGUSR1. If the PGPROC gets recycled between 4 and 5, the target backend will find the relevant bits no longer set, and will do nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
> > [pg_dump_die_horribly.2.diff ] > > > In my hands, the patch complains: > > > In file included from gram.y:13255:0: > scan.c: In function yy_try_NUL_trans: > scan.c:16243:23: warning: unused variable yyg [-Wunused-variable] > pg_backup_archiver.c:3320:1: error: static declaration of > archive_close_connection follows > non-static declaration > pg_backup.h:170:13: note: previous declaration of archive_close_connection > was here > make[3]: *** [pg_backup_archiver.o] Error 1 > make[2]: *** [all-pg_dump-recurse] Error 2 > make[2]: *** Waiting for unfinished jobs > make[1]: *** [all-bin-recurse] Error 2 > make: *** [all-src-recurse] Error 2 > -- make returned 2 - abort > I should add: Centos 5.8, gcc 4.6.3 Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On Mon, Mar 19, 2012 at 09:49:32PM +0200, Peter Eisentraut wrote: > On m??n, 2012-03-19 at 02:35 +, Peter Geoghegan wrote: > > I see your point of view. I suppose I can privately hold onto the test > > suite, since it might prove useful again. > > I would still like to have those tests checked in, but not run by > default, in case someone wants to hack on this particular feature again. Agreed. Also, patch review becomes materially smoother when the author includes comprehensive tests. When a usage I wish to verify already appears in the submitted tests, that saves time. I respect the desire to keep regular "make check" lean, but not if it means comprehensive tests get written to be buried in the mailing list archives or never submitted at all. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect assumptions with low LIMITs
On Mon, Mar 19, 2012 at 6:19 PM, Jeff Davis wrote: > On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote: >> The problems are as I described them >> >> (1) no account made for sparsity, and other factors leading to an >> overestimate of rows (N) >> >> (2) inappropriate assumption of the effect of LIMIT m, which causes a >> costly SeqScan to appear better than an IndexScan for low m/N, when in >> fact that is seldom the case. >> >> Overestimating N in (1) inverts the problem, so that an overestimate >> isn't the safe thing at all. > > I think the actual problem has more to do with risk. The planner doesn't > know how uniform the distribution of the table is, which introduces risk > for the table scan. > > I would tend to agree that for low selectivity fraction and a very low > limit (e.g. 1-3 in your example) and a large table, it doesn't seem like > a good risk to use a table scan. I don't know how that should be modeled > or implemented though. FWIW, we have been bitten by the uniformity assumption a number of times, but we kind of know what to look for. It also has an interesting interpretation as applied to garbage (insomuch as queries all carry a conceptual "WHERE xmin...xmax... predicate). Sometimes a table is mostly garbage for a particular value, and the index scan constantly has to seek the next tuple in its search to find non-garbage. I don't know how many times the uniformity assumption has held up fine (guess: many), but cases of where an index cond has that little "Filter:" footnote has been a ticking timebomb for a number of people I've interacted with. It can be subtle because the amount of scanned data may not be enormous, so upon re-running the query it is not nearly as pathological as the original run given cache effects, the only way to make it very visible is to look at the EXPLAIN with BUFFERS option and note how many pages are being discarded by the filter/how many pages of the underlying relation and indexes are being processed only to be discarded. Were I writing a OLTP-query performance linting tool, queries with partially-matched index conditions might even be on my list of things to warn about. That is unfortunate for cases where the unmatched part of the index condition may only qualify, say, five records in all situations, but the database has no construct to know and enforce this AFAIK. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On Mon, Mar 19, 2012 at 06:41:39PM +0100, Marco Nenciarini wrote: > Attached is v5, which should address all the remaining issues. Looks clean to me. > On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote: > > If the cost doesn't exceed O(F log P), where F is the size of the FK table > > and > > P is the size of the PK table, I'm not worried. If it can be O(F^2), we > > would > > have a problem to be documented, if not fixed. > > We have rewritten the old query in a simpler way; now its cost is O(F log P). > Here F must represent the size of the "flattened" table, that is, the total > number of values that must be checked, which seems a reasonable assumption > in any case. Great; I find that approach easier to reason about. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers