Re: [HACKERS] Proposal: Create index on foreign table
(2012/03/17 2:07), David Fetter wrote: On Fri, Mar 16, 2012 at 11:58:29AM +0200, Heikki Linnakangas wrote: On 16.03.2012 10:44, Etsuro Fujita wrote: For a flat file, CREATE INDEX constructs an index in the same way as an index for a regular table. For starters, how would you keep the index up-to-date when the flat file is modified? One way is to poll the remote source for evidence of such changes during auto_vacuum or with similar daemon processes. Another is by waiting for a signal from an external source such as a NOTIFY. Which is more appropriate will again depend on circumstances. I think that's a good idea. I'd like to work on it in the future. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Create index on foreign table
(2012/03/16 22:51), Shigeru Hanada wrote: 2012/3/16 Etsuro Fujita: The index creation is supported for a flat file such as CSV and a remote table on a RDB e.g., Postgres using CREATE INDEX. IMHO CREATE INDEX for foreign tables should have general design, not specific to some kind of FDWs. OK. I'll try to design CREATE INDEX more generally, though I'll at first focus on those two. I'd like to build the index physical data file for a flat file using the index access method of regular tables (ie btree, hash, gin, gist, and spgist) based on the following transformation between the TID and the file offset to some data in the file: block_number = file_offset_to_some_data / BLCKSZ offset_number = file_offset_to_some_data % BLCKSZ Indeed these information would help searching data stored in local files. But, again, it seems too specific to file-based FDWs. I'd suggest separating basic general design and implementation by FDWs. The design you shown here seems indexed-file_fdw to me... This transformation is for a flat file. I think an FDW author who wants to build the index physical data file for an external data may choose any transformation that defines a one-to-one mapping into the TID space. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does exprCollation reject List node?
On Fri, Mar 16, 2012 at 10:17 PM, Tom Lane wrote: > Shigeru HANADA writes: >> During writing pgsql_fdw codes, I noticed that exprCollation rejects >> non-Expr nodes with error "unrecognized node type: %d". Is this >> intentional behavior, or can it return InvalidOid for unrecognized nodes >> like exprInputCollation? > > Doesn't seem to me that asking for the collation of a list is very > sensible, so I don't see a problem with that. Oh, I've used the function wrongly. It returns the collation of the result of the expression, so passing a list doesn't make any sense. The comment of expression_tree_walker clearly says that it can handle List as well, so handling List in foreign_expr_walker by calilng itself recursively for each element in the list seems necessary. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
The implementation seems to be done quite fully. There is even a patch file. Why is the implementation not added into the release of Postgres? As so much has already being done, what could I do in this case for the Gsoc? On Sun, Mar 18, 2012 at 6:38 PM, Daniel Farina wrote: > On Sat, Mar 17, 2012 at 8:48 PM, HuangQi wrote: > > About the second topic, so currently TABLESAMPLE is not implemented > > inside Postgres? I didn't see this query before, but I googled it just > now > > and the query seems very weird and > > interesting. > http://www.fotia.co.uk/fotia/DY.18.TheTableSampleClause.aspx > > Still, do you have any mail thread talking about this? > > I think there may be a few, but there's a nice implementation plan > discussed by Neil Conway and written into slides from a few years ago: > > http://www.pgcon.org/2007/schedule/attachments/9-Introduction_to_Hacking_PostgreSQL_Neil_Conway.pdf > > He also had his implementation, although at this point some of the > bitrot will be intense: > > http://www.neilconway.org/talks/hacking/ > > I also seem to remember writing this (to some degree) as a student as > part of a class project, so a full-blown production implementation in > a summer sounds reasonable, unless someone has thought more about this > and ran into some icebergs. I'm not sure exactly what the blockers > were to this being committed back in 2007 (not to suggest there > weren't any). > > I haven't thought enough about skipscan, but there a number more > unknowns there to me... > > -- > fdr > -- Best Regards Huang Qi Victor
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 01:50, Tom Lane wrote: > I am *not* a fan of regression tests that try to microscopically test > every feature in the system. I see your point of view. I suppose I can privately hold onto the test suite, since it might prove useful again. I will work on a pg_regress based approach with a reasonably-sized random subset of about 20 of my existing tests, to provide some basic smoke testing. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > On 19 March 2012 00:10, Andrew Dunstan wrote: >> Why exactly does this feature need particularly to have script-driven >> regression test generation when others don't? > It's not that it needs it, so much as that it is possible to provide > coverage for much of the code with black-box testing. In the case of > most of the hundreds of tests, I can point to a particular piece of > code that is being tested, that was written *after* the test was. Well, basically what you're saying is that you did test-driven development, which is fine. However, that does not mean that those same tests are ideal for ongoing regression testing. What we want from a regression test these days is primarily (a) portability testing, ie does the feature work on platforms other than yours?, and (b) early warning if someone breaks it down the road. In most cases, fairly coarse testing is enough to catch drive-by breakage; and when it's not enough, like as not the breakage is due to something you never thought about originally and thus never tested for, so you'd not have caught it anyway. I am *not* a fan of regression tests that try to microscopically test every feature in the system. Sure you should do that when initially developing a feature, but it serves little purpose to do it over again every time any other developer runs the regression tests for the foreseeable future. That road leads to a regression suite that's so voluminous that it takes too long to run and developers start to avoid running it, which is counterproductive. For an example in our own problem space look at mysql, whose regression tests take well over an hour to run on a fast box. So they must be damn near bug-free right? Uh, not so much, and I think the fact that developers can't easily run their test suite is not unrelated to that. So what I'd rather see is a small set of tests that are designed to do a smoke-test of functionality and then exercise any interfaces to the rest of the system that seem likely to break. Over time we might augment that, when we find particular soft spots as a result of previously undetected bugs. But sheer volume of tests is not a positive IMO. As for the scripted vs raw-SQL-in-pg_regress question, I'm making the same point as Andrew: only the pg_regress method is likely to get run nearly everywhere, which means that the scripted approach is a FAIL so far as the portability-testing aspect is concerned. Lastly, even given that we were willing to accept a scripted set of tests, I'd want to see it in perl not python. Perl is the project standard; I see no reason to expect developers to learn two different scripting languages to work on PG. (There might be a case for accepting python-scripted infrastructure for pl/python, say, but not for components that are 100% unrelated to python.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Peter Eisentraut writes: > On lör, 2012-03-17 at 18:04 -0400, Tom Lane wrote: >> I'm not sure what we should do instead. We have gotten push-back before >> anytime we changed the command tag for an existing command (and in fact >> it seems that we intentionally added the rowcount display in 9.0, which >> means there are people out there who care about that functionality). >> On the other hand, the traditional output for CREATE TABLE AS doesn't >> seem to satisfy the principle of least astonishment. A third >> consideration is that if we are pushing CREATE TABLE AS as the preferred >> spelling, people will probably complain if it omits functionality that >> SELECT INTO provides; so I'm not sure that "SELECT n" in one case and >> "CREATE TABLE AS" in the other would be a good idea either. Any >> opinions what to do here? > Another consideration is that the SQL command tags are defined by the > SQL standard. So if we were to change it, then it should be "CREATE > TABLE". I'm not convinced that it should be changed, though. (I recall > cross-checking our list against the SQL standard in the past, so there > might have been discussion on this already.) If we were going to change the output at all, I would vote for "CREATE TABLE " so as to preserve the rowcount functionality. Keep in mind though that this would force client-side changes, for instance in libpq's PQcmdTuples(). Fixing that one routine isn't so painful, but what of other client-side libraries, not to mention applications? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Dimitri Fontaine writes: > That lights a bulb: what about rewriting CREATE TABLE AS as two > commands, internally: Given the compatibility constraints on issues like what command tag to return, I think that would probably make our jobs harder not easier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 19 March 2012 00:10, Andrew Dunstan wrote: > If your tests are that voluminous then maybe they are not what we're looking > for anyway. As Tom noted: > > > IMO the objective of a regression test is not to memorialize every single > case the code author thought about during development. ISTM it would not > take very many cases to have reasonable code coverage. Fair enough. > Why exactly does this feature need particularly to have script-driven > regression test generation when others don't? It's not that it needs it, so much as that it is possible to provide coverage for much of the code with black-box testing. In the case of most of the hundreds of tests, I can point to a particular piece of code that is being tested, that was written *after* the test was. Doing this with pg_regress the old-fashioned way is going to be incredibly verbose. I'm all for doing script-generation of pg_regress tests in a well-principled way, and I'm happy to take direction from others as to what that should look like. I know that for the most part the tests provide coverage for discrete units of functionality, and so add value. If they add value, why not include them? Tests are supposed to be comprehensive. If that inconveniences you, by slowing down the buildfarm for questionable benefits, maybe it would be okay to have some tests not run automatically, even if that did make them "next door to useless" in Tom's estimation. There could be a more limited set of conventional pg_regress tests that are run automatically, plus more comprehensive tests that are run less frequently, typically only as it becomes necessary to alter pg_stat_statements to take account of those infrequent changes (typically additions) to the query tree. We have tests that ensure that header files don't contain C++ keywords, and nominally promise to not do so, and they are not run automatically. I don't see the sense in requiring that tests should be easy to run, while also aspiring to have tests that are as useful and comprehensive as possible. It seems like the code should dictate the testing infrastructure, and not the other way around. Part of the reason why I'm resistant to reducing the number of tests is that it seems to me that excluding some tests but not others would be quite arbitrary. It is not the case that some tests are clearly more useful than others (except for the fuzz testing stuff, which probably isn't all that useful). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 03/18/2012 07:46 PM, Peter Geoghegan wrote: On 18 March 2012 22:46, Andrew Dunstan wrote: If you want to generate the tests using some tool, then use whatever works for you, be it Python or Perl or Valgol, but ideally what is committed (and this what should be in your patch) will be the SQL output of that, not the generator plus input. The reason that I'd prefer to use Perl or even Python to generate pg_regress input, and then have that infrastructure committed is because it's a lot more natural and succint to deal with the problem that way. I would have imagined that a patch that repeats the same boilerplate again and again, to test almost every minor facet of normalisation would be frowned upon. However, if you prefer that, it can easily be accommodated. If your tests are that voluminous then maybe they are not what we're looking for anyway. As Tom noted: IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. Why exactly does this feature need particularly to have script-driven regression test generation when others don't? If this is a general pattern that people want to follow, then maybe we need to plan and support it rather than just add a random test generation script here and there. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 18 March 2012 22:46, Andrew Dunstan wrote: > If you want to generate the tests using some tool, then use whatever works > for you, be it Python or Perl or Valgol, but ideally what is committed (and > this what should be in your patch) will be the SQL output of that, not the > generator plus input. The reason that I'd prefer to use Perl or even Python to generate pg_regress input, and then have that infrastructure committed is because it's a lot more natural and succint to deal with the problem that way. I would have imagined that a patch that repeats the same boilerplate again and again, to test almost every minor facet of normalisation would be frowned upon. However, if you prefer that, it can easily be accommodated. The best approach might be to commit the output of the Python script as well as the python script itself, with some clean-up work. That way, no one is actually required to run the Python script themselves as part of a standard build, and so they have no basis to complain about additional dependencies. We can run the regression tests from the buildfarm without any additional infrastructure to invoke the python script to generate the pg_regress tests each time. When time comes to change the representation of the query tree, which is not going to be that frequent an event, but will occur every once in a while, the author of the relevant patch should think to add some tests to my existing set, and verify that they pass. That's going to be made a lot easier by having them edit a file that expresses the problem in terms whether two queries should be equivalent or distinct, or what a given query's final canonicalised representation should look like, all with minimal boilerplate. I'm only concerned with making the patch as easy as possible to maintain. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 03/18/2012 06:12 PM, Peter Geoghegan wrote: On 18 March 2012 16:13, Tom Lane wrote: Is there a really strong reason why adequate regression testing isn't possible in a plain-vanilla pg_regress script? A quick look at the script says that it's just doing some SQL commands and then checking the results of queries on the pg_stat_statements views. Admittedly the output would be bulkier in pg_regress, which would mean that we'd not likely want several hundred test cases. But IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. Hmm. It's difficult to have much confidence that a greatly reduced number of test cases ought to provide sufficient coverage. I don't disagree with your contention, I just don't know how to judge this matter. Given that there isn't really a maintenance burden with regression tests, I imagine that that makes it compelling to be much more inclusive. The fact that we rely on there being no concurrent queries might have to be worked around for parallel scheduled regression tests, such as by doing everything using a separate database, with that database oid always in the predicate of the query checking the pg_stat_statements view. I probably would have written the tests in Perl in the first place, but I don't know Perl. These tests existed in some form from day 1, as I followed a test-driven development methodology, and needed to use a language that I could be productive in immediately. There is probably no reason why they cannot be re-written in Perl, but spit out pg_regress tests, compacting the otherwise-verbose pg_regress input. Should I cut my teeth on Perl by writing the tests to do so? How might this be integrated with the standard regression tests, if that's something that is important? A pg_regress script doesn't require any perl. It's pure SQL. It is perfectly possible to make a single test its own group in a parallel schedule, and this is done now for a number of cases. See src/test/regress/parallel_schedule. Regression tests run in their own database set up for the purpose. You should be able to restrict the regression queries to only the current database. If you want to generate the tests using some tool, then use whatever works for you, be it Python or Perl or Valgol, but ideally what is committed (and this what should be in your patch) will be the SQL output of that, not the generator plus input. Tests built that way get automatically run by the buildfarm. Tests that don't use the standard testing framework don't. You need a *really* good reason, therefore, not to do it that way. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 18 March 2012 16:13, Tom Lane wrote: > Is there a really strong reason why adequate regression testing isn't > possible in a plain-vanilla pg_regress script? A quick look at the > script says that it's just doing some SQL commands and then checking the > results of queries on the pg_stat_statements views. Admittedly the > output would be bulkier in pg_regress, which would mean that we'd not > likely want several hundred test cases. But IMO the objective of a > regression test is not to memorialize every single case the code author > thought about during development. ISTM it would not take very many > cases to have reasonable code coverage. Hmm. It's difficult to have much confidence that a greatly reduced number of test cases ought to provide sufficient coverage. I don't disagree with your contention, I just don't know how to judge this matter. Given that there isn't really a maintenance burden with regression tests, I imagine that that makes it compelling to be much more inclusive. The fact that we rely on there being no concurrent queries might have to be worked around for parallel scheduled regression tests, such as by doing everything using a separate database, with that database oid always in the predicate of the query checking the pg_stat_statements view. I probably would have written the tests in Perl in the first place, but I don't know Perl. These tests existed in some form from day 1, as I followed a test-driven development methodology, and needed to use a language that I could be productive in immediately. There is probably no reason why they cannot be re-written in Perl, but spit out pg_regress tests, compacting the otherwise-verbose pg_regress input. Should I cut my teeth on Perl by writing the tests to do so? How might this be integrated with the standard regression tests, if that's something that is important? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On lör, 2012-03-17 at 18:04 -0400, Tom Lane wrote: > I'm not sure what we should do instead. We have gotten push-back before > anytime we changed the command tag for an existing command (and in fact > it seems that we intentionally added the rowcount display in 9.0, which > means there are people out there who care about that functionality). > On the other hand, the traditional output for CREATE TABLE AS doesn't > seem to satisfy the principle of least astonishment. A third > consideration is that if we are pushing CREATE TABLE AS as the preferred > spelling, people will probably complain if it omits functionality that > SELECT INTO provides; so I'm not sure that "SELECT n" in one case and > "CREATE TABLE AS" in the other would be a good idea either. Any > opinions what to do here? Another consideration is that the SQL command tags are defined by the SQL standard. So if we were to change it, then it should be "CREATE TABLE". I'm not convinced that it should be changed, though. (I recall cross-checking our list against the SQL standard in the past, so there might have been discussion on this already.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Tom Lane writes: > 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so > that it can enforce that the prepared query is a SELECT. (BTW, maybe > this should be weakened to "something that returns tuples", in view of > RETURNING?) That lights a bulb: what about rewriting CREATE TABLE AS as two commands, internally: 1. CREATE TABLE … 2. WITH x ( ) INSERT INTO … SELECT * FROM x; It seems like not solving much from a practical point of view though as you need to be able to parse the output of the subquery before you can do the first step, but on the other hand it might be that you already have the pieces to just do that. Well, I had to share that though, somehow. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
BTW, I've been looking through how to do what I suggested earlier to get rid of the coziness and code duplication between CreateTableAs and the prepare.c code; namely, let CreateTableAs create a DestReceiver and then call ExecuteQuery with that receiver. It appears that we still need at least two bits of added complexity with that approach: 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so that it can enforce that the prepared query is a SELECT. (BTW, maybe this should be weakened to "something that returns tuples", in view of RETURNING?) 2. Since ExecuteQuery goes through the Portal machinery, there's no way for it to pass in eflags to control the table OIDs setup. This is easily solved by adding an eflags parameter to PortalStart, which doesn't seem too awful to me, since the typical caller would just pass zero. (ExecuteQuery would also have to know about testing interpretOidsOption to compute the eflags to use, unless we add an eflags parameter to it, which might be the cleaner option.) In short I'm thinking: add an eflags parameter to PortalStart, and add both an eflags parameter and a "bool mustReturnTuples" parameter to ExecuteQuery. This definitely seems cleaner than the current duplication of code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
On Sat, Mar 17, 2012 at 05:28:11PM -0700, Daniel Farina wrote: > Noah offered me these comments: > > This patch still changes the policy for pg_terminate_backend(), and it does > > not fix other SIGINT senders like processCancelRequest() and ProcSleep(). > > ?If > > you're concerned about PID-reuse races, audit all backend signalling. > > ?Either > > fix all such problems or propose a plan to get there eventually. > > Is the postmaster signaling its children intrinsically vulnerable to > PID racing? Because it controls when it can call wait() or waitpid() > on child processes, it can unambiguously know that PIDs have not been > cycled for use. Agreed, for Unix anyway. > For this reason, a credible and entirely alternate > design might be to bounce IPC requests through the postmaster, but > since postmaster is so critical I had decided not to introduce nor > change mechanics there. Good point, but I also agree with your decision there. > The Postmaster I think keeps a private copy of cancellation keys that > are not in shared memory, if I read it properly (not 100% sure), and > uses that for cancellation requests. This has a useful property of > allowing cancellations even in event that shared memory goes insane > (and since postmaster is typically left as last sane process of the > group I thought it wise to not have it reuse a shared-memory based > approach). Yes. > > Currently, when pg_terminate_backend() follows a pg_cancel_backend() on > > which > > the target has yet to act, the eventual outcome is a terminated process. > > ?With > > this patch, the pg_terminate_backend() becomes a no-op with this warning: > > > >> ! ? ? ? ? ? ? ? ? ? ? ?ereport(WARNING, > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >> ?(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errmsg("process is busy > >> responding to administrative " > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "request")), > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errhint("This is temporary, and > >> may be retried."; > > > > That's less useful than the current behavior. > > Yes. It could be fixed with dynamic allocation (holding more > administration requests), but for just getting a flavor of what a > solution might be like. I wanted to avoid additional dynamic > allocation (which would necessitate a similar condition in the form of > much-less likely OOM), but at some point I think this error condition > is inevitable in some form. I see it as akin to EAGAIN. Right now, > administrative requests are so short (copying and clearing a handful > of words out of PGPROC) that it's unlikely that this would be a > problem in practice. I nominally agree that the new race would be rare, but not rarer than the race this patch purposes to remove. You could also fix this by having the sender wait until the target is ready to accept an admin request. For the particular case of cancel/terminate, a terminate could overwrite a cancel; a cancel can reduce to a no-op when either request is pending. I share your interest in not tying a design to the narrow needs of cancel/terminate, though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Sun, Mar 18, 2012 at 8:56 AM, Greg Stark wrote: > On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane wrote: No, it's about reducing the number of comparisons needed to maintain the heap property. >> >>> That sounds very interesting. I didn't know it was even theoretically >>> possible to do that. >> >> So has somebody found a hole in the n log n lower bound on the cost of >> comparison-based sorting? I thought that had been proven pretty >> rigorously. > > I think what he's describing is, for example, say you have a heap of > 1,000 elements and that lets you do the entire sort in a single pass > -- i.e. it generates a single sorted run after the first pass. > Increasing the heap size to 2,000 will just mean doing an extra > comparison for each tuple to sift up. And increasing the heap size > further will just do more and more work for nothing. It's still nlogn > but the constant factor is slightly higher. Of course to optimize this > you would need to be able to see the future. > > I always thought the same behaviour held for if the heap was larger > than necessary to do N merge passes. that is, making the heap larger > might generate fewer tapes but the still enough to require the same > number of passes. However trying to work out an example I'm not too > sure. If you generate fewer tapes then the heap you use to do the > merge is smaller so it might work out the same (or more likely, you > might come out ahead). Except for rounding effects, it does come out the same. Every extra layer on the tuple-heap during the initial run building causes a reduced layer on the tape-heap during the merge. So they wash. I haven't analyzed the exact rounding effects in detail, but by just instrumenting the code I found a huge tuple-heap with a two-tape merge at the end used less than one percent more comparisons, but was >40% slower, then a lower-memory sort which used a modest sized tuple-heap followed by a modest-size tape-heap merge.My conclusion is that it isn't the number of comparison that drove the difference, but the number of on-chip cache misses. Two modest sized heaps are more cache friendly than one giant heap and one tiny heap. Of course if the data is partially sorted in a way that is apparent over large ranges but not short ranges, the larger initial heap will capture that during the initial run construction while the smaller one will not. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan writes: > Is there anything that I could be doing to help bring this patch > closer to a committable state? Sorry, I've not actually looked at that patch yet. I felt I should push on Andres' CTAS patch first, since that's blocking progress on the command triggers patch. > I'm thinking of the tests in particular > - do you suppose it's acceptable to commit them more or less as-is? If they rely on having python, that's a 100% guaranteed rejection in my opinion. It's difficult enough to sell people on incremental additions of perl dependencies to the build/test process. Bringing in an entire new scripting language seems like a nonstarter. I suppose we could commit such a thing as an appendage that doesn't get run in standard builds, but then I see little point in it at all. Tests that don't get run regularly are next door to useless. Is there a really strong reason why adequate regression testing isn't possible in a plain-vanilla pg_regress script? A quick look at the script says that it's just doing some SQL commands and then checking the results of queries on the pg_stat_statements views. Admittedly the output would be bulkier in pg_regress, which would mean that we'd not likely want several hundred test cases. But IMO the objective of a regression test is not to memorialize every single case the code author thought about during development. ISTM it would not take very many cases to have reasonable code coverage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane wrote: >>> No, it's about reducing the number of comparisons needed to maintain >>> the heap property. > >> That sounds very interesting. I didn't know it was even theoretically >> possible to do that. > > So has somebody found a hole in the n log n lower bound on the cost of > comparison-based sorting? I thought that had been proven pretty > rigorously. I think what he's describing is, for example, say you have a heap of 1,000 elements and that lets you do the entire sort in a single pass -- i.e. it generates a single sorted run after the first pass. Increasing the heap size to 2,000 will just mean doing an extra comparison for each tuple to sift up. And increasing the heap size further will just do more and more work for nothing. It's still nlogn but the constant factor is slightly higher. Of course to optimize this you would need to be able to see the future. I always thought the same behaviour held for if the heap was larger than necessary to do N merge passes. that is, making the heap larger might generate fewer tapes but the still enough to require the same number of passes. However trying to work out an example I'm not too sure. If you generate fewer tapes then the heap you use to do the merge is smaller so it might work out the same (or more likely, you might come out ahead). -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
Jeremy Harris writes: > On 2012-03-18 15:25, Tom Lane wrote: >> Yeah, that was me, and it came out of actual user complaints ten or more >> years back. (It's actually not 2X growth but more like 4X growth >> according to the comments in logtape.c, though I no longer remember the >> exact reasons why.) We knew when we put in the logtape logic that we >> were trading off speed for space, and we accepted that. > How about a runtime check of disk-free? Not very workable, the foremost reason why not being that it assumes that the current sort is the only thing consuming disk space. I'm not sure there is any portable method of finding out how much disk space is available, anyway. (Think user quotas and such before supposing you know how to do that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recent MinGW postgres builds with -O2 do not pass regression tests
On 03/18/2012 11:12 AM, Yeb Havinga wrote: When building a minimal PostgreSQL under the latest mingw (2018), make check will give a few dozen fails with the server exiting on code 2. The build is fine when -O2 is removed from the CFLAGS. This behaviour is present on all revs on the REL9_1_STABLE branch that I tested, among which were 9.1.3 and 9.1.0. That makes it look like a mingw bug to me. Have you tried with the latest 32 bit compiler from mingw-w64, which appears to be 20111219? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janes writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't know how often people run their databases so close to the edge on disk space that this matters, but someone felt that that extra storage was worth avoiding. Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact reasons why.) We knew when we put in the logtape logic that we were trading off speed for space, and we accepted that. How about a runtime check of disk-free? -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
Jeff Janes writes: > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes wrote: >>> Anyway, I think the logtape could use redoing. > The problem there is that none of the files can be deleted until it > was entirely read, so you end up with all the data on disk twice. I > don't know how often people run their databases so close to the edge > on disk space that this matters, but someone felt that that extra > storage was worth avoiding. Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact reasons why.) We knew when we put in the logtape logic that we were trading off speed for space, and we accepted that. It's possible that with the growth of hard drive sizes, real-world applications would no longer care that much about whether the space required to sort is 4X data size rather than 1X. Or then again, maybe their data has grown just as fast and they still care. > As a desirable side effect, I think it would mean > that we could dispense with retail palloc and pfree altogether. We > could just allocate a big chunk of memory, copy tuples into it until > it's full, using a pointer to keep track of the next unused byte, and > then, after writing the run, reset the allocation pointer back to the > beginning of the buffer. That would not only avoid the cost of going > through palloc/pfree, but also the memory overhead imposed by > bookkeeping and power-of-two rounding. That would be worthwhile, probably. The power-of-2 rounding in palloc is not nice at all for tuplesort's purposes. We've occasionally talked about inventing a different memory context type that is less willing to waste space ... but on the other hand, such an allocator would run slower, so you're right back to making a speed for space tradeoff. If the tuples could be deallocated in bulk then that catch-22 goes away. >> No, it's about reducing the number of comparisons needed to maintain >> the heap property. > That sounds very interesting. I didn't know it was even theoretically > possible to do that. So has somebody found a hole in the n log n lower bound on the cost of comparison-based sorting? I thought that had been proven pretty rigorously. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Recent MinGW postgres builds with -O2 do not pass regression tests
When building a minimal PostgreSQL under the latest mingw (2018), make check will give a few dozen fails with the server exiting on code 2. The build is fine when -O2 is removed from the CFLAGS. This behaviour is present on all revs on the REL9_1_STABLE branch that I tested, among which were 9.1.3 and 9.1.0. $ gcc -v Using built-in specs. COLLECT_GCC=C:\MinGW\bin\gcc.exe COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/4.6.1/lto-wrapper.exe Target: mingw32 Configured with: ../gcc-4.6.1/configure --enable-languages=c,c++,fortran,objc,ob j-c++ --disable-sjlj-exceptions --with-dwarf2 --enable-shared --enable-libgomp - -disable-win32-registry --enable-libstdcxx-debug --enable-version-specific-runti me-libs --build=mingw32 --prefix=/mingw Thread model: win32 gcc version 4.6.1 (GCC) regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
Greg Stark writes: > I'm still curious how it would compare to call strxfrm and sort the > resulting binary blobs. In principle that should be a win; it's hard to believe that strxfrm would have gotten into the standards if it were not a win for sorting applications. > I don't think the sortsupport stuff actually > makes this any easier though. Since using it requires storing the > binary blob somewhere I think the support would have to be baked into > tuplesort (or hacked into the sortkey as an expr that was evaluated > earlier somehow). Well, obviously something has to be done, but I think it might be possible to express this as another sortsupport API function rather than doing anything as ugly as hardwiring strxfrm into the callers. However, it occurred to me that we could pretty easily jury-rig something that would give us an idea about the actual benefit available here. To wit: make a C function that wraps strxfrm, basically strxfrm(text) returns bytea. Then compare the performance of ORDER BY text_col to ORDER BY strxfrm(text_col). (You would need to have either both or neither of text and bytea using the sortsupport code paths for this to be a fair comparison.) One other thing I've always wondered about in this connection is the general performance of sorting toasted datums. Is it better to detoast them in every comparison, or pre-detoast to save comparison cycles at the cost of having to push much more data around? I didn't see any discussion of this point in Robert's benchmarks, but I don't think we should go very far towards enabling sortsupport for text until we understand the issue and know whether we need to add more infrastructure for it. If you cross your eyes a little bit, this is very much like the strxfrm question... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_prewarm
> Would be nice to sort out the features of the two Postgres extentions > pgfincore (https://github.com/klando/pgfincore ) and pg_prewarm: what > do they have in common, what is complementary? pg_prewarm use postgresql functions (buffer manager) to warm data (different kind of 'warm', see pg_prewarm code). Relations are warmed block by block, for a range of block. pgfincore does not use the postgresql buffer manager, it uses the posix calls. It can proceed per block or full relation. Both need POSIX_FADVISE compatible system to be efficient. The main difference between pgfincore and pg_prewarm about full relation warm is that pgfincore will make very few system calls when pg_prewarm will do much more. The current implementation of pgfincore allows to make a snapshot and restore via pgfincore or via pg_prewarm (just need some SQL-fu for the later). > > I would be happy to test both. But when reading the current > documentation I'm missing installation requirements (PG version, > replication? memory/hardware requirements), specifics of Linux (and > Windows if supported), and some config. hints (e.g. > relationships/dependencies of OS cache and PG cache an > postgresql.conf). pgfincore works with all postgresql stable releases. Probably idem for pg_prewarm. in both case, make && make install, then some SQL file to load for <=9.0. With 9.1, once you've build and install, just CREATE EXTENSION pg_fincore; (probably the same with pg_prewarm) > > -Stefan > > 2012/3/11 Cédric Villemain : > > Le vendredi 9 mars 2012 16:50:05, Robert Haas a écrit : > >> On Fri, Mar 9, 2012 at 10:33 AM, Dimitri Fontaine > >> > >> wrote: > >> > So that's complementary with pgfincore, ok. I still wish we could > >> > maintain the RAM content HOT on the standby in the same way we are > >> > able to maintain its data set on disk, though. > >> > >> That's an interesting idea. It seems tricky, though. > > > > it is the purpose of the latest pgfincore version. > > I use a varbit as output of introspection on master, then you are able to > > store in a table, stream to slaves, then replay localy. > > > > -- > > Cédric Villemain +33 (0)6 20 30 22 52 > > http://2ndQuadrant.fr/ > > PostgreSQL: Support 24x7 - Développement, Expertise et Formation > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
On Sun, Mar 18, 2012 at 01:28, Daniel Farina wrote: > Noah offered me these comments: >> This patch still changes the policy for pg_terminate_backend(), and it does >> not fix other SIGINT senders like processCancelRequest() and ProcSleep(). If >> you're concerned about PID-reuse races, audit all backend signalling. Either >> fix all such problems or propose a plan to get there eventually. > > Is the postmaster signaling its children intrinsically vulnerable to > PID racing? Because it controls when it can call wait() or waitpid() > on child processes, it can unambiguously know that PIDs have not been > cycled for use. For this reason, a credible and entirely alternate As a note for future work, I don't think this assumption holds on win32. We have a background thread there that picks up "child dead" events, and posts those on an asynchronous queue (see pgwin32_deadchild_callback). And even if we didn't, I'm not sure the *process id* is "blocked" until you wait on in. There is no "zombie state" for processes on win32 - it dies, and the process handle becomes signaled (note that this is also the process *handle*, and not the process id. There may be multiple handles opened to the same process, but the process itself goes away as soon as they are switched to signalled mode, even if nobody was paying attention). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
On Sat, Mar 17, 2012 at 8:48 PM, HuangQi wrote: > About the second topic, so currently TABLESAMPLE is not implemented > inside Postgres? I didn't see this query before, but I googled it just now > and the query seems very weird and > interesting. http://www.fotia.co.uk/fotia/DY.18.TheTableSampleClause.aspx > Still, do you have any mail thread talking about this? I think there may be a few, but there's a nice implementation plan discussed by Neil Conway and written into slides from a few years ago: http://www.pgcon.org/2007/schedule/attachments/9-Introduction_to_Hacking_PostgreSQL_Neil_Conway.pdf He also had his implementation, although at this point some of the bitrot will be intense: http://www.neilconway.org/talks/hacking/ I also seem to remember writing this (to some degree) as a student as part of a class project, so a full-blown production implementation in a summer sounds reasonable, unless someone has thought more about this and ran into some icebergs. I'm not sure exactly what the blockers were to this being committed back in 2007 (not to suggest there weren't any). I haven't thought enough about skipscan, but there a number more unknowns there to me... -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers