Re: [HACKERS] Don't allow relative path for copy from file
From: Tom Lane [mailto:t...@sss.pgh.pa.us] Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: As described in the reference manual for COPY, we should to check file's path format not to allow relative path. Please find attached a patch. The argument for disallowing writing to a relative path is to make it harder to accidentally overwrite a database file. That argument does not apply to COPY IN, so I'm not convinced we should impose an additional restriction. It's not out of the question that this would break real-world use-cases --- imagine someone whose workflow involves copying data files across a network to a directory accessible to the server (and quite possibly specified by a relative path) and then doing COPY IN. In any case, this patch is missing documentation updates, specifically the paragraph in the COPY reference page that it falsifies. Agreed. I'd like to withdraw the patch sent in the earlier post, and propose to update the documentation in the COPY reference page. Please find attached a patch. Thanks, Best regards, Etsuro Fujita copy_ref_page.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature request: auto savepoint for interactive psql when in transaction.
On Mon, Nov 14, 2011 at 3:05 PM, Ross Reedstrom reeds...@rice.edu wrote: On Mon, Nov 14, 2011 at 02:45:04PM -0800, Will Leinweber wrote: My coworker Dan suggested that some people copy and paste scripts. However I feel that that is an orthogonal problem and if there is a very high rate of input psql should detect that and turn interactive off. And I still strongly feel that on_error_rollback=interactive should be the default. Hmm, I think that falls under the don't so that, then usecase. I've been known to cp the occasional script - I guess the concern here would be not seeing failed steps that scrolled off the terminal. (I set my scrollback to basically infinity and actaully use it, but then I'm strange that way :-) ) I do this and have done this all the time. Because emacs. On the other hand, I only really do it in line-buffered modes. I also feel there is something of a development/production parity that is broken by this, but then again, so are backslash commands interpreted by psql, and that has never proven to be a practical problem. That I know of. I wouldn't let my particular use case (M-x shell) get in the way of changing the default if that was the consensus because I think this would help a lot more people than hurt. In the discoverability department, one can not-hack the server error message by having psql emit its own hint when it receives an error while in a transaction block (vs auto-commit). This is knowable via the ReadyForQuery message, which can tell you idle in transaction. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UNION ALL with WHERE clause does not use Merge Append
Hi list, I have a table with (start_time, end_time) and I'd want to tally up the number of concurrent connections at any point in time. The Merge Append plan node introduced in 9.1 would be perfect for this purpose. It seems to work out fine for the most trivial case, but just when I add an WHERE clause to any of the UNION subqueries, this stops working. Tested on 9.1.4 and 9.2beta3 Here's a simplified test case: create table foo as select generate_series(1,100) i; create index on foo(i); vacuum analyze foo; This works as expected: EXPLAIN ANALYZE select i from foo UNION ALL select i from foo ORDER BY 1 LIMIT 100; Limit (cost=0.01..3.31 rows=100 width=4) (actual time=0.028..0.078 rows=100 loops=1) - Result (cost=0.01..65981.61 rows=200 width=4) (actual time=0.026..0.064 rows=100 loops=1) - Merge Append (cost=0.01..65981.61 rows=200 width=4) (actual time=0.026..0.053 rows=100 loops=1) Sort Key: public.foo.i - Index Only Scan using foo_i_idx on foo (cost=0.00..20490.80 rows=100 width=4) (actual time=0.017..0.021 rows=51 loops=1) Heap Fetches: 0 - Index Only Scan using foo_i_idx on foo (cost=0.00..20490.80 rows=100 width=4) (actual time=0.007..0.012 rows=50 loops=1) Heap Fetches: 0 Total runtime: 0.106 ms But once I add even a basic WHERE clause, suddenly it decides that sorting is the only way: EXPLAIN ANALYZE select i from foo where i is not null UNION ALL select i from foo where i is not null ORDER BY 1 LIMIT 100; Limit (cost=127250.56..127250.81 rows=100 width=4) (actual time=1070.799..1070.812 rows=100 loops=1) - Sort (cost=127250.56..132250.56 rows=200 width=4) (actual time=1070.798..1070.804 rows=100 loops=1) Sort Key: public.foo.i Sort Method: top-N heapsort Memory: 29kB - Result (cost=0.00..50812.00 rows=200 width=4) (actual time=0.009..786.806 rows=200 loops=1) - Append (cost=0.00..50812.00 rows=200 width=4) (actual time=0.007..512.201 rows=200 loops=1) - Seq Scan on foo (cost=0.00..15406.00 rows=100 width=4) (actual time=0.007..144.872 rows=100 loops=1) Filter: (i IS NOT NULL) - Seq Scan on foo (cost=0.00..15406.00 rows=100 width=4) (actual time=0.003..139.196 rows=100 loops=1) Filter: (i IS NOT NULL) Total runtime: 1070.847 ms Works again when I stuff it in a subquery and put WHERE above the UNION. But this loses flexibility -- I can't filter the subqueries with different clauses any more: EXPLAIN ANALYZE select * from ( select i from foo UNION ALL select i from foo ) subq where i is not null ORDER BY 1 LIMIT 100; Limit (cost=0.01..3.56 rows=100 width=4) (actual time=0.033..0.088 rows=100 loops=1) - Result (cost=0.01..70981.61 rows=200 width=4) (actual time=0.032..0.071 rows=100 loops=1) - Merge Append (cost=0.01..70981.61 rows=200 width=4) (actual time=0.031..0.059 rows=100 loops=1) Sort Key: public.foo.i - Index Only Scan using foo_i_idx on foo (cost=0.00..22990.80 rows=100 width=4) (actual time=0.020..0.025 rows=51 loops=1) Index Cond: (i IS NOT NULL) Heap Fetches: 0 - Index Only Scan using foo_i_idx on foo (cost=0.00..22990.80 rows=100 width=4) (actual time=0.010..0.014 rows=50 loops=1) Index Cond: (i IS NOT NULL) Heap Fetches: 0 Total runtime: 0.115 ms Is this just a planner shortcoming or a bug? Or is there some justification for this behavior? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: allow to get a number of processed rows by COPY statement
Hello here is updated patch postgres=# copy omega to '/tmp/xxx'; COPY 60 postgres=# do $$ declare r int; begin copy omega from '/tmp/xxx'; get diagnostics r = row_count; raise notice ' %', r; end; $$ language plpgsql; NOTICE: 60 DO Regards Pavel 2012/8/16 Bruce Momjian br...@momjian.us: What ever happened to this patch? I don't see it on any of the commit-fests, though someone was asked for it to be added: http://archives.postgresql.org/pgsql-hackers/2011-10/msg00381.php --- On Tue, Oct 4, 2011 at 12:22:19PM +0200, Pavel Stehule wrote: Hello There is not possible to get a number of processed rows when COPY is evaluated via SPI. Client can use a tag, but SPI doesn't use a tag. I propose a small change a ProcessUtility to return a processed rows. Note: I found a small inconsistency between SPI and Utility interface. SPI still use a 4 byte unsign int for storing a number of processed rows. Utility use a 8bytes unsign int. Motivation: postgres=# \sf fx CREATE OR REPLACE FUNCTION public.fx(tablename text, filename text) RETURNS integer LANGUAGE plpgsql AS $function$ declare r int; begin execute format('COPY %s FROM %s', quote_ident(tablename), quote_literal(filename)); get diagnostics r = row_count; return r; end; $function$ Regards Pavel Stehule diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 398bc40..a7c2b8f 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -600,6 +600,7 @@ postquel_getnext(execution_state *es, SQLFunctionCachePtr fcache) es-qd-params, false, /* not top level */ es-qd-dest, +NULL, NULL); result = true; /* never stops early */ } diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 688279c..21cabcc 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -1838,6 +1838,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, { Node *stmt = (Node *) lfirst(lc2); boolcanSetTag; + boolisCopyStmt = false; DestReceiver *dest; _SPI_current-processed = 0; @@ -1857,6 +1858,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, { CopyStmt *cstmt = (CopyStmt *) stmt; + isCopyStmt = true; if (cstmt-filename == NULL) { my_res = SPI_ERROR_COPY; @@ -1911,16 +1913,23 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, } else { + uint32 processed; + ProcessUtility(stmt, plansource-query_string, paramLI, false, /* not top level */ dest, -NULL); +NULL, +processed); /* Update processed if stmt returned tuples */ + if (_SPI_current-tuptable) _SPI_current-processed = _SPI_current-tuptable-alloced - _SPI_current-tuptable-free; + else if (canSetTag isCopyStmt) + _SPI_current-processed = processed; + res = SPI_OK_UTILITY; } diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c index 466727b..1a861ee 100644 --- a/src/backend/tcop/pquery.c +++ b/src/backend/tcop/pquery.c @@ -1184,7 +1184,8 @@ PortalRunUtility(Portal portal, Node *utilityStmt, bool isTopLevel, portal-portalParams, isTopLevel, dest, -completionTag); +completionTag, +NULL); /* Some utility statements may change context on us */ MemoryContextSwitchTo(PortalGetHeapMemory(portal)); diff
Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
On 09.08.2012 18:42, Alexander Korotkov wrote: In this revision of patch I tried to handle conditions more generally using variables minLower, maxLower, minUpper, maxUpper, inclusive and strictEmpty. However some strategies still contain additional logic. Thanks, that clarified the code tremendously. The comments I added about the geometrical interpretations of the operations earlier seem unnecessary now, so removed those. What is our conclusion about saving previous choice for RANGESTRAT_ADJACENT strategy? I think we're going to do what you did in the patch. A more generic mechanism for holding private state across consistent calls would be nice, but it's not that ugly the way you wrote it. I committed the patch now, but left out the support for adjacent for now. Not because there was necessarily anything wrong with that, but because I have limited time for reviewing, and the rest of the patch looks ready for commit now. I reworded the comments quite a lot, you might want to proofread those to double-check that they're still correct. I'll take a look at the adjacent-support next, as a separate patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statistics and selectivity estimation for ranges
On 15.08.2012 11:34, Alexander Korotkov wrote: On Wed, Aug 15, 2012 at 12:14 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Histogram of upper bounds would be both more accurate and natural for some operators. However, it requires collecting additional statistics while AFAICS it doesn't liberate us from having histogram of range lengths. Hmm, if we collected a histogram of lower bounds and a histogram of upper bounds, that would be roughly the same amount of data as for the standard histogram with both bounds in the same histogram. Ok, we've to decide if we need standard histogram. In some cases it can be used for more accurate estimation of and operators. But I think it is not so important. So, we can replace standard histogram with histograms of lower and upper bounds? Yeah, I think that makes more sense. The lower bound histogram is still useful for and operators, just not as accurate if there are lots of values with the same lower bound but different upper bound. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statistics and selectivity estimation for ranges
On 15.08.2012 11:34, Alexander Korotkov wrote: On Wed, Aug 15, 2012 at 12:14 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Histogram of upper bounds would be both more accurate and natural for some operators. However, it requires collecting additional statistics while AFAICS it doesn't liberate us from having histogram of range lengths. Hmm, if we collected a histogram of lower bounds and a histogram of upper bounds, that would be roughly the same amount of data as for the standard histogram with both bounds in the same histogram. Ok, we've to decide if we need standard histogram. In some cases it can be used for more accurate estimation of and operators. But I think it is not so important. So, we can replace standard histogram with histograms of lower and upper bounds? Yeah, I think that makes more sense. The lower bound histogram is still useful for and operators, just not as accurate if there are lots of values with the same lower bound but different upper bound. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
Excerpts from Kevin Grittner's message of jue ago 02 10:48:02 -0400 2012: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Don't forget the peculiarities of columns with record types. I forgot to include the type creation in the example: test=# create type a as (a1 int, a2 int); CREATE TYPE Thanks for the example. After playing with this, I think that a NOT NULL constraint attached to a column with a composite type is equivalent to a CHECK (col IS DISTINCT FROM NULL); at least they seem to behave identically. Is that what you would expect? This seems a bit complicated to handle with the way I'm doing things today; at parse analysis time, when my current code is creating the check constraint, we don't know anything about the type of the column IIRC. Maybe I will have to delay creating the constraint until execution. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL with WHERE clause does not use Merge Append
Marti Raudsepp ma...@juffo.org writes: Is this just a planner shortcoming or a bug? Or is there some justification for this behavior? Per the comment in is_safe_append_member(): * It's only safe to pull up the child if its jointree contains exactly * one RTE, else the AppendRelInfo data structure breaks. The one base RTE * could be buried in several levels of FromExpr, however. * * Also, the child can't have any WHERE quals because there's no place to * put them in an appendrel. (This is a bit annoying...) This will probably get fixed someday, but I wouldn't recommend holding your breath for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Clean up the #include mess a little.
Excerpts from Bruce Momjian's message of mié ago 15 18:30:40 -0400 2012: On Wed, Sep 7, 2011 at 01:20:17AM +0300, Ants Aasma wrote: On Tue, Sep 6, 2011 at 10:18 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: I wonder what happens if files in the same subdir are grouped in a subgraph. Is that possible? Possible, and done. Also added possivility to add .c files to the graph, coloring by subdir and possibility exclude nodes from the graph. I didn't yet bother to clean up the code - to avoid eye damage, don't look at the source. Bad news is that it doesn't significantly help readability for the all nodes case. See all_with_subgraphs.svgz. It does help for other cases. For example parsenodes.h.svgz has the result for render_includes.py --select='nodes/parsenodes.h+*-*' --subgraphs and execnodes.h.svgz for --subgraphs --select='nodes/execnodes.h+*-*' Should we add this script and instructions to src/tools/pginclude? Probably not, but maybe the developer FAQ in the wiki? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
I have used your notes below to rewrite the Window function SQL manual section. As you said, it was very hard to read. I now understand it better, having restructured it, and I hope others do too. After waiting 30 minutes for our developer doc build to refresh, I am giving up and posting my own URL for the doc changes: http://momjian.us/tmp/pgsql/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS Perhaps I need to go back to having my own doc build. --- On Mon, Oct 17, 2011 at 11:48:38AM +0200, Florian Pflug wrote: On Oct17, 2011, at 01:09 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: ... reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. Hmm. Maybe the use of a footnote there is too subtle, and we should instead have that text in-line (probably in parentheses)? Or we could use a note, but that's probably too much emphasis. Inline and in parentheses sounds fine. In addition, I think we should reword the explanation in 4.2.8 (The SQL Language / SQL Syntax / Value Expressions / Window Functions). Instead of that rather long (and IMHO hard to read) paragraph about possible frame clauses and their behaviour in the presence or absence of an ORDER BY clause, we should go with a more algorithmic explanation I think. Something along these lines maybe: -- .) PARTITION BY splits the rows into disjoint partitions. All further processing happens only inside a single partition .) In RANGE mode, ORDER BY then splits each partition into an ordered list of sub-partitions, each containing rows which the ORDER BY considers to be equivalent. .) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if there are rows which are considered to be equivalent by the ORDER BY, the ordering of the sub-partition isn't fully determined. .) Each row's frame then consists of some consecutive range of sub-partitions. .) In RANGE mode, that consecutive range can only start at either the first sub-partition or the current row's sub-partition, and can only end at either the current row's sub-partition or the last sub-partitions. .) In ROWS mode, the consecutive range may additional start n sub-partitions (or rows, it's the same thing here) before the current row, and may additionally end m sub-partitions/rows after the current row. From that, it follows that even with an underspecified sort order, the contents of each frame are still fully determined in RANGE mode. The ordering of rows within a frame is not determined, though. So overall, in RANGE mode, a query's result is only non-deterministic if the window function is sensitive to the ordering of rows within a frame. In ROWS mode, OTOH, the contents each frame themselves are not fully determined, so even an ordering agnostic window function may produce non-deterministic results. -- If you think that something along these lines would be an improvement, I can try to come up with a patch. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts
On Mon, Oct 17, 2011 at 08:38:18PM +0100, Simon Riggs wrote: On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax as a TransactionId without verifying whether it is a multixact or not. Since they advance separately, this could lead to bogus answers. This probably needs to be fixed. I didn't look into past releases to see if there's a live released bug here or not. I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is set. Additionally I think it should check HEAP_XMAX_INVALID before reading the Xmax at all. If it's failing to even check XMAX_INVALID, surely it's completely broken? Perhaps it assumes its caller has checked all this? HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens when HEAP_XMAX_IS_MULTI is not set. I'll add an assert to check this and a comment to explain. Was this completed? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: In docs, change a few cases of not important to unimportant.
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 16.08.2012 17:49, Bruce Momjian wrote: Heikki Linnakangas wrote: On 16.08.2012 17:36, Bruce Momjian wrote: In docs, change a few cases of not important to unimportant. FWIW, I don't think these changes were an improvement. I find not important more readable in those sentences. +1 Really? All of them? Yes.. Perhaps not important just feels better to me in general than unimportant. Yeah, it's probably subjective, but for me unimportant has negative, dismissive overtones, while not important does not. Sort of like the way unpleasant has a different meaning than not pleasant. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts
On Thu, Aug 16, 2012 at 11:38:14AM -0400, Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of jue ago 16 11:24:55 -0400 2012: On Mon, Oct 17, 2011 at 08:38:18PM +0100, Simon Riggs wrote: On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax as a TransactionId without verifying whether it is a multixact or not. Since they advance separately, this could lead to bogus answers. This probably needs to be fixed. I didn't look into past releases to see if there's a live released bug here or not. I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is set. Additionally I think it should check HEAP_XMAX_INVALID before reading the Xmax at all. If it's failing to even check XMAX_INVALID, surely it's completely broken? Perhaps it assumes its caller has checked all this? HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens when HEAP_XMAX_IS_MULTI is not set. I'll add an assert to check this and a comment to explain. Was this completed? As far as I recall, there are changes related to this in my fklocks patch. I am hoping to have some review happen on it during the upcoming commitfest (which presumably means I need to do a merge to newer sources.) I was asking about adding the assert check --- does that need to wait too? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The pgrminclude problem
I found a tool that Google authored that attempts to solve the same problems as pgrminclude does in our codebase. It's called include what you use, and is based on Clang. The project is hosted here: http://code.google.com/p/include-what-you-use/ I'm not suggesting that we should start using this tool instead of pgrminclude, because it has enough caveats of its own, and is mostly written with Google's C++ codebase in mind. However, it's worth being aware of. There is a useful analysis of the general problem in the README - Why IWYU is Difficult (you can probably just skip the extensive analysis of C++ templates as they relate to the problem there). The tool is authored by Craig Silverstein, Google's director of technology. If he believes that IWYU is a difficult problem, well, it probably is. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The pgrminclude problem
On Thu, Aug 16, 2012 at 04:48:34PM +0100, Peter Geoghegan wrote: I found a tool that Google authored that attempts to solve the same problems as pgrminclude does in our codebase. It's called include what you use, and is based on Clang. The project is hosted here: http://code.google.com/p/include-what-you-use/ I'm not suggesting that we should start using this tool instead of pgrminclude, because it has enough caveats of its own, and is mostly written with Google's C++ codebase in mind. However, it's worth being aware of. There is a useful analysis of the general problem in the README - Why IWYU is Difficult (you can probably just skip the extensive analysis of C++ templates as they relate to the problem there). The tool is authored by Craig Silverstein, Google's director of technology. If he believes that IWYU is a difficult problem, well, it probably is. Good to know. We only use pgrminclude very five years or so, and Tom isn't even keen on that. I added the URL to our src/tools/pginclude/README. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts
Excerpts from Bruce Momjian's message of jue ago 16 11:44:48 -0400 2012: On Thu, Aug 16, 2012 at 11:38:14AM -0400, Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of jue ago 16 11:24:55 -0400 2012: On Mon, Oct 17, 2011 at 08:38:18PM +0100, Simon Riggs wrote: On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax as a TransactionId without verifying whether it is a multixact or not. Since they advance separately, this could lead to bogus answers. This probably needs to be fixed. I didn't look into past releases to see if there's a live released bug here or not. I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is set. Additionally I think it should check HEAP_XMAX_INVALID before reading the Xmax at all. If it's failing to even check XMAX_INVALID, surely it's completely broken? Perhaps it assumes its caller has checked all this? HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens when HEAP_XMAX_IS_MULTI is not set. I'll add an assert to check this and a comment to explain. Was this completed? As far as I recall, there are changes related to this in my fklocks patch. I am hoping to have some review happen on it during the upcoming commitfest (which presumably means I need to do a merge to newer sources.) I was asking about adding the assert check --- does that need to wait too? I don't think it's worth fussing about. Also I don't need any more merge conflicts than I already have. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cataloguing NOT NULL constraints
Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think that a NOT NULL constraint attached to a column with a composite type is equivalent to a CHECK (col IS DISTINCT FROM NULL); at least they seem to behave identically. Is that what you would expect? I had not thought about that, but now that you point it out I think that interpretation makes more sense than any other. In a quick test they behaved identically for me. This seems a bit complicated to handle with the way I'm doing things today; at parse analysis time, when my current code is creating the check constraint, we don't know anything about the type of the column IIRC. Maybe I will have to delay creating the constraint until execution. Why? CHECK (col IS DISTINCT FROM NULL) works correctly for *any* type, doesn't it? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: In docs, change a few cases of not important to unimportant.
On Thu, Aug 16, 2012 at 10:38:09AM -0500, Kevin Grittner wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 16.08.2012 17:49, Bruce Momjian wrote: Heikki Linnakangas wrote: On 16.08.2012 17:36, Bruce Momjian wrote: In docs, change a few cases of not important to unimportant. FWIW, I don't think these changes were an improvement. I find not important more readable in those sentences. +1 Really? All of them? Yes.. Perhaps not important just feels better to me in general than unimportant. Yeah, it's probably subjective, but for me unimportant has negative, dismissive overtones, while not important does not. Sort of like the way unpleasant has a different meaning than not pleasant. OK, thanks for the feedback. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The pgrminclude problem
On 16 August 2012 16:56, Bruce Momjian br...@momjian.us wrote: Good to know. We only use pgrminclude very five years or so, and Tom isn't even keen on that. Yeah. Even if this could be made to work well, we'd still have to do something like get an absolute consensus from all build farm animals, if we expected to have an absolutely trustworthy list. I don't think pgrminclude is a bad idea. I just think that it should only be used to guide the efforts of a human to remove superfluous #includes, which is how it is used anyway. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts
Excerpts from Bruce Momjian's message of jue ago 16 11:24:55 -0400 2012: On Mon, Oct 17, 2011 at 08:38:18PM +0100, Simon Riggs wrote: On Mon, Oct 17, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed that HeapTupleHeaderAdvanceLatestRemovedXid is comparing Xmax as a TransactionId without verifying whether it is a multixact or not. Since they advance separately, this could lead to bogus answers. This probably needs to be fixed. I didn't look into past releases to see if there's a live released bug here or not. I think the fix is simply to ignore the Xmax if the HEAP_XMAX_IS_MULTI bit is set. Additionally I think it should check HEAP_XMAX_INVALID before reading the Xmax at all. If it's failing to even check XMAX_INVALID, surely it's completely broken? Perhaps it assumes its caller has checked all this? HeapTupleHeaderAdvanceLatestRemovedXid() is only ever called when HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD, which only happens when HEAP_XMAX_IS_MULTI is not set. I'll add an assert to check this and a comment to explain. Was this completed? As far as I recall, there are changes related to this in my fklocks patch. I am hoping to have some review happen on it during the upcoming commitfest (which presumably means I need to do a merge to newer sources.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS
2012/8/15 David E. Wheeler da...@justatheory.com On Aug 15, 2012, at 11:31 AM, Fabrízio de Royes Mello wrote: Is there any reason not to add $subject? Would it be difficult? Looking to the source code I think this feature isn't hard to implement... I'm writing a little path to do that and I'll send soon... Cool, thanks! The attached patch implement this feature: CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ AUTHORIZATION user_name ] [ schema_element [ ... ] ] CREATE SCHEMA [ IF NOT EXISTS ] AUTHORIZATION user_name [ schema_element [ ... ] ] Now, PostgreSQL don't trow an error if we use IF NOT EXISTS in CREATE SCHEMA statement. So, I don't know the next steps... Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello create_schema_if_not_exists.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS
On 08/16/2012 01:36 PM, Fabrízio de Royes Mello wrote: 2012/8/15 David E. Wheeler da...@justatheory.com mailto:da...@justatheory.com On Aug 15, 2012, at 11:31 AM, Fabrízio de Royes Mello wrote: Is there any reason not to add $subject? Would it be difficult? Looking to the source code I think this feature isn't hard to implement... I'm writing a little path to do that and I'll send soon... Cool, thanks! The attached patch implement this feature: CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ AUTHORIZATION user_name ] [ schema_element [ ... ] ] CREATE SCHEMA [ IF NOT EXISTS ] AUTHORIZATION user_name [ schema_element [ ... ] ] Now, PostgreSQL don't trow an error if we use IF NOT EXISTS in CREATE SCHEMA statement. So, I don't know the next steps... Please see http://wiki.postgresql.org/wiki/Developer_FAQ#I_have_developed_a_patch.2C_what_next.3F cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS
On Aug 16, 2012, at 10:36 AM, Fabrízio de Royes Mello wrote: The attached patch implement this feature: CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ AUTHORIZATION user_name ] [ schema_element [ ... ] ] CREATE SCHEMA [ IF NOT EXISTS ] AUTHORIZATION user_name [ schema_element [ ... ] ] Now, PostgreSQL don't trow an error if we use IF NOT EXISTS in CREATE SCHEMA statement. So, I don't know the next steps... Awesome, thanks! Please add it to the next CommitFest: https://commitfest.postgresql.org/action/commitfest_view?id=15 Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS
2012/8/16 David E. Wheeler da...@justatheory.com On Aug 16, 2012, at 10:36 AM, Fabrízio de Royes Mello wrote: The attached patch implement this feature: CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ AUTHORIZATION user_name ] [ schema_element [ ... ] ] CREATE SCHEMA [ IF NOT EXISTS ] AUTHORIZATION user_name [ schema_element [ ... ] ] Now, PostgreSQL don't trow an error if we use IF NOT EXISTS in CREATE SCHEMA statement. So, I don't know the next steps... Awesome, thanks! Please add it to the next CommitFest: https://commitfest.postgresql.org/action/commitfest_view?id=15 Patch added to CommitFest: https://commitfest.postgresql.org/action/patch_view?id=907 Thanks, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] psql \set vs \copy - bug or expected behaviour?
On Fri, Oct 21, 2011 at 05:31:41PM -0400, Robert Haas wrote: On Fri, Oct 21, 2011 at 7:24 AM, Richard Huxton d...@archonet.com wrote: It looks like \copy is just passing the text of the query unadjusted to COPY. I get a syntax error on :x with the \copy below on both 9.0 and 9.1 === test script === \set x '''HELLO''' -- Works \echo :x -- Works \o '/tmp/test1.txt' COPY (SELECT :x) TO STDOUT; -- Doesn't work \copy (SELECT :x) TO '/tmp/test2.txt' === end script === I'm not sure whether that's a bug per se, but I can see where a behavior change might be an improvement. I did some research on this and learned a little more about flex rules. Turns out we can allow variable substitution in psql whole-line commands, like \copy and \!, by sharing the variable expansion flex rules with the code that does argument processing. What we can't easily do is to allow quotes to prevent variable substitution in these whole-line commands because we can't process the quotes because that will remove them. Here are some examples; \copy and \! behave the same: test= \set x abc test= \echo :x abc test= \echo :x -- :x test= \! echo :x abc test= \! echo :x -- abc Notice the last line has expanded :x even though it is in quotes. So, what do we want? The attached patch is pretty short. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l new file mode 100644 index 1208c8f..3732dc5 *** a/src/bin/psql/psqlscan.l --- b/src/bin/psql/psqlscan.l *** other . *** 934,949 } ! xslasharg{ /* * Default processing of text in a slash command's argument. * * Note: unquoted_option_chars counts the number of characters at the * end of the argument that were not subject to any form of quoting. * psql_scan_slash_option needs this to strip trailing semicolons safely. */ ! {space}|\\ { /* * Unquoted space is end of arg; do not eat. Likewise * backslash is end of command or next command, do not eat --- 934,956 } ! /* * Default processing of text in a slash command's argument. + * It shares token actions with xslasharg and xslashwholeline. * * Note: unquoted_option_chars counts the number of characters at the * end of the argument that were not subject to any form of quoting. * psql_scan_slash_option needs this to strip trailing semicolons safely. */ ! xslashwholeline{space}+ { ! /* process entire line, but suppress leading whitespace */ ! if (output_buf-len 0) ! ECHO; ! } ! ! xslasharg{space}|\\ { /* * Unquoted space is end of arg; do not eat. Likewise * backslash is end of command or next command, do not eat *** other . *** 957,982 return LEXRES_OK; } ! {quote} { ! *option_quote = '\''; ! unquoted_option_chars = 0; ! BEGIN(xslashquote); ! } ! ! `{ backtick_start_offset = output_buf-len; *option_quote = '`'; unquoted_option_chars = 0; BEGIN(xslashbackquote); } ! {dquote} { ECHO; *option_quote = ''; unquoted_option_chars = 0; BEGIN(xslashdquote); } :{variable_char}+ { /* Possible psql variable substitution */ if (option_type == OT_NO_EVAL) --- 964,1005 return LEXRES_OK; } ! xslasharg` { ! /* Only in xslasharg, so backticks are potentially passed to the shell */ backtick_start_offset = output_buf-len; *option_quote = '`'; unquoted_option_chars = 0; BEGIN(xslashbackquote); } ! xslasharg{quote} { ! *option_quote = '\''; ! unquoted_option_chars = 0; ! BEGIN(xslashquote); ! } ! ! xslasharg{dquote} { ECHO; *option_quote = ''; unquoted_option_chars = 0; BEGIN(xslashdquote); } + xslasharg{other} { + unquoted_option_chars++; + ECHO; + } + + xslashwholeline{other} { ECHO; } + + /* + * This code allows variable processing in slasharg and wholeline + * modes. wholeline does not allow quoting to prevent variable + * subtitution because quote detection would remove the quotes. + */ + + xslasharg,xslashwholeline{ + :{variable_char}+ { /* Possible psql variable substitution */ if (option_type == OT_NO_EVAL) *** other . *** 1044,1054 ECHO; } - {other} { - unquoted_option_chars++; - ECHO; - } - } xslashquote{ --- 1067,1072 *** other . *** 1115,1133 } - xslashwholeline{ - /* copy everything until end of input line */ - /* but suppress leading whitespace */ - - {space}+ { - if (output_buf-len 0) -
Re: [HACKERS] TRUE/FALSE vs true/false
On Tue, Aug 14, 2012 at 10:57:08PM -0400, Peter Eisentraut wrote: On Tue, 2012-08-14 at 17:36 -0400, Bruce Momjian wrote: On Tue, Aug 14, 2012 at 05:34:02PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote: On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote: I meant a mass sed -e 's/TRUE/true/g' -e 's/FALSE/false/g' run so all the ~200 occurrences of both TRUE and FALSE get converted so the whole source tree is consistent. I would be in favor of that. I have implemented this with the patch at: http://momjian.us/expire/true_diff.txt Does this really do anything for us that will justify the extra back-patching pain it will cause? I don't see that it's improving code readability any. I think it is more of a consistency issue. There were multiple people who wanted this change. Of course, some of those people don't backport stuff. I guess you could argue this particular case without end, but I think we should be open to these kinds of changes. They will make future code easier to deal with and confuse new developers less (when to use which, do they mean different things, etc.). If back-patching really becomes a problem, we might want to look a little deeper into git options. I think the Linux kernel people do these kinds of cleanups more often, so there is probably some better support for it. So what do we want to do with this? I am a little concerned that we are sacrificing code clarity for backpatching ease, but I don't do as much backpatching as Tom. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TRUE/FALSE vs true/false
Bruce Momjian br...@momjian.us wrote: So what do we want to do with this? I am a little concerned that we are sacrificing code clarity for backpatching ease, but I don't do as much backpatching as Tom. Well, if you back-patched this change, it would eliminate the issue for Tom, wouldn't it? Not sure if that's sane; just a thought. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TRUE/FALSE vs true/false
On Thu, Aug 16, 2012 at 02:21:12PM -0500, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: So what do we want to do with this? I am a little concerned that we are sacrificing code clarity for backpatching ease, but I don't do as much backpatching as Tom. Well, if you back-patched this change, it would eliminate the issue for Tom, wouldn't it? Not sure if that's sane; just a thought. I would be worried about some instability in backpatching. I was looking for an 'ignore-case' mode to patch, but I don't see it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner avoidance of index only scans for partial indexes
On 8/15/12 3:28 PM, Merlin Moncure wrote: Aside: the performance gains I'm seeing for IOS are nothing short of spectacular. Do you have some metrics? I could use them for publicity stuff. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] State of the on-disk bitmap index
Hello Jonah, Simon, and the hackers, I am going to implement a simple kind of encoded bitmap indexes (EBI). That is an index type where the bitmap columns may not only contain only a single '1' in the set of bits belonging to a tuple. Instead, an additional mapping table translates the distinct values of the table column into a unique encoding. To select for a given value all bitmap columns must be compared instead of only one. Queries that match multiple different values (like IN lists or range queries) simplify to less than the full set of bitmaps that needs to be compared because of boolean logic. The total number of bitmaps required to represent unique encodings for all different values is ceil(ld(n)), where n is the number of distinct values. Compared to normal bitmap indexes this solves the problem of high-cardinality columns. It is targetet at data warehousing scenarios with insert only data. The respective scientific paper can be found at http://www.dvs.tu-darmstadt.de/publications/pdf/ebi_a4.pdf I thought, it could be a good idea to base my work on the long proposed on-disk bitmap index implementation. Regarding to the wiki, you, Jonah and Simon, were the last devs that touched this thing. Unfortunately I could not find the patch representing your state of that work. I could only capture the development history up to Gianni Ciolli Gabriele Bartolini from the old pgsql-patches archives. Other people involved were Jie Zhang, Gavin Sherry, Heikki Linnakangas, and Leonardo F. Are you and the others still interested in getting this into PG? A rebase of the most current bitmap index implementation onto master HEAD will be the first 'byproduct' that I am going to deliver back to you. 1. Is anyone working on this currently? 2. Who has got the most current source code? 3. Is there a git of that or will I need to reconstruct the history from the patches I collected? Disclaimer: Although I read and manually processed most of the executor's code during my last work, I would still call myself a newbie and therefore will need some assistance most probably. Regards, Daniel -- Daniel Bausch Wissenschaftlicher Mitarbeiter Technische Universität Darmstadt Fachbereich Informatik Fachgebiet Datenbanken und Verteilte Systeme Hochschulstraße 10 64289 Darmstadt Germany Tel.: +49 6151 16 6706 Fax: +49 6151 16 6229 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unreproducible bug in snapshot import code
On Fri, Oct 28, 2011 at 11:07:25AM -0400, Gurjeet Singh wrote: On Fri, Oct 28, 2011 at 10:11 AM, Bruce Momjian br...@momjian.us wrote: Gurjeet Singh wrote: I have tried reproducing the bug starting from 1 and 2 transactions before the one shown in snippet, and I used tab-completion to get the same screen-output as termonal1.txt and yet it's not reproducible. I could reproduce it when I typed TAB just after typing set in set transaction snapshot. As Tom and Alvaro pointed out, the tab-completion issues a query and which prevents the set transaction snapshot command. Great! That settles it then. Reproducible, but not a bug. Yes, it is only tabs that query the database for completion that cause this. Should this be documented somehow? (No idea how.) If we have a doc section on psql's tab-completion, I think this needs to go there. A note like: Trying to tab-complete on psql may send queries to the server, and depending on the transaction state, execution of these queries may lead to non-default/ unexpected behaviour by the queries executed after tab-completion. For example, ... I have added the attached patch to document this limitation. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml new file mode 100644 index ebb0ad4..a47af51 *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** testdb=gt; \set PROMPT1 '%[%033[1;33;40 *** 3272,3278 exits and is reloaded when applicationpsql/application starts up. Tab-completion is also supported, although the completion logic makes no claim to be an ! acronymSQL/acronym parser. If for some reason you do not like the tab completion, you can turn it off by putting this in a file named filename.inputrc/filename in your home directory: programlisting --- 3272,3281 exits and is reloaded when applicationpsql/application starts up. Tab-completion is also supported, although the completion logic makes no claim to be an ! acronymSQL/acronym parser. The queries generated by tab-completion ! can also interfere with other SQL commands, e.g. literalSET ! TRANSACTION ISOLATION LEVEL/. ! If for some reason you do not like the tab completion, you can turn it off by putting this in a file named filename.inputrc/filename in your home directory: programlisting -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Clean up the #include mess a little.
On Thu, Aug 16, 2012 at 11:15:24AM -0400, Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of mié ago 15 18:30:40 -0400 2012: On Wed, Sep 7, 2011 at 01:20:17AM +0300, Ants Aasma wrote: On Tue, Sep 6, 2011 at 10:18 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: I wonder what happens if files in the same subdir are grouped in a subgraph. Is that possible? Possible, and done. Also added possivility to add .c files to the graph, coloring by subdir and possibility exclude nodes from the graph. I didn't yet bother to clean up the code - to avoid eye damage, don't look at the source. Bad news is that it doesn't significantly help readability for the all nodes case. See all_with_subgraphs.svgz. It does help for other cases. For example parsenodes.h.svgz has the result for render_includes.py --select='nodes/parsenodes.h+*-*' --subgraphs and execnodes.h.svgz for --subgraphs --select='nodes/execnodes.h+*-*' Should we add this script and instructions to src/tools/pginclude? Probably not, but maybe the developer FAQ in the wiki? I just added the script email URL to the pginclude README. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_page_prune comments
On Wed, Nov 2, 2011 at 12:27:02PM -0400, Robert Haas wrote: The following comment - or at least the last sentence thereof - appears to be out of date. /* * XXX Should we update the FSM information of this page ? * * There are two schools of thought here. We may not want to update FSM * information so that the page is not used for unrelated UPDATEs/INSERTs * and any free space in this page will remain available for further * UPDATEs in *this* page, thus improving chances for doing HOT updates. * * But for a large table and where a page does not receive further UPDATEs * for a long time, we might waste this space by not updating the FSM * information. The relation may get extended and fragmented further. * * One possibility is to leave fillfactor worth of space in this page * and update FSM with the remaining space. * * In any case, the current FSM implementation doesn't accept * one-page-at-a-time updates, so this is all academic for now. */ The simple fix here is just to delete that last sentence, but does anyone think we ought to do change the behavior, now that we have the option to do so? Last sentence removed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 27 July 2012 16:39, Jeff Janes jeff.ja...@gmail.com wrote: Can you suggest a benchmark that will usefully exercise this patch? I think the given sizes below work on most 64 bit machines. My apologies for not getting around to taking a look at this sooner. I tested this patch on my x86_64 laptop: [peter@peterlaptop tests]$ uname -r 3.4.4-4.fc16.x86_64 I have been able to recreate your results with the work_mem setting you supplied, 16384 (both queries that you suggested are executed together, for a less sympathetic workload): [peter@peterlaptop tests]$ cat sortmemgrow_sort.sql select count(distinct foo) from (select random() as foo from generate_series(1,524200)) asdf; select count(distinct foo) from (select random() as foo from generate_series(1,524300)) asdf; [peter@peterlaptop tests]$ # For master: [peter@peterlaptop tests]$ pgbench -f sortmemgrow_sort.sql -T 45 -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 45 s number of transactions actually processed: 45 tps = 0.992526 (including connections establishing) tps = 0.992633 (excluding connections establishing) [peter@peterlaptop tests]$ # For patch: [peter@peterlaptop tests]$ pgbench -f sortmemgrow_sort.sql -T 45 -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 45 s number of transactions actually processed: 66 tps = 1.461739 (including connections establishing) tps = 1.461911 (excluding connections establishing) I didn't find trace_sort all that useful for my earlier work on optimising in memory-sorting (at least when running benchmarks), as the granularity is too small for simple, relatively inexpensive queries with sort nodes (or some tuplesort usage, like the datum sort of your example). Also, the instrumentation can have a Heisenberg effect. I've avoided using it here. The less expensive query's executions costs are essentially the same with and without the patch. This patch works by not doubling the size of state-memtupsize when available memory is less than half of allowed memory (i.e. we are one call to grow_memtuples() away from reporting ). Rather, in that situation, it is set to a size that extrapolates the likely size of the total amount of memory needed in a way that is quite flawed, but likely to work well for the pass-by-value Datum case. Now, on the face of it, this may appear to be a re-hashing of the question of how paranoid do we need to be about wasting memory in memory-constrained situations - should we consider anything other than a geometric growth rate, to be parsimonious with memory at the risk of thrashing?. However, this is not really the case, because this is only a single, last-ditch effort to avoid a particularly undesirable eventuality. We have little to lose and quite a bit to gain. A cheap guestimation seems reasonable. I have attached a revision for your consideration, with a few editorialisations, mostly style-related. I removed this entirely: +* is the new method still follow this? The last allocation is no +* longer necessarily a power of 2, but that is not freed. I did so because, according to aset.c: * Further improvement 12/00: as the code stood, request sizes in the * midrange between small and large were handled very inefficiently, * because any sufficiently large free chunk would be used to satisfy a * request, even if it was much larger than necessary. This led to more * and more wasted space in allocated chunks over time. To fix, get rid * of the midrange behavior: we now handle only small power-of-2-size * chunks as chunks. Anything large is passed off to malloc(). Change * the number of freelists to change the small/large boundary. So, at the top of grow_memtuples, this remark still holds: * and so there will be no unexpected addition to what we ask for. (The * minimum array size established in tuplesort_begin_common is large * enough to force palloc to treat it as a separate chunk, so this * assumption should be good. But let's check it.) It continues to hold because we're still invariably passing off this request to malloc() (or, in this particular case, realloc()) regardless of the alignment of the request size. Certainly, the extant code verifies !LACKMEM, and the regression tests pass when this patch is applied. However, there is still a danger of LACKMEM. This revision has the following logic for extrapolating newmemtupsize (This is almost the same as the original patch): + newmemtupsize = (int) floor(oldmemtupsize * allowedMem / memNowUsed); Suppose that the code was: + newmemtupsize = (int) ceil(oldmemtupsize * allowedMem / memNowUsed); We'd now have an error with your latter example query because !LACKMEM, due to the inclusion of a single additional SortTuple. This is because GetMemoryChunkSpace (and, ultimately,
Re: [HACKERS] Avoiding shutdown checkpoint at failover
On Thu, Mar 8, 2012 at 08:20:02AM -0500, Robert Haas wrote: On Sat, Jan 28, 2012 at 8:57 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 26, 2012 at 5:27 AM, Fujii Masao masao.fu...@gmail.com wrote: One thing I would like to ask is that why you think walreceiver is more appropriate for writing XLOG_END_OF_RECOVERY record than startup process. I was thinking the opposite, because if we do so, we might be able to skip the end-of-recovery checkpoint even in file-based log-shipping case. Right now, WALReceiver has one code path/use case. Startup has so many, its much harder to know whether we'll screw up one of them. If we can add it in either place then I choose the simplest, most relevant place. If the code is the same, we can move it around later. Let me write the code and then we can think some more. Are we still considering trying to do this for 9.2? Seems it's been over a month without a new patch, and it's not entirely clear that we know what the design should be. Did this get completed? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression tests fail once XID counter exceeds 2 billion
On Wed, Nov 16, 2011 at 07:08:27PM -0500, Tom Lane wrote: I wrote: Simon Riggs si...@2ndquadrant.com writes: We need a function called transactionid_current() so a normal user can write select virtualtransaction from pg_locks where transactionid = transactionid_current() and have it just work. That would solve that one specific use-case. The reason I suggested txid_from_xid is that it could also be used to compare XIDs seen in tuples to members of a txid_snapshot, which is not possible now. BTW, a pgsql-general question just now made me realize that txid_from_xid() could have another use-case too. Right now, there are no inequality comparisons on XIDs, which is necessary because XIDs in themselves don't have a total order. However, you could ORDER BY txid_from_xid(xmin) and it would work, ie, give you rows in their XID order. This could be useful for finding the latest-modified rows in a table, modulo the fact that it would be ordering by transaction start time not commit time. Added to TODO: Add function to allow easier transaction id comparisons http://archives.postgresql.org/pgsql-hackers/2011-11/msg00786.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE SCHEMA IF NOT EXISTS
2012/8/16 Fabrízio de Royes Mello fabriziome...@gmail.com: The attached patch implement this feature: CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ AUTHORIZATION user_name ] [ schema_element [ ... ] ] CREATE SCHEMA [ IF NOT EXISTS ] AUTHORIZATION user_name [ schema_element [ ... ] ] Now, PostgreSQL don't trow an error if we use IF NOT EXISTS in CREATE SCHEMA statement. I started testing this, but I didn't see regression tests for it. Could you write them?. Best. -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: list API / memory allocations
Did we make any headway on this? --- On Sat, Nov 19, 2011 at 12:31:09PM -0500, Stephen Frost wrote: Andres, * Andres Freund (and...@anarazel.de) wrote: For that I added new functions/defines which allocate all the needed memory in one hunk: list_immutable_make$n(), List *list_new_immutable_n(NodeTag t, size_t n); List *list_make_n(NodeTag t, size_t n, ...); A while back, I posted a patch to try and address this same issue. The approach that I took was to always pre-allocate a certain (#defined) amount (think it was 5 or 10 elements). There were a number of places that caused problems with that approach because they hacked on the list element structures directly (instead of using the macros/functions)- you'll want to watch out for those areas in any work on lists. That patch is here: http://archives.postgresql.org/pgsql-hackers/2011-05/msg01213.php The thread on it might also be informative. I do like your approach of being able to pass the ultimate size of the list in.. Perhaps the two approaches could be merged? I was able to make everything work with my approach, provided all the callers used the list API (I did that by making sure the links, etc, actually pointed to the right places in the pre-allocated array). One downside was that the size ended up being larger that it might have been in some cases. Thanks, Stephen -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: list API / memory allocations
Is this a TODO? --- On Sat, Nov 19, 2011 at 12:31:09PM -0500, Stephen Frost wrote: Andres, * Andres Freund (and...@anarazel.de) wrote: For that I added new functions/defines which allocate all the needed memory in one hunk: list_immutable_make$n(), List *list_new_immutable_n(NodeTag t, size_t n); List *list_make_n(NodeTag t, size_t n, ...); A while back, I posted a patch to try and address this same issue. The approach that I took was to always pre-allocate a certain (#defined) amount (think it was 5 or 10 elements). There were a number of places that caused problems with that approach because they hacked on the list element structures directly (instead of using the macros/functions)- you'll want to watch out for those areas in any work on lists. That patch is here: http://archives.postgresql.org/pgsql-hackers/2011-05/msg01213.php The thread on it might also be informative. I do like your approach of being able to pass the ultimate size of the list in.. Perhaps the two approaches could be merged? I was able to make everything work with my approach, provided all the callers used the list API (I did that by making sure the links, etc, actually pointed to the right places in the pre-allocated array). One downside was that the size ended up being larger that it might have been in some cases. Thanks, Stephen -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Not HOT enough
Did we want to apply this? --- On Wed, Nov 23, 2011 at 07:33:18PM +, Simon Riggs wrote: On Wed, Nov 23, 2011 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real question is do we favour HOT cleanup on those small 8 tables, or do we favour HOT cleanup of every other table? No, the real question is why not think a little harder and see if we can come up with a solution that doesn't involve making some cases worse to make others better. Slightly modified patch attached. When we access a shared relation and the page is prunable we re-derive the cutoff value using GetOldestXmin. That code path is rarely taken. In particular, pg_shdepend is only accessed during object creation/alter/drop, so this isn't a path we can't spare a small amount little extra on, just like the trade-off we've taken to make locking faster for DML while making DDL a little slower. If this is still unacceptable, then I'll have to look at reducing impact on pg_shdepend from temp tables - which is still a plan. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 61f2ce4..5feaedc 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -16,13 +16,13 @@ #include access/heapam.h #include access/transam.h +#include catalog/catalog.h #include miscadmin.h #include pgstat.h #include storage/bufmgr.h #include utils/rel.h #include utils/tqual.h - /* Working data for heap_page_prune and subroutines */ typedef struct { @@ -72,6 +72,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin) { Pagepage = BufferGetPage(buffer); Sizeminfree; + Transactionid CutoffXmin = OldestXmin; /* * Let's see if we really need pruning. @@ -91,6 +92,18 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin) return; /* + * We choose to set RecentGlobalXmin only for the current database, which + * means we cannot use it to prune shared relations when reading them with + * MVCC snapshots. By making that choice it allows our snapshots to be + * smaller and faster, and the RecentGlobalXmin will be further forward + * and offer better pruning of non-shared relations. So when accessing a + * shared relation and we see the page is prunable (above) we get an + * OldestXmin across all databases. + */ + if (IsSharedRelation(relation-rd_id)) + CutoffXmin = GetOldestXmin(true, true); + + /* * We prune when a previous UPDATE failed to find enough space on the page * for a new tuple version, or when free space falls below the relation's * fill-factor target (but not less than 10%). @@ -124,7 +137,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin) * needed */ /* OK to prune */ - (void) heap_page_prune(relation, buffer, OldestXmin, true, ignore); + (void) heap_page_prune(relation, buffer, CutoffXmin, true, ignore); } /* And release buffer lock */ diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 1a48485..60153f4 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -55,6 +55,7 @@ #include storage/spin.h #include utils/builtins.h #include utils/snapmgr.h +#include utils/tqual.h /* Our shared memory area */ @@ -1185,6 +1186,8 @@ GetMaxSnapshotSubxidCount(void) * RecentGlobalXmin: the global xmin (oldest TransactionXmin across all * running transactions, except those running LAZY VACUUM). This is * the same computation done by GetOldestXmin(true, true). + * For MVCC snapshots we examine transactions running only in our + * database, ignoring longer running transactions in other databases. * * Note: this function should probably not be called with an argument that's * not statically allocated (see xip allocation below). @@ -1200,9 +1203,12 @@ GetSnapshotData(Snapshot snapshot) int count = 0; int subcount = 0; boolsuboverflowed = false; + boolallDbs = false; Assert(snapshot != NULL); + allDbs = !IsMVCCSnapshot(snapshot); + /* * Allocating space for maxProcs xids is usually overkill; numProcs
Re: [HACKERS] Avoiding repeated snapshot computation
Did we ever make a decision on this patch? --- On Sat, Nov 26, 2011 at 09:22:50PM +0530, Pavan Deolasee wrote: On some recent benchmarks and profile data, I saw GetSnapshotData figures at the very top or near top. For lesser number of clients, it can account for 10-20% of time, but more number of clients I have seen it taking up as much as 40% of sample time. Unfortunately, the machine of which I was running these tests is currently not available and so I don't have the exact numbers. But the observation is almost correct. Our recent work on separating the hot members of PGPROC in a separate array would definitely reduce data cache misses ans reduce the GetSnapshotData time, but it probably still accounts for a large enough critical section for a highly contended lock. I think now that we have reduced the run time of the function itself, we should now try to reduce the number of times the function is called. Robert proposed a way to reduce the number of calls per transaction. I think we can go one more step further and reduce the number for across the transactions. One major problem today could be because the way LWLock works. If the lock is currently held in SHARED mode by some backend and some other backend now requests it in SHARED mode, it will immediately get it. Thats probably the right thing to do because you don't want the reader to really wait when the lock is readily available. But in the case of GetSnapshotData(), every reader is doing exactly the same thing; they are computing a snapshot based on the same shared state and would compute exactly the same snapshot (if we ignore the fact that we don't include caller's XID in xip array, but thats a minor detail). And because the way LWLock works, more and more readers would get in to compute the snapshot, until the exclusive waiters get a window to sneak in, either because more and more processes slowly start sleeping for exclusive access. To depict it, the four transactions make overlapping calls for GetSnapshotData() and hence the total critical section starts when the first caller enters it and the ends when the last caller exits. Txn1 --[ SHARED]- Txn2 [ SHARED]--- Txn3 -[SHARED]- Txn4 ---[ SHARED ]- |---Total Time | Couple of ideas come to mind to solve this issue. A snapshot once computed will remain valid for every call irrespective of its origin until at least one transaction ends. So we can store the last computed snapshot in some shared area and reuse it for all subsequent GetSnapshotData calls. The shared snapshot will get invalidated when some transaction ends by calling ProcArrayEndTransaction(). I tried this approach and saw a 15% improvement for 32-80 clients on the 32 core HP IA box with pgbench -s 100 -N tests. Not bad, but I think this can be improved further. What we can do is when a transaction comes to compute its snapshot, it checks if some other transaction is already computing a snapshot for itself. If so, it just sleeps on the lock. When the other process finishes computing the snapshot, it saves the snapshot is a shared area and wakes up all processes waiting for the snapshot. All those processes then just copy the snapshot from the shared area and they are done. This will not only reduce the total CPU consumption by avoiding repetitive work, but would also reduce the total time for which ProcArrayLock is held in SHARED mode by avoiding pipeline of GetSnapshotData calls. I am currently trying the shared work queue mechanism to implement this, but I am sure we can do it this in some other way too. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large number of open(2) calls with bulk INSERT into empty table
A TODO for this? --- On Tue, Dec 6, 2011 at 02:53:42PM -0500, Robert Haas wrote: On Tue, Dec 6, 2011 at 7:12 AM, Florian Weimer fwei...@bfk.de wrote: * Robert Haas: I tried whacking out the call to GetPageWithFreeSpace() in RelationGetBufferForTuple(), and also with the unpatched code, but the run-to-run randomness was way more than any difference the change made. Is there a better test case? I think that if you want to exercise file system lookup performance, you need a larger directory, which presumably means a large number of tables. OK. I created 100,000 dummy tables, 10,000 at a time avoid blowing up the lock manager. I then repeated my previous tests, and I still can't see any meaningful difference (on my MacBook Pro, running MacOS X v10.6.8). So at least on this OS, it doesn't seem to matter much. I'm inclined to defer putting any more work into it until such time as someone can demonstrate that it actually causes a problem and provides a reproducible test case. I don't deny that there's probably an effect and it would be nice to improve this, but it doesn't seem worth spending a lot of time on until we can find a case where the effect is measurable. On the other hand, the problem of the FSM taking up 24kB for an 8kB table seems clearly worth fixing, but I don't think I have the cycles for it at present. Maybe a TODO is in order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
Was this resolved? (Sorry to be bugging everyone.) --- On Tue, Nov 29, 2011 at 11:42:10PM -0500, Robert Haas wrote: On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian br...@momjian.us wrote: Just confirming we decided not to persue this. Doesn't sound like it. I've been thinking a lot about the more general problem here - namely, that allowing catalog changes without an access exclusive lock is unsafe - and trying to come up with a good solution, because I'd really like to see us make this work. It strikes me that there are really two separate problems here. 1. If you are scanning a system catalog using SnapshotNow, and a commit happens at just the right moment, you might see two versions of the row or zero rather than one. You'll see two if you scan the old row version, then the concurrent transaction commits, and then you scan the new one. You'll see zero if you scan the new row (ignoring it, since it isn't committed yet) and then the concurrent transaction commits, and then you scan the old row. On a related note, if you scan several interrelated catalogs (e.g. when building a relcache entry) and a transaction that has made matching modifications to multiple catalogs commits midway through, you might see the old version of the data from one table and the new version of the data from some other table. Using AccessExclusiveLock fixes the problem because we guarantee that anybody who wants to read those rows first takes some kind of lock, which they won't be able to do while the updater holds AccessExclusiveLock. 2. Other backends may have data in the relcache or catcaches which won't get invalidated until they do AcceptInvalidationMessages(). That will always happen no later than the next time they lock the relation, so if you are holding AccessExclusiveLock then you should be OK: no one else holds any lock, and they'll have to go get one before doing anything interesting. But if you are holding any weaker lock, there's nothing to force AcceptInvalidationMessages to happen before you reuse those cache entries. In both cases, name lookups are an exception: we don't know what OID to lock until we've done the name lookup. Random ideas for solving the first problem: 1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've rejected this before because of the cost involved, but maybe with Pavan's patch and some of the other improvements that are on the way we could make this cheap enough to be acceptable. 1-B. Don't allow transactions that have modified system catalog X to commit while we're scanning system catalog X; make the scans take some kind of shared lock and commits an exclusive lock. This seems awfully bad for concurrency, though, not to mention the overhead of taking and releasing all those locks even when nothing conflicts. 1-C. Wrap a retry loop around every place where we scan a system catalog. If a transaction that has written rows in catalog X commits while we're scanning catalog X, discard the results of the first scan and declare a do-over (or maybe something more coarse-grained, or at the other extreme even more fine-grained). If we're doing several interrelated scans then the retry loop must surround the entire unit, and enforce a do-over of the whole operation if commits occur in any of the catalogs before the scan completes. Unfortunately, any of these seem likely to require a Lot of Work, probably more than can be done in time for 9.2. However, perhaps it would be feasible to hone in on a limited subset of the cases (e.g. name lookups, which are mainly done in only a handful of places, and represent an existing bug) and implement the necessary code changes just for those cases. Then we could generalize that pattern to other cases as time and energy permit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: list API / memory allocations
Bruce, * Bruce Momjian (br...@momjian.us) wrote: Did we make any headway on this? I've got the code written but I need to test it in a stable environment to see what kind of improvment it provides. I've actually been fighting for the past 2 months with the box that I want to use and think I may just give up and steal a different server. It hasn't fallen off my list of things to do, just hasn't been a high priority. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] RFC: list API / memory allocations
On Thu, Aug 16, 2012 at 09:51:28PM -0400, Stephen Frost wrote: Bruce, * Bruce Momjian (br...@momjian.us) wrote: Did we make any headway on this? I've got the code written but I need to test it in a stable environment to see what kind of improvment it provides. I've actually been fighting for the past 2 months with the box that I want to use and think I may just give up and steal a different server. It hasn't fallen off my list of things to do, just hasn't been a high priority. OK, thanks for the status update. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timing overhead and Linux clock sources
FYI, I am planning to go ahead and package this tool in /contrib for PG 9.3. --- On Fri, Dec 9, 2011 at 08:26:12PM -0500, Greg Smith wrote: On 12/09/2011 06:48 PM, Ants Aasma wrote: The attached test program (test_gettimeofday_monotonic) shows that one test loop iteration takes 34ns with tsc and 1270ns with hpet. This test program is great, I've wanted this exact sort of visibility into this problem for years. I've toyed with writing something like this and then seeing what results it returns on all of the build farm animals. For now I can just run it on all the hardware I have access to, which is not a small list. I think we should bundle this up similarly to test_fsync, document some best practices on time sources, and then point the vague warning about EXPLAIN overhead toward that. Then new sources of timing overhead can point there too. Much like low-level fsync timing, there's nothing PostgreSQL can do about it, the best we can do is provide a program to help users classify a system as likely or unlikely to run to have high-overhead timing. I can make the needed docs changes, this is resolving a long standing issue impacting code I wanted to add. Rough guideline I'm seeing shape up is that 50ns is unlikely to cause clock timing to be a significant problem, 500ns certainly is, and values in the middle should concern people but not necessarily invalidate the timing data collected. I just confirmed that switching the clock source by echoing a new value into /sys/devices/system/clocksource/clocksource0/current_clocksource works (on the 2.6.32 kernel at least). What we want to see on a good server is a result that looks like this, from my Intel i7-860 system: $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc $ ./test_gettimeofday_monotonic 5 Per loop: 39.30 ns usec: count percent 4: 6 0.0% 2:104 0.8% 1:4999760 3.92983% 0: 16109 96.07009% Here's how badly that degrades if I use one of the alternate time sources: # echo acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource $ ./test_gettimeofday_monotonic 5 Per loop: 727.65 ns usec: count percent 16: 1 0.1% 8: 0 0.0% 4: 1233 0.01794% 2:699 0.01017% 1:4992622 72.65764% 0:1876879 27.31423% echo hpet /sys/devices/system/clocksource/clocksource0/current_clocksource $ ./test_gettimeofday_monotonic 5 Per loop: 576.96 ns usec: count percent 8: 2 0.2% 4: 1273 0.01469% 2:767 0.00885% 1:4993028 57.61598% 0:3670977 42.36046% Switching to the Intel T7200 CPU in my T60 laptop only provides the poor quality time sources, not TSC, and results show timing is really slow there: $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet $ ./test_gettimeofday_monotonic 5 Per loop: 1019.60 ns usec: count percent 256: 2 0.4% 128: 3 0.6% 64: 90 0.00184% 32: 23 0.00047% 16: 92 0.00188% 8: 1246 0.02541% 4: 34 0.00069% 2: 136154 2.77645% 1:4700772 95.85818% 0: 65466 1.33498% # echo acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource $ ./test_gettimeofday_monotonic 5 Per loop: 1864.66 ns usec: count percent 256: 2 0.7% 128: 0 0.0% 64: 3 0.00011% 32: 6 0.00022% 16: 90 0.00336% 8: 1741 0.06493% 4: 2062 0.07690% 2:2260601 84.30489% 1: 416954 15.54952% 0: 0 0.0% This seems to be measuring exactly the problem I only had a hand-wave it's bad on this hardware explanation of before. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Slow tab completion w/ lots of tables
Greetings, When doing tab-completion under 9.1, pg_table_is_visible(oid) is slow and is ending up as the first thing tested against all the rows in pg_class. Increasing the cost of pg_table_is_visible() up to 10 causes it to move to the end of the tests, which improves things greatly- I thought there was a plan to make that the default..? This is with 9.1.4. After researching this a bit, I'm left wondering why we're using substring() to do the matching test. I don't see any easy way to index a substring() call which can be of any size (depending on how many characters are preceding the user hitting 'tab'). On the other hand, using LIKE with 'string%' and indexing with varchar_pattern_ops should give us a nice index which could be used for tab completion, right? If no one points out an obvious flaw in that, I'll take a look at making that happen. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] huge tlb support
On Mon, 9 Jul 2012 12:30:23 +0200 Andres Freund and...@2ndquadrant.com wrote: On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote: y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: Also, I was under the impression that recent Linux kernels use hugepages automatically if they can, so I wonder exactly what Andres was testing on ... if you mean the trasparent hugepage feature, iirc it doesn't affect MAP_SHARED mappings like this. Oh! That would explain some things. It seems like a pretty nasty restriction though ... do you know why they did that? Looking a bit deeper they explicitly only work on private memory. The reason apparently being that its too hard to update the page table entries in multiple processes at once without introducing locking problems/scalability issues. To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for the mapping to /dev/zero or the biggest mapping ;). Its not counted as Anonymous memory and it doesn't have transparent hugepages. I was confused before because there is quite some (400mb here) huge pages allocated for postgres during a pgbench run but thats just all the local memory... A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had horrible problems caused by transparent_hugepages running postgres on largish systems (128GB to 512GB memory, 32 cores). The system sometimes goes 99% system time and is very slow and unresponsive to the point of not successfully completing new tcp connections. Turning off transparent_hugepages fixes it. That said, explicit hugepage support for the buffer cache would be a big win especially for high connection counts. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] external_pid_file not removed on postmaster exit
On Fri, 2012-07-27 at 08:09 +0300, Peter Eisentraut wrote: It seems strange that the external_pid_file is never removed. There is even a C comment about it: /* Should we remove the pid file on postmaster exit? */ I think it should be removed with proc_exit hook just like the main postmaster.pid file. Does anyone remember why this was not done originally or have any concerns? Since that was not the case, I propose the attached patch to unlink the external pid file. diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a4fb2a4..73520a6 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -329,6 +329,7 @@ /* * postmaster.c - function prototypes */ +static void unlink_external_pid_file(int status, Datum arg); static void getInstallationPaths(const char *argv0); static void checkDataDir(void); static Port *ConnCreate(int serverFd); @@ -1071,7 +1072,6 @@ static bool save_backend_variables(BackendParameters *param, Port *port, { fprintf(fpidfile, %d\n, MyProcPid); fclose(fpidfile); - /* Should we remove the pid file on postmaster exit? */ /* Make PID file world readable */ if (chmod(external_pid_file, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) != 0) @@ -1081,6 +1081,8 @@ static bool save_backend_variables(BackendParameters *param, Port *port, else write_stderr(%s: could not write external PID file \%s\: %s\n, progname, external_pid_file, strerror(errno)); + + on_proc_exit(unlink_external_pid_file, 0); } /* @@ -1183,6 +1185,17 @@ static bool save_backend_variables(BackendParameters *param, Port *port, /* + * on_proc_exit callback to delete external_pid_file + */ +static void +unlink_external_pid_file(int status, Datum arg) +{ + if (external_pid_file) + unlink(external_pid_file); +} + + +/* * Compute and check the directory paths to files that are part of the * installation (as deduced from the postgres executable's own location) */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers