Re: [HACKERS] RFC: Making TRUNCATE more "MVCC-safe"
On Fri, Feb 10, 2012 at 01:59:18PM -0500, Robert Haas wrote: > On Fri, Feb 10, 2012 at 6:42 AM, Noah Misch wrote: > > I like the design you have chosen. ?It would find applications beyond > > TRUNCATE, so your use of non-specific naming is sound. ?For example, older > > snapshots will see an empty table "t" after "CREATE TABLE t AS SELECT 1" > > commits; that's a comparable MVCC anomaly. ?Some of our non-MVCC-safe > > commands > > should perhaps just become MVCC-safe, but there will always be use cases for > > operations that shortcut MVCC. ?When one truly does want that, your proposal > > for keeping behavior consistent makes plenty of sense. > > I guess I'm not particularly excited by the idea of trying to make > TRUNCATE MVCC-safe. I notice that the example involves the REPEATABLE > READ isolation level, which is already known to be busted in a variety > of ways; that's why we now have SERIALIZABLE, and why most people use > READ COMMITTED. Are there examples of this behavior at other > isolation levels? I've yet to see an MVCC anomaly that one can reproduce at REPEATABLE READ and not at READ COMMITTED. They tend to be narrow race conditions at READ COMMITTED, yet easy to demonstrate at REPEATABLE READ. Related: http://archives.postgresql.org/pgsql-performance/2011-02/msg00451.php Incidentally, people use READ COMMITTED because they don't question the default, not because they know hazards of REPEATABLE READ. I don't know the bustedness you speak of; could we improve the documentation to inform folks? > But I have to admit I'm intrigued by the idea of extending this to > other cases, if there's a reasonable way to do that. For example, if > we could fix things up so that we don't see a table at all if it was > created after we took our snapshot, that would remove one of the > obstacles to marking pages bulk-loaded into that table with FrozenXID > and PD_ALL_VISIBLE from the get-go. I think a lot of people would be > mighty happy about that. Exactly. > But the necessary semantics seem somewhat different. For TRUNCATE, > you want to throw a serialization error; but is that also what you > want for CREATE TABLE? Or do you just want it to appear empty? I think an error helps just as much there. If you create a table and populate it with data in the same transaction, letting some snapshot see an empty table is an atomicity failure. Your comment illustrates a helpful point: this is just another kind of ordinary serialization failure, one that can happen at any isolation level. No serial transaction sequence can yield one of these errors. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql leaking memory when stringifying datums
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= writes: > While chasing a PL/Python memory leak, I did a few tests with PL/pgSQL > and I think there are places where memory is not freed sufficiently early. I think the basic issue here is that the type output function might generate (and not bother to free) additional cruft besides its output string, so that pfree'ing the output alone is not sufficient to avoid a memory leak if the call occurs in a long-lived context. However, I don't much care for the details of the proposed patch: if we're going to fix this by running the output function in the per-tuple memory context, and expecting the caller to do exec_eval_cleanup later, why should we add extra pstrdup/pfree overhead? We can just leave the result in the temp context in most cases, and thus get a net savings rather than a net cost from fixing this. The attached modified patch does it like that. BTW, it occurs to me to wonder whether we need to worry about such subsidiary leaks in type input functions as well. I see at least one place where pl_exec.c is tediously freeing the result of exec_simple_cast_value, but if there are secondary leaks that's not going to be good enough. Maybe we should switch over to a similar definition where the cast result is in the per-tuple context, and you've got to copy it if you want it to be long-lived. regards, tom lane diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index bf952b62478792b5564fe2af744a318322ea197c..e93b7c63be5d8a385b420dc0d9afa04cb90174a7 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** static void exec_move_row(PLpgSQL_execst *** 188,201 static HeapTuple make_tuple_from_row(PLpgSQL_execstate *estate, PLpgSQL_row *row, TupleDesc tupdesc); ! static char *convert_value_to_string(Datum value, Oid valtype); ! static Datum exec_cast_value(Datum value, Oid valtype, Oid reqtype, FmgrInfo *reqinput, Oid reqtypioparam, int32 reqtypmod, bool isnull); ! static Datum exec_simple_cast_value(Datum value, Oid valtype, Oid reqtype, int32 reqtypmod, bool isnull); static void exec_init_tuple_store(PLpgSQL_execstate *estate); --- 188,204 static HeapTuple make_tuple_from_row(PLpgSQL_execstate *estate, PLpgSQL_row *row, TupleDesc tupdesc); ! static char *convert_value_to_string(PLpgSQL_execstate *estate, ! Datum value, Oid valtype); ! static Datum exec_cast_value(PLpgSQL_execstate *estate, ! Datum value, Oid valtype, Oid reqtype, FmgrInfo *reqinput, Oid reqtypioparam, int32 reqtypmod, bool isnull); ! static Datum exec_simple_cast_value(PLpgSQL_execstate *estate, ! Datum value, Oid valtype, Oid reqtype, int32 reqtypmod, bool isnull); static void exec_init_tuple_store(PLpgSQL_execstate *estate); *** plpgsql_exec_function(PLpgSQL_function * *** 430,436 else { /* Cast value to proper type */ ! estate.retval = exec_cast_value(estate.retval, estate.rettype, func->fn_rettype, &(func->fn_retinput), func->fn_rettypioparam, --- 433,440 else { /* Cast value to proper type */ ! estate.retval = exec_cast_value(&estate, estate.retval, ! estate.rettype, func->fn_rettype, &(func->fn_retinput), func->fn_rettypioparam, *** exec_stmt_fori(PLpgSQL_execstate *estate *** 1757,1763 * Get the value of the lower bound */ value = exec_eval_expr(estate, stmt->lower, &isnull, &valtype); ! value = exec_cast_value(value, valtype, var->datatype->typoid, &(var->datatype->typinput), var->datatype->typioparam, var->datatype->atttypmod, isnull); --- 1761,1767 * Get the value of the lower bound */ value = exec_eval_expr(estate, stmt->lower, &isnull, &valtype); ! value = exec_cast_value(estate, value, valtype, var->datatype->typoid, &(var->datatype->typinput), var->datatype->typioparam, var->datatype->atttypmod, isnull); *** exec_stmt_fori(PLpgSQL_execstate *estate *** 1772,1778 * Get the value of the upper bound */ value = exec_eval_expr(estate, stmt->upper, &isnull, &valtype); ! value = exec_cast_value(value, valtype, var->datatype->typoid, &(var->datatype->typinput), var->datatype->typioparam, var->datatype->atttypmod, isnull); --- 1776,1782 * Get the value of the upper bound */ value = exec_eval_expr(estate, stmt->upper, &isnull, &valtype); ! value = exec_cast_value(estate, value, valtype, var->datatype->typoid, &(var->datatype->typinput), var->datatype->typioparam, var->datatype->atttypmod, isnull); *** exec_stmt_fori(PLpgSQL_execstate *estate *** 1789,1795 if (stmt->step) { value = exec_eval_exp
[HACKERS] bitfield and gcc
I wonder if somewhere in Postgres source "we" are relying on the GCC "correct behaviour" regarding the read-modify-write of bitfield in structures. Take a read at this https://lwn.net/Articles/478657/ sorry if this was already mentioned. Regards Gaetano Mendola -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing special case OID generation
On 2/7/12 8:14 AM, Alvaro Herrera wrote: Having one sequence for each toast table could be wasteful though. I mean, sequences are not the best use of shared buffer cache currently. If we could have more than one sequence data in a shared buffer page, things would be different. Not sure how serious this really is. This would actually be an argument for supporting multiple page sizes... too bad that's such a beast. FWIW, from our most complex production database: cnuapp_p...@postgres08.obr=# select relkind, count(*) from pg_class group by 1; relkind | count -+--- S | 522 r | 1058 t | 698 i | 2894 v | 221 c |12 (6 rows) -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On sön, 2012-02-05 at 10:53 -0800, Jeff Davis wrote: > > initdb should do these syncs by default and offer an option to > disable them. > > For test frameworks that run initdb often, that makes sense. > > But for developers, it doesn't make sense to spend 0.5s typing an > option > that saves you 0.3s. So, we'd need some more convenient way to choose > the no-fsync option, like an environment variable that developers can > set. Or maybe developers don't care about 0.3s? > You can use https://launchpad.net/libeatmydata for those cases. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
All, Andrew ran crake on these modules, and they do not add any links not added by core postgres already. As such, can we proceed with this patch? Greg, do you have an updated version to run against HEAD? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!
I wrote: > Now, back to the original subject of this thread: both HEAD and 9.1 are > now operating as designed, in that they will dump the (user-provided > portion of the) contents of an extension config table whenever that > extension is dumped, even if --schema is specified. Or so I thought, anyway. Further experimentation with despez's example shows that in HEAD, --schema is still able to block dumping of extension config table contents, and the reason appears to be commit a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which added yet another set of filtering conditions in a poorly chosen place; or possibly I should say it made arbitrary changes in the definition of the --schema switch. That patch needs some rethinking too, though I'm not sure what yet. I also note that his example shows that if you have a selective dump (say, with a -t switch), config table contents will be dumped even when the owning extension is not. This seems like a pretty clear bug: getExtensionMembership should not be creating TableDataInfo objects for extension config tables if the owning extension is not to be dumped. Barring objections, I'll go fix and back-patch that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
On Thu, Feb 9, 2012 at 03:21, Christoph Berg wrote: > Hi, > > we have a database that is storing strings in various encodings (and > non-encodings, namely the arbitrary byte soup [ ... ] > For this reason, the database uses > sql_ascii encoding > ...snip... > In sql_ascii databases, utf_e2u does not do any recoding, but then > SvUTF8_on still marks the string as utf-8, while it isn't. > > (Returned values might also need fixing.) > > In my view, this is clearly a bug in pl/perl on sql_ascii databases. Yeah, there was some musing about this over in: http://archives.postgresql.org/pgsql-hackers/2011-02/msg01142.php Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr and SvPVUTF8() when turning a perl string into a cstring. With the attached I get: => create or replace function perl_white(a text) returns text as $$ return shift; $$ language plperlu; => select perl_white(E'\200'), perl_white(E'\200')::bytea, coalesce(perl_white(E'\200'), 'null'); perl_white | perl_white | coalesce ++-- | \x80 | => select perl_white(E'\401'); perl_white \x01 (1 row) Does the attached fix the issue for you? Ill note that all the pls seem to behave a bit differently: => create or replace function py_white(a text) returns text as $$ return a; $$ language plpython3u; => select py_white(E'\200'), py_white(E'\200')::bytea, coalesce(py_white(E'\200'), 'null'); py_white | py_white | coalesce --+--+-- | | null (1 row) =>select py_white(E'\401'); py_white -- \x01 (1 row) => create or replace function tcl_white(text) returns text as $$ return $1; $$ language pltcl; => select tcl_white(E'\200'), tcl_white(E'\200')::bytea, coalesce(tcl_white(E'\200'), 'null'); tcl_white | tcl_white | coalesce ---+---+-- | \x80 | => select tcl_white(E'\402'); tcl_white --- \x02 (1 row) *** a/src/pl/plperl/plperl_helpers.h --- b/src/pl/plperl/plperl_helpers.h *** *** 5,23 * convert from utf8 to database encoding */ static inline char * ! utf_u2e(const char *utf8_str, size_t len) { ! int enc = GetDatabaseEncoding(); ! ! char *ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, enc); /* ! * when we are a PG_UTF8 or SQL_ASCII database ! * pg_do_encoding_conversion() will not do any conversion or ! * verification. we need to do it manually instead. */ if (enc == PG_UTF8 || enc == PG_SQL_ASCII) ! pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false); if (ret == utf8_str) ret = pstrdup(ret); --- 5,24 * convert from utf8 to database encoding */ static inline char * ! utf_u2e(char *utf8_str, size_t len) { ! int enc = GetDatabaseEncoding(); ! char *ret = utf8_str; /* ! * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion() ! * will not do any conversion or verification. we need to do it manually ! * instead. */ if (enc == PG_UTF8 || enc == PG_SQL_ASCII) ! pg_verify_mbstr_len(enc, utf8_str, len, false); ! else ! ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, enc); if (ret == utf8_str) ret = pstrdup(ret); *** *** 66,72 sv2cstr(SV *sv) * we are done */ SvREFCNT_inc(sv); ! val = SvPVutf8(sv, len); /* * we use perl's length in the event we had an embedded null byte to ensure --- 67,80 * we are done */ SvREFCNT_inc(sv); ! /* ! * when SQL_ASCII just treat it as byte soup, that is fetch the string out ! * however it is currently stored by perl ! */ ! if (GetDatabaseEncoding() == PG_SQL_ASCII) ! val = SvPV(sv, len); ! else ! val = SvPVutf8(sv, len); /* * we use perl's length in the event we had an embedded null byte to ensure *** *** 89,99 static inline SV * cstr2sv(const char *str) { SV *sv; ! char *utf8_str = utf_e2u(str); sv = newSVpv(utf8_str, 0); SvUTF8_on(sv); - pfree(utf8_str); return sv; --- 97,112 cstr2sv(const char *str) { SV *sv; ! char *utf8_str; ! ! /* no conversion when SQL_ASCII */ ! if (GetDatabaseEncoding() == PG_SQL_ASCII) ! return newSVpv(str, 0); ! ! utf8_str = utf_e2u(str); sv = newSVpv(utf8_str, 0); SvUTF8_on(sv); pfree(utf8_str); return sv; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On Fri, Feb 10, 2012 at 2:23 PM, Jim Nasby wrote: > On 2/8/12 6:17 AM, Christian Nicolaisen wrote: >> We have some tables with documents and their metadata (filename, filetype, >> etc). >> Most of the time we just want to get the metadata (filename, filetype, >> etc) and not the document itself. >> In this case it would be nice to have the metadata (which wouldn't end up >> on the TOAST) on a fast array (SSD-based), >> and put the filedata on a slow array (HDD-based). It's probably possible >> to have two tables one for metadata and one >> for the extra file, but this is a workaround. > > Did you intend to post a patch? Because nothing was attached. Also, if > you're submitting a patch you should add it to the next commitfest. He was commenting on the possible utility of a previously-submitted patch, which got pushed off because we weren't sure it was good for anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On 2/8/12 6:17 AM, Christian Nicolaisen wrote: Hi We have some tables with documents and their metadata (filename, filetype, etc). Most of the time we just want to get the metadata (filename, filetype, etc) and not the document itself. In this case it would be nice to have the metadata (which wouldn't end up on the TOAST) on a fast array (SSD-based), and put the filedata on a slow array (HDD-based). It's probably possible to have two tables one for metadata and one for the extra file, but this is a workaround. Did you intend to post a patch? Because nothing was attached. Also, if you're submitting a patch you should add it to the next commitfest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Fri, Feb 10, 2012 at 7:01 PM, Ants Aasma wrote: > > On Feb 9, 2012 1:27 AM, "Robert Haas" > >> However, there is a potential fly in the ointment: in other cases in >> which we've reduced contention at the LWLock layer, we've ended up >> with very nasty contention at the spinlock layer that can sometimes >> eat more CPU time than the LWLock contention did. In that light, it >> strikes me that it would be nice to be able to partition the >> contention N ways rather than just 2 ways. I think we could do that >> as follows. Instead of having one control lock per SLRU, have N >> locks, where N is probably a power of 2. Divide the buffer pool for >> the SLRU N ways, and decree that each slice of the buffer pool is >> controlled by one of the N locks. Route all requests for a page P to >> slice P mod N. Unlike this approach, that wouldn't completely >> eliminate contention at the LWLock level, but it would reduce it >> proportional to the number of partitions, and it would reduce spinlock >> contention according to the number of partitions as well. A down side >> is that you'll need more buffers to get the same hit rate, but this >> proposal has the same problem: it doubles the amount of memory >> allocated for CLOG. > > Splitting the SLRU into different parts is exactly the same approach as > associativity used in CPU caches. I found some numbers that analyze cache > hit rate with different associativities: My suggested approach is essentially identical approach to the one we already use for partitioning the buffer cache and lock manager. I expect it to be equally effective at reducing contention. There is little danger of all hitting same partition at once, since there are many xids and they are served out sequentially. In the lock manager case we use the relid as key, so there is some skewing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Feb 9, 2012 1:27 AM, "Robert Haas" > However, there is a potential fly in the ointment: in other cases in > which we've reduced contention at the LWLock layer, we've ended up > with very nasty contention at the spinlock layer that can sometimes > eat more CPU time than the LWLock contention did. In that light, it > strikes me that it would be nice to be able to partition the > contention N ways rather than just 2 ways. I think we could do that > as follows. Instead of having one control lock per SLRU, have N > locks, where N is probably a power of 2. Divide the buffer pool for > the SLRU N ways, and decree that each slice of the buffer pool is > controlled by one of the N locks. Route all requests for a page P to > slice P mod N. Unlike this approach, that wouldn't completely > eliminate contention at the LWLock level, but it would reduce it > proportional to the number of partitions, and it would reduce spinlock > contention according to the number of partitions as well. A down side > is that you'll need more buffers to get the same hit rate, but this > proposal has the same problem: it doubles the amount of memory > allocated for CLOG. Splitting the SLRU into different parts is exactly the same approach as associativity used in CPU caches. I found some numbers that analyze cache hit rate with different associativities: http://research.cs.wisc.edu/multifacet/misc/spec2000cache-data/ Now obviously CPU cache access patterns are different from CLOG patterns, but I think that the numbers strongly suggest that the reduction in hitrate might be less than what you fear. For example, the harmonic mean of data cache misses over all benchmark for 16, 32 and 64 cache lines: | Size | Direct | 2-way LRU | 4-way LRU | 8-way LRU | Full LRU | |---+-+-+-+-+-| | 1KB | 0.0863842-- | 0.0697167-- | 0.0634309-- | 0.0563450-- | 0.0533706-- | | 2KB | 0.0571524-- | 0.0423833-- | 0.0360463-- | 0.0330364-- | 0.0305213-- | | 4KB | 0.0370053-- | 0.0260286-- | 0.0222981-- | 0.0202763-- | 0.0190243-- | As you can see, the reduction in hit rate is rather small down to 4 way associative caches. There may be a performance problem when multiple CLOG pages that happen to sit in a single way become hot at the same time. The most likely case that I can come up with is multiple scans going over unhinted pages created at different time periods. If that is something to worry about, then a tool that's used for CPUs is to employ a fully associative victim cache behind the main cache. If a CLOG page is evicted, it is transferred into the victim cache, evicting a page from there. When a page isn't found in the main cache, the victim cache is first checked for a possible hit. The movement between the two caches doesn't need to involve any memory copying - just swap pointers in metadata. The victim cache will bring back concurrency issues when the hit rate of the main cache is small - like the pgbench example you mentioned. In that case, a simple associative cache will allow multiple reads of clog pages simultaneously. On the other hand - in that case lock contention seems to be the symptom, rather than the disease. I think that those cases would be better handled by increasing the maximum CLOG SLRU size. The increase in memory usage should be a drop in the bucket for systems that have enough transaction processing velocity for that to be a problem. -- Ants Aasma
Re: [HACKERS] RFC: Making TRUNCATE more "MVCC-safe"
On Fri, Feb 10, 2012 at 6:42 AM, Noah Misch wrote: > I like the design you have chosen. It would find applications beyond > TRUNCATE, so your use of non-specific naming is sound. For example, older > snapshots will see an empty table "t" after "CREATE TABLE t AS SELECT 1" > commits; that's a comparable MVCC anomaly. Some of our non-MVCC-safe commands > should perhaps just become MVCC-safe, but there will always be use cases for > operations that shortcut MVCC. When one truly does want that, your proposal > for keeping behavior consistent makes plenty of sense. I guess I'm not particularly excited by the idea of trying to make TRUNCATE MVCC-safe. I notice that the example involves the REPEATABLE READ isolation level, which is already known to be busted in a variety of ways; that's why we now have SERIALIZABLE, and why most people use READ COMMITTED. Are there examples of this behavior at other isolation levels? But I have to admit I'm intrigued by the idea of extending this to other cases, if there's a reasonable way to do that. For example, if we could fix things up so that we don't see a table at all if it was created after we took our snapshot, that would remove one of the obstacles to marking pages bulk-loaded into that table with FrozenXID and PD_ALL_VISIBLE from the get-go. I think a lot of people would be mighty happy about that. But the necessary semantics seem somewhat different. For TRUNCATE, you want to throw a serialization error; but is that also what you want for CREATE TABLE? Or do you just want it to appear empty? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: fix pg_dump for inherited defaults & not-null flags
Robert Haas writes: > ... I'd lean toward back-patching. Not hearing any contrary opinions, that's what I've done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] auto_explain produces invalid JSON
The auto_explain module appears to create invalid JSON (in all versions since 9.0). For example, with the settings auto_explain.log_format = 'json' auto_explain.log_min_duration = 0 the query select * from pg_type; produces this in the log: LOG: duration: 529.808 ms plan: [ "Query Text": "select * from pg_type;", "Plan": { "Node Type": "Seq Scan", "Relation Name": "pg_type", "Alias": "pg_type", "Startup Cost": 0.00, "Total Cost": 9.87, "Plan Rows": 287, "Plan Width": 611 } ] Note that at the top level, it uses the array delimiters [ ] for what is actually an object (key/value). Running this through a JSON parser will fail. By contrast, EXPLAIN (FORMAT JSON) on the command line produces: QUERY PLAN --- [ { "Plan": { "Node Type": "Seq Scan", "Relation Name": "pg_type", "Alias": "pg_type", "Startup Cost": 0.00, "Total Cost": 9.87, "Plan Rows": 287, "Plan Width": 611 } } ] (1 row) So there is evidently something out of sync between what EXPLAIN and what auto_explain produces. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Robert Haas wrote: > One thing that's been bugging me for a while is that the tab > completion code all works by looking backward up to n words. What we > really want to know is what kind of statement we're in and where we > are in it. Absent other information, if we're in the target list of a > SELECT statement (nested arbitrarily) that behavior is reasonable. If > we're someplace in a GRANT statement, or someplace in a CREATE > STATEMENT where, say, a column name is expected, it's really not. I played with this years ago, but readline does not really offer a good way to easily get what we want (the whole statement, chunked into nice bits to analyze). Of course at this point we should think about making things more generic so we can drop in whatever readline-alternative comes along in the future. > Unfortunately, making the tab completion something other than > incredibly stupid is likely to be an insane amount of work. Insane amount of work? Check. Inredibly stupid? No, I think we've done pretty good given the limitations we have. You want to see incredibly stupid, see some of the *other* CLIs out there (hi, mysql! :) - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202101157 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk81TI4ACgkQvJuQZxSWSsivRQCfcze1WMq81rE+mtrOReHBQ6eV SzEAn2JySDAoCokFkY/gtz//GqolVVm5 =d2LG -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix PL/Python metadata when there is no result
Dear hackers, Thanks for the work on PLPython result metadata, it is very useful! I just came across a crash when trying to access this metadata on the result of an UPDATE, which obviously cannot return any tuple (unless you specify a RETURNING clause maybe?). Please find attached a patch that solves this issue. Instead of a PG crash, we get the following message: ERROR: plpy.Error: no result fetched All the best, -- Jean-Baptiste Quenot 0001-Fix-PLPython-metadata-access-when-there-is-no-result.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
On Fri, Feb 10, 2012 at 11:22 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Feb 10, 2012 at 11:01 AM, Tom Lane wrote: >>> Well, if you want a patch with low standards, what about tab-completing >>> function names anywhere that we do not see context suggesting something >>> else? > >> I think that without a bit more contextual information that's likely >> to lead to some odd results. Unimplemented completions will lead to >> bizarre things happening. > > True. I was first thinking of doing this only if we know we're in > a DML query, ie *first* word on the line is > WITH/SELECT/INSERT/UPDATE/DELETE. However, in the current > implementation that is not terribly workable because we are only looking > at the current line of text, not the whole input buffer; so making such > a restriction would disable completion after the first line of a multi- > line command. > >> One thing that's been bugging me for a while is that the tab >> completion code all works by looking backward up to n words. > > Yup. At the very least it would be good if it had access to the entire > current command, so that we could sanity-check on the basis of the first > word. Agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
Robert Haas writes: > On Fri, Feb 10, 2012 at 11:01 AM, Tom Lane wrote: >> Well, if you want a patch with low standards, what about tab-completing >> function names anywhere that we do not see context suggesting something >> else? > I think that without a bit more contextual information that's likely > to lead to some odd results. Unimplemented completions will lead to > bizarre things happening. True. I was first thinking of doing this only if we know we're in a DML query, ie *first* word on the line is WITH/SELECT/INSERT/UPDATE/DELETE. However, in the current implementation that is not terribly workable because we are only looking at the current line of text, not the whole input buffer; so making such a restriction would disable completion after the first line of a multi- line command. > One thing that's been bugging me for a while is that the tab > completion code all works by looking backward up to n words. Yup. At the very least it would be good if it had access to the entire current command, so that we could sanity-check on the basis of the first word. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Upcoming PG back-branch releases, end of this month
In view of the recently fixed data-corruption issues in hot standby operation (bugs 6200, 6425), the core team feels we should push out back branch update releases soon. Due to unavailability of some key people, the earliest feasible schedule turns out to be wrap Thursday 2/23 for public release Monday Feb 27. So that's what it will be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
On Fri, Feb 10, 2012 at 11:01 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Feb 10, 2012 at 10:20 AM, Tom Lane wrote: >>> I'm not against tab-completing functions, if people think that's >>> useful. I am against tab-completing them in 1% of use-cases, which is >>> what this patch accomplishes. The fact that it's short doesn't make it >>> good. > >> Our tab completion is in general very incomplete; we have made a >> practice of cherry-picking the most commonly encountered cases and >> handling only those. Whether or not that is a good policy is a >> philosophical question, but there is no reason to hold this particular >> patch to a higher standard than the quality of our tab completion code >> in general. > > Well, if you want a patch with low standards, what about tab-completing > function names anywhere that we do not see context suggesting something > else? I really think that doing it only immediately after SELECT is > going to prove far more of an annoyance than a help, because once you > get used to relying on it you are going to wish it worked elsewhere. I think that without a bit more contextual information that's likely to lead to some odd results. Unimplemented completions will lead to bizarre things happening. One thing that's been bugging me for a while is that the tab completion code all works by looking backward up to n words. What we really want to know is what kind of statement we're in and where we are in it. Absent other information, if we're in the target list of a SELECT statement (nested arbitrarily) that behavior is reasonable. If we're someplace in a GRANT statement, or someplace in a CREATE STATEMENT where, say, a column name is expected, it's really not. Unfortunately, making the tab completion something other than incredibly stupid is likely to be an insane amount of work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
Robert Haas writes: > On Fri, Feb 10, 2012 at 10:20 AM, Tom Lane wrote: >> I'm not against tab-completing functions, if people think that's >> useful. I am against tab-completing them in 1% of use-cases, which is >> what this patch accomplishes. The fact that it's short doesn't make it >> good. > Our tab completion is in general very incomplete; we have made a > practice of cherry-picking the most commonly encountered cases and > handling only those. Whether or not that is a good policy is a > philosophical question, but there is no reason to hold this particular > patch to a higher standard than the quality of our tab completion code > in general. Well, if you want a patch with low standards, what about tab-completing function names anywhere that we do not see context suggesting something else? I really think that doing it only immediately after SELECT is going to prove far more of an annoyance than a help, because once you get used to relying on it you are going to wish it worked elsewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: fix pg_dump for inherited defaults & not-null flags
On Fri, Feb 10, 2012 at 10:52 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Feb 9, 2012 at 6:21 PM, Tom Lane wrote: >>> Although this is a bug fix, it's a nontrivial change in the logic and >>> so I'm hesitant to back-patch into stable branches. Given the lack of >>> prior complaints, maybe it would be best to leave it unfixed in existing >>> branches? Not sure. Thoughts? > >> I guess I'd be in favor of back-patching it, if that doesn't look like >> too much of a job. We shouldn't assume that because only one person >> reports a problem, no one else has been or will be affected. > > I don't think it's too much work --- what I'm more worried about is > introducing new bugs. If I apply it only in HEAD then it will go > through a beta test cycle before anybody relies on it in production. > I *think* the patch is okay, but I've been wrong before. Well, I'm not going to second-guess your judgement too much, but my general feeling is that it's worth taking a bit of risk to get pg_dump to DTRT. Dump and restore failures are extremely painful and difficult to work around, so in my book they rank pretty highly in the list of things that are important to get fixed. So if you're on the fence, I'd lean toward back-patching. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: fix pg_dump for inherited defaults & not-null flags
Robert Haas writes: > On Thu, Feb 9, 2012 at 6:21 PM, Tom Lane wrote: >> Although this is a bug fix, it's a nontrivial change in the logic and >> so I'm hesitant to back-patch into stable branches. Given the lack of >> prior complaints, maybe it would be best to leave it unfixed in existing >> branches? Not sure. Thoughts? > I guess I'd be in favor of back-patching it, if that doesn't look like > too much of a job. We shouldn't assume that because only one person > reports a problem, no one else has been or will be affected. I don't think it's too much work --- what I'm more worried about is introducing new bugs. If I apply it only in HEAD then it will go through a beta test cycle before anybody relies on it in production. I *think* the patch is okay, but I've been wrong before. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
On Fri, Feb 10, 2012 at 10:20 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane wrote: >>> That seems pretty nearly entirely bogus. What is the argument for >>> supposing that the word right after SELECT is a function name? > >> It isn't necessarily, but it might be. It'd certainly be nice to type: >> SELECT pg_si >> and get: >> SELECT pg_size_pretty( > > Yeah, and then you'll type > > SELECT pg_size_pretty(pg_dat > > and get nothing, and curse the authors of such a misbegotten incomplete > concept that leads your fingers to rely on something that doesn't work > where it should. > > I'm not against tab-completing functions, if people think that's > useful. I am against tab-completing them in 1% of use-cases, which is > what this patch accomplishes. The fact that it's short doesn't make it > good. Our tab completion is in general very incomplete; we have made a practice of cherry-picking the most commonly encountered cases and handling only those. Whether or not that is a good policy is a philosophical question, but there is no reason to hold this particular patch to a higher standard than the quality of our tab completion code in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: fix pg_dump for inherited defaults & not-null flags
On Thu, Feb 9, 2012 at 6:21 PM, Tom Lane wrote: > Attached is a proposed patch to deal with the issue described here: > http://archives.postgresql.org/pgsql-bugs/2012-02/msg0.php > > Even though we'd previously realized that comparing the text of > inherited CHECK expressions is an entirely unsafe way to detect > expression equivalence (cf comments for guessConstraintInheritance), > pg_dump is still doing that for inherited DEFAULT expressions, with > the predictable result that it does the wrong thing in this sort > of example. > > Furthermore, as I looked more closely at the code, I realized that > there is another pretty fundamental issue: if an inherited column has a > default expression or NOT NULL bit that it did not inherit from its > parent, flagInhAttrs forces the column to be treated as non-inherited, > so that it will be emitted as part of the child table's CREATE TABLE > command. This is *wrong* if the column is not attislocal; it will > result in the column incorrectly having the attislocal property after > restore. (Note: such a situation could only arise if the user had > altered the column's default or NOT NULL property with ALTER TABLE after > creation.) > > All of this logic predates the invention of attislocal, and really is > attempting to make up for the lack of that bit, so it's not all that > surprising that it falls down. > > So the attached patch makes the emit-column-or-not behavior depend only > on attislocal (except for binary upgrade which has its own kluge > solution for the problem; though note that the whether-to-dump tests now > exactly match the special cases for binary upgrade, which they did not > before). Also, I've dropped the former attempts to exploit inheritance > of defaults, and so the code will now emit a default explicitly for each > child in an inheritance hierarchy, even if it didn't really need to. > Since the backend doesn't track whether defaults are inherited, this > doesn't cause any failure to restore the catalog state properly. > > Although this is a bug fix, it's a nontrivial change in the logic and > so I'm hesitant to back-patch into stable branches. Given the lack of > prior complaints, maybe it would be best to leave it unfixed in existing > branches? Not sure. Thoughts? I guess I'd be in favor of back-patching it, if that doesn't look like too much of a job. We shouldn't assume that because only one person reports a problem, no one else has been or will be affected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 9 February 2012 14:51, Robert Haas wrote: > I'm not sure I entirely follow all this, but I'll look at the code > once you have it. I have attached a revision of the patch, with the adjustments already described. Note that I have not made this support btree tuplesorting yet, as there is an impedance mismatch that must be resolved, particularly with the SortSupport stuff, and I wanted to know what you think of the multiple key specialisation first. Arguably, we could get away with only a single specialisation - I haven't really though about it much. You say "Well, how often will we sort 10,000 integers?", and I think that btree index creation is one very common and useful case, so I'd like to look at that in more detail. I certainly don't see any reason to not do it too. This should give you performance for sorting multiple-keys that is almost as good as the single-key optimisation that you found to be more compelling. Obviously the need to actually call comparetup_heap to look at non-leading sortkeys will vary from case to case, and this is based on your test case, where there are no duplicates and thus no need to ever do that. That isn't too far from representative, as I think that in general, a majority of comparisons won't result in equality. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c new file mode 100644 index 1452e8c..e040ace *** a/src/backend/utils/sort/tuplesort.c --- b/src/backend/utils/sort/tuplesort.c *** *** 113,118 --- 113,119 #include "utils/pg_rusage.h" #include "utils/rel.h" #include "utils/sortsupport.h" + #include "utils/template_qsort_arg.h" #include "utils/tuplesort.h" *** struct Tuplesortstate *** 281,286 --- 282,301 int currentRun; /* + * Will invocations of a tuple-class encapsulating comparator + * (comparetup_heap, comparetup_index_btree, etc) skip the leading key, + * because that has already been compared elsewhere? + */ + bool skipLeadingkey; + + /* + * This specialization function pointer is sometimes used as an alternative + * to the standard qsort_arg, when it has been determined that we can + * benefit from various per number-of-sortkey performance optimizations. + */ + void (*qsort_arg_spec)(void *a, size_t n, size_t es, void *arg); + + /* * Unless otherwise noted, all pointer variables below are pointers to * arrays of length maxTapes, holding per-tape data. */ *** static void readtup_datum(Tuplesortstate *** 492,497 --- 507,519 static void reversedirection_datum(Tuplesortstate *state); static void free_sort_tuple(Tuplesortstate *state, SortTuple *stup); + /* + * A set of type-neutral specializations, for single sortkey and multiple + * sortkey sorts. These specialization are used for sorting both heap and + * btree index tuples that meet that criteria. + */ + DO_TEMPLATE_QSORT_ARG(single, ONE_ADDITIONAL_CODE, single_comparetup_inline) + DO_TEMPLATE_QSORT_ARG(mult, MULTI_ADDITIONAL_CODE, mult_comparetup_inline) /* * tuplesort_begin_xxx *** tuplesort_begin_heap(TupleDesc tupDesc, *** 631,636 --- 653,661 PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey); } + state->qsort_arg_spec = + nkeys==1?single_qsort_arg:mult_qsort_arg; + state->skipLeadingkey = true; MemoryContextSwitchTo(oldcontext); return state; *** tuplesort_performsort(Tuplesortstate *st *** 1222,1232 * amount of memory. Just qsort 'em and we're done. */ if (state->memtupcount > 1) ! qsort_arg((void *) state->memtuples, ! state->memtupcount, ! sizeof(SortTuple), ! (qsort_arg_comparator) state->comparetup, ! (void *) state); state->current = 0; state->eof_reached = false; state->markpos_offset = 0; --- 1247,1273 * amount of memory. Just qsort 'em and we're done. */ if (state->memtupcount > 1) ! { ! /* Use a sorting specialization if available */ ! if (state->qsort_arg_spec) ! /* specialization available */ ! state->qsort_arg_spec( ! (void *) state->memtuples, ! state->memtupcount, ! sizeof(SortTuple), ! (void *) state); ! else ! /* ! * Fall back on regular qsort_arg, with function pointer ! * comparator, making no compile-time assumptions about the ! * number of sortkeys or the datatype(s) to be sorted. ! */ ! qsort_arg((void *) state->memtuples, ! state->memtupcount, ! sizeof(SortTuple), ! (qsort_arg_comparator) state->comparetup, ! (void *) state); ! } state->current = 0; state->eof_reached = false; state->markpos_offset = 0; *** make_bounded_heap(Tuplesortstate *state) *** 2407,2412 ---
Re: [HACKERS] psql tab completion for SELECT
Robert Haas writes: > On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane wrote: >> That seems pretty nearly entirely bogus. What is the argument for >> supposing that the word right after SELECT is a function name? > It isn't necessarily, but it might be. It'd certainly be nice to type: > SELECT pg_si > and get: > SELECT pg_size_pretty( Yeah, and then you'll type SELECT pg_size_pretty(pg_dat and get nothing, and curse the authors of such a misbegotten incomplete concept that leads your fingers to rely on something that doesn't work where it should. I'm not against tab-completing functions, if people think that's useful. I am against tab-completing them in 1% of use-cases, which is what this patch accomplishes. The fact that it's short doesn't make it good. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
On 10/02/12 08:50, Robert Haas wrote: > On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane wrote: > > Peter Eisentraut writes: > >> That seems pretty useful, and it's more or less a one-line change, as in > >> the attached patch. > > > > That seems pretty nearly entirely bogus. What is the argument for > > supposing that the word right after SELECT is a function name? I would > > think it would be a column name (from who-knows-what table) much more > > often. > > It isn't necessarily, but it might be. It'd certainly be nice to type: > > SELECT pg_si > > and get: > > SELECT pg_size_pretty( > Well the important problem in completion is how the dictionary of possible terms is determined and how much context info is used to reduce / enhance that dictionary. case 1: select x case 2: select x from bar Possible dictionaries in case 1: 1.1 complete nothing 1.2 complete assuming the union of all columns from all tables in the search_path as the dictionary. 1.3 complete assuming the union of all function names in the search_path as the dictionary 1.4 complete assuming 1.2 and 1.3 Possible dictionaries in case 2: 2.1 treat it like case 1 2.2 complete assuming the union of all columns from bar in the search_path as the dictionary 2.3 2.2 + 1.3 Now these are heuristics and the decision becomes a question of usefulness vs amount of time it costs to implement and possibly how expensive computing the dictionary is. I personally would like 1.3 in case 1 and 2.3 in case 2, because I expect 1.2 to be to show too many possible completions to be useful. But even 1.3 in both cases wouldn't be that bad. Cheers, Bene -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
2012/2/10 Robert Haas : > On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane wrote: >> Peter Eisentraut writes: >>> That seems pretty useful, and it's more or less a one-line change, as in >>> the attached patch. >> >> That seems pretty nearly entirely bogus. What is the argument for >> supposing that the word right after SELECT is a function name? I would >> think it would be a column name (from who-knows-what table) much more >> often. > > It isn't necessarily, but it might be. It'd certainly be nice to type: > > SELECT pg_si > > and get: > > SELECT pg_size_pretty( +1 Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql tab completion for SELECT
On Fri, Feb 10, 2012 at 1:24 AM, Tom Lane wrote: > Peter Eisentraut writes: >> That seems pretty useful, and it's more or less a one-line change, as in >> the attached patch. > > That seems pretty nearly entirely bogus. What is the argument for > supposing that the word right after SELECT is a function name? I would > think it would be a column name (from who-knows-what table) much more > often. It isn't necessarily, but it might be. It'd certainly be nice to type: SELECT pg_si and get: SELECT pg_size_pretty( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Collecting statistics on CSV file data
(2011/12/15 11:30), Etsuro Fujita wrote: > (2011/12/14 15:34), Shigeru Hanada wrote: >> I think this patch could be marked as "Ready for committer" with some >> minor fixes. Please find attached a revised patch (v6.1). I've tried to make pgsql_fdw work with this feature, and found that few static functions to be needed to exported to implement ANALYZE handler in short-cut style. The "Short-cut style" means the way to generate statistics (pg_class and pg_statistic) for foreign tables without retrieving sample data from foreign server. Attached patch (export_funcs.patch) exports examine_attribute and update_attstats which are necessary to implement ANALYZE handler for pgsql_fdw. In addition to exporting, update_attstats is also renamed to vac_update_attstats to fit with already exported function vac_update_relstats. I also attached archive of WIP pgsql_fdw with ANALYZE support. This version has better estimation than original pgsql_fdw, because it can use selectivity of qualifiers evaluated on local side to estimate number of result rows. To show the effect of ANALYZE clearly, WHERE push-down feature is disabled. Please see pgsqlAnalyzeForeignTable and store_remote_stats in pgsql_fdw.c. I used pgbench_accounts tables with 3 records, and got reasonable rows estimation for queries below. postgres=# UPDATE pgbench_accounts SET filler = NULL postgres-# WHERE aid % 3 = 0; postgres=# ANALYZE; postgres=# ANALYZE pgbench_accounts; -- needs explicit table name postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE filler IS NULL; QUERY PLAN Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=100030 width=97) Filter: (filler IS NULL) Remote SQL: DECLARE pgsql_fdw_cursor_13 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 100; QUERY PLAN Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=96 width=97) Filter: (aid < 100) Remote SQL: DECLARE pgsql_fdw_cursor_14 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 1000; QUERY PLAN Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=1004 width=97) Filter: (aid < 1000) Remote SQL: DECLARE pgsql_fdw_cursor_15 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) In implementing ANALYZE handler, hardest part was copying anyarray values from remote to local. If we can make it common in core, it would help FDW authors who want to implement ANALYZE handler without retrieving sample rows from remote server. Regards, -- Shigeru Hanada commit bb28cb5a69aae3bd9c7fbebc8b9483d23711bec4 Author: Shigeru Hanada Date: Thu Feb 9 16:06:14 2012 +0900 Export functions which are useful for FDW analyze support. Export examine_attribute and update_attstats (with renaming to vac_update_attstats) which are useful (and nealy required) to implement short-cut version of ANALYZE handler in FDWs. diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 6a22d49..d0a323a 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -94,8 +94,6 @@ static void compute_index_stats(Relation onerel, double totalrows, AnlIndexData *indexdata, int nindexes, HeapTuple *rows, int numrows, MemoryContext col_context); -static VacAttrStats *examine_attribute(Relation onerel, int attnum, - Node *index_expr); static int acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows, @@ -105,8 +103,6 @@ static int acquire_inherited_sample_rows(Relation onerel, double *totalrows, double *totaldeadrows, BlockNumber *totalpages, int elevel); static int compare_rows(const void *a, const void *b); -static void update_attstats(Oid relid, bool inh, - int natts, VacAttrStats **vacattrstats); static Datum std_fetch_func(VacAttrStatsP stats, int
Re: [HACKERS] RFC: Making TRUNCATE more "MVCC-safe"
On Thu, Feb 09, 2012 at 11:11:16PM +0200, Marti Raudsepp wrote: > I've always been a little wary of using the TRUNCATE command due to > the warning in the documentation about it not being "MVCC-safe": > queries may silently give wrong results and it's hard to tell when > they are affected. > > That got me thinking, why can't we handle this like a standby server > does -- if some query has data removed from underneath it, it aborts > with a serialization failure. > > Does this solution sound like a good idea? > > The attached patch is a lame attempt at implementing this. I added a > new pg_class.relvalidxmin attribute which tracks the Xid of the last > TRUNCATE (maybe it should be called reltruncatexid?). Whenever > starting a relation scan with a snapshot older than relvalidxmin, an > error is thrown. This seems to work out well since TRUNCATE updates > pg_class anyway, and anyone who sees the new relfilenode automatically > knows when it was truncated. I like the design you have chosen. It would find applications beyond TRUNCATE, so your use of non-specific naming is sound. For example, older snapshots will see an empty table "t" after "CREATE TABLE t AS SELECT 1" commits; that's a comparable MVCC anomaly. Some of our non-MVCC-safe commands should perhaps just become MVCC-safe, but there will always be use cases for operations that shortcut MVCC. When one truly does want that, your proposal for keeping behavior consistent makes plenty of sense. > Should I also add another counter to pg_stat_database_conflicts? > Currently this table is only used on standby servers. > ERROR: canceling statement due to conflict with TRUNCATE TABLE on foo > DETAIL: Rows visible to this transaction have been removed. My initial reaction is not to portray this like a recovery conflict, since several aspects distinguish it from all recovery conflict types. (I have not read your actual patch.) Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog location arithmetic
On Fri, Feb 10, 2012 at 7:00 AM, Euler Taveira de Oliveira wrote: > On 08-02-2012 09:35, Fujii Masao wrote: > > Fujii, new patch attached. Thanks for your tests. Thanks for the new patch! >> But another problem happened. When I changed pg_proc.h so that the unused >> OID was assigned to pg_xlog_location_diff(), and executed the above again, >> I encountered the segmentation fault: >> > I reproduced the problems in my old 32-bit laptop. I fixed it casting to > int64. I also updated the duplicated OID. Yep, in the updated patch, I could confirm that the function works fine without any error in my machine. The patch looks fine to me except the following minor comments: In the document, it's better to explain clearly that the function subtracts the second argument from the first. -These functions cannot be executed during recovery. + These functions cannot be executed during recovery (except + pg_xlog_location_diff). + pg_xlog_location_diff calculates the difference in bytes + between two transaction log locations. It can be used with + pg_stat_replication or some functions shown in +to get the replication lag. Very minor comment: you should use spaces rather than a tab to indent each line. >> Why OID needs to be reassigned? >> > There isn't a compelling reason. It is just a way to say: "hey, it is another > function with the same old name". > > I'll not attach another version for pg_size_pretty because it is a matter of > updating a duplicated OID. Okay, I reviewed the previous patch again. That looks fine to me. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers