Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Hi David, Sorry, but I do not completely understand your suggestions: 1. IMCS really contains single patch file sysv_shmem.patch. Applying this patch is not mandatory for using IMCS: it just solves the problem with support of > 256Gb of shared memory. Right now PostgreSQL is not able to use more than 256Gb shared buffers at Linux with standard 4kb pages. I have found proposal for using MAP_HUGETLB flag in commit fest: http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org but unfortunately it was rejected. Hugepages are intensively used by Oracle and I think that them will be useful for improving performance of PorstreSQL. So not just IMCS can benefit from this patch. My patch is much more simple - I specially limited scope of this patch to one file. Certainly switch huge tlb on/off should be done through postgresql.conf configuration file. In any case - IMCS can be used without this patch: you just could not use more than 256Gb memory, even if your system has more RAM. 2. I do not understand "The add-on is not formatted as an EXTENSION" IMCS was created as standard extension - I just look at the examples of other PostgreSQL extensions included in PostgreSQL distribution (for example pg_stat_statements). It can be added using "create extension imcs" and removed "drop extension imcs" commands. If there are some violations of PostgreSQL extensions rules, please let me know, I will fix them. But I thought that I have done everything in legal way. On 01/04/2014 03:21 AM, David Fetter wrote: On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote: I want to announce implementation of In-Memory Columnar Store extension for PostgreSQL. Vertical representation of data is stored in PostgreSQL shared memory. Thanks for the hard work! I noticed a couple of things about this that probably need some improvement. 1. There are unexplained patches against other parts of PostgreSQL, which means that they may break other parts of PostgreSQL in equally inexplicable ways. Please rearrange the patch so it doesn't require this. This leads to: 2. The add-on is not formatted as an EXTENSION, which would allow people to add it or remove it cleanly. Would you be so kind as to fix these? Cheers, David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] cleanup in code
1. compiling with msvc shows warning in relcache.c 1>e:\workspace\postgresql\master\postgresql\src\backend\utils\cache\relcache.c(3959): warning C4715: 'RelationGetIndexAttrBitmap' : not all control paths return a value Attached patch remove_msvc_warning.patch to remove above warning 2. It seems option K is not used in pg_dump: while ((c = getopt_long(argc, argv, "abcCd:E:f:F:h:ij:K:n:N:oOp:RsS:t:T:U:vwWxZ:", long_options, &optindex)) != -1) I have checked both docs and code but didn't find the use of this option. Am I missing something here? Attached patch remove_redundant_option_K_pgdump.patch to remove this option from code. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com remove_msvc_warning.patch Description: Binary data remove_redundant_option_K_pgdump.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane wrote: > BTW, so far as the syntax goes, I'm quite distressed by having to make > REJECTS into a fully-reserved word. It's not reserved according to the > standard, and it seems pretty likely to be something that apps might be > using as a table or column name. I've been looking at this, but I'm having a hard time figuring out a way to eliminate shift/reduce conflicts while not maintaining REJECTS as a fully reserved keyword - I'm pretty sure it's impossible with an LALR parser. I'm not totally enamored with the exact syntax proposed -- I appreciate the flexibility on the one hand, but on the other hand I suppose that REJECTS could just as easily be any number of other words. One possible compromise would be to use a synonym that is not imagined to be in use very widely, although I looked up "reject" in a thesaurus and didn't feel too great about that idea afterwards. Another idea would be to have a REJECTING keyword, as the sort of complement of RETURNING (currently you can still ask for RETURNING, without REJECTS but with ON DUPLICATE KEY LOCK FOR UPDATE if that happens to make sense). I think that would work fine, and might actually be more elegant. Now, REJECTING will probably have to be a reserved keyword, but that seems less problematic, particularly as RETURNING is itself a reserved keyword not described by the standard. In my opinion REJECTING would reinforce the notion of projecting the complement of what RETURNING would project in the same context. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new json funcs
Here is a patch for the new json functions I mentioned a couple of months ago. These are: json_to_record json_to_recordset json_object json_build_array json_build_object json_object_agg So far there are no docs, but the way these work is illustrated in the regression tests - I hope to have docs within a few days. cheers andrew diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index af8ddc6..9b9d11f 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -68,6 +68,11 @@ static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims, bool use_line_feeds); static void array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds); +static void +datum_to_json(Datum val, bool is_null, StringInfo result, + TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar); +static void add_json(Datum orig_val, bool is_null, StringInfo result, + Oid val_type, bool key_scalar); /* the null action object used for pure validation */ static JsonSemAction nullSemAction = @@ -1217,7 +1222,7 @@ extract_mb_char(char *s) */ static void datum_to_json(Datum val, bool is_null, StringInfo result, - TYPCATEGORY tcategory, Oid typoutputfunc) + TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar) { char *outputstr; text *jsontext; @@ -1239,23 +1244,31 @@ datum_to_json(Datum val, bool is_null, StringInfo result, composite_to_json(val, result, false); break; case TYPCATEGORY_BOOLEAN: - if (DatumGetBool(val)) -appendStringInfoString(result, "true"); + if (!key_scalar) +appendStringInfoString(result, DatumGetBool(val) ? "true" : "false"); else -appendStringInfoString(result, "false"); +escape_json(result, DatumGetBool(val) ? "true" : "false"); break; case TYPCATEGORY_NUMERIC: outputstr = OidOutputFunctionCall(typoutputfunc, val); - /* - * Don't call escape_json here if it's a valid JSON number. - */ - dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr; - dummy_lex.input_length = strlen(dummy_lex.input); - json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error); - if (! numeric_error) -appendStringInfoString(result, outputstr); - else + if (key_scalar) + { +/* always quote keys */ escape_json(result, outputstr); + } + else + { +/* + * Don't call escape_json for a non-key if it's a valid JSON number. + */ +dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr; +dummy_lex.input_length = strlen(dummy_lex.input); +json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error); +if (! numeric_error) + appendStringInfoString(result, outputstr); +else + escape_json(result, outputstr); + } pfree(outputstr); break; case TYPCATEGORY_JSON: @@ -1273,6 +1286,10 @@ datum_to_json(Datum val, bool is_null, StringInfo result, break; default: outputstr = OidOutputFunctionCall(typoutputfunc, val); + if (key_scalar && *outputstr == '\0') +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must not be empty"))); escape_json(result, outputstr); pfree(outputstr); break; @@ -1306,7 +1323,7 @@ array_dim_to_json(StringInfo result, int dim, int ndims, int *dims, Datum *vals, if (dim + 1 == ndims) { datum_to_json(vals[*valcount], nulls[*valcount], result, tcategory, - typoutputfunc); + typoutputfunc, false); (*valcount)++; } else @@ -1487,13 +1504,85 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds) else tcategory = TypeCategory(tupdesc->attrs[i]->atttypid); - datum_to_json(val, isnull, result, tcategory, typoutput); + datum_to_json(val, isnull, result, tcategory, typoutput, false); } appendStringInfoChar(result, '}'); ReleaseTupleDesc(tupdesc); } +static void +add_json(Datum orig_val, bool is_null, StringInfo result, Oid val_type, bool key_scalar) +{ +Datum val; + TYPCATEGORY tcategory; + Oid typoutput; + bool typisvarlena; + Oid castfunc = InvalidOid; + +if (val_type == InvalidOid) +ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not determine input data type"))); + + + getTypeOutputInfo(val_type, &typoutput, &typisvarlena); + + if (val_type > FirstNormalObjectId) + { + HeapTuple tuple; + Form_pg_cast castForm; + + tuple = SearchSysCache2(CASTSOURCETARGET, + ObjectIdGetDatum(val_type), + ObjectIdGetDatum(JSONOID)); + if (HeapTupleIsValid(tuple)) + { +castForm = (Form_pg_cast) GETSTRUCT(tuple); + +if (castForm->castmethod == COERCION_METHOD_FUNCTION) + castfunc = typoutput = castForm->castfunc; + +ReleaseSysCache(tuple); + } + } + + if (castfunc != InvalidOid) + tcategory = TYPCATEGORY_JSON_CAST; + else if (val_type == RECORDARRAYOID) + tcategory =
[HACKERS] [PATCH] Support for pg_stat_archiver view
Hello, please find attached the patch that adds basic support for the pg_stat_archiver system view, which allows users that have continuous archiving procedures in place to keep track of some important metrics and information. Currently, pg_stat_archiver displays: * archived_wals: number of successfully archived WAL files since start (or the last reset) * last_archived_wal: last successfully archived WAL file * last_archived_wal_time: timestamp of the latest successful WAL archival * stats_reset: time of last stats reset This is an example of output: postgres=# select * from pg_stat_archiver ; -[ RECORD 1 ]--+-- archived_wals | 1 last_archived_wal | 00010001 last_archived_wal_time | 2014-01-04 01:01:08.858648+01 stats_reset| 2014-01-04 00:59:25.895034+01 Similarly to pg_stat_bgwriter, it is possible to reset statistics just for this context, calling the pg_stat_reset_shared('archiver') function. The patch is here for discussion and has been prepared against HEAD. It includes also changes in the documentation and the rules.out test. I plan to add further information to the pg_stat_archiver view, including the number of failed attempts of archival and the WAL and timestamp of the latest failure. However, before proceeding, I'd like to get some feedback on this small patch as well as advice on possible regression tests to be added. Thank you. Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 4ec6981..6d45972 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -270,6 +270,14 @@ postgres: user database host + pg_stat_archiverpg_stat_archiver + One row only, showing statistics about the + WAL archiver process's activity. See +for details. + + + + pg_stat_bgwriterpg_stat_bgwriter One row only, showing statistics about the background writer process's activity. See @@ -648,6 +656,49 @@ postgres: user database host + + pg_stat_archiver View + + + + + Column + Type + Description + + + + + + archived_wals + bigint + Number of WAL files that have been successfully archived + + + last_archived_wal + text + Name of the last successfully archived WAL file + + + last_archived_wal_time + timestamp with time zone + Time of the last successful archival operation + + + stats_reset + timestamp with time zone + Time at which these statistics were last reset + + + + + + + The pg_stat_archiver view will always have a + single row, containing data about the archiver process of the cluster. + + + pg_stat_bgwriter View @@ -1613,6 +1664,8 @@ postgres: user database host pg_stat_reset_shared('bgwriter') will zero all the counters shown in the pg_stat_bgwriter view. + Calling pg_stat_reset_shared('archiver') will zero all the + counters shown in the pg_stat_archiver view. diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 575a40f..3a8d7b4 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -672,6 +672,13 @@ CREATE VIEW pg_stat_xact_user_functions AS WHERE P.prolang != 12 -- fast check to eliminate built-in functions AND pg_stat_get_xact_function_calls(P.oid) IS NOT NULL; +CREATE VIEW pg_stat_archiver AS +SELECT +pg_stat_get_archiver_archived_wals() AS archived_wals, +pg_stat_get_archiver_last_archived_wal() AS last_archived_wal, +pg_stat_get_archiver_last_archived_wal_time() AS last_archived_wal_time, +pg_stat_get_archiver_stat_reset_time() AS stats_reset; + CREATE VIEW pg_stat_bgwriter AS SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 2bb572e..3cb1ddc 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -36,6 +36,7 @@ #include "access/xlog_internal.h" #include "libpq/pqsignal.h" #include "miscadmin.h" +#include "pgstat.h" #include "postmaster/fork_process.h" #include "postmaster/pgarch.h" #include "postmaster/postmaster.h" @@ -46,6 +47,7 @@ #include "storage/pmsignal.h" #include "utils/guc.h" #include "utils/ps_status.h" +#include "utils/timestamp.h" /* -- @@ -755,4 +757,12 @@ pgarch_archiveDone(char *xlog) (errcode_for_file_access(), errmsg("could not rename file \"%s\" to \"%s\": %m", rlogready, rlogdone
Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote: > I want to announce implementation of In-Memory Columnar Store > extension for PostgreSQL. > Vertical representation of data is stored in PostgreSQL shared memory. Thanks for the hard work! I noticed a couple of things about this that probably need some improvement. 1. There are unexplained patches against other parts of PostgreSQL, which means that they may break other parts of PostgreSQL in equally inexplicable ways. Please rearrange the patch so it doesn't require this. This leads to: 2. The add-on is not formatted as an EXTENSION, which would allow people to add it or remove it cleanly. Would you be so kind as to fix these? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Jan 3, 2014 at 7:39 AM, Peter Eisentraut wrote: > This patch doesn't apply anymore. Yes, there was some bit-rot. I previous deferred dealing with a shift/reduce conflict implied by commit 1b4f7f93b4693858cb983af3cd557f6097dab67b. I've fixed that problem now using non operator precedence, and performed a clean rebase on master. I've also fixed the basis of your much earlier complaint about breakage of ecpg's regression tests (without adding support for the feature to ecpg). All make check-world tests pass. Patch is attached. I have yet to figure out how to make REJECTS a non-reserved keyword, or even just a type_func_name_keyword, though intuitively I have a sense that the latter ought to be possible. This is the same basic patch as benchmarked above, with various tricks to avoid stronger lock acquisition when that's likely profitable (we can even do _bt_check_unique() with only a shared lock and no hwlock much of the time, on the well-informed suspicion that it won't be necessary to insert, but only to return a TID). There has also been some clean-up to aspects of serializable behavior, but that needs further attention and scrutiny from a subject matter expert, hopefully Heikki. Though it's probably also true that I should find time to think about transaction isolation some more. I've since had another idea relating to performance optimization, which was to hint that the last attempt to insert a key was unsuccessful, so the next one (after the conflicting transaction's commit/abort) of that same value will very likely conflict too, making lock avoidance profitable on average. This appears to be much more effective than the previous woolly heuristic (never published, just benchmarked), which I've left in as an additional reason to avoid heavyweight locking, if only for discussion. This benchmark now shows my approach winning convincingly with this additional "priorConflict" optimization: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/ If someone had time to independently recreate the benchmark I have here, or perhaps to benchmark the patch in some other way, that would be useful (for full details see my recent e-mail about the prior benchmark, where the exact details are described - this is the same, but with one more run for the priorConflict optimization). Subtleties of visibility also obviously deserve closer inspection, but perhaps I shouldn't be so hasty: No consensus on the way forward looks even close to emerging. How do people feel about my approach now? -- Peter Geoghegan btreelock_insert_on_dup.v6.2014_01_03.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] costing of hash join
Jeff Janes writes: > I'm trying to figure out why hash joins seem to be systematically underused > in my hands. In the case I am immediately looking at it prefers a merge > join with both inputs getting seq scanned and sorted, despite the hash join > being actually 2 to 3 times faster, where inputs and intermediate working > sets are all in memory. I normally wouldn't worry about a factor of 3 > error, but I see this a lot in many different situations. The row > estimates are very close to actual, the errors is only in the cpu estimates. Can you produce a test case for other people to look at? What datatype(s) are the join keys? > A hash join is charged cpu_tuple_cost for each inner tuple for inserting it > into the hash table: Doesn't seem like monkeying with that is going to account for a 3x error. Have you tried using perf or oprofile or similar to see where the time is actually, rather than theoretically, going? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] costing of hash join
I'm trying to figure out why hash joins seem to be systematically underused in my hands. In the case I am immediately looking at it prefers a merge join with both inputs getting seq scanned and sorted, despite the hash join being actually 2 to 3 times faster, where inputs and intermediate working sets are all in memory. I normally wouldn't worry about a factor of 3 error, but I see this a lot in many different situations. The row estimates are very close to actual, the errors is only in the cpu estimates. A hash join is charged cpu_tuple_cost for each inner tuple for inserting it into the hash table: * charge one cpu_operator_cost for each column's hash function. Also, * tack on one cpu_tuple_cost per inner row, to model the costs of * inserting the row into the hashtable. But a sort is not charged a similar charge to insert a tuple into the sort memory pool: * Also charge a small amount (arbitrarily set equal to operator cost) per * extracted tuple. We don't charge cpu_tuple_cost because a Sort node * doesn't do qual-checking or projection, so it has less overhead than * most plan nodes. Note it's correct to use tuples not output_tuples Are these operations different enough to justify this difference? The qual-checking (and I think projection) needed on a hash join should have already been performed by and costed to the seq scan feeding the hashjoin, right? Cheers, Jeff
Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
We had the same issues running 9.2.4: [2013-10-15 00:23:01 GMT/0/15396] WARNING: page 8789807 of relation base/16429/2349631976 is uninitialized [2013-10-15 00:23:01 GMT/0/15396] CONTEXT: xlog redo vacuum: rel 1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0 [2013-10-15 00:23:01 GMT/0/15396] PANIC: WAL contains references to invalid pages [2013-10-15 00:23:01 GMT/0/15396] CONTEXT: xlog redo vacuum: rel 1663/16429/2349631976; blk 8858544, lastBlockVacuumed 0 [2013-10-15 00:23:11 GMT/0/15393] LOG: startup process (PID 15396) was terminated by signal 6: Aborted [2013-10-15 00:23:11 GMT/0/15393] LOG: terminating any other active server processes Also on an index. I ended up manually patching the heap files at that block location to "fix" the problem. It happened again about 2 weeks after that, then never again. It hit all connected secondaries. On Fri, Jan 3, 2014 at 12:50 PM, Sergey Konoplev wrote: > On Thu, Jan 2, 2014 at 11:59 AM, Christophe Pettus wrote: >> In both cases, the indicated relation was a primary key index. In one case, >> rebuilding the primary key index caused the problem to go away permanently >> (to date). In the second case, the problem returned even after a full dump >> / restore of the master database (that is, after a dump / restore of the >> master, and reimaging the secondary, the problem returned at the same >> primary key index, although of course with a different OID value). >> >> It looks like this has been experienced on 9.2.6, as well: >> >> >> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jxe6v4r7h...@mail.gmail.com > > This problem worries me a lot too. If someone is interested I still > have a file system copy of the buggy cluster including WAL. > > -- > Kind regards, > Sergey Konoplev > PostgreSQL Consultant and DBA > > http://www.linkedin.com/in/grayhemp > +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 > gray...@gmail.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication bug in 9.3.2, "WAL contains references to invalid pages"
On Thu, Jan 2, 2014 at 11:59 AM, Christophe Pettus wrote: > In both cases, the indicated relation was a primary key index. In one case, > rebuilding the primary key index caused the problem to go away permanently > (to date). In the second case, the problem returned even after a full dump / > restore of the master database (that is, after a dump / restore of the > master, and reimaging the secondary, the problem returned at the same primary > key index, although of course with a different OID value). > > It looks like this has been experienced on 9.2.6, as well: > > > http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jxe6v4r7h...@mail.gmail.com This problem worries me a lot too. If someone is interested I still have a file system copy of the buggy cluster including WAL. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN extension bug? (with patch)
If so, there is only the one-liner patch to consider. This patch doesn't apply anymore. Please submit an updated patch for the commit fest. In src/include/utils/elog.h there is an include for "utils/errcodes.h" which is generated somehow when compiling postgresql but not present by default. So you have to compile postgresql and then the contrib, or use PGXS with an already installed version. With this caveat, the one-liner patch (4 characters removed) reattached does compile for me: sh> git branch ismn2 sh> git checkout ismn2 sh> patch -p1 < ~/ismn-checksum.patch patching file contrib/isn/isn.c sh> ... sh> cd contrib/isn sh> make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fpic -I. -I. -I../../src/include -D_GNU_SOURCE -c -o isn.o isn.c gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fpic -L../../src/port -L../../src/common -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags -shared -o isn.so isn.o sh> -- Fabiendiff --git a/contrib/isn/isn.c b/contrib/isn/isn.c index 3db6b84..c4e2333 100644 --- a/contrib/isn/isn.c +++ b/contrib/isn/isn.c @@ -827,7 +827,7 @@ string2ean(const char *str, bool errorOK, ean13 *result, case ISMN: strncpy(buf, "9790", 4); /* this isn't for sure yet, for now * ISMN it's only 9790 */ - valid = (valid && ((rcheck = checkdig(buf + 3, 10)) == check || magic)); + valid = (valid && ((rcheck = checkdig(buf, 13)) == check || magic)); break; case ISBN: strncpy(buf, "978", 3); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire wrote: > On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane wrote: >> Claudio Freire writes: >>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer wrote: Loading data into the database isn't such an uncommon task. Not everything is OLTP. >> >>> Truly, but a sustained insert stream of 10 Mbps is certainly way >>> beyond common non-OLTP loads. This is far more specific than non-OLTP. >> >> I think Florian has a good point there, and the reason is this: what >> you are talking about will be of exactly zero use to applications that >> want to see the results of one query before launching the next. Which >> eliminates a whole lot of apps. I suspect that almost the *only* >> common use case in which a stream of queries can be launched without >> feedback is going to be bulk data loading. It's not clear at all >> that pipelining the PQexec code path is the way to better performance >> for that --- why not use COPY, instead? > > You're forgetting ORM workloads. > > ORMs can usually plan the inserts to be in a sequence that both don't > require feedback (except the knowledge that they were successful), and > that do not violate constraints. > > Flushing a whole object hierarchy for instance, can be done without > feedback. Not even serial columns need feedback, since many ORMs > (SQLAlchemy, Hibernate) support allocation of ID sequences in batches > (by issuing a proper select nextval). > > I agree, that with the proposed API, it's too error prone to be > useful. But I also think, if the API is simple and fool-proof enough, > it could be "build them and they will come". I know I'll be happy to > implement support for SQLAlchemy (since it will benefit me), if the > API resembles the proposition below (at least in simplicity). > > Per-query expectations could be such a thing. And it can even work with > PQexec: > > PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);"); > --read-- > PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);"); > --read-- > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > ... 9 times... > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC); > do { >// do something useful > } while (PQflush()); > > Here, the PQASYNC flag would temporarily switch to non-blocking I/O, > and buffer what cannot be sent. PQASNC_CORK, would only buffer (only > send if the buffer is full). After any ASYNC call, PQflush would be > necessary (to flush the send queue and to consume the expected > responses), but I can imagine any synchronous call (PQexec, > PQsendQuery or whatever) could detect a non-empty buffer and just > blockingly flush right there. > > This can benefit many useful patterns. ORM flush, is one, if there can > be preallocation of IDs (which I know at least SQLAlchemy and > Hibernate both support). > > Execute-many of prepared statements is another one, quite common. > > I'm not sure what would happen if one of the queries returned an > error. If in a transaction, all the following queries would error out > I'd imagine. If not, they would simply be executed blindly.. am I > correct? Long term, I'd rather see an optimized 'ORM flush' assemble the data into a structured data set (perhaps a JSON document) and pass it to some receiving routine that decomposed it into records. This is a better way to so things on so many levels. Maybe I'm an old cranky guy yelling at pigeons, but I don't think the current approach that many ORMs take is going to withstand the test of time. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane wrote: > Claudio Freire writes: >> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer wrote: >>> Loading data into the database isn't such an uncommon task. Not everything >>> is OLTP. > >> Truly, but a sustained insert stream of 10 Mbps is certainly way >> beyond common non-OLTP loads. This is far more specific than non-OLTP. > > I think Florian has a good point there, and the reason is this: what > you are talking about will be of exactly zero use to applications that > want to see the results of one query before launching the next. Which > eliminates a whole lot of apps. I suspect that almost the *only* > common use case in which a stream of queries can be launched without > feedback is going to be bulk data loading. It's not clear at all > that pipelining the PQexec code path is the way to better performance > for that --- why not use COPY, instead? You're forgetting ORM workloads. ORMs can usually plan the inserts to be in a sequence that both don't require feedback (except the knowledge that they were successful), and that do not violate constraints. Flushing a whole object hierarchy for instance, can be done without feedback. Not even serial columns need feedback, since many ORMs (SQLAlchemy, Hibernate) support allocation of ID sequences in batches (by issuing a proper select nextval). I agree, that with the proposed API, it's too error prone to be useful. But I also think, if the API is simple and fool-proof enough, it could be "build them and they will come". I know I'll be happy to implement support for SQLAlchemy (since it will benefit me), if the API resembles the proposition below (at least in simplicity). Per-query expectations could be such a thing. And it can even work with PQexec: PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);"); --read-- PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);"); --read-- PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); ... 9 times... PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC); do { // do something useful } while (PQflush()); Here, the PQASYNC flag would temporarily switch to non-blocking I/O, and buffer what cannot be sent. PQASNC_CORK, would only buffer (only send if the buffer is full). After any ASYNC call, PQflush would be necessary (to flush the send queue and to consume the expected responses), but I can imagine any synchronous call (PQexec, PQsendQuery or whatever) could detect a non-empty buffer and just blockingly flush right there. This can benefit many useful patterns. ORM flush, is one, if there can be preallocation of IDs (which I know at least SQLAlchemy and Hibernate both support). Execute-many of prepared statements is another one, quite common. I'm not sure what would happen if one of the queries returned an error. If in a transaction, all the following queries would error out I'd imagine. If not, they would simply be executed blindly.. am I correct? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer wrote: > On 01/03/2014 04:20 PM, Tom Lane wrote: > >> I think Florian has a good point there, and the reason is this: what >> you are talking about will be of exactly zero use to applications that >> want to see the results of one query before launching the next. Which >> eliminates a whole lot of apps. I suspect that almost the *only* >> common use case in which a stream of queries can be launched without >> feedback is going to be bulk data loading. It's not clear at all >> that pipelining the PQexec code path is the way to better performance >> for that --- why not use COPY, instead? > > > The data I encounter has to be distributed across multiple tables. Switching > between the COPY TO commands would again need client-side buffering and > heuristics for sizing these buffers. Lengths of runs vary a lot in my case. > > I also want to use binary mode as a far as possible to avoid the integer > conversion overhead, but some columns use custom enum types and are better > transferred in text mode. > > Some INSERTs happen via stored procedures, to implement de-duplication. > > These issues could be addressed by using temporary staging tables. However, > when I did that in the past, this caused pg_shdepend bloat. Carefully > reusing them when possible might avoid that. Again, due to the variance in > lengths of runs, the staging tables are not always beneficial. > > I understand that pipelining introduces complexity. But solving the issues > described above is no picnic, either. Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)? It transfers most everything in binary (enums notably are handled as strings). A typical usage of libpqtypes would be to arrange multiple records into an array on the client then hand them off to a stored procedure on the server side (perhaps over an asynchronous call while you assemble the next batch). libpqtypes was written for C applications with very high performance requirements (for non performance critical cases we might use json instead). In my experience it's not too difficult to arrange an assembly/push loop that amortizes the round trip overhead to zero; it's not as efficient as COPY but much more flexible and will blow away any scheme that sends data row per query. I agree with Tom that major changes to the libpq network stack is probably not a good idea. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Michael Paquier escribió: > Hi all, > > Please find attached updated patches for the support of REINDEX > CONCURRENTLY, renamed 2.0 for the occasion: > - 20131114_1_index_drop_comments.patch, patch that updates some > comments in index_drop. This updates only a couple of comments in > index_drop but has not been committed yet. It should be IMO... Pushed this one, thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On 12/25/13, 6:40 AM, MauMau wrote: > pg_regress must wait for postgres to terminate by calling waitpid(), > because it invoked postgres directly. The attached > pg_regress_pg_stop.patch does this. If you like the combination of this > and the original fix for pg_ctl in one patch, please use > pg_stop_fail_v3.patch. This patch doesn't apply. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Doc fix for VACUUM FREEZE
On 12/17/13, 8:16 PM, Maciek Sakrejda wrote: > (now with patch--sorry about that) This patch doesn't apply. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On 01/03/2014 04:20 PM, Tom Lane wrote: I think Florian has a good point there, and the reason is this: what you are talking about will be of exactly zero use to applications that want to see the results of one query before launching the next. Which eliminates a whole lot of apps. I suspect that almost the *only* common use case in which a stream of queries can be launched without feedback is going to be bulk data loading. It's not clear at all that pipelining the PQexec code path is the way to better performance for that --- why not use COPY, instead? The data I encounter has to be distributed across multiple tables. Switching between the COPY TO commands would again need client-side buffering and heuristics for sizing these buffers. Lengths of runs vary a lot in my case. I also want to use binary mode as a far as possible to avoid the integer conversion overhead, but some columns use custom enum types and are better transferred in text mode. Some INSERTs happen via stored procedures, to implement de-duplication. These issues could be addressed by using temporary staging tables. However, when I did that in the past, this caused pg_shdepend bloat. Carefully reusing them when possible might avoid that. Again, due to the variance in lengths of runs, the staging tables are not always beneficial. I understand that pipelining introduces complexity. But solving the issues described above is no picnic, either. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
This patch doesn't apply anymore. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN extension bug? (with patch)
On 12/24/13, 10:29 AM, Fabien COELHO wrote: > >> On 12/22/13, 2:36 AM, Fabien COELHO wrote: >>> I'm not sure whether the policy is to update the version number of the >>> extension for such a change. As the library is always "isn.so", two >>> versions cannot live in parallel anyway. If it is useful, the second >>> patch attached also upgrade the version number. >> >> If you are not changing anything in the SQL, then you don't need to >> change the version number. > > Ok, thanks for the information. I understand that the version number is > about the API, not the implementation. > > If so, there is only the one-liner patch to consider. This patch doesn't apply anymore. Please submit an updated patch for the commit fest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
Claudio Freire writes: > On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer wrote: >> Loading data into the database isn't such an uncommon task. Not everything >> is OLTP. > Truly, but a sustained insert stream of 10 Mbps is certainly way > beyond common non-OLTP loads. This is far more specific than non-OLTP. I think Florian has a good point there, and the reason is this: what you are talking about will be of exactly zero use to applications that want to see the results of one query before launching the next. Which eliminates a whole lot of apps. I suspect that almost the *only* common use case in which a stream of queries can be launched without feedback is going to be bulk data loading. It's not clear at all that pipelining the PQexec code path is the way to better performance for that --- why not use COPY, instead? Or to put it another way, I don't subscribe to "if you build it they will come" for this proposed feature. I think that making any use of it would be so complex and error-prone that the vast majority of apps won't bother. Before we start adding a lot of complexity to libpq's API and internals to support this, you need to make a better case that there would be a significant number of users. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction Interfaces
On 2013-12-12 10:01:21 -0500, Robert Haas wrote: > On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund wrote: > > I think there'll always be a bit of a difference between slots for > > physical and logical data, even if 90% of the implementation is the > > same. We can signal that difference by specifying logical/physical as an > > option or having two different sets of commands. > > > > Maybe? > > > > ACQUIRE_REPLICATION_SLOT slot_name PHYSICAL physical_opts > > ACQUIRE_REPLICATION_SLOT slot_name LOGICAL logical_opts > > -- already exists without slot, PHYSICAL arguments > > START_REPLICATION [SLOT slot] [PHYSICAL] RECPTR opt_timeline > > START_REPLICATION SLOT LOGICAL slot plugin_options > > RELEASE_REPLICATION_SLOT slot_name > > I assume you meant START_REPLICATION SLOT slot LOGICAL plugin_options, > but basically this seems OK to me. When writing the code for this, I decided that I need to reneg a bit on those names - they don't work nicely enough on the C level for me. Specifically during a START_REPLICATION we need to temporarily mark the slot as being actively used and mark it unused again afterwards. That's much more Acquire/Release like than the persistent Acquire/Release above for me. The C names in the version I am working on currently are: extern void ReplicationSlotCreate(const char *name); extern void ReplicationSlotDrop(const char *name); extern void ReplicationSlotAcquire(const char *name); extern void ReplicationSlotRelease(void); extern void ReplicationSlotSave(void); which would make the walsender ones CREATE_REPLICATION_SLOT ... START_REPLICATION [SLOT slot] [LOGICAL | PHYSICAL] ... DROP_REPLICATION_SLOT ... where START_REPLICATION internally does acquire/release on the passed SLOT. Does that work for you? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Hi, On 2014-01-03 11:11:13 -0300, Alvaro Herrera wrote: > > Yeah. Since we expect mxids to be composed at a much lower rate than > > xids, we can keep pg_multixact small without needing to increase the > > rate of full table scans. I don't think that's necessarily true - there have been several pg_controldata outputs posted lately which had more multis used than xids. In workloads using explicit row locking or heavily used FKs that's not that suprising. > > However, it seems to me that we ought to > > have GUCs for mxid_freeze_table_age and mxid_freeze_min_age. There's > > no principled way to derive those values from the corresponding values > > for XIDs, and I can't see any reason to suppose that we know how to > > auto-tune brand new values better than we know how to auto-tune their > > XID equivalents that we've had for years. > > > > One million is probably a reasonable default for mxid_freeze_min_age, > > though. I think setting mxid_freeze_min_age to something lower is fair game, I'd even start at 100k or so. What I think is important is that we do *not* set mxid_freeze_table_age to something very low. People justifiedly hate anti-wraparound vacuums. What's your thought about the autovacuum_freeze_max_age equivalent? I am not sure about introducing new GUCs in the back branches, I don't have a problem with it, but I am also not sure it's necessary. Fixing members wraparound into itself seems more important and once we trigger vacuums via that it doesn't seem to be too important to have low settings. > Also, what would be good names? Peter E. complained recently about the > word MultiXactId being exposed in some error messages; maybe "mxid" is > too short an abbreviation of that. Perhaps > multixactid_freeze_min_age = 1 million > multixactid_freeze_table_age = 3 million > ? I personally am fine with mxid - we use xid in other settings after all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Robert Haas escribió: > On Mon, Dec 30, 2013 at 10:59 PM, Alvaro Herrera > wrote: > > One problem I see is length of time before freezing multis: they live > > for far too long, causing the SLRU files to eat way too much disk space. > > I ran burnmulti in a loop, creating multis of 3 members each, with a min > > freeze age of 50 million, and this leads to ~770 files in > > pg_multixact/offsets and ~2900 files in pg_multixact/members. Each file > > is 32 pages long. 256kB apiece. Probably enough to be bothersome. > > > > I think for computing the freezing point for multis, we should slash > > min_freeze_age by 10 or something like that. Or just set a hardcoded > > one million. > > Yeah. Since we expect mxids to be composed at a much lower rate than > xids, we can keep pg_multixact small without needing to increase the > rate of full table scans. However, it seems to me that we ought to > have GUCs for mxid_freeze_table_age and mxid_freeze_min_age. There's > no principled way to derive those values from the corresponding values > for XIDs, and I can't see any reason to suppose that we know how to > auto-tune brand new values better than we know how to auto-tune their > XID equivalents that we've had for years. > > One million is probably a reasonable default for mxid_freeze_min_age, though. I didn't want to propose having new GUCs, but if there's no love for my idea of deriving it from the Xid freeze policy, I guess it's the only solution. Just keep in mind we will need to back-patch these new GUCs to 9.3. Are there objections to this? Also, what would be good names? Peter E. complained recently about the word MultiXactId being exposed in some error messages; maybe "mxid" is too short an abbreviation of that. Perhaps multixactid_freeze_min_age = 1 million multixactid_freeze_table_age = 3 million ? I imagine this stuff would be described somewhere in the docs, perhaps within the "routine maintenance" section somewhere. FWIW the idea of having a glossary sounds good to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
Robert Haas escribió: > The other thing that bothers me here is that, while a normalized > command string sounds great in theory, as soon as you want to allow > (for example) mapping schema A on node 1 to schema B on node 2, the > wheels come off: you'll have to deparse that normalized command string > so you can change out the schema name and then reassemble it back into > a command string again. So we're going to parse the user input, then > deparse it, hand over the results to the application code, which will > then parse it, modify that, and deparse it again. I have considered several ideas on this front, but most of them turn out to be useless or too cumbersome to use. What seems most adequate is to build a command string containing certain patterns, and an array of replacement values for such patterns; each pattern corresponds to one element that somebody might want modified in the command. As a trivial example, a command such as CREATE TABLE foo (bar INTEGER); would return a string like CREATE TABLE ${table_schema}.${table_name} (bar INTEGER); and the replacement array would be {table_schema => "public", table_name => "foo"} If we additionally provide a function to expand the replacements in the string, we would have the base funcionality of a normalized command string. If somebody wants to move the table to some other schema, they can simply modify the array to suit their taste, and again expand using the provided function; this doesn't require parsing SQL. It's likely that there are lots of fine details that need exploring before this is a fully workable idea -- I have just started work on it, so please bear with me. I think this is basically what you call "a JSON blob". > Finally, I'm very skeptical of the word "normalized". To me, that > sounds like an alias for "modifying the command string in unspecified > ways that big brother thinks will be useful to event trigger authors". > Color me skeptical. What if somebody doesn't want their command > string normalized? What if they want it normalized in a way that's > different from the way that we've chosen to normalize it? I fear that > this whole direction amounts to "we don't know how to design a real > API so let's just do surgery on the command string and call whatever > pops out the API". You might criticize the example above by saying that I haven't considered using a JSON array for the list of table elements; in a sense, I would be being Big Brother and deciding that you (as the user) don't need to mess up with the column/constraints list in a table you're creating. I thought about it and wasn't sure if there was a need to implement that bit in the first iteration of this implementation. One neat thing about this string+replaceables idea is that we can later change what replaceable elements the string has, thus providing more functionality (thus, for example, perhaps the column list can be altered in v2 that was a "constant" in v1), without breaking existing users of the v1. > > but there > > is a slight problem for some kind of objects that are represented partly > > as ALTER state during creation; for example creating a table with a > > sequence uses ALTER SEQ/OWNED BY internally at some point. There might > > be other cases I'm missing, also. (The REFRESH command is nominally > > also supported.) > > There are lots of places in the DDL code where we pass around > constructed parse trees as a substitute for real argument lists. I > expect that many of those places will eventually get refactored away, > so it's important that this feature does not end up relying on > accidents of the current code structure. For example, an > AlterTableStmt can actually do a whole bunch of different things in a > single statement: SOME of those are handled by a loop in > ProcessUtilitySlow() and OTHERS are handled internally by AlterTable. > I'm pretty well convinced that that division of labor is a bad design, > and I think it's important that this feature doesn't make that dubious > design decision into documented behavior. Yeah, the submitted patch took care of these elements by invoking the appropriate collection function at all the right places. Most of it happened right in ProcessUtilitySlow, but other bits were elsewhere (for instance, sub-objects created in a complex CREATE SCHEMA command). I mentioned the ALTER SEQUENCE example above because that happens in a code path that wasn't even close to the rest of the stuff. > > Now about the questions I mentioned above: > > > > a) It doesn't work to reverse-parse the statement nodes in all cases; > > there are several unfixable bugs if we only do that. In order to create > > always-correct statements, we need access to the catalogs for the > > created objects. But if we are doing catalog access, then it seems to > > me that we can do away with the statement parse nodes completely and > > just reconstruct the objects from catalog information. Shall we go that > > ro
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer wrote: > On 01/02/2014 07:52 PM, Claudio Freire wrote: > >>> No, because this doesn't scale automatically with the bandwidth-delay >>> product. It also requires that the client buffers queries and their >>> parameters even though the network has to do that anyway. >> >> >> Why not? I'm talking about transport-level packets, btw, not libpq >> frames/whatever. >> >> Yes, the network stack will sometimes do that. But the it doesn't have >> to do it. It does it sometimes, which is not the same. > > > The network inevitably buffers because the speed of light is not infinite. > > Here's a concrete example. Suppose the server is 100ms away, and you want > to send data at a constant rate of 10 Mbps. The server needs to acknowledge > the data you sent, but this acknowledgment arrives after 200 ms. As a > result, you've sent 2 Mbits before the acknowledgment arrives, so the > network appears to have buffered 250 KB. This effect can actually be used > for data storage, called "delay line memory", but it is somewhat out of > fashion now. ... >> So, trusting the network start to do the quick start won't work. For >> steady streams of queries, it will work. But not for short bursts, >> which will be the most heavily used case I believe (most apps create >> short bursts of inserts and not continuous streams at full bandwidth). > > > Loading data into the database isn't such an uncommon task. Not everything > is OLTP. Truly, but a sustained insert stream of 10 Mbps is certainly way beyond common non-OLTP loads. This is far more specific than non-OLTP. Buffering will benefit the vast majority of applications that don't do steady, sustained query streams. Which is the vast majority of applications. An ORM doing a flush falls in this category, so it's an overwhelmingly common case. >> And buffering algorithms are quite platform-dependent anyway, so it's >> not the best idea to make libpq highly reliant on them. > > > That is why I think libpq needs to keep sending until the first response > from the server arrives. Batching a fixed number of INSERTs together in a > single conceptual query does not achieve auto-tuning to the buffering > characteristics of the path. Not on its own, but it does improve thoughput during slow start, which benefits OLTP, which is a hugely common use case. As you say, the network will then auto-tune when the query stream is consistent enough, so what's the problem with explicitly buffering a little then? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On 01/02/2014 07:52 PM, Claudio Freire wrote: No, because this doesn't scale automatically with the bandwidth-delay product. It also requires that the client buffers queries and their parameters even though the network has to do that anyway. Why not? I'm talking about transport-level packets, btw, not libpq frames/whatever. Yes, the network stack will sometimes do that. But the it doesn't have to do it. It does it sometimes, which is not the same. The network inevitably buffers because the speed of light is not infinite. Here's a concrete example. Suppose the server is 100ms away, and you want to send data at a constant rate of 10 Mbps. The server needs to acknowledge the data you sent, but this acknowledgment arrives after 200 ms. As a result, you've sent 2 Mbits before the acknowledgment arrives, so the network appears to have buffered 250 KB. This effect can actually be used for data storage, called "delay line memory", but it is somewhat out of fashion now. And buffering algorithms are quite platform-dependent anyway, so it's not the best idea to make libpq highly reliant on them. That is why I think libpq needs to keep sending until the first response from the server arrives. Batching a fixed number of INSERTs together in a single conceptual query does not achieve auto-tuning to the buffering characteristics of the path. So, trusting the network start to do the quick start won't work. For steady streams of queries, it will work. But not for short bursts, which will be the most heavily used case I believe (most apps create short bursts of inserts and not continuous streams at full bandwidth). Loading data into the database isn't such an uncommon task. Not everything is OLTP. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make various variables read-only (const)
On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote: > On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa wrote: > > This allows the variables to be moved from .data to .rodata section which > > means that more data can be shared by processes and makes sure that nothing > > can accidentally overwrite the read-only definitions. On a x86-64 Linux > > system this moves roughly 9 kilobytes of previously writable data to the > > read-only data segment in the backend and 4 kilobytes in libpq. > > > > https://github.com/saaros/postgres/compare/constify > > > > 24 files changed, 108 insertions(+), 137 deletions(-) > > This sounds like a broadly good thing, but I've had enough painful > experiences with const to be a little wary. And how much does this > really affect data sharing? Doesn't copy-on-write do the same thing > for writable data? Could we get most of the benefit by const-ifying > one or two large data structures and forget the rest? Thanks for the review and sorry for the late reply, I was offline for a while. As Wim Lewis pointed out in his mail the const data is most likely mixed with non-const data and copy-on-write won't help with all of it. Also, some of the const data includes duplicates and thus .data actually shrinks more than .rodata grows. We'd probably get most of the space-saving benefits by just constifying the biggest variables, but I think applying const to more things will also make things more correct. > Other comments: > > - The first hunk of the patch mutilates the comment it modifies for no > apparent reason. Please revert. > > - Why change the API of transformRelOptions()? The comment was changed to reflect the new API, I modified transformRelOptions to only accept a single valid namespace to make things simpler in the calling code. Nothing used more than one valid namespace anyway, and it allows us to just use a constant "toast" without having to create a 2 char* array with a NULL. > -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name } > +/* The extra NUL-terminator will make sure a warning is raised if the > + * storage space for name is too small, otherwise when strlen(name) == > + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped. > + */ > +#define DEF_ENC2NAME(name, codepage) { #name "\0", PG_##name } > > - The above hunk is not related to the primary purpose of this patch. It sort-of is. Without fixed size char-arrays it's not possible to move everything to .rodata, but fixed size char-arrays come with the drawback of silently dropping the NUL-terminator when strlen(str) == sizeof(array), by forcing a NUL-terminator in we always get a warning if it would've been dropped and the size of the array can then be increased. Thanks, Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers