Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On Sat, Feb 1, 2014 at 7:41 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi, On 01/02/14 02:45, Fujii Masao wrote: LOG: process 33662 still waiting for ShareLock on transaction 1011 after 1000.184 ms DETAIL: Process holding the lock: 33660. Request queue: 33662. [... snip ...] LOG: process 33665 still waiting for ExclusiveLock on tuple (0,4) of relation 16384 of database 12310 after 1000.134 ms DETAIL: Process holding the lock: 33662. Request queue: 33665 This log message says that the process 33662 is holding the lock, but it's not true. As the message says: first lock is waiting for the transaction, second one for the tuple. So that are two different locks thus the two different holders and queues. So... Is this the intentional behavior? Yes, I think so. Oh, yes. You're right. I have other minor comments: Since you added errdetail_log_plural(), ISTM that you need to update sources.sgml. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Current message doesn't look like complete sentence yet... We would need to use something like Processes X, Y are holding while Z is waiting for the lock.. I could not come up with good message, though.. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi, On 03/02/14 17:59, Fujii Masao wrote: Since you added errdetail_log_plural(), ISTM that you need to update sources.sgml. [x] Fixed. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Current message doesn't look like complete sentence yet... We would need to use something like Processes X, Y are holding while Z is waiting for the lock.. I could not come up with good message, though.. The problem is that we have two potential plural cases in this message. That leads to the need to formulate the second part independently from singular/plural. I tried to improve a little bit and propose this message: Singular: The following process is holding the lock: A. The request queue consists of: B. Plural: Following processes are holding the lock: A, B. The request queue consists of: C. Attached you will find an updated patch. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index 881b0c3..aa20807 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -251,6 +251,15 @@ ereport(ERROR, /listitem listitem para + functionerrdetail_log_plural(const char *fmt_singuar, const char + *fmt_plural, unsigned long n, ...)/function is like + functionerrdetail_log/, but with support for various plural forms of + the message. + For more information see xref linkend=nls-guidelines. +/para + /listitem + listitem +para functionerrhint(const char *msg, ...)/function supplies an optional quotehint/ message; this is to be used when offering suggestions about how to fix the problem, as opposed to factual details about diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index fb449a8..da72c82 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1209,13 +1209,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ if (log_lock_waits deadlock_state != DS_NOT_YET_CHECKED) { - StringInfoData buf; + StringInfoData buf, + lock_waiters_sbuf, + lock_holders_sbuf; const char *modename; long secs; int usecs; long msecs; + SHM_QUEUE *procLocks; + PROCLOCK *proclock; + bool first_holder = true, + first_waiter = true; + int lockHoldersNum = 0; initStringInfo(buf); + initStringInfo(lock_waiters_sbuf); + initStringInfo(lock_holders_sbuf); + DescribeLockTag(buf, locallock-tag.lock); modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid, lockmode); @@ -1225,10 +1235,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) msecs = secs * 1000 + usecs / 1000; usecs = usecs % 1000; + /* + * we loop over the lock's procLocks to gather a list of all + * holders and waiters. Thus we will be able to provide more + * detailed information for lock debugging purposes. + * + * lock-procLocks contains all processes which hold or wait for + * this lock. + */ + + LWLockAcquire(partitionLock, LW_SHARED); + + procLocks = (lock-procLocks); + proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, + offsetof(PROCLOCK, lockLink)); + + while (proclock) + { +/* + * we are a waiter if myProc-waitProcLock == proclock; we are + * a holder if it is NULL or something different + */ +if (proclock-tag.myProc-waitProcLock == proclock) +{ + if (first_waiter) + { + appendStringInfo(lock_waiters_sbuf, %d, + proclock-tag.myProc-pid); + first_waiter = false; + } + else + appendStringInfo(lock_waiters_sbuf, , %d, + proclock-tag.myProc-pid); +} +else +{ + if (first_holder) + { + appendStringInfo(lock_holders_sbuf, %d, + proclock-tag.myProc-pid); + first_holder = false; + } + else + appendStringInfo(lock_holders_sbuf, , %d, + proclock-tag.myProc-pid); + + lockHoldersNum++; +} + +proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink, + offsetof(PROCLOCK, lockLink)); + } + + LWLockRelease(partitionLock); + if (deadlock_state == DS_SOFT_DEADLOCK) ereport(LOG, (errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(The following process is holding the lock: %s. The request queue consists of: %s., + Following processes are holding the lock: %s. The request queue consists of: %s., +
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-03 12:00:40 +0800, Craig Ringer wrote: I think it's a good thing personally - we shouldn't be exporting every little internal var in the symbol table. If we built with -fvisibility=hidden on 'nix there'd be no need to complain about commits being on on 'nix then breaking on Windows, 'cos the 'nix build would break in the same place. That's all or nothing though, there's no vars hidden, procs exported option in gcc. I think that'd be an exercise in futility. We're not talking about a general purpose library here, where I agree -fvisibility=hidden is a useful thing, but about the backend. We'd break countless extensions people have written. Most of those have been authored on *nix. To make any form of sense we'd need to have a really separate API layer between internal/external stuff. That doesn't seem likely to arrive anytime soon, if ever. I think all that would achieve is that we'd regularly need to backpatch visibility fixes. And have countless pointless flames about which variables to expose. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Hello, The attached file unionall_inh_idx_typ3_v6_20140203.patch is the new version of this patch fixed the 'whole-row-var' bug. First of all, on second thought about this, create table parent (a int, b int); create table child () inherits (parent); insert into parent values (1,10); insert into child values (2,20); select a, b from parent union all select a, b from child; Mmm. I had the same result. Please let me have a bit more time. This turned out to be a correct result. The two tables have following records after the two INSERTs. | =# select * from only parent; | 1 | 10 | (1 row) | | =# select * from child; | 2 | 20 | (1 row) Then it is natural that the parent-side in the UNION ALL returns following results. | =# select * from parent; | a | b | ---+ | 1 | 10 | 2 | 20 | (2 rows) Finally, the result we got has proved not to be a problem. Second, about the crash in this sql, select parent from parent union all select parent from parent; It is ignored whole-row reference (=0) which makes the index of child translated-vars list invalid (-1). I completely ignored it in spite that myself referred to before. Unfortunately ConvertRowtypeExpr prevents appendrels from being removed currently, and such a case don't seem to take place so often, so I decided to exclude the case. In addition, I found that only setting rte-inh = false causes duplicate scanning on the same relations through abandoned appendrels so I set parent/child_relid to InvalidOid so as no more to referred to from the ex-parent (and ex-children). The following queries seems to work correctly (but with no optimization) after these fixes. create table parent (a int, b int); create table child () inherits (parent); insert into parent values (1,10); insert into child values (2,20); select parent from parent union all select parent from parent; parent (1,10) (2,20) (1,10) (2,20) (4 rows) select a, parent from parent union all select a,parent from parent; a | parent ---+ 1 | (1,10) 2 | (2,20) 1 | (1,10) 2 | (2,20) (4 rows) select a, b from parent union all select a, b from parent; a | b ---+ 1 | 10 2 | 20 1 | 10 2 | 20 (4 rows) select a, b from parent union all select a, b from child; a | b ---+ 2 | 20 1 | 10 2 | 20 (3 rows) The attached two patches are rebased to current 9.4dev HEAD and make check at the topmost directory and src/test/isolation are passed without error. One bug was found and fixed on the way. It was an assertion failure caused by probably unexpected type conversion introduced by collapse_appendrels() which leads implicit whole-row cast from some valid reltype to invalid reltype (0) into adjust_appendrel_attrs_mutator(). What query demonstrated this bug in the previous type 2/3 patches? I would still like to know the answer to the above question. I rememberd after some struggles. It failed during 'make check', on the following query in inherit.sql. | update bar set f2 = f2 + 100 | from | ( select f1 from foo union all select f1+3 from foo ) ss | where bar.f1 = ss.f1; The following SQLs causes the same type of crash. create temp table foo(f1 int, f2 int); create temp table foo2(f3 int) inherits (foo); create temp table bar(f1 int, f2 int); update bar set f2 = 1 from ( select f1 from foo union all select f1+3 from foo ) ss where bar.f1 = ss.f1; The tipped stone was wholerow1 for the subquery created in preprocess_targetlist. | /* Not a table, so we need the whole row as a junk var */ | var = makeWholeRowVar(rt_fetch(rc-rti, range_table), .. | snprintf(resname, sizeof(resname), wholerow%u, rc-rowmarkId); Then the finishing blow was a appendRelInfo corresponding to the var above, whose parent_reltype = 0 and child_reltype != 0, and of course introduced by my patch. Since such a situation takes place even for this patch, the modification is left alone. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 52dcc72..ece9318 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -57,6 +57,12 @@ typedef struct AppendRelInfo *appinfo; } adjust_appendrel_attrs_context; +typedef struct { + AppendRelInfo *child_appinfo; + Index target_rti; + bool failed; +} transvars_merge_context; + static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, @@ -98,6 +104,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, List *input_plans, List *refnames_tlist); static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static Node *transvars_merge_mutator(Node *node, + transvars_merge_context *ctx); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 3 February 2014 10:06, Christian Kruse christ...@2ndquadrant.com wrote: Hi, On 03/02/14 17:59, Fujii Masao wrote: Since you added errdetail_log_plural(), ISTM that you need to update sources.sgml. [x] Fixed. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Current message doesn't look like complete sentence yet... We would need to use something like Processes X, Y are holding while Z is waiting for the lock.. I could not come up with good message, though.. The problem is that we have two potential plural cases in this message. That leads to the need to formulate the second part independently from singular/plural. I tried to improve a little bit and propose this message: Singular: The following process is holding the lock: A. The request queue consists of: B. Plural: Following processes are holding the lock: A, B. The request queue consists of: C. Seems too complex. How about this... Lock holder(s): A. Lock waiter(s) B Lock holder(s): A, B. Lock waiter(s) C -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Sun, Feb 2, 2014 at 3:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Dave Page dp...@pgadmin.org writes: On Sun, Feb 2, 2014 at 1:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think we should give serious consideration to desupporting this combination so that we can get rid of the plague of PGDLLIMPORT marks. No objection here - though I should point out that it's not been offline for a long time (aside from a couple of weeks in January) - it's been happily building most pre-9.2 branches for ages. 9.1 seems to be stuck, along with HEAD, and I forgot to add 9.3. I'm in the process of cleaning that up as time allows, but am happy to drop it instead if we no longer want to support anything that old. We certainly don't use anything resembling that config for the EDB installer builds. Further discussion pointed out that currawong, for example, seems to want PGDLLIMPORT markings but is able to get by without them in some cases that narwhal evidently doesn't like. So at this point, desupporting narwhal's configuration is clearly premature --- we should instead be looking into exactly what is causing the different cases to fail or not fail. I still have hopes that we might be able to get rid of PGDLLIMPORT marks, but by actually understanding why they seem to be needed in some cases and not others, not by just arbitrarily dropping support. In the meantime, please do get HEAD running again on that machine. Done: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-03%2009%3A26%3A43 It's not happy though :-( -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
From: Rajeev rastogi rajeev.rast...@huawei.com Please find the attached revised patch. Thanks, your patch looks good. I confirmed the following: * Applied cleanly to the latest source tree, and built on Linux and Windows (with MSVC) without any warnings. * Changed to D:\ and ran pg_ctl register -N pg -D pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. * Changed to D:\pgdata and ran pg_ctl register -N pg -D ..\pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. Finally, please change the function description comment of get_absolute_path() so that the style follows other function in path.c. That is: /* * function_name * * Function description */ Then update the CommitFest entry with the latest patch and let me know of that. Then, I'll change the patch status to ready for committer. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [review] PostgreSQL Service on Windows does not start if data directory given is relative path
On 3rd February 2014, MauMau Wrote: Thanks, your patch looks good. I confirmed the following: * Applied cleanly to the latest source tree, and built on Linux and Windows (with MSVC) without any warnings. * Changed to D:\ and ran pg_ctl register -N pg -D pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. * Changed to D:\pgdata and ran pg_ctl register -N pg -D ..\pgdata, and checked the registry value ImagePath with regedit. It contained -D D:\pgdata. Thanks for reviewing and testing the patch. Finally, please change the function description comment of get_absolute_path() so that the style follows other function in path.c. That is: /* * function_name * * Function description */ I have modified the function description comment as per your suggestion. Then update the CommitFest entry with the latest patch and let me know of that. Then, I'll change the patch status to ready for committer. I will update commitFest with the latest patch immediately after sending the mail. Thanks and Regards, Kumar Rajeev Rastogi pgctl_win32service_rel_dbpath_v4.patch Description: pgctl_win32service_rel_dbpath_v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi Simon, On 03/02/14 10:43, Simon Riggs wrote: Singular: The following process is holding the lock: A. The request queue consists of: B. Plural: Following processes are holding the lock: A, B. The request queue consists of: C. Seems too complex. How about this... Lock holder(s): A. Lock waiter(s) B Lock holder(s): A, B. Lock waiter(s) C This is basically the same as before, it is even shorter. The complaint was that I don't use a whole sentence in this error detail. Won't the change fulfill the same complaint? To be honest, I'd like to stick with your earlier proposal: Singular: Process holding the lock: A. Request queue: B Plural: Processes holding the lock: A, B. Request queue: C, D This seems to be a good trade-off between project guidelines, readability and parsability. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpq4w0aipTXc.pgp Description: PGP signature
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
On Sun, Feb 2, 2014 at 11:13 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan p...@heroku.com writes: On Wed, Jan 29, 2014 at 1:10 PM, Robert Haas rh...@postgresql.org wrote: Include planning time in EXPLAIN ANALYZE output. Isn't it perhaps a little confusing that Planning time may well exceed Total runtime? Perhaps s/Total runtime/Execution time/ ? I'm not really feeling a compelling need to change that. We've been displaying total runtime - described exactly that way - for many releases and it's surely is confusing to the novice that the time reported can be much less than the time reported by psql's \timing option, usually because of planning time. But adding the planning time to the output seems to me to make that better, not worse. If the user can't figure out that runtime != planning time, I'm not sure they'll be able to figure out execution time != planning time, either. One of the reasons it's called Total runtime, or so I've always assumed, is because it's more inclusive than the time shown for the root node of the plan tree. Specifically, it includes the time required to fire triggers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
Hi Oleg, On 3 Únor 2014, 7:53, Oleg Bartunov wrote: Tomasa, it'd be nice if you use real data in your testing. I'm using a mailing list archive (the benchmark is essentially a simple search engine on top of the archive, implemented using built-in full-text). So I think this is a quite 'real' dataset, not something synthetic. The queries are generated, of course, but I strived to make them as real as possible. Sure, this is not a hstore-centric benchmark, but the thread is about GIN indexes, so I think it's fair. One very good application of gin fast-scan is dramatic performance improvement of hstore/jsonb @ operator, see slides 57, 58 http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf. I'd like not to lost this benefit :) Yes, that's something we could/should test, probably. Sadly I don't have a dataset or a complete real-world test case at hand. Any ideas? I certainly agree that it'd be very sad to lose the performance gain for hstore/json. OTOH my fear is that to achieve that gain, we'll noticeably slow down other important use cases (e.g. full-text search), which is one of the reasons why I was doing the tests. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Feb 3, 2014 at 5:45 AM, Dave Page dp...@pgadmin.org wrote: Done: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2014-02-03%2009%3A26%3A43 It's not happy though :-( Specifically, it says: dlltool --export-all --output-def libpg_buffercachedll.def pg_buffercache_pages.o dllwrap -o pg_buffercache.dll --dllname pg_buffercache.dll --def libpg_buffercachedll.def pg_buffercache_pages.o -L../../src/port -L../../src/common -Wl,--allow-multiple-definition -L/mingw/lib -Wl,--as-needed -L../../src/backend -lpostgres Info: resolving _MainLWLockArray by linking to __imp__MainLWLockArray (auto-import) fu01.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname' fu02.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname' nmth00.o(.idata$4+0x0): undefined reference to `_nm__MainLWLockArray' I assume I should go stick a DLLIMPORT on MainLWLockArray, unless somebody has another proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1
Hi All, Is there any way to automate the archive deletion process. Any script or command in HA setup using pgpool Thanks in advance Best Regards, *Rajni Baliyan | Database - Consultant* *ASHNIK PTE. LTD.*101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533 M : +65 83858518 T: +65 6438 3504 | www.ashnik.com www.facebook.com/ashnikbiz | www.twitter.com/ashnikbiz [image: email patch] This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s). On Fri, Jan 31, 2014 at 11:22 AM, Bruce Momjian br...@momjian.us wrote: On Fri, Jul 26, 2013 at 06:28:05PM -0400, Tom Lane wrote: Our documentation appears not to disclose this fine point, but a look at the SQL-MED standard says it's operating per spec. The standard also says that ADD is an error if the option is already defined, which is a bit more defensible, but still not exactly what I'd call user-friendly. And the error we issue for that case is pretty misleading too: regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ; ALTER SERVER regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ; ERROR: option use_remote_estimate provided more than once I think we could do with both more documentation, and better error messages for these cases. In the SET-where-you-should-use-ADD case, perhaps ERROR: option use_remote_estimate has not been set HINT: Use ADD not SET to define an option that wasn't already set. In the ADD-where-you-should-use-SET case, perhaps ERROR: option use_remote_estimate is already set HINT: Use SET not ADD to change an option's value. The provided more than once wording would be appropriate if the same option is specified more than once in the command text, but I'm not sure that it's worth the trouble to detect that case. Where are on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general image002.jpg
Re: [HACKERS] GIN improvements part2: fast scan
On 03/02/14 02:44, Tomas Vondra wrote: (2) The question is whether the new patch works fine on rare words. See this for comparison of the patches against HEAD: http://www.fuzzy.cz/tmp/gin/3-rare-words.png http://www.fuzzy.cz/tmp/gin/3-rare-words-new.png and this is the comparison of the two patches: http://www.fuzzy.cz/tmp/gin/patches-rare-words.png That seems fine to me - some queries are slower, but we're talking about queries taking 1 or 2 ms, so the measurement error is probably the main cause of the differences. (3) With higher numbers of frequent words, the differences (vs. HEAD or the previous patch) are not that dramatic as in (1) - the new patch is consistently by ~20% faster. Just thinking, this is about one algorithm is being better or frequent words and another algorithm being better at rare words... we do have this information (at least or tsvector) in the statistics, would it be possible to just call the consistent function more often if the statistics gives signs that it actually is a frequent word? Jesper - heavily dependent on tsvector-searches, with both frequent and rare words. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugfix patch for json_array_elements
On 02/02/2014 08:54 PM, Craig Ringer wrote: On 02/03/2014 09:09 AM, Craig Ringer wrote: At a guess, we're looking at a case where a new child context is created at every call, so every MemoryContextResetChildren call has to deal with more child contexts. That would be yes. After a short run, I see 32849 lines like: json_array_elements temporary cxt: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used under the context: PortalMemory: 8192 total in 1 blocks PortalHeapMemory: 7168 total in 3 blocks ExecutorState: 65600 total in 4 blocks ExprContext: 8192 total in 1 blocks json_array_elements temporary cxt: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used The attached patch deletes the context after use, bringing performance back into line. It should be backpatched to 9.3. [...] + MemoryContextDelete(state-tmp_cxt); + state-tmp_cxt = NULL; + PG_RETURN_NULL(); Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the end of the function since it's a child of the CurrentMemoryContext. But if I got that wrong I'm happy to fix it. I don't see the point in setting state-tmp_cxt to NULL since it's about to go out of scope anyway. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Robert Haas robertmh...@gmail.com writes: I assume I should go stick a DLLIMPORT on MainLWLockArray, unless somebody has another proposal. Hold up awhile. The reason we're resurrecting it is to get a crosscheck on what symbols it thinks need DLLIMPORT that no other platform does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [review] PostgreSQL Service on Windows does not start if data directory given is relative path
From: Rajeev rastogi rajeev.rast...@huawei.com I will update commitFest with the latest patch immediately after sending the mail. OK, done setting the status to ready for committer. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On 2014-02-03 12:00:40 +0800, Craig Ringer wrote: I think it's a good thing personally - we shouldn't be exporting every little internal var in the symbol table. If we built with -fvisibility=hidden on 'nix there'd be no need to complain about commits being on on 'nix then breaking on Windows, 'cos the 'nix build would break in the same place. That's all or nothing though, there's no vars hidden, procs exported option in gcc. To make any form of sense we'd need to have a really separate API layer between internal/external stuff. That doesn't seem likely to arrive anytime soon, if ever. Indeed. Aside from the difficulty of having tighter requirements on just one platform (which, for development purposes, isn't even a mainstream one), it seems completely silly to me that the restriction applies only to variables and not functions. We expose *far* more global functions than global variables; and there's no particular reason to think that calling a random function is any safer than frobbing a random global variable. What we need is to make the Windows platform stop thinking that it is the center of the world, and make it act as much as possible like the Unix-oid platforms. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-03 09:13:02 -0500, Tom Lane wrote: What we need is to make the Windows platform stop thinking that it is the center of the world, and make it act as much as possible like the Unix-oid platforms. What about the completely crazy thing of simply redefining export to include the declspec on windows for backend build? That will possibly blow up the size of the .def files et al a bit, but I have little mercy with that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/03/2014 06:37 PM, Andres Freund wrote: On 2014-02-03 12:00:40 +0800, Craig Ringer wrote: I think it's a good thing personally - we shouldn't be exporting every little internal var in the symbol table. If we built with -fvisibility=hidden on 'nix there'd be no need to complain about commits being on on 'nix then breaking on Windows, 'cos the 'nix build would break in the same place. That's all or nothing though, there's no vars hidden, procs exported option in gcc. I think that'd be an exercise in futility. We're not talking about a general purpose library here, where I agree -fvisibility=hidden is a useful thing, but about the backend. We'd break countless extensions people have written. Most of those have been authored on *nix. To make any form of sense we'd need to have a really separate API layer between internal/external stuff. That doesn't seem likely to arrive anytime soon, if ever. I think all that would achieve is that we'd regularly need to backpatch visibility fixes. And have countless pointless flames about which variables to expose. Fair point. If we're not going to define a proper API, then export control is not useful. And since there isn't a proper API, nor any on the cards, _that_ is a reasonable reason to just export all. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-03 22:23:16 +0800, Craig Ringer wrote: On 02/03/2014 06:37 PM, Andres Freund wrote: I think that'd be an exercise in futility. We're not talking about a general purpose library here, where I agree -fvisibility=hidden is a useful thing, but about the backend. We'd break countless extensions people have written. Most of those have been authored on *nix. To make any form of sense we'd need to have a really separate API layer between internal/external stuff. That doesn't seem likely to arrive anytime soon, if ever. I think all that would achieve is that we'd regularly need to backpatch visibility fixes. And have countless pointless flames about which variables to expose. Fair point. If we're not going to define a proper API, then export control is not useful. And since there isn't a proper API, nor any on the cards, _that_ is a reasonable reason to just export all. We have a (mostly) proper API. Just not an internal/external API split. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Fri, Jan 31, 2014 at 9:29 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao masao.fu...@gmail.com wrote: Hi, The files in pg_stat_tmp directory don't need to be backed up because they are basically reset at the archive recovery. So I think it's worth changing pg_basebackup so that it skips any files in pg_stat_tmp directory. Thought? I think this is good idea, but can't it also avoid PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in pg_stat_tmp All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE refers to just the global one. You want to exclude based on PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc stats_temp_directory if it's in PGDATA. Attached patch changes basebackup.c so that it skips all files in both pg_stat_tmp and stats_temp_directory. Even when a user sets stats_temp_directory to the directory other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because, per recent change of pg_stat_statements, the external query file is always created there. Committed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: What about the completely crazy thing of simply redefining export to include the declspec on windows for backend build? That will possibly blow up the size of the .def files et al a bit, but I have little mercy with that. You mean #define extern extern PGDLLIMPORT ? What will that do to function declarations that have extern? Are we sure C compilers are smart enough to not go into an infinite loop on such a macro? Anyway, narwhal's latest results thicken the plot even more. How come pg_buffercache has been building OK on the other Windows critters, when MainLWLockArray lacks PGDLLIMPORT? That lets out the theory about linking to libpq having something to do with it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On 02/01/2014 05:46 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 02/01/2014 05:12 PM, Marco Atzeri wrote: is it possible the tsearch test never worked on en_US.utf8 but only on C locale ? Yes, that's more or less what I said, I thought. Maybe we need to test this on other Windows systems in non-C encodings. Let's make sure it's only a Cygwin problem. I'll work on that. You should try to concentrate on the thinks like prepared_xacts and isolation_test that we know don't work ONLY on Cygwin. Please test the patch I just posted in the pgsql-bugs thread. It looks to me like there are bugs in both the C and non-C locale cases, but only the latter case would be exercised by our regression tests, since we don't use any non-ASCII characters in the tests. This is commit 082c0dfa140b5799bc7eb574d68610dcfaa619ba and friends, right? If so, it appears to have done the trick. brolga is now happy running tsearch with en_US.utf8: http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=brolgadt=2014-02-03%2011%3A26%3A10stg=install-check-en_US.utf8 Cool. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/26/2014 08:24 AM, Tomas Vondra wrote: Hi! On 25.1.2014 22:21, Heikki Linnakangas wrote: Attached is a new version of the patch set, with those bugs fixed. I've done a bunch of tests with all the 4 patches applied, and it seems to work now. I've done tests with various conditions (AND/OR, number of words, number of conditions) and I so far I did not get any crashes, infinite loops or anything like that. I've also compared the results to 9.3 - by dumping the database and running the same set of queries on both machines, and indeed I got 100% match. I also did some performance tests, and that's when I started to worry. For example, I generated and ran 1000 queries that look like this: SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(header 53 32 useful dropped)') ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header 53 32 useful dropped)')) DESC; i.e. in this case the query always was 5 words connected by AND. This query is a pretty common pattern for fulltext search - sort by a list of words and give me the best ranked results. On 9.3, the script was running for ~23 seconds, on patched HEAD it was ~40. It's perfectly reproducible, I've repeated the test several times with exactly the same results. The test is CPU bound, there's no I/O activity at all. I got the same results with more queries (~100k). Attached is a simple chart with x-axis used for durations measured on 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious a vast majority of queries is up to 2x slower - that's pretty obvious from the chart. Only about 50 queries are faster on HEAD, and 700 queries are more than 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms on HEAD). Typically, the EXPLAIN ANALYZE looks something like this (on 9.3): http://explain.depesz.com/s/5tv and on HEAD (same query): http://explain.depesz.com/s/1lI Clearly the main difference is in the Bitmap Index Scan which takes 60ms on 9.3 and 120ms on HEAD. On 9.3 the perf top looks like this: 34.79% postgres [.] gingetbitmap 28.96% postgres [.] ginCompareItemPointers 9.36% postgres [.] TS_execute 5.36% postgres [.] check_stack_depth 3.57% postgres [.] FunctionCall8Coll while on 9.4 it looks like this: 28.20% postgres [.] gingetbitmap 21.17% postgres [.] TS_execute 8.08% postgres [.] check_stack_depth 7.11% postgres [.] FunctionCall8Coll 4.34% postgres [.] shimTriConsistentFn Not sure how to interpret that, though. For example where did the ginCompareItemPointers go? I suspect it's thanks to inlining, and that it might be related to the performance decrease. Or maybe not. Yeah, inlining makes it disappear from the profile, and spreads that time to the functions calling it. The profile tells us that the consistent function is called a lot more than before. That is expected - with the fast scan feature, we're calling consistent not only for potential matches, but also to refute TIDs based on just a few entries matching. If that's effective, it allows us to skip many TIDs and avoid consistent calls, which compensates, but if it's not effective, it's just overhead. I would actually expect it to be fairly effective for that query, so that's a bit surprising. I added counters to see where the calls are coming from, and it seems that about 80% of the calls are actually coming from this little the feature I explained earlier: In addition to that, I'm using the ternary consistent function to check if minItem is a match, even if we haven't loaded all the entries yet. That's less important, but I think for something like rare1 | (rare2 frequent) it might be useful. It would allow us to skip fetching 'frequent', when we already know that 'rare1' matches for the current item. I'm not sure if that's worth the cycles, but it seemed like an obvious thing to do, now that we have the ternary consistent function. So, that clearly isn't worth the cycles :-). At least not with an expensive consistent function; it might be worthwhile if we pre-build the truth-table, or cache the results of the consistent function. Attached is a quick patch to remove that, on top of all the other patches, if you want to test the effect. Every single change you did in fast scan seems to be reasonable, but testing shows that something went wrong. Simple test with 3 words of different selectivities. After applying your patches: #
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-03 09:25:57 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: What about the completely crazy thing of simply redefining export to include the declspec on windows for backend build? That will possibly blow up the size of the .def files et al a bit, but I have little mercy with that. You mean #define extern extern PGDLLIMPORT ? What will that do to function declarations that have extern? As far as I understand it's perfectly ok to have that on functions. http://msdn.microsoft.com/en-us/library/aa271769%28v=vs.60%29.aspx even claims it's faster in some scenarios. Are we sure C compilers are smart enough to not go into an infinite loop on such a macro? I'd be really surprised if there were any that wouldn't completely choke on pg otherwise. Especially not as we are only talking about windows where only gcc and msvc is supported these days. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Q: How to use indexer api smartly
My goal is to implement a new index type base on bitmap index algorithm. I've to main problems : 1. How to get Target list - list of columns need to be returned from query on the index. I want to implement index only access , today the indexer api get row-id and then PG retrive the data from the table . My idea is to return the list of columns directly from the index . The question is it possible ? how do I know what are the column list (Assuming the index contain all those columns ). 2. How to implement a query context within the indexer api . The indexers functions :amcostestimate, ambeginscan and amrescan ) do not maintain a context. The idea is the better memory management of index structures ( lock the memory structure for query life time - until it end) Thanks , Amihay.
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/03/2014 01:13 AM, Peter Geoghegan wrote: On Sun, Feb 2, 2014 at 5:22 PM, Craig Ringer cr...@2ndquadrant.com wrote: I'm not a fan of MinGW (gcc) on Windows, it's a right pain. It's also the only open source compiler currently supported for PostgreSQL on Windows - practically the only one available. I don't know about you, but I'm not too keen on assuming Microsoft will continue to offer free toolchains that're usable for our purposes. They're crippling their free build tools more and more with each release, which isn't a good trend. I was under the impression that Microsoft had finally come around to implementing a few C99 features in Visual Studio 2013 precisely because they want there to be an open source ecosystem on Windows. It's not so long ago that they were saying they would no longer publish free-as-in-beer command line compilers at all. The outrage made them change their minds, but we really can't rely on only Microsoft compilers for Windows. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Feb 3, 2014 at 9:57 AM, Andrew Dunstan and...@dunslane.net wrote: It's not so long ago that they were saying they would no longer publish free-as-in-beer command line compilers at all. The outrage made them change their minds, but we really can't rely on only Microsoft compilers for Windows. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andrew Dunstan and...@dunslane.net writes: On 02/03/2014 01:13 AM, Peter Geoghegan wrote: On Sun, Feb 2, 2014 at 5:22 PM, Craig Ringer cr...@2ndquadrant.com wrote: I'm not a fan of MinGW (gcc) on Windows, it's a right pain. It's also the only open source compiler currently supported for PostgreSQL on Windows - practically the only one available. I don't know about you, but I'm not too keen on assuming Microsoft will continue to offer free toolchains that're usable for our purposes. They're crippling their free build tools more and more with each release, which isn't a good trend. I was under the impression that Microsoft had finally come around to implementing a few C99 features in Visual Studio 2013 precisely because they want there to be an open source ecosystem on Windows. It's not so long ago that they were saying they would no longer publish free-as-in-beer command line compilers at all. The outrage made them change their minds, but we really can't rely on only Microsoft compilers for Windows. FWIW, I'd definitely not be in favor of desupporting mingw altogether. However, narwhal appears to be running a pretty ancient version: configure: using compiler=gcc.exe (GCC) 3.4.2 (mingw-special) If it turns out that more recent versions have saner behavior, I'd not have the slightest qualms about saying we don't support such old versions anymore. But this is all speculation at the moment. I'm hoping somebody with a Windows build environment (ie, not me) will poke into the situation and figure out exactly why some of these global variables seem to need PGDLLIMPORT while others don't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Fri, Jan 31, 2014 at 12:44 PM, Antonin Houska antonin.hou...@gmail.com wrote: In 9.3 I noticed that postmaster considers bgworker crashed (and therefore tries to restart it) even if it has exited with zero status code. I first thought about a patch like the one below, but then noticed that postmaster.c:bgworker_quickdie() signal handler exits with 0 too (when there's no success). Do we need my patch, my patch + something for the handler or no patch at all? I think the word crashed here doesn't actually mean crashed. If a worker dies with an exit status of other than 0 or 1, we call HandleChildCrash(), which effects a system-wide restart. But if it exits with code 0, we don't do that. We just set HaveCrashedWorker so that it gets restarted immediately, and that's it. In other words, exit(0) in a bgworker causes an immediate relaunch, and exit(1) causes the restart interval to be respected, and exit(other) causes a system-wide crash-and-restart cycle. This is admittedly a weird API, and we've had some discussion of whether to change it, but I don't know that we've reached any final conclusion. I'm tempted to propose exactly inverting the current meaning of exit(0). That is, make it mean don't restart me, ever, even if I have a restart interval configured rather than restart me right away, even if I have a restart interval configured. That way, a background process that wants to run until it accomplishes some task could be written to exit(1) on error and exit(0) on success, which seems quite natural. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
Robert Haas robertmh...@gmail.com writes: This is admittedly a weird API, and we've had some discussion of whether to change it, but I don't know that we've reached any final conclusion. I'm tempted to propose exactly inverting the current meaning of exit(0). That is, make it mean don't restart me, ever, even if I have a restart interval configured rather than restart me right away, even if I have a restart interval configured. That way, a background process that wants to run until it accomplishes some task could be written to exit(1) on error and exit(0) on success, which seems quite natural. So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? I think this is an improvement, but it probably depends on what you think the use-cases are for bgworkers. I can definitely see that there is a need for a bgworker to be just plain done, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote: + para id=functions-json-table + xref linkend=functions-json-creation-table shows the functions that are + available for creating typejson/type values. + (see xref linkend=datatype-json) /para - table id=functions-json-table -titleJSON Support Functions/title + indexterm + primaryarray_to_json/primary + /indexterm + indexterm + primaryrow_to_json/primary + /indexterm + indexterm + primaryto_json/primary + /indexterm + indexterm + primaryjson_build_array/primary + /indexterm + indexterm + primaryjson_build_object/primary + /indexterm + indexterm + primaryjson_object/primary + /indexterm Hm, why are you collecting the indexterms at the top in the contrast to the previous way of collecting them at the point of documentation? diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 1ae9fa0..fd93d9b 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ - rangetypes_typanalyze.o rangetypes_selfuncs.o + rangetypes_typanalyze.o rangetypes_selfuncs.o \ + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. +/* + * for jsonb we always want the de-escaped value - that's what's in token + */ + strange newline. +static void +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) +{ + JsonbInState *_state = (JsonbInState *) state; + JsonbValue v; + + v.size = sizeof(JEntry); + + switch (tokentype) + { + ... + default:/* nothing else should be here in fact */ + break; Shouldn't this at least Assert(false) or something? +static void +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c) +{ + uint32 hentry = c JENTRY_TYPEMASK; + + if (hentry == JENTRY_ISNULL) + { + v-type = jbvNull; + v-size = sizeof(JEntry); + } + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR) + { + recvJsonb(buf, v, level + 1, (uint32) c); + } + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE) + { + v-type = jbvBool; + v-size = sizeof(JEntry); + v-boolean = (hentry == JENTRY_ISFALSE) ? false : true; + } + else if (hentry == JENTRY_ISNUMERIC) + { + v-type = jbvNumeric; + v-numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf), + Int32GetDatum(0), Int32GetDatum(-1))); + + v-size = sizeof(JEntry) * 2 + VARSIZE_ANY(v-numeric); What's the *2 here? +static void +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header) +{ + uint32 hentry; + uint32 i; This function and recvJsonbValue call each other recursively, afaics without any limit, shouldn't they check for the stack depth? + hentry = header JENTRY_TYPEMASK; + + v-size = 3 * sizeof(JEntry); *3? + if (hentry == JENTRY_ISOBJECT) + { + v-type = jbvHash; + v-hash.npairs = header JB_COUNT_MASK; + if (v-hash.npairs 0) + { + v-hash.pairs = palloc(sizeof(*v-hash.pairs) * v-hash.npairs); + Hm, if I understand correctly, we're just allocating JB_COUNT_MASK (which is 0x0FFF) * sizeof(*v-hash.pairs) bytes here, without any crosschecks about the actual length of the data? So with a few bytes the server can be coaxed to allocate a gigabyte of data? Since this immediately calls another input routine, this can be done in a nested fashion, quickly OOMing the server. I think this and several other places really need a bit more input sanity checking. + for (i = 0; i v-hash.npairs; i++) + { + recvJsonbValue(buf, v-hash.pairs[i].key, level, pq_getmsgint(buf, 4)); + if (v-hash.pairs[i].key.type != jbvString) + elog(ERROR, jsonb's key could be only a string); Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a few other places. +char * +JsonbToCString(StringInfo out, char *in, int estimated_len) +{ + boolfirst = true; + JsonbIterator *it; + int type; + JsonbValue
Re: [HACKERS] GIN improvements part2: fast scan
Hi Alexander, On 3 Únor 2014, 15:31, Alexander Korotkov wrote: I found my patch 0005-Ternary-consistent-implementation.patch to be completely wrong. It introduces ternary consistent function to opclass, but don't uses it, because I forgot to include ginlogic.c change into patch. So, it shouldn't make any impact on performance. However, testing results with that patch significantly differs. That makes me very uneasy. Can we now reproduce exact same? Do I understand it correctly that the 0005 patch should give exactly the same performance as the 9.4-heikki branch (as it was applied on it, and effectively did no change). This wasn't exactly what I measured, although the differences were not that significant. I can rerun the tests, if that's what you're asking for. I'll improve the test a bit - e.g. I plan to average multiple runs, to filter out random noise (which might be significant for such short queries). Right version of these two patches in one against current head is attached. I've rerun tests with it, results are /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun postprocessing including graph drawing? Yes, I'll do that. However I'll have to rerun the other tests too, because the previous runs were done on a different machine. I'm a bit confused right now. The previous patches (0005 + 0007) were supposed to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those were not commited yet, so why is this version against HEAD? To summarize, I know of these patch sets: 9.4-heikki (old version) 0001-Optimize-GIN-multi-key-queries.patch 0002-Further-optimize-the-multi-key-GIN-searches.patch 0003-Further-optimize-GIN-multi-key-searches.patch 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch 9.4-alex-1 (based on 9.4-heikki) 0005-Ternary-consistent-implementation.patch 9.4-alex-1 (based on 9.4-alex-1) 0006-Sort-entries.patch 9.4-heikki (new version) gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch 9.4-alex-2 (new version) gin-fast-scan.10.patch.gz Or did I get that wrong? Sometimes test cases are not what we expect. For example: =# explain SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(5alpha1-initdb''d)'); QUERY PLAN Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) - Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 rows=1 width=0) Index Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) Planning time: 0.257 ms (5 rows) 5alpha1-initdb'd is 3 gin entries with different frequencies. Why do you find that strange? The way the query is formed or the way it's evaluated? The query generator certainly is not perfect, so it may produce some strange queries. Also, these patches are not intended to change relevance ordering speed. When number of results are high, most of time is relevance calculating and sorting. I propose to remove ORDER BY clause from test cases to see scan speed more clear. Sure, I can do that. Or maybe one set of queries with ORDER BY, the other one without it. I've dump of postgresql.org search queries from Magnus. We can add them to our test case. You mean search queries from the search for mailing list archives? Sure, we add that. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-03 09:22:52 -0600, Merlin Moncure wrote: I lost my stomach (or maybe it was the glass of red) somewhere in the middle, but I think this needs a lot of work. Especially the io code doesn't seem ready to me. I'd consider ripping out the send/recv code for 9.4, that seems the biggest can of worms. It will still be usable without. Not having type send/recv functions is somewhat dangerous; it can cause problems for libraries that run everything through the binary wire format. I'd give jsonb a pass on that, being a new type, but would be concerned if hstore had that ability revoked. Yea, removing it for hstore would be a compat problem... offhand note: hstore_send seems pretty simply written and clean; it's a simple nonrecursive iterator... But a send function is pretty pointless without the corresponding recv function... And imo recv simply is to dangerous as it's currently written. I am not saying that it cannot be made work, just that it's still nearly as ugly as when I pointed out several of the dangers some weeks back. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Thu, Jan 30, 2014 at 8:38 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/30/2014 01:53 AM, Tomas Vondra wrote: (3) A file with explain plans for 4 queries suffering ~2x slowdown, and explain plans with 9.4 master and Heikki's patches is available here: http://www.fuzzy.cz/tmp/gin/queries.txt All the queries have 6 common words, and the explain plans look just fine to me - exactly like the plans for other queries. Two things now caught my eye. First some of these queries actually have words repeated - either exactly like database database or in negated form like !anything anything. Second, while generating the queries, I use dumb frequency, where only exact matches count. I.e. write != written etc. But the actual number of hits may be much higher - for example write matches exactly just 5% documents, but using @@ it matches more than 20%. I don't know if that's the actual cause though. I tried these queries with the data set you posted here: http://www.postgresql.org/message-id/52e4141e.60...@fuzzy.cz. The reason for the slowdown is the extra consistent calls it causes. That's expected - the patch certainly does call consistent in situations where it didn't before, and if the pre-consistent checks are not able to eliminate many tuples, you lose. So, what can we do to mitigate that? Some ideas: 1. Implement the catalog changes from Alexander's patch. That ought to remove the overhead, as you only need to call the consistent function once, not both ways. OTOH, currently we only call the consistent function if there is only one unknown column. If with the catalog changes, we always call the consistent function even if there are more unknown columns, we might end up calling it even more often. 2. Cache the result of the consistent function. 3. Make the consistent function cheaper. (How? Magic?) 4. Use some kind of a heuristic, and stop doing the pre-consistent checks if they're not effective. Not sure what the heuristic would look like. I'm fiddling around with following idea of such heuristic: 1) Sort entries ascending by estimate of TIDs number. Assume number of entries to be n. 2) Sequentially call ternary consistent with first m of GIN_FALSE and rest n-m of GIN_MAYBE until it returns GIN_FALSE. 3) Decide whether to use fast-scan depending on number of TIDs in first m entries and number of TIDs in rest (n-m) entries. Something like number_of_TIDs_in_frist_m_entries * k number_of_TIDs_in_rest_n-m_entries where k is some quotient. For sure, it rough method, but it should be fast. Without natural implementation of ternary consistent function algorithm will be slightly different: only m values close to n should be checked. I'm going to implement this heuristic against last version of your patch. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Regression tests failing if not launched on db regression
On Fri, Jan 31, 2014 at 5:48 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Feb 1, 2014 at 12:42 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 31, 2014 at 2:28 AM, Michael Paquier michael.paqu...@gmail.com wrote: I took a look at this with a view to committing it but on examination I'm not sure this is the best way to proceed. The proposed text documents that the tests should be run in a database called regression, but the larger documentation chapter of which this section is a part never explains how to run them anywhere else, so it feels a bit like telling a ten-year-old not to burn out the clutch. The bit about not changing enable_* probably belongs, if anywhere, in section 30.2, on test evaluation, rather than here. And what about the attached? I have moved all the content to 30.2, and added two paragraphs: one about the planner flags, the other about the database used. Regards, Well, it doesn't really address my first concern, which was that you talk about running the tests in a database named regression, but that's exactly what make check does and it's unclear how you would do anything else without modifying the source code. It's not the purpose of the documentation to tell you all the ways that you could break things if you patch the tree. I also don't want to document exactly which tests would fail if you did hack things like that; that documentation is likely to become outdated. I think the remaining points you raise are worth mentioning. I'm attaching a patch with my proposed rewording of your changes. I made the section about configuration parameters a bit more generic and adjusted the phrasing to sound more natural in English, and I moved your mention of the other issues around a bit. What do you think of this version? The part about the planning parameter looks good, thanks. The places used to mention the databases used also makes more sense. Thanks for your input. OK, committed and back-patched to 9.3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] KNN-GiST with recheck
On Tue, Jan 28, 2014 at 9:32 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/28/2014 04:12 PM, Alexander Korotkov wrote: On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: 4. (as you mentioned in the other thread: ) It's a modularity violation that you peek into the heap tuple from gist. I think the proper way to do this would be to extend the IndexScan executor node to perform the re-shuffling of tuples that come from the index in wrong order, or perhaps add a new node type for it. Of course that's exactly what your partial sort patch does :-). I haven't looked at that in detail, but I don't think the approach the partial sort patch takes will work here as is. In the KNN-GiST case, the index is returning tuples roughly in the right order, but a tuple that it returns might in reality belong somewhere later in the ordering. In the partial sort patch, the input stream of tuples is divided into non-overlapping groups, so that the tuples within the group are not sorted, but the groups are. I think the partial sort case is a special case of the KNN-GiST case, if you consider the lower bound of each tuple to be the leading keys that you don't need to sort. Yes. But, for instance btree accesses heap for unique checking. Is really it so crimilal? :-) Well, it is generally considered an ugly hack in b-tree too. I'm not 100% opposed to doing such a hack in GiST, but would very much prefer not to. This is not only question of a new node or extending existing node. We need to teach planner/executor access method can return value of some expression which is lower bound of another expression. AFICS now access method can return only original indexed datums and TIDs. So, I afraid that enormous infrastructure changes are required. And I can hardly imagine what they should look like. Yeah, I'm not sure either. Maybe a new field in IndexScanDesc, along with xs_itup. Or as an attribute of xs_itup itself. This shouldn't look like a hack too. Otherwise I see no point of it: it's better to have some isolated hack in access method than hack in planner/executor. So I see following changes to be needed to implement this right way: 1) Implement new relation between operators: operator1 is lower bound of operator2. 2) Extend am interface to let it return values of operators. 3) Implement new node for knn-sorting. However, it requires a lot of changes in PostgreSQL infrastructure and can appear to be not enough general too (we don't know until we have another application). -- With best regards, Alexander Korotkov.
Re: [HACKERS] GIN improvements part2: fast scan
On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3 Únor 2014, 15:31, Alexander Korotkov wrote: I found my patch 0005-Ternary-consistent-implementation.patch to be completely wrong. It introduces ternary consistent function to opclass, but don't uses it, because I forgot to include ginlogic.c change into patch. So, it shouldn't make any impact on performance. However, testing results with that patch significantly differs. That makes me very uneasy. Can we now reproduce exact same? Do I understand it correctly that the 0005 patch should give exactly the same performance as the 9.4-heikki branch (as it was applied on it, and effectively did no change). This wasn't exactly what I measured, although the differences were not that significant. Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here: http://www.fuzzy.cz/tmp/gin/# In some queries it differs in times. I wonder why. I can rerun the tests, if that's what you're asking for. I'll improve the test a bit - e.g. I plan to average multiple runs, to filter out random noise (which might be significant for such short queries). Right version of these two patches in one against current head is attached. I've rerun tests with it, results are /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun postprocessing including graph drawing? Yes, I'll do that. However I'll have to rerun the other tests too, because the previous runs were done on a different machine. I'm a bit confused right now. The previous patches (0005 + 0007) were supposed to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those were not commited yet, so why is this version against HEAD? To summarize, I know of these patch sets: 9.4-heikki (old version) 0001-Optimize-GIN-multi-key-queries.patch 0002-Further-optimize-the-multi-key-GIN-searches.patch 0003-Further-optimize-GIN-multi-key-searches.patch 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch 9.4-alex-1 (based on 9.4-heikki) 0005-Ternary-consistent-implementation.patch 9.4-alex-1 (based on 9.4-alex-1) 0006-Sort-entries.patch From these patches I only need to compare 9.4-heikki (old version) and 9.4-alex-1 to release my doubts. 9.4-heikki (new version) gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch 9.4-alex-2 (new version) gin-fast-scan.10.patch.gz Or did I get that wrong? Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in numbering. Sometimes test cases are not what we expect. For example: =# explain SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(5alpha1-initdb''d)'); QUERY PLAN Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) - Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 rows=1 width=0) Index Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) Planning time: 0.257 ms (5 rows) 5alpha1-initdb'd is 3 gin entries with different frequencies. Why do you find that strange? The way the query is formed or the way it's evaluated? The query generator certainly is not perfect, so it may produce some strange queries. I just mean that in this case 3 words doesn't mean 3 gin entries. Also, these patches are not intended to change relevance ordering speed. When number of results are high, most of time is relevance calculating and sorting. I propose to remove ORDER BY clause from test cases to see scan speed more clear. Sure, I can do that. Or maybe one set of queries with ORDER BY, the other one without it. Good. I've dump of postgresql.org search queries from Magnus. We can add them to our test case. You mean search queries from the search for mailing list archives? Sure, we add that. Yes. I'll transform it into tsquery and send you privately. -- With best regards, Alexander Korotkov.
Re: [HACKERS] bugfix patch for json_array_elements
Andrew Dunstan and...@dunslane.net writes: On 02/02/2014 08:54 PM, Craig Ringer wrote: The attached patch deletes the context after use, bringing performance back into line. It should be backpatched to 9.3. Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the end of the function since it's a child of the CurrentMemoryContext. The executor does MemoryContextReset, not MemoryContextResetAndDeleteChildren, on the per-tuple context. That means that child contexts will be reset, not deleted. I seem to recall some discussions about changing that, or even redefining MemoryContextReset to automatically delete child contexts; but it would take a fair amount of research to be sure such a change was safe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 3 Únor 2014, 17:08, Alexander Korotkov wrote: On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3 Únor 2014, 15:31, Alexander Korotkov wrote: I found my patch 0005-Ternary-consistent-implementation.patch to be completely wrong. It introduces ternary consistent function to opclass, but don't uses it, because I forgot to include ginlogic.c change into patch. So, it shouldn't make any impact on performance. However, testing results with that patch significantly differs. That makes me very uneasy. Can we now reproduce exact same? Do I understand it correctly that the 0005 patch should give exactly the same performance as the 9.4-heikki branch (as it was applied on it, and effectively did no change). This wasn't exactly what I measured, although the differences were not that significant. Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here: http://www.fuzzy.cz/tmp/gin/# Yes. In some queries it differs in times. I wonder why. Not sure. I can rerun the tests, if that's what you're asking for. I'll improve the test a bit - e.g. I plan to average multiple runs, to filter out random noise (which might be significant for such short queries). Right version of these two patches in one against current head is attached. I've rerun tests with it, results are /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun postprocessing including graph drawing? Yes, I'll do that. However I'll have to rerun the other tests too, because the previous runs were done on a different machine. I'm a bit confused right now. The previous patches (0005 + 0007) were supposed to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those were not commited yet, so why is this version against HEAD? To summarize, I know of these patch sets: 9.4-heikki (old version) 0001-Optimize-GIN-multi-key-queries.patch 0002-Further-optimize-the-multi-key-GIN-searches.patch 0003-Further-optimize-GIN-multi-key-searches.patch 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch 9.4-alex-1 (based on 9.4-heikki) 0005-Ternary-consistent-implementation.patch 9.4-alex-1 (based on 9.4-alex-1) 0006-Sort-entries.patch From these patches I only need to compare 9.4-heikki (old version) and 9.4-alex-1 to release my doubts. OK, understood. 9.4-heikki (new version) gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch 9.4-alex-2 (new version) gin-fast-scan.10.patch.gz Or did I get that wrong? Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in numbering. You're right. It should have been like this: 9.4-alex-1 (based on 9.4-heikki) 0005-Ternary-consistent-implementation.patch 9.4-alex-2 (based on 9.4-alex-1) 0006-Sort-entries.patch 9.4-alex-3 (new version, not yet tested) gin-fast-scan.10.patch.gz Sometimes test cases are not what we expect. For example: =# explain SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(5alpha1-initdb''d)'); QUERY PLAN Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) - Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 rows=1 width=0) Index Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) Planning time: 0.257 ms (5 rows) 5alpha1-initdb'd is 3 gin entries with different frequencies. Why do you find that strange? The way the query is formed or the way it's evaluated? The query generator certainly is not perfect, so it may produce some strange queries. I just mean that in this case 3 words doesn't mean 3 gin entries. Isn't that expected? I mean, that's what to_tsquery may do, right? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? That's what I was thinking about, yes. Of course, when the restart interval is 0, done until restart interval is the same as restart immediately, so for anyone who wants to *always* restart immediately there is no problem. Where you will run into trouble is if you sometimes want to wait for the restart interval and other times want to restart immediately. But I'm not sure that's a real use case. If it is, I suggest that we assign it some other, more obscure exit code and reserve 0 and 1 for what I believe will probably be the common cases. It would be potentially more useful and more general to have a function BackgroundWorkerSetMyRestartInterval() or similar. That way, a background worker could choose to get restarted after whatever amount of time it likes in a particular instance, rather than only 0 or the interval configured at registration time. But I'm not that excited about implementing that, and certainly not for 9.4. It's not clear that there's a real need, and there are thorny questions like should a postmaster crash and restart cycle cause an immediate restart? and what if some other process wants to poke that background worker and cause it to restart sooner?. There are lots of interesting and potentially useful behaviors here, but I'm wary of letting the engineering getting out ahead of the use cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow startup due to LWLockAssign() spinlock
Andres Freund and...@2ndquadrant.com writes: On larger, multi-socket, machines, startup takes a fair bit of time. As I was profiling anyway I looked into it and noticed that just about all of it is spent in LWLockAssign() called by InitBufferPool(). Starting with shared_buffers=48GB on the server Nate Boley provided, takes about 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. Simply modifying LWLockAssign() to not take the spinlock when !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making LWLockAssign() prettier it seems enough of a speedup to be worthwile nonetheless. Hm. This patch only works if the postmaster itself never assigns any LWLocks except during startup. That's *probably* all right, but it seems a bit scary. Is there any cheap way to make the logic actually be what your comment claims, namely Interlocking is not necessary during postmaster startup? I guess we could invent a ShmemInitInProgress global flag ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? That's what I was thinking about, yes. Of course, when the restart interval is 0, done until restart interval is the same as restart immediately, so for anyone who wants to *always* restart immediately there is no problem. Where you will run into trouble is if you sometimes want to wait for the restart interval and other times want to restart immediately. But I'm not sure that's a real use case. If it is, I suggest that we assign it some other, more obscure exit code and reserve 0 and 1 for what I believe will probably be the common cases. Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about exit(0) - done until restart interval, or permanently if there is none exit(1) - done permanently, even if a restart interval was declared exit(other) - crash I don't offhand see a point in an exit and restart immediately case. Why exit at all, if you could just keep running? If you *can't* just keep running, it probably means you know you've bollixed something, so that the crash case is probably what to do anyway. It would be potentially more useful and more general to have a function BackgroundWorkerSetMyRestartInterval() or similar. That might be a good idea too, but I think it's orthogonal to what the exit codes mean. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? That's what I was thinking about, yes. Of course, when the restart interval is 0, done until restart interval is the same as restart immediately, so for anyone who wants to *always* restart immediately there is no problem. Where you will run into trouble is if you sometimes want to wait for the restart interval and other times want to restart immediately. But I'm not sure that's a real use case. If it is, I suggest that we assign it some other, more obscure exit code and reserve 0 and 1 for what I believe will probably be the common cases. Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about exit(0) - done until restart interval, or permanently if there is none exit(1) - done permanently, even if a restart interval was declared exit(other) - crash I think what I proposed is better for two reasons: 1. It doesn't change the meaning of exit(1) vs. 9.3. All background worker code I've seen or heard about (which is admittedly not all there is) does exit(1) because the current exit(0) behavior doesn't seem to be what anyone wants. So I don't think it matters much whether we redefine exit(0), but redefining exit(1) will require version-dependent changes to the background workers in contrib and, most likely, every other background worker anyone has written. 2. If you want a worker that starts up, does something, and doesn't need to be further restarted when that succeeds, then under your definition you return 0 when you fail and 1 when you succeed. Under my definition you do the opposite, which I think is more natural. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
On 01/31/2014 06:26 AM, Fujii Masao wrote: Is there a good place to define the constant, so that both backend and client can use it? I'd say 'include/common' but no existing file seems to be appropriate. I'm not sure if it's worth to add a new file there. If there is no good place to define them, it's okay to define them also in client side for now. +termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal] [literalMAX_RATE/literal]/term It's better to add something like replaceable'rate'/replaceable just after literalMAX_RATE/literal. + para + literalMAX_RATE/literal does not affect WAL streaming. + /para I don't think that this paragraph is required here because BASE_BACKUP is basically independent from WAL streaming. Why did you choose bytes per second as a valid rate which we can specify? Since the minimum rate is 32kB, isn't it better to use KB per second for that? If we do that, we can easily increase the maximum rate from 1GB to very large number in the future if required. The attached version addresses all the comments above. +wait_result = WaitLatch(MyWalSnd-latch, WL_LATCH_SET | WL_TIMEOUT +| WL_POSTMASTER_DEATH, (long) (sleep / 1000)); If WL_POSTMASTER_DEATH is triggered, we should exit immediately like other process does? This is not a problem of this patch. This problem exists also in current master. But ISTM it's better to solve that together. Thought? Once we're careful about not missing signals, I think PM death should be noticed too. The backup functionality itself would probably manage to finish without postmaster, however it's executed under walsender process. Question is where !PostmasterIsAlive() check should be added. I think it should go to the main loop of perform_base_backup(), but that's probably not in the scope of this patch. Do you think that my patch should only add a comment like Don't forget to set WL_POSTMASTER_DEATH flag when making basebackup.c sensitive to PM death? // Antonin Houska (Tony) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 832524e..704b653 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1772,7 +1772,7 @@ The commands accepted in walsender mode are: /varlistentry varlistentry -termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal]/term +termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal] [literalMAX_RATE/literal replaceable'rate'/replaceable]/term listitem para Instructs the server to start streaming a base backup. @@ -1840,7 +1840,20 @@ The commands accepted in walsender mode are: the waiting and the warning, leaving the client responsible for ensuring the required log is available. /para - /listitem +/listitem + /varlistentry + varlistentry +termliteralMAX_RATE/literal replaceablerate//term +listitem + para + Limit (throttle) the maximum amount of data transferred from server + to client per unit of time. The expected unit is kilobytes per second. + If this option is specified, the value must either be equal to zero + or it must fall within the range from 32 kB through 1 GB (inclusive). + If zero is passed or the option is not specified, no restriction is + imposed on the transfer. + /para +/listitem /varlistentry /variablelist /para diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index c379df5..f8866db 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -189,6 +189,27 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-r replaceable class=parameterrate/replaceable/option/term + termoption--max-rate=replaceable class=parameterrate/replaceable/option/term + listitem + para +The maximum amount of data transferred from server per second. +The purpose is to limit the impact of applicationpg_basebackup/application +on the running server. + /para + para +This option always affects transfer of the data directory. Transfer of +WAL files is only affected if the collection method is literalfetch/literal. + /para + para +Valid values are between literal32 kB/literal and literal1024 MB/literal. +Suffixes literalk/literal (kilobytes) and literalM/literal +(megabytes) are accepted. + /para + /listitem + /varlistentry + + varlistentry
Re: [HACKERS] slow startup due to LWLockAssign() spinlock
On 2014-02-03 11:22:45 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On larger, multi-socket, machines, startup takes a fair bit of time. As I was profiling anyway I looked into it and noticed that just about all of it is spent in LWLockAssign() called by InitBufferPool(). Starting with shared_buffers=48GB on the server Nate Boley provided, takes about 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. Simply modifying LWLockAssign() to not take the spinlock when !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making LWLockAssign() prettier it seems enough of a speedup to be worthwile nonetheless. Hm. This patch only works if the postmaster itself never assigns any LWLocks except during startup. That's *probably* all right, but it seems a bit scary. Is there any cheap way to make the logic actually be what your comment claims, namely Interlocking is not necessary during postmaster startup? I guess we could invent a ShmemInitInProgress global flag ... I'd be fine with inventing such a flag, I couldn't find one and decided that this alone didn't merit it, since it seems to be really unlikely that we will start to allocate such resources after startup in the postmaster. Unless we're talking about single user mode obviously, but the spinlock isn't necessary there anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugfix patch for json_array_elements
On 02/03/2014 11:12 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 02/02/2014 08:54 PM, Craig Ringer wrote: The attached patch deletes the context after use, bringing performance back into line. It should be backpatched to 9.3. Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the end of the function since it's a child of the CurrentMemoryContext. The executor does MemoryContextReset, not MemoryContextResetAndDeleteChildren, on the per-tuple context. That means that child contexts will be reset, not deleted. I seem to recall some discussions about changing that, or even redefining MemoryContextReset to automatically delete child contexts; but it would take a fair amount of research to be sure such a change was safe. Good to know. Is it worth a note in src/backend/utils/mmgr/README so people are warned against making the same mistake I did? Both of these are set-returning functions operating in materialize mode - not sure if that makes any difference. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about ... I think what I proposed is better for two reasons: 1. It doesn't change the meaning of exit(1) vs. 9.3. All background worker code I've seen or heard about (which is admittedly not all there is) does exit(1) because the current exit(0) behavior doesn't seem to be what anyone wants. Hm. If that's actually the case, then I agree that preserving the current behavior of exit(1) is useful. I'd been assuming we were breaking things anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Fri, Jan 31, 2014 at 10:22 PM, Bruce Momjian br...@momjian.us wrote: On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote: Right now, whether or not to autovacuum is the rest of a two-pronged test. The first prong is based on number of updates and deletes relative to table size; that triggers a regular autovacuum. The second prong is based on age(relfrozenxid) and triggers a non-page-skipping vacuum (colloquially, an anti-wraparound vacuum). The typical case in which this doesn't work out well is when the table has a lot of inserts but few or no updates and deletes. So I propose that we change the first prong to count inserts as well as updates and deletes when deciding whether it needs to vacuum the table. We already use that calculation to decide whether to auto-analyze, so it wouldn't be very novel. We know that the work of marking pages all-visible will need to be done at some point, and doing it sooner will result in doing it in smaller batches, which seems generally good. However, I do have one concern: it might lead to excessive index-vacuuming. Right now, we skip the index vac step only if there ZERO dead tuples are found during the heap scan. Even one dead tuple (or line pointer) will cause an index vac cycle, which may easily be excessive. So I further propose that we introduce a threshold for index-vac; so that we only do index vac cycle if the number of dead tuples exceeds, say 0.1% of the table size. Thoughts? Let the hurling of rotten tomatoes begin. Robert, where are we on this? Should I post a patch? I started working on this at one point but didn't finish the implementation, let alone the no-doubt-onerous performance testing that will be needed to validate whatever we come up with. It would be really easy to cause serious regressions with ill-considered changes in this area, and I don't think many people here have the bandwidth for a detailed study of all the different workloads that might be affected here right this very minute. More generally, you're sending all these pings three weeks after the deadline for CF4. I don't think that's a good time to encourage people to *start* revising old patches, or writing new ones. I've also had some further thoughts about the right way to drive vacuum scheduling. I think what we need to do is tightly couple the rate at which we're willing to do vacuuming to the rate at which we're incurring vacuum debt. That is, if we're creating 100kB/s of pages needing vacuum, we vacuum at 2-3MB/s (with default settings). If we're creating 10MB/s of pages needing vacuum, we *still* vacuum at 2-3MB/s. Not shockingly, vacuum gets behind, the database bloats, and everything goes to heck. The rate of vacuuming needs to be tied somehow to the rate at which we're creating stuff that needs to be vacuumed. Right now we don't even have a way to measure that, let alone auto-regulate the aggressiveness of autovacuum on that basis. Similarly, for marking of pages as all-visible, we currently make the same decision whether the relation is getting index-scanned (in which case the failure to mark those pages all-visible may be suppressing the use of index scans or making them less effective) or whether it's not being accessed at all (in which case vacuuming it won't help anything, and might hurt by pushing other pages out of cache). Again, if we had better statistics, we could measure this - counting heap fetches for actual index-only scans plus heap fetches for index scans that might have been planned index-only scans but for the relation having too few all-visible pages doesn't sound like an impossible metric to gather. And if we had that, we could use it to trigger vacuuming, instead of guessing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra t...@fuzzy.cz wrote: Sometimes test cases are not what we expect. For example: =# explain SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(5alpha1-initdb''d)'); QUERY PLAN Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) - Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 rows=1 width=0) Index Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) Planning time: 0.257 ms (5 rows) 5alpha1-initdb'd is 3 gin entries with different frequencies. Why do you find that strange? The way the query is formed or the way it's evaluated? The query generator certainly is not perfect, so it may produce some strange queries. I just mean that in this case 3 words doesn't mean 3 gin entries. Isn't that expected? I mean, that's what to_tsquery may do, right? Everything is absolutely correct. :-) It just may be not what do you expect if you aren't getting into details. -- With best regards, Alexander Korotkov.
Re: [HACKERS] GIN improvements part2: fast scan
On 3 Únor 2014, 19:18, Alexander Korotkov wrote: On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra t...@fuzzy.cz wrote: Sometimes test cases are not what we expect. For example: =# explain SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(5alpha1-initdb''d)'); QUERY PLAN Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4) Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) - Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00 rows=1 width=0) Index Cond: (body_tsvector @@ '''5alpha1-initdb'' ''5alpha1'' ''initdb'' ''d'''::tsquery) Planning time: 0.257 ms (5 rows) 5alpha1-initdb'd is 3 gin entries with different frequencies. Why do you find that strange? The way the query is formed or the way it's evaluated? The query generator certainly is not perfect, so it may produce some strange queries. I just mean that in this case 3 words doesn't mean 3 gin entries. Isn't that expected? I mean, that's what to_tsquery may do, right? Everything is absolutely correct. :-) It just may be not what do you expect if you aren't getting into details. Well, that's not how I designed the benchmark. I haven't based the benchmark on GIN entries, but on 'natural' words, to simulate real queries. I understand using GIN terms might get more consistent results (e.g. 3 GIN terms with given frequency) than the current approach. However this was partially a goal, to cover wider range of cases. Also, that's why the benchmark works with relative speedup - comparing the query duration with and without the patch. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_sleep(interval)
On Sun, Feb 2, 2014 at 4:50 AM, Julien Rouhaud rjuju...@gmail.com wrote: It seems like pg_sleep_until() has issues if used within a transaction, as it uses now() and not clock_timestamp(). Please find attached a patch that solves this issue. For consistency reasons, I also modified pg_sleep_for() to also use clock_timestamp. Good catch! Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.warn_shadow
Hello I am not happy from warnings_as_error what about stop_on_warning instead? second question: should be these errors catchable or uncatchable? When I work on large project, where I had to use some error handler of EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax errors was catched by this handler. Any debugging was terribly difficult and I had to write plpgsql_lint as solution. Regards Pavel 2014-02-03 Marko Tiikkaja ma...@joh.to: Hi everyone, Here's a new version of the patch. Added new tests and docs and changed the GUCs per discussion. plpgsql.warnings_as_errors only affects compilation at CREATE FUNCTION time: =# set plpgsql.warnings to 'all'; SET =#* set plpgsql.warnings_as_errors to true; SET =#* select foof(1); -- compiled since it's the first call in this session WARNING: variable f1 shadows a previously defined variable LINE 2: declare f1 int; ^ foof -- (1 row) =#* create or replace function foof(f1 int) returns void as $$ declare f1 int; begin end $$ language plpgsql; ERROR: variable f1 shadows a previously defined variable LINE 3: declare f1 int; Currently, plpgsql_warnings is a boolean since there's only one warning we implement. The idea is to make it a bit field of some kind in the future when we add more warnings. Starting that work for 9.4 seemed like overkill, though. I tried to keep things simple. Regards, Marko Tiikkaja
Re: [HACKERS] plpgsql.warn_shadow
On 2014-02-03 9:17 PM, Pavel Stehule wrote: I am not happy from warnings_as_error what about stop_on_warning instead? abort_compilation_on_warning? I think if we're not going to make it more clear that warnings are only promoted to errors at CREATE FUNCTION time, we can't do much better than warnings_as_errors. second question: should be these errors catchable or uncatchable? When I work on large project, where I had to use some error handler of EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax errors was catched by this handler. Any debugging was terribly difficult and I had to write plpgsql_lint as solution. Is this really an issue considering these errors can only happen when someone runs CREATE FUNCTION? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.warn_shadow
Dne 3.2.2014 21:48 Marko Tiikkaja ma...@joh.to napsal(a): On 2014-02-03 9:17 PM, Pavel Stehule wrote: I am not happy from warnings_as_error what about stop_on_warning instead? abort_compilation_on_warning? I think if we're not going to make it more clear that warnings are only promoted to errors at CREATE FUNCTION time, we can't do much better than warnings_as_errors. second question: should be these errors catchable or uncatchable? When I work on large project, where I had to use some error handler of EXCEPTION WHEN OTHERS I found very strange and not useful so all syntax errors was catched by this handler. Any debugging was terribly difficult and I had to write plpgsql_lint as solution. Is this really an issue considering these errors can only happen when someone runs CREATE FUNCTION? Can you look, pls, what terminology is used in gcc, clang, perl or python. I agree with idea, but proposed names I have not associated with something. Regards Pavel Regards, Marko Tiikkaja
[HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD
On Sun, Feb 2, 2014 at 7:13 AM, Andres Freund and...@2ndquadrant.com wrote: Just as reference, we're talking about a performance degradation from 475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting max_connections to 90, from 91... That's pretty terrible. So, I looked into this, and I am fairly certain it's because of the (mis-)alignment of the buffer descriptors. With certain max_connections settings InitBufferPool() happens to get 64byte aligned addresses, with others not. I checked the alignment with gdb to confirm that. I find your diagnosis to be quite plausible. A quick hack (attached) making BufferDescriptor 64byte aligned indeed restored performance across all max_connections settings. It's not surprising that a misaligned buffer descriptor causes problems - there'll be plenty of false sharing of the spinlocks otherwise. Curious that the the intel machine isn't hurt much by this. I think that is explained here: http://www.agner.org/optimize/blog/read.php?i=142v=t With Sandy Bridge, Misaligned memory operands [are] handled efficiently. Now all this hinges on the fact that by a mere accident BufferDescriptors are 64byte in size: Are they 64 bytes in size on REL9_*_STABLE? How about on win64? I think we're reasonably disciplined here already, but long is 32-bits in length even on win64. Looks like it would probably be okay, but as you say, it doesn't seem like something to leave to chance. We could polish up the attached patch and apply it to all the branches, the costs of memory are minimal. But I wonder if we shouldn't instead make ShmemInitStruct() always return cacheline aligned addresses. That will require some fiddling, but it might be a good idea nonetheless? What fiddling are you thinking of? I think we should also consider some more reliable measures to have BufferDescriptors cacheline sized, rather than relying on the happy accident. Debugging alignment issues isn't fun, too much of a guessing game... +1. Maybe make code that isn't appropriately aligned fail to compile? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.comwrote: Some background: The setups that triggered me into working on the patchset didn't really have a pgbench like workload, the individual queries were/are more complicated even though it's still an high throughput OLTP workload. And the contention was *much* higher than what I can reproduce with pgbench -S, there was often nearly all time spent in the lwlock's spinlock, and it was primarily the buffer mapping lwlocks, being locked in shared mode. The difference is that instead of locking very few buffers per query like pgbench does, they touched much more. Perhaps I should try to argue for this extension to pgbench again: http://www.postgresql.org/message-id/CAMkU=1w0K3RNhtPuLF8WQoVi6gxgG6mcnpC=-ivjwkjkydp...@mail.gmail.com I think it would go a good job of exercising what you want, provided you set the scale so that all data fit in RAM but not in shared_buffers. Or maybe you want it to fit in shared_buffers, since the buffer mapping lock was contended in shared mode--that suggests the problem is finding the buffer that already has the page, not making a buffer to have the page. Cheers, Jeff
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: The changed algorithm for lwlock imo is an *algorithmic* improvement, not one for a particular architecture. The advantage being that locking a lwlock which is primarily taken in shared mode will never need need to wait or loop. I agree. My point was only that the messaging ought to be that this is something that those with multi-socket Intel systems should take note of. Yes, that branch is used by some of them. But to make that clear to all that are still reading, I have *first* presented the patch findings to -hackers and *then* backported it, and I have referenced the existance of the patch for 9.2 on list before. This isn't some kind of secret sauce deal... No, of course not. I certainly didn't mean to imply that. My point was only that anyone that is affected to the same degree as the party with the 4 socket server might be left with a very poor impression of Postgres if we failed to fix the problem. It clearly rises to the level of a bugfix. That might be something to do later, as it *really* can hurt in practice. We had one server go from load 240 to 11... Well, we have to commit something on master first. But it should be a priority to avoid having this hurt users further, since the problems are highly predictable for certain types of servers. But I think we should first focus on getting the patch ready for master, then we can see where it's going. At the very least I'd like to split of the part modifying the current spinlocks to use the atomics, that seems far to invasive. Agreed. I unfortunately can't tell you that much more, not because it's private, but because it mostly was diagnosed by remote hand debugging, limiting insights considerably. Of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
On Mon, Feb 3, 2014 at 4:15 AM, Robert Haas robertmh...@gmail.com wrote: I'm not really feeling a compelling need to change that. We've been displaying total runtime - described exactly that way - for many releases and it's surely is confusing to the novice that the time reported can be much less than the time reported by psql's \timing option, usually because of planning time. I think that has a lot more to do with network roundtrip time and protocol/serialization overhead. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD
On 2014-02-03 15:17:13 -0800, Peter Geoghegan wrote: On Sun, Feb 2, 2014 at 7:13 AM, Andres Freund and...@2ndquadrant.com wrote: Just as reference, we're talking about a performance degradation from 475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting max_connections to 90, from 91... That's pretty terrible. Yea, I was scared myself. A quick hack (attached) making BufferDescriptor 64byte aligned indeed restored performance across all max_connections settings. It's not surprising that a misaligned buffer descriptor causes problems - there'll be plenty of false sharing of the spinlocks otherwise. Curious that the the intel machine isn't hurt much by this. I think that is explained here: http://www.agner.org/optimize/blog/read.php?i=142v=t With Sandy Bridge, Misaligned memory operands [are] handled efficiently. No, I don't think so. Those improvements afair refer to unaligned accesses as in accessing a 4 byte variable at address % 4 != 0. Now all this hinges on the fact that by a mere accident BufferDescriptors are 64byte in size: Are they 64 bytes in size on REL9_*_STABLE? How about on win64? I think we're reasonably disciplined here already, but long is 32-bits in length even on win64. Looks like it would probably be okay, but as you say, it doesn't seem like something to leave to chance. I haven't checked any branches yet, but I don't remember any recent changes to sbufdesc. And I think we're lucky enough that all the used types are the same width across common 64bit OSs. But I absolutely agree we shouldn't leave this to chance. We could polish up the attached patch and apply it to all the branches, the costs of memory are minimal. But I wonder if we shouldn't instead make ShmemInitStruct() always return cacheline aligned addresses. That will require some fiddling, but it might be a good idea nonetheless? What fiddling are you thinking of? Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before returning from ShmemAlloc() (and thereby ShmemInitStruct). After I'd written the email I came across an interesting bit of code: void * ShmemAlloc(Size size) { ... /* extra alignment for large requests, since they are probably buffers */ if (size = BLCKSZ) newStart = BUFFERALIGN(newStart); /* * Preferred alignment for disk I/O buffers. On some CPUs, copies between * user space and kernel space are significantly faster if the user buffer * is aligned on a larger-than-MAXALIGN boundary. Ideally this should be * a platform-dependent value, but for now we just hard-wire it. */ #define ALIGNOF_BUFFER 32 #define BUFFERALIGN(LEN)TYPEALIGN(ALIGNOF_BUFFER, (LEN)) So, we're already trying to make large allocations (which we'll be using here) try to fit to a 32 byte alignment. Which unfortunately isn't enough here, but it explains why max_connections has to be changed by exactly 1 to achieve adequate performance... So, the absolutely easiest thing would be to just change ALIGNOF_BUFFER to 64. The current value was accurate in 2003 (common cacheline size back then), but it really isn't anymore today. Anything relevant is 64byte. But I really think we should also remove the BLCKSZ limit. There's relatively few granular/dynamic usages of ShmemAlloc(), basically just shared memory hashes. And I'd be rather unsurprised if aligning all allocations for hashes resulted in a noticeable speedup. We have frightening bottlenecks in those today. I think we should also consider some more reliable measures to have BufferDescriptors cacheline sized, rather than relying on the happy accident. Debugging alignment issues isn't fun, too much of a guessing game... +1. Maybe make code that isn't appropriately aligned fail to compile? I was wondering about using some union trickery to make sure the array only consists out of properly aligned types. That'd require some changes but luckily code directly accessing the BufferDescriptors array isn't too far spread. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
Peter Geoghegan p...@heroku.com writes: On Mon, Feb 3, 2014 at 4:15 AM, Robert Haas robertmh...@gmail.com wrote: I'm not really feeling a compelling need to change that. We've been displaying total runtime - described exactly that way - for many releases and it's surely is confusing to the novice that the time reported can be much less than the time reported by psql's \timing option, usually because of planning time. I think that has a lot more to do with network roundtrip time and protocol/serialization overhead. The problem I'm having with the way it stands now is that one would reasonably expect that Total time is the total of all times counted by EXPLAIN, including main plan execution time, trigger firing time, and now planning time. Since it is not, any longer, a total, I think renaming it would be a good idea. I'm not wedded to execution time in particular, but I don't like total. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow startup due to LWLockAssign() spinlock
On 2014-02-03 11:22:45 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On larger, multi-socket, machines, startup takes a fair bit of time. As I was profiling anyway I looked into it and noticed that just about all of it is spent in LWLockAssign() called by InitBufferPool(). Starting with shared_buffers=48GB on the server Nate Boley provided, takes about 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. Simply modifying LWLockAssign() to not take the spinlock when !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making LWLockAssign() prettier it seems enough of a speedup to be worthwile nonetheless. Hm. This patch only works if the postmaster itself never assigns any LWLocks except during startup. That's *probably* all right, but it seems a bit scary. Is there any cheap way to make the logic actually be what your comment claims, namely Interlocking is not necessary during postmaster startup? I guess we could invent a ShmemInitInProgress global flag ... So, here's a flag implementing things with that flag. I kept your name, as it's more in line with ipci.c's naming, but it looks kinda odd besides proc_exit_inprogress. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index c392d4f..ece9674 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -48,6 +48,11 @@ shmem_startup_hook_type shmem_startup_hook = NULL; static Size total_addin_request = 0; static bool addin_request_allowed = true; +/* + * Signals whether shared memory is currently being initialized, while the + * postmaster hasn't forked any children yet. + */ +bool ShmemInitInProgress = false; /* * RequestAddinShmemSpace @@ -97,6 +102,12 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) int numSemas; /* + * We're now initializing shared memory, and we won't have forked + * children yet. + */ + ShmemInitInProgress = true; + + /* * Size of the Postgres shared-memory block is estimated via * moderately-accurate estimates for the big hogs, plus 100K for the * stuff that's too small to bother with estimating. @@ -261,4 +272,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port) */ if (shmem_startup_hook) shmem_startup_hook(); + + if(!IsUnderPostmaster) + ShmemInitInProgress = false; } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 82ef440..0365f9b 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -355,9 +355,11 @@ CreateLWLocks(void) * LWLockAssign - assign a dynamically-allocated LWLock number * * We interlock this using the same spinlock that is used to protect - * ShmemAlloc(). Interlocking is not really necessary during postmaster - * startup, but it is needed if any user-defined code tries to allocate - * LWLocks after startup. + * ShmemAlloc(). Interlocking is not necessary while we're initializing + * shared memory, but it is needed for any code allocating LWLocks after + * startup. Since we allocate large amounts of LWLocks for the buffer pool + * during startup, we avoid taking the spinlock when not needed, as it has + * shown to slowdown startup considerably. */ LWLock * LWLockAssign(void) @@ -368,14 +370,17 @@ LWLockAssign(void) volatile int *LWLockCounter; LWLockCounter = (int *) ((char *) MainLWLockArray - 3 * sizeof(int)); - SpinLockAcquire(ShmemLock); + if (!ShmemInitInProgress) + SpinLockAcquire(ShmemLock); if (LWLockCounter[0] = LWLockCounter[1]) { - SpinLockRelease(ShmemLock); + if (!ShmemInitInProgress) + SpinLockRelease(ShmemLock); elog(ERROR, no more LWLocks available); } result = MainLWLockArray[LWLockCounter[0]++].lock; - SpinLockRelease(ShmemLock); + if (!ShmemInitInProgress) + SpinLockRelease(ShmemLock); return result; } diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h index 37eca7a..0f1e00a 100644 --- a/src/include/storage/ipc.h +++ b/src/include/storage/ipc.h @@ -63,6 +63,7 @@ typedef void (*shmem_startup_hook_type) (void); /* ipc.c */ extern bool proc_exit_inprogress; +extern bool ShmemInitInProgress; extern void proc_exit(int code) __attribute__((noreturn)); extern void shmem_exit(int code); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Sun, Feb 2, 2014 at 12:07 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-02 00:04:41 +0900, Michael Paquier wrote: On Fri, Dec 13, 2013 at 2:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-12-12 11:55:51 -0500, Tom Lane wrote: I'm not, however, terribly thrilled with the suggestions to add implicit casts associated with this type. Implicit casts are generally dangerous. It's a tradeof. Currently we have the following functions returning LSNs as text: * pg_current_xlog_location * pg_current_xlog_insert_location * pg_last_xlog_receive_location * pg_last_xlog_replay_location one view containing LSNs * pg_stat_replication and the following functions accepting LSNs as textual paramters: * pg_xlog_location_diff * pg_xlogfile_name The question is how do we deal with backward compatibility when introducing a LSN type? There might be some broken code around monitoring if we simply replace the type without implicit casts. Given the limited usage, how bad would it really be if we simply made all those take/return the LSN type? As long as the type's I/O representation looks like the old text format, I suspect most queries wouldn't notice. I've asked around inside 2ndq and we could find one single problematic query, so it's really not too bad. Are there some plans to awaken this patch (including changing the output of the functions of xlogfuncs.c)? This would be useful for the differential backup features I am looking at these days. I imagine that it is too late for 9.4 though... I think we should definitely go ahead with the patch per-se, we just added another system view with lsns in it... I don't have a too strong opinion whether to do it in 9.4 or 9.5. It seems fairly low impact to me, and it's an old patch, but I personally don't have the tuits to refresh the patch right now. Please find attached a patch implementing lsn as a datatype, based on the one Robert wrote a couple of years ago. Since XLogRecPtr has been changed to a simple uint64, this *refresh* has needed some work. In order to have this data type plugged in more natively with other xlog system functions, I have added as well PG_RETURN_LSN and PG_GETARG_LSN to facilitate the interface, making easier code management if XLogRecPtr or LSN format are changed in the future. Patch contains regression tests as well as a bit of documentation. Perhaps this is too late for 9.4, so if there are no objections I'll simply add this patch to the next commit fest in June for 9.5. Having the system functions use this data type (pg_start_backup, pg_xlog_*, create_rep_slot, etc.) does not look to be that difficult as all the functions in xlogfuncs.c actually use XLogRecPtr before changing it to text, so it would actually simplify the code. I think I'll simply code it as I just looking at it now... Regards, -- Michael *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** *** 155,160 --- 155,166 /row row +entrytypelsn/type/entry +entry/entry +entryLog Sequence Number/entry + /row + + row entrytypemacaddr/type/entry entry/entry entryMAC (Media Access Control) address/entry *** *** 4502,4507 SELECT * FROM pg_attribute --- 4508,4527 /para /sect1 + sect1 id=datatype-lsn +titleacronymLSN/ Type/title + +indexterm zone=datatype-lsn + primaryLSN/primary +/indexterm + +para + The typelsn/type data type can be used to store LSN (Log Sequence + Number) data which is a pointer to a location in the XLOG. This type is a + representation of XLogRecPtr. +/para + /sect1 + sect1 id=datatype-pseudo titlePseudo-Types/title *** a/src/backend/utils/adt/Makefile --- b/src/backend/utils/adt/Makefile *** *** 20,26 OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ cash.o char.o date.o datetime.o datum.o domains.o \ enum.o float.o format_type.o \ geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \ ! lockfuncs.o misc.o nabstime.o name.o numeric.o numutils.o \ oid.o oracle_compat.o orderedsetaggs.o \ pseudotypes.o rangetypes.o rangetypes_gist.o \ rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \ --- 20,26 cash.o char.o date.o datetime.o datum.o domains.o \ enum.o float.o format_type.o \ geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \ ! lockfuncs.o lsn.o misc.o nabstime.o name.o numeric.o numutils.o \ oid.o oracle_compat.o orderedsetaggs.o \ pseudotypes.o rangetypes.o rangetypes_gist.o \ rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \ *** /dev/null --- b/src/backend/utils/adt/lsn.c *** *** 0 --- 1,180 + /*- + * + * lsn.c + * Internal LSN operations + *
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
* Tom Lane (t...@sss.pgh.pa.us) wrote: The problem I'm having with the way it stands now is that one would reasonably expect that Total time is the total of all times counted by EXPLAIN, including main plan execution time, trigger firing time, and now planning time. Since it is not, any longer, a total, I think renaming it would be a good idea. I'm not wedded to execution time in particular, but I don't like total. Agreed. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Proposing pg_hibernate
Please find attached the pg_hibernate extension. It is a set-it-and-forget-it solution to enable hibernation of Postgres shared-buffers. It can be thought of as an amalgam of pg_buffercache and pg_prewarm. It uses the background worker infrastructure. It registers one worker process (BufferSaver) to save the shared-buffer metadata when server is shutting down, and one worker per database (BlockReader) when restoring the shared buffers. It stores the buffer metadata under $PGDATA/pg_database/, one file per database, and one separate file for global objects. It sorts the list of buffers before storage, so that when it encounters a range of consecutive blocks of a relation's fork, it stores that range as just one entry, hence reducing the storage and I/O overhead. On-disk binary format, which is used to create the per-database save-files, is defined as: 1. One or more relation filenodes; stored as rrelfilenode. 2. Each realtion is followed by one or more fork number; stored as fforknumber 3. Each fork number is followed by one or more block numbers; stored as bblocknumber 4. Each block number is followed by zero or more range numbers; stored as Nnumber {r {f {b N* }+ }+ }+ Possible enhancements: - Ability to save/restore only specific databases. - Control how many BlockReaders are active at a time; to avoid I/O storms. - Be smart about lowered shared_buffers across the restart. - Different modes of reading like pg_prewarm does. - Include PgFincore functionality, at least for Linux platforms. The extension currently works with PG 9.3, and may work on 9.4 without any changes; I haven't tested, though. If not, I think it'd be easy to port to HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via a GUC, and since my aim is to provide a non-intrusive no-tuning-required extension, I'd like to use the new dynamic-background-worker infrastructure in 9.4, which doesn't seem to have any preset limits (I think it's limited by max_connections, but I may be wrong). I can work on 9.4 port, if there's interest in including this as a contrib/ module. To see the extension in action: .) Compile it. .) Install it. .) Add it to shared_preload_libraries. .) Start/restart Postgres. .) Install pg_buffercache extension, to inspect the shared buffers. .) Note the result of pg_buffercache view. .) Work on your database to fill up the shared buffers. .) Note the result of pg_buffercache view, again; there should be more blocks than last time we checked. .) Stop and start the Postgres server. .) Note the output of pg_buffercache view; it should contain the blocks seen just before the shutdown. .) Future server restarts will automatically save and restore the blocks in shared-buffers. The code is also available as Git repository at https://github.com/gurjeet/pg_hibernate/ Demo: $ make -C contrib/pg_hibernate/ $ make -C contrib/pg_hibernate/ install $ vi $B/db/data/postgresql.conf $ grep shared_preload_libraries $PGDATA/postgresql.conf shared_preload_libraries = 'pg_hibernate' # (change requires restart) $ pgstart waiting for server to start done server started $ pgsql -c 'create extension pg_buffercache;' CREATE EXTENSION $ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;' count --- 163 14 (2 rows) $ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as b from generate_series(1, 10) as s;' SELECT 10 $ pgsql -c 'create index on test_hibernate(a);' CREATE INDEX $ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;' count --- 2254 14 (2 rows) $ pgstop waiting for server to shut down... done server stopped $ pgstart waiting for server to start done server started $ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;' count --- 2264 17 (2 rows) There are a few more blocks than the time they were saved, but all the blocks from before the restart are present in shared buffers after the restart. Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EDB www.EnterpriseDB.com http://www.enterprisedb.com pg_hibernate.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I refactored the loop in _bt_moveright to, well, not have that bug anymore. The 'page' and 'opaque' pointers are now fetched at the beginning of the loop. Did I miss something? I think so, yes. You still aren't assigning the value returned by _bt_getbuf() to 'buf'. Since, as I mentioned, _bt_finish_split() ultimately unlocks *and unpins*, it may not be the same buffer as before, so even with the refactoring there are race conditions. A closely related issue is that you haven't mentioned anything about buffer pins/refcount side effects in comments above _bt_finish_split(), even though I believe you should. A minor stylistic concern is that I think it would be better to only have one pair of _bt_finish_split()/_bt_getbuf() calls regardless of the initial value of 'access'. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
Michael Paquier michael.paqu...@gmail.com writes: Please find attached a patch implementing lsn as a datatype, based on the one Robert wrote a couple of years ago. Patch contains regression tests as well as a bit of documentation. Perhaps this is too late for 9.4, so if there are no objections I'll simply add this patch to the next commit fest in June for 9.5. I may have lost count, but aren't a bunch of the affected functions new in 9.4? If so, there's a good argument to be made that we should get this in now, rather than waiting and having an API change for those functions in 9.5. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Tue, Feb 4, 2014 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: Please find attached a patch implementing lsn as a datatype, based on the one Robert wrote a couple of years ago. Patch contains regression tests as well as a bit of documentation. Perhaps this is too late for 9.4, so if there are no objections I'll simply add this patch to the next commit fest in June for 9.5. I may have lost count, but aren't a bunch of the affected functions new in 9.4? If so, there's a good argument to be made that we should get this in now, rather than waiting and having an API change for those functions in 9.5. Cool... I have created a second patch that updates all the system functions to use the new lsn datatype introduced in the 1st patch (pg_start|stop_backup, replication_slot stuff, xlog diff things, etc.) and it is attached. This cleans up quite a bit of code in xlogfuncs.c because we do not need anymore the LSN - cstring transformations! I am also attaching a v2 of the first patch, I noticed that lsn_in introduced in the first patch was using some error messages not consistent with the ones of validate_xlog_location:xlogfuncs.c. Note as well that validate_xlog_location is removed in the 2nd patch where all the system functions are swicthed to the new datatype. Regards, -- Michael *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** *** 155,160 --- 155,166 /row row +entrytypelsn/type/entry +entry/entry +entryLog Sequence Number/entry + /row + + row entrytypemacaddr/type/entry entry/entry entryMAC (Media Access Control) address/entry *** *** 4502,4507 SELECT * FROM pg_attribute --- 4508,4527 /para /sect1 + sect1 id=datatype-lsn +titleacronymLSN/ Type/title + +indexterm zone=datatype-lsn + primaryLSN/primary +/indexterm + +para + The typelsn/type data type can be used to store LSN (Log Sequence + Number) data which is a pointer to a location in the XLOG. This type is a + representation of XLogRecPtr. +/para + /sect1 + sect1 id=datatype-pseudo titlePseudo-Types/title *** a/src/backend/utils/adt/Makefile --- b/src/backend/utils/adt/Makefile *** *** 20,26 OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ cash.o char.o date.o datetime.o datum.o domains.o \ enum.o float.o format_type.o \ geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \ ! lockfuncs.o misc.o nabstime.o name.o numeric.o numutils.o \ oid.o oracle_compat.o orderedsetaggs.o \ pseudotypes.o rangetypes.o rangetypes_gist.o \ rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \ --- 20,26 cash.o char.o date.o datetime.o datum.o domains.o \ enum.o float.o format_type.o \ geo_ops.o geo_selfuncs.o int.o int8.o json.o jsonfuncs.o like.o \ ! lockfuncs.o lsn.o misc.o nabstime.o name.o numeric.o numutils.o \ oid.o oracle_compat.o orderedsetaggs.o \ pseudotypes.o rangetypes.o rangetypes_gist.o \ rowtypes.o regexp.o regproc.o ruleutils.o selfuncs.o \ *** /dev/null --- b/src/backend/utils/adt/lsn.c *** *** 0 --- 1,180 + /*- + * + * lsn.c + * Internal LSN operations + * + * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/backend/utils/adt/lsn.c + * + *- + */ + #include postgres.h + + #include funcapi.h + #include libpq/pqformat.h + #include utils/builtins.h + #include utils/lsn.h + + #define MAXLSNLEN 17 + #define MAXLSNCOMPONENT 8 + + /*-- + * Formatting and conversion routines. + *-*/ + + Datum + lsn_in(PG_FUNCTION_ARGS) + { + char *str = PG_GETARG_CSTRING(0); + int len1, len2; + uint32 id, off; + XLogRecPtr result; + + /* Sanity check input format. */ + len1 = strspn(str, 0123456789abcdefABCDEF); + if (len1 1 || len1 MAXLSNCOMPONENT || str[len1] != '/') + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg(invalid input syntax for transaction log location: \%s\, str))); + len2 = strspn(str + len1 + 1, 0123456789abcdefABCDEF); + if (len2 1 || len2 MAXLSNCOMPONENT || str[len1 + 1 + len2] != '\0') + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg(invalid input syntax for transaction log location: \%s\, str))); + + /* Decode result. */ + id = (uint32) strtoul(str, NULL, 16); + off = (uint32) strtoul(str + len1 + 1, NULL, 16); + result = (XLogRecPtr) ((uint64) id 32) | off; + + PG_RETURN_LSN(result);
[HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 4:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: guaibasaurus doesn't like this patch: you need to be more careful about links, because the regression instructions are supposed to compile as a standalone document. I wonder if these standalone things are really worthwhile. The whole point of this, ten years ago, was that people who were trying to get started with PostgreSQL might not have had neither the doc toolchain nor convenient Internet access available. Plain text documentation was essential for getting off the ground. This seems much less relevant today; is the annoyance of not being able to use links everywhere really buying us anything at this point? That's a very fair question. It's a reasonable bet that pretty much nobody actually looks at the text versions of either HISTORY or regress_README anymore. It's conceivable that somebody somewhere makes use of the text version of INSTALL when trying to get PG going on some bare-bones platform ... but really, can't they look it up on the net? How'd they get the PG sources they're installing, anyway? I'm prepared to believe that these things are just dinosaurs now. However, at least in the case of the release notes, we'd have to kill them in all active branches not only HEAD, if we don't want surprises. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ That's interesting. The maximum number of what you see here (~293125) is markedly lower than what I can get. ... poke around ... Hm, that's partially because you're using pgbench without -M prepared if I see that correctly. The bottleneck in that case is primarily memory allocation. But even after that I am getting higher numbers: ~342497. Trying to nail down the differnce it oddly seems to be your max_connections=80 vs my 100. The profile in both cases is markedly different, way much more spinlock contention with 80. All in Pin/UnpinBuffer(). I updated this benchmark, with your BufferDescriptors alignment patch [1] applied on top of master (while still not using -M prepared in order to keep the numbers comparable). So once again, that's: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ It made a bigger, fairly noticeable difference, but not so big a difference as you describe here. Are you sure that you saw this kind of difference with only 64 clients, as you mentioned elsewhere [1] (perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously max_connections is still 80 in the above. Should I have gone past 64 clients to see the problem? The best numbers I see with the [1] patch applied on master is only ~327809 for -S 10 64 clients. Perhaps I've misunderstood. [1] Misaligned BufferDescriptors causing major performance problems on AMD: http://www.postgresql.org/message-id/20140202151319.gd32...@awork2.anarazel.de -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] memory usage of pg_upgrade
On Mon, Sep 9, 2013 at 07:39:00PM -0400, Bruce Momjian wrote: In the case of tablespaces, I should have thought you could keep a hash table of the names and just store an entry id in the table structure. But that's just my speculation without actually looking at the code, so don't take my word for it :-) Yes, please feel free to improve the code. I improved pg_upgrade CPU usage for a lerge number of objects, but never thought to look at memory usage. It would be a big win to just palloc/pfree the memory, rather than allocate tones of memory. If you don't get to it, I will in a few weeks. Thanks you for pointing out this problem. I have researched the cause and the major problem was that I was allocating the maximum path length in a struct rather than allocating just the length I needed, and was not reusing string pointers that I knew were not going to change. The updated attached patch significantly decreases memory consumption: tables orig patch % decrease 127,168 kB 27,168 kB0 1k 46,136 kB 27,920 kB 39 2k 65,224 kB 28,796 kB 56 4k 103,276 kB 30,472 kB 70 8k 179,512 kB 33,900 kB 81 16k 331,860 kB 40,788 kB 88 32k 636,544 kB 54,572 kB 91 64k 1,245,920 kB 81,876 kB 93 As you can see, a database with 64k tables shows a 93% decrease in memory use. I plan to apply this for PG 9.4. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index f04707f..acf8660 *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** create_script_for_old_cluster_deletion(c *** 707,726 fprintf(script, \n); /* remove PG_VERSION? */ if (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ! fprintf(script, RM_CMD %s%s%cPG_VERSION\n, fix_path_separator(os_info.old_tablespaces[tblnum]), - fix_path_separator(old_cluster.tablespace_suffix), PATH_SEPARATOR); for (dbnum = 0; dbnum old_cluster.dbarr.ndbs; dbnum++) ! { ! fprintf(script, RMDIR_CMD %s%s%c%d\n, fix_path_separator(os_info.old_tablespaces[tblnum]), - fix_path_separator(old_cluster.tablespace_suffix), PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid); - } } else /* * Simply delete the tablespace directory, which might be .old --- 707,724 fprintf(script, \n); /* remove PG_VERSION? */ if (GET_MAJOR_VERSION(old_cluster.major_version) = 804) ! fprintf(script, RM_CMD %s%cPG_VERSION\n, fix_path_separator(os_info.old_tablespaces[tblnum]), PATH_SEPARATOR); for (dbnum = 0; dbnum old_cluster.dbarr.ndbs; dbnum++) ! fprintf(script, RMDIR_CMD %s%c%d\n, fix_path_separator(os_info.old_tablespaces[tblnum]), PATH_SEPARATOR, old_cluster.dbarr.dbs[dbnum].db_oid); } else + { + char *suffix_path = pg_strdup(old_cluster.tablespace_suffix); /* * Simply delete the tablespace directory, which might be .old *** create_script_for_old_cluster_deletion(c *** 728,734 */ fprintf(script, RMDIR_CMD %s%s\n, fix_path_separator(os_info.old_tablespaces[tblnum]), ! fix_path_separator(old_cluster.tablespace_suffix)); } fclose(script); --- 726,734 */ fprintf(script, RMDIR_CMD %s%s\n, fix_path_separator(os_info.old_tablespaces[tblnum]), ! fix_path_separator(suffix_path)); ! pfree(suffix_path); ! } } fclose(script); diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c new file mode 100644 index 3d0dd1a..fd083de *** a/contrib/pg_upgrade/info.c --- b/contrib/pg_upgrade/info.c *** create_rel_filename_map(const char *old_ *** 108,127 * relation belongs to the default tablespace, hence relfiles should * exist in the data directories. */ ! strlcpy(map-old_tablespace, old_data, sizeof(map-old_tablespace)); ! strlcpy(map-new_tablespace, new_data, sizeof(map-new_tablespace)); ! strlcpy(map-old_tablespace_suffix, /base, sizeof(map-old_tablespace_suffix)); ! strlcpy(map-new_tablespace_suffix, /base, sizeof(map-new_tablespace_suffix)); } else { /* relation belongs to a tablespace, so use the tablespace location */ ! strlcpy(map-old_tablespace, old_rel-tablespace, sizeof(map-old_tablespace)); ! strlcpy(map-new_tablespace, new_rel-tablespace, sizeof(map-new_tablespace)); ! strlcpy(map-old_tablespace_suffix, old_cluster.tablespace_suffix, ! sizeof(map-old_tablespace_suffix)); ! strlcpy(map-new_tablespace_suffix, new_cluster.tablespace_suffix, ! sizeof(map-new_tablespace_suffix));
Re: [HACKERS] Regression tests failing if not launched on db regression
On Tue, Feb 4, 2014 at 12:49 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 31, 2014 at 5:48 PM, Michael Paquier michael.paqu...@gmail.com wrote: The part about the planning parameter looks good, thanks. The places used to mention the databases used also makes more sense. Thanks for your input. OK, committed and back-patched to 9.3. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR
On 02/01/2014 09:53 PM, MauMau wrote: Not all users use PostgreSQL built by EnterpriseDB, so I think src\tools\msvc\build.bat should produce modules that are not affected by ASLR. I completely agree; just saying that any installer can set the key. I'm convinced that setting this flag is appropriate, at least while Pg relies on having the shared memory segment mapped in the same zone in every executable. Just pointing out that there's a workaround in the mean time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prohibit row-security + inheritance in 9.4?
On 01/31/2014 11:24 PM, Yeb Havinga wrote: On 2014-01-31 16:05, Stephen Frost wrote: * Yeb Havinga (y.t.havi...@mgrid.net) wrote: IMHO, there is another way to implement this, other than the procedure to override the child-rel-quals with the ones from the parent. At DDL time, synchronize quals on the parent with rls quals of the childs. Isn't this also what happens with constraints? No, we're not going to do that. We don't do it for GRANT and I don't think it makes sense to do it here. This reasoning could go either way. GRANT is on a complete set of rows. This is a restriction on the level of individual rows, and in that sense, it is more like a row-level CHECK constraint. If we wanted to make them the same then we'd throw out the ability to do any kind of changes or actions on the child and then we'd have actual partitioning. We don't have that though, we have inheiritance. I fail to understand this, probably because I do not have a partition use case for inheritance, but rather an information model that is more ontology like. The more specific childs get down the inheritance tree, more columns they get, and their requirements might differ completely in nature from their siblings, and make no sense at all as well when specified at the level of the parent (or even impossible, since the parent does not have all the columns). That's a bit inconsistent with how Pg's inheritance works, though. Conceptually rows in children *are* part of the parent. You cannot see the child columns when querying via the parent, so it's not a problem to constrain the ability to see the extra child columns with row-security. They can only be accessed when querying the child directly, where the child's row-security expression will apply. So you're not exposing information that's specific to the children, and the inherited common cols are, conceptually *part* of the parent, so applying the parent's qual makes sense. Then during expansion of the range table, no code is needed to ignore child rls quals and copy parent rels to child rels. This is what's already implemented and isn't a huge amount of code to begin with, so I don't see this as being an argument against having the flexibility. It would seem to me that any additional logic that can be avoided during planning is a good thing. Also, the argument that something is already implemented, does not itself make it good to commit. The implementation with the minimum of required logic and complexity is apply the parent's row-security quals to the children, don't check children for quals. Any special handling of child rels creates exciting challenges because inheritance expansion happens _after_ a bunch of the query planner and optimizer has run. Injecting new expression trees at this point is messy, especially if you want those expression trees to in turn contain row-security qualified tables, inheritance, etc. As previously discussed, applying the parent qual to children ensures that what's visible when querying a relation that has children is consistent with its row-security qual. If the parent qual is applied only on the parent rel directly, not children, then querying the parent could emit rows not permitted by the parent's row-security qual. I'm not happy with that, and as Stephen poined out upthread, it isn't really consistent with how permissions work with inheritance otherwise. If instead the parent qual is applied to the parent and all children, and you *also* add the quals of each child, you get a complex, hard to debug, hard to optimize mess. You also run back into the problem mentioned above, that adding quals after inhertiance expansion is messy and problematic. It's also really ugly in what's going to be the most common case, where the child quals are the same as the parent quals, as you'll get nested identical quals. Despite the challenges with it, I think that's the least insane way to respect child quals on parents. It has pretty much zero chance of happening in 9.4, though; the discussed approach of building row-security on top of updatable security barrier views doesn't play well with adding inheritance on inheritance-expanded children. One answer for that would be to keep inheritance as it is for 9.4 (if I can get the remaining issues sorted out) and in 9.5, if possible, allow the addition of a row-security qual that, if it appears on a child rel during inheritance expansion, _is_ expanded. At least, if it proves necessary, which I'm not entirely convinced of. I suggested it as a solution for a requirement worded upthread as It makes absolutely zero sense, in my head anyway, to have rows returned when querying the parent which should NOT be returned based on the quals of the parent. without disregarding rls-quals on childs. I'm not sure I understand what you are saying here. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On Mon, Feb 3, 2014 at 8:53 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Simon, On 03/02/14 10:43, Simon Riggs wrote: Singular: The following process is holding the lock: A. The request queue consists of: B. Plural: Following processes are holding the lock: A, B. The request queue consists of: C. Seems too complex. How about this... Lock holder(s): A. Lock waiter(s) B Lock holder(s): A, B. Lock waiter(s) C This is basically the same as before, it is even shorter. The complaint was that I don't use a whole sentence in this error detail. Won't the change fulfill the same complaint? To be honest, I'd like to stick with your earlier proposal: Singular: Process holding the lock: A. Request queue: B Plural: Processes holding the lock: A, B. Request queue: C, D This seems to be a good trade-off between project guidelines, readability and parsability. ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Inconsistency between pg_stat_activity and log_duration
I found an interesting inconsistency between pg_stat_activity and log_duration. -[ RECORD 1 ]+--- datid| 16392 datname | test pid | 21815 usesysid | 10 usename | t-ishii application_name | psql client_addr | client_hostname | client_port | -1 backend_start| 2014-02-04 12:46:55.178688+09 xact_start | 2014-02-04 12:47:27.210976+09 query_start | 2014-02-04 12:47:27.210976+09 state_change | 2014-02-04 12:47:27.210981+09 waiting | f state| active query| select * from pg_stat_activity; [snip] -[ RECORD 3 ]+--- datid| 16392 datname | test pid | 21850 usesysid | 10 usename | t-ishii application_name | pgbench client_addr | client_hostname | client_port | -1 backend_start| 2014-02-04 12:47:11.233245+09 xact_start | 2014-02-04 12:47:11.235236+09 query_start | 2014-02-04 12:47:11.241084+09 state_change | 2014-02-04 12:47:11.241085+09 waiting | f state| active query| SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u' As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT count(*) FROM pg_catalog.pg_class... is active and it seems still running. On the other side, Here is an excerpt from PostgreSQL log: 21850 2014-02-04 12:47:11.241 JST LOG: execute pgpool21805/pgpool21805: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u' 21850 2014-02-04 12:47:11.241 JST LOG: duration: 0.078 ms The duration was shown as 0.078 ms thus it seems the query has been already finished. The reason why pg_stat_activity thinks that the query in question is, sync message has not been sent to the backend yet (at least from what I read from postgres.c). I think this inconsistency is not very intutive to users... Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers on foreign tables
On Sun, Feb 02, 2014 at 11:53:51AM +0100, Ronan Dunklau wrote: Le jeudi 30 janvier 2014 14:05:08 Noah Misch a écrit : On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote: What do you think about this approach ? Is there something I missed which would make it not sustainable ? Seems basically reasonable. I foresee multiple advantages from having one tuplestore per query level as opposed to one for the entire transaction. You would remove the performance trap of backing up the tuplestore by rescanning. It permits reclaiming memory and disk space in AfterTriggerEndQuery() rather than at end of transaction. You could remove ate_ptr1 and ate_ptr2 from AfterTriggerEventDataFDW and just store the flags word: depending on AFTER_TRIGGER_2CTIDS, grab either the next one or the next two tuples from the tuplestore. Using work_mem per AfterTriggerBeginQuery() instead of per transaction is no problem. What do you think of that design change? I agree that this design is better, but I have some objections. We can remove ate_ptr2 and rely on the AFTER_TRIGGER_2CTIDS flag, but the rescanning and ate_ptr1 (renamed ate_tupleindex in the attached patch) can't go away. Consider for example the case of a foreign table with more than one AFTER UPDATE triggers. Unless we store the tuples once for each trigger, we will have to rescan the tuplestore. Will we? Within a given query level, when do (non-deferred) triggers execute in an order other than the enqueue order? To mitigate the effects of this behaviour, I added the option to perform a reverse_seek when the looked-up tuple is nearer from the current index than from the start. If there's still a need to seek within the tuplestore, that should get rid of the O(n^2) effect. I'm hoping that per-query-level tuplestores will eliminate the need to seek entirely. If you do pursue that change, make sure the code still does the right thing when it drops queued entries during subxact abort. I don't really understand what should be done at that stage. Since triggers on foreign tables are not allowed to be deferred, everything should be cleaned up at the end of each query, right ? So, there shouldn't be any queued entries. I suspect that's right. If you haven't looked over AfterTriggerEndSubXact(), please do so and ensure all its actions still make sense in the context of this new kind of trigger storage. The attached patch checks this, and add documentation for this limitation. I'm not really sure about how to phrase that correctly in the error message and the documentation. One can store at most INT_MAX foreign tuples, which means that at most INT_MAX insert or delete or half-updates can occur. By half-updates, I mean that for update two tuples are stored whereas in contrast to only one for insert and delete trigger. Besides, I don't know where this disclaimer should be in the documentation. Any advice here ? I wouldn't mention that limitation. Maybe it shouldn't be so prominent, but I still think a note somewhere couldn't hurt. Perhaps. There's not much documentation of such implementation upper limits, and there's no usage of INT_MAX outside of parts that discuss writing C code. I'm not much of a visionary when it comes to the documentation; I try to document new things with an amount of detail similar to old features. Should the use of work_mem be documented somewhere, too ? I wouldn't. Most uses of work_mem are undocumented, even relatively major ones like count(DISTINCT ...) and CTEs. So, while I'd generally favor a patch documenting all/most of the things that use work_mem, it would be odd to document one new consumer apart from the others. This is the performance trap I mentioned above; it brings potential O(n^2) complexity to certain AFTER trigger execution scenarios. What scenarios do you have in mind ? I fixed the problem when there are multiple triggers on a foreign table, do you have any other one ? I'm not aware of such a performance trap in your latest design. For what it's worth, I don't think multiple triggers were ever a problem. In the previous design, a problem arose if you had a scenario like this: Query level 1: queue one million events ... Repeat this section many times: Query level 2: queue one event Query level 3: queue one event Query level 3: execute events Query level 2: execute events -- had to advance through the 1M stored events ... Query level 1: execute events Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Hello, I have often seen inquiries about an log message from PostgreSQL server. LOG: could not create IPv6 socket: Address family not supported by protocol This is emitted on ipv6-disabled environment which was available for me by the following steps on CentOS 6.5, - Add 'NETWORKING_IPV6=no' into /etc/sysconfig/network - Create /etc/modprobe.d/disable-ipv6.conf with the following content. options net-pf-10 off install ipv6 /bin/true - Comment out the entry for ::1 in /etc/hosts - Reboot. - [Confirm that ip a and ifconfig -a don't show ipv6 addrs] The cause is that the server collects available listen addresses by getaddrinfo with hint.ai_flags not having AI_ADDRCONFIG set. pqcomm.c:299 in function StreamServerPort hint.ai_family = family; ! hint.ai_flags = AI_PASSIVE; hint.ai_socktype = SOCK_STREAM; The man page of getaddrinfo says that AI_ADDRCONFIG would help and I saw it actually did. Well, I excavated some discussions about this option from ancient pgsql-hackers, a decade ago. http://www.postgresql.org/message-id/20030703204355.ga12...@ping.be This looks a little broken behaviour to me. My guess is that it returns an AF_INET6 socket but doesn't support it in the kernel. In that case with the AI_ADDRCONFIG option it shouldn't have returned that address. The question is wether your getaddrinfo() supports that option, and wether it's working or not. I suppose AI_ADDRCONFIG is a quite obscure option at the time, but the time seems to have come when we can use this option. man getaddrinfo NOTES getaddrinfo() supports the address%scope-id notation for specifying the IPv6 scope-ID. AI_ADDRCONFIG, AI_ALL, and AI_V4MAPPED are available since glibc 2.3.3. AI_NUMERICSERV is available since glibc 2.3.4. RHEL/CentOS 4.9 and after seems satisfies the condition. I don't know about other platforms but it is enough to define it as 0 if not defined. In addition, it would not be a problem even if the option doesn't work as expected because the error handling code added at the time the above conversation took place would act as the last resort. There would be no way it can work destructively. Attached patch does this. Any suggestions ? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 2b24793..121f70b 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -297,7 +297,7 @@ StreamServerPort(int family, char *hostName, unsigned short portNumber, /* Initialize hint structure */ MemSet(hint, 0, sizeof(hint)); hint.ai_family = family; - hint.ai_flags = AI_PASSIVE; + hint.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; hint.ai_socktype = SOCK_STREAM; #ifdef HAVE_UNIX_SOCKETS diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 305d126..1dd7f2a 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -344,7 +344,7 @@ pgstat_init(void) /* * Create the UDP socket for sending and receiving statistic messages */ - hints.ai_flags = AI_PASSIVE; + hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_DGRAM; hints.ai_protocol = 0; diff --git a/src/include/getaddrinfo.h b/src/include/getaddrinfo.h index 6192d1f..685d922 100644 --- a/src/include/getaddrinfo.h +++ b/src/include/getaddrinfo.h @@ -64,6 +64,11 @@ #define AI_PASSIVE 0x0001 #endif +#ifndef AI_ADDRCONFIG +/* Works as NOP if AI_ADDRCONFIG is not defined */ +#define AI_ADDRCONFIG 0x +#endif + #ifndef AI_NUMERICHOST /* * some platforms don't support AI_NUMERICHOST; define as zero if using -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On Mon, Feb 03, 2014 at 08:48:06PM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 4:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: guaibasaurus doesn't like this patch: you need to be more careful about links, because the regression instructions are supposed to compile as a standalone document. I wonder if these standalone things are really worthwhile. The whole point of this, ten years ago, was that people who were trying to get started with PostgreSQL might not have had neither the doc toolchain nor convenient Internet access available. Plain text documentation was essential for getting off the ground. This seems much less relevant today; is the annoyance of not being able to use links everywhere really buying us anything at this point? That's a very fair question. It's a reasonable bet that pretty much nobody actually looks at the text versions of either HISTORY or regress_README anymore. It's conceivable that somebody somewhere makes use of the text version of INSTALL when trying to get PG going on some bare-bones platform ... but really, can't they look it up on the net? How'd they get the PG sources they're installing, anyway? I sometimes read text-based INSTALL files when building other projects, but a tiny file giving a URL is almost as good. (The other two generated files do seem much less important.) I'm prepared to believe that these things are just dinosaurs now. However, at least in the case of the release notes, we'd have to kill them in all active branches not only HEAD, if we don't want surprises. I wonder how difficult it would be to make sufficient link data available when building the standalone files. There would be no linking per se; we would just need the referent's text fragment emitted where the xref tag appears. For example, have a stylesheet that produces a file bearing all link IDs and titles appearing anywhere in the documentation. Let the standalone builds include that file. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Feb 3, 2014 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: Also as per below link, it seems that PGDLLIMPORT is required for exported global variables. http://msdn.microsoft.com/en-us/library/8fskxacy(v=vs.80).aspx That was what we'd always assumed, but the fact that postgres_fdw has been working for the last year (on everything except narwhal) puts the lie to that as a blanket assumption. So what I want to know now is how come it wasn't failing; because then we might be able to exploit that to eliminate the need for the PGDLLIMPORT cruft everywhere. Today, I had tried to debug the same scenario on Win XP 32-bit, but the value for DateStyle is junk on that m/c as well. Is it possible that someway we can check on buildfarm m/c (Windows) where regression test is passing what is the value of DateStyle. We can use something like attached patch to print the value in log. In my test, it prints the values as below: Without using PGDLLIMPORT for DateStyle LOG: Value of DateStyle is: 326903295 Using PGDLLIMPORT for DateStyle LOG: Value of DateStyle is: 1 In the function where it is used, it seems to me that it is setting DateStyle as ISO if it is not ISO and function configure_remote_session() will set it to ISO initially. So basically even if the value of DateStyle is junk, it will just do what we wanted i.e setting it to ISO. Does the failure is due to reason that it is not expecting DateStyle to be ISO and due to junk value of this parameter it sets the same to ISO. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com elog_for_datestyle.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
Noah Misch n...@leadboat.com writes: Robert Haas robertmh...@gmail.com writes: I wonder if these standalone things are really worthwhile. I wonder how difficult it would be to make sufficient link data available when building the standalone files. There would be no linking per se; we would just need the referent's text fragment emitted where the xref tag appears. IIRC, that's basically what the workaround is, except it's not very automated. Even if it were automated, though, there's still a problem: such links aren't really *useful* in flat text format. I think that forcing the author to actually think about what to put there in the flat text version is a good thing, if we're going to retain the flat text version at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 01/30/2014 04:05 PM, Craig Ringer wrote: On 01/30/2014 01:25 PM, Craig Ringer wrote: On 01/29/2014 09:47 PM, Craig Ringer wrote: https://github.com/ringerc/postgres/compare/rls-9.4-upd-sb-views i.e. https://github.com/ringerc/postgres.git , branch rls-9.4-upd-sb-views (subject to rebasing) or the non-rebased tag rls-9.4-upd-sb-views-v2 Pushed an update to the branch. New update tagged rls-9.4-upd-sb-views-v3 . Fixes an issue with rowmarking that stems from the underlying updatable s.b. views patch. Other tests continue to fail, this isn't ready yet. Specifically: - row-security rule recursion detection isn't solved yet, it just overflows the stack. This is now fixed in the newly tagged rls-9.4-upd-sb-views-v4 in g...@github.com:ringerc/postgres.git . I landed up adding a field to RangeTblEntry that keeps track of all the oids of relations row-security expanded to produce this RTE. When testing an RTE for row-security policy, this list is checked to see if the oid of the relation being expanded is already on the list. The list is copied to all RTEs inside sublinks after a relation is expanded using a query_tree_walker. - COPY doesn't know anything about row-security COPY will ERROR on row-security rels in rls-9.4-upd-sb-views-v4. I'm looking at integrating the approach Kohei KaiGai took in the original patch now, then I'll be moving on to: - I'm just starting to chase some odd errors in the tests, ERROR: failed to find unique expression in subplan tlist and ERROR: could not open file base/16384/30070: No such file or directory. Their cause/origin is not yet known, but they're specific to when row-security policy is being applied. I was hoping these would be fixed by solving the recursion issues, but that wasn not the case. Input/comments would be appreciated. I haven't looked into this yet. - policies based on current_user don't remember current_user when rows are pulled from refcursor returned by a security definer function. This is actually a separate, existing bug, or surprising behaviour. I'd like to address it, but it's really a separate patch. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
Hello, Now I got workable dll thanks for your advice. I think both the problems are related and the reason is that dsm_demo.dll is not built properly. Let us first try to solve your second problem, because I think if that is solved, you will not face problem-1. Thank you for kindness. I got the situation after successfully getting correct dll by using .def file after your advice. cl needs __declspec(dllexport) in the symbol definitions to reveal them externally, without using .def file. PostgreSQL platform(?) seems offering a macro PGDLLEXPORT for such use. I suppose this should be used in extension module dlls to expose symbols, like this, - void _PG_init(void); - Datum dsm_demo_create(PG_FUNCTION_ARGS); - Datum dsm_demo_read(PG_FUNCTION_ARGS); === + PGDLLEXPORT void _PG_init(void); + PGDLLEXPORT Datum dsm_demo_create(PG_FUNCTION_ARGS); + PGDLLEXPORT Datum dsm_demo_read(PG_FUNCTION_ARGS); # This hardly seems to be used commonly... I followed this instruction to make build environemnt, http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows/ And the change above enables us to build this module without .def file. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Hello, I have often seen inquiries about an log message from PostgreSQL server. LOG: could not create IPv6 socket: Address family not supported by protocol That's merely a harmless log message. - hints.ai_flags = AI_PASSIVE; + hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; This, on the other hand, might actively break things. It did when we had it in before (cf the thread you link to and commit df63503dc). I don't have any faith that systems on which it is broken have vanished from the face of the earth. One good reason not to trust this too much is that getaddrinfo() is fundamentally a userspace DNS access function, and as such it has no very good way to know if there's currently an IPv4 or IPv6 interface configured on the local system. At minimum there are obvious race conditions in that. If we're concerned about users worrying about log messages from this, I'd rather see us downgrade those log messages to DEBUG level than risk breaking the code with behaviors that were proven to be a bad idea a decade ago. But TBH I see no strong need to do anything here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Amit Kapila amit.kapil...@gmail.com writes: In the function where it is used, it seems to me that it is setting DateStyle as ISO if it is not ISO and function configure_remote_session() will set it to ISO initially. So basically even if the value of DateStyle is junk, it will just do what we wanted i.e setting it to ISO. Does the failure is due to reason that it is not expecting DateStyle to be ISO and due to junk value of this parameter it sets the same to ISO. Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
Craig Ringer cr...@2ndquadrant.com writes: I landed up adding a field to RangeTblEntry that keeps track of all the oids of relations row-security expanded to produce this RTE. When testing an RTE for row-security policy, this list is checked to see if the oid of the relation being expanded is already on the list. The list is copied to all RTEs inside sublinks after a relation is expanded using a query_tree_walker. That's impossibly ugly, both as to the idea that an RTE is a mutable data structure for this processing, and as to having to copy the list around (meaning that you can't even claim to have one source of truth for the state...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/04/2014 03:14 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: I landed up adding a field to RangeTblEntry that keeps track of all the oids of relations row-security expanded to produce this RTE. When testing an RTE for row-security policy, this list is checked to see if the oid of the relation being expanded is already on the list. The list is copied to all RTEs inside sublinks after a relation is expanded using a query_tree_walker. That's impossibly ugly, both as to the idea that an RTE is a mutable data structure for this processing, and as to having to copy the list around (meaning that you can't even claim to have one source of truth for the state...) I see. Do you have any suggestion about how this might be approached in a manner that would be more satisfactory? It's not strictly necessary to duplicate the parent list when annotating the RTEs in an expanded row-security qual; I did so only out of concern that an object of indeterminate/shared ownership may not be considered desirable. The only time it's strictly neccessary to copy the parent list is when a new row-security qual is being expanded. In this case it cannot be shared, because there may be multiple relations with row-security applied, and a separate path is required for each. If the list is shared, infinite recursion is falsely reported in cases like: rel a has a qual referring to b rel b has a qual referring to c rel c has a qual referring to d select ... from a inner join b on ... There's no infinite recursion there, but a simple approach that keeps a global list of expanded quals will think there is as it'll see b and c expanded twice. So whenever you expand a qual, you have to fork the parent list, copying it. The approach used in the rewriter won't work here, because the rewriter eagerly depth-first recurses when expanding things. In the optimizer, all securityQuals get expanded in one pass, _then_ sublink processing expands any added sublinks in a separate pass. It's possible the parent list could be associated with the Query that's produced by securityQual expansion instead, but at the time you're expanding row-security on any RTEs within that you don't necessarily have access to the Query node that might be a couple of subquery levels up, since the security qual can be an arbitrary expression. I looked at keeping a structure in PlannerGlobal that tracked the chain of row-security qual expansion, but I couldn't figure out a way to cleanly tell what branch of the query a given RTE that's being expanded was in, and thus how to walk the global tree to check for infinite recursion. The only alternative I see is to immediately and recursively expand row-security entries within subqueries using a walker, as soon as the initial row-security qual is expanded on the top level rel. I'm concerned about code duplication when following that path, since that's pretty much what's already happening with sublink expansion, just using the existing execution paths. If I'm right, and immediate recursive expansion isn't an acceptable alternative, any ideas on how the row-security expansion breadcrumb trail might be stored in a more appropriate manner and accessed from where they're needed? Note that it's trivial to prevent simple, direct recursion. Preventing mutual recursion without also breaking non-recursive cases where rels are used in totally different parts of the query is harder. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Tue, Feb 4, 2014 at 12:25 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, Now I got workable dll thanks for your advice. I think both the problems are related and the reason is that dsm_demo.dll is not built properly. Let us first try to solve your second problem, because I think if that is solved, you will not face problem-1. Thank you for kindness. I got the situation after successfully getting correct dll by using .def file after your advice. cl needs __declspec(dllexport) in the symbol definitions to reveal them externally, without using .def file. PostgreSQL platform(?) seems offering a macro PGDLLEXPORT for such use. I suppose this should be used in extension module dlls to expose symbols, like this, - void _PG_init(void); - Datum dsm_demo_create(PG_FUNCTION_ARGS); - Datum dsm_demo_read(PG_FUNCTION_ARGS); === + PGDLLEXPORT void _PG_init(void); + PGDLLEXPORT Datum dsm_demo_create(PG_FUNCTION_ARGS); + PGDLLEXPORT Datum dsm_demo_read(PG_FUNCTION_ARGS); # This hardly seems to be used commonly... Yeah, for functions we mainly believe to export using .def file only and so is the case for this module. Anyway this is just a test module so if things works for you by changing the above way, its fine. However I wonder why its not generating .def file for you. I followed this instruction to make build environemnt, http://blog.2ndquadrant.com/compiling-postgresql-extensions-visual-studio-windows/ And the change above enables us to build this module without .def file. Okay, you can complete your test in the way with which you are able to successfully build it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers