Re: [HACKERS] narwhal and PGDLLIMPORT
On Tue, Feb 4, 2014 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: In the function where it is used, it seems to me that it is setting DateStyle as ISO if it is not ISO and function configure_remote_session() will set it to ISO initially. So basically even if the value of DateStyle is junk, it will just do what we wanted i.e setting it to ISO. Does the failure is due to reason that it is not expecting DateStyle to be ISO and due to junk value of this parameter it sets the same to ISO. Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. Thats true, but for me its failing as MainLWLockArray contains Junk value. Now the point to look out is why its passing on some of the build farm m/c's. I will study about this more, if you have anything specific in mind then, do let me know. Can I get the details of m/c env on which it is passing like which windows version and msvc version? It might give some clue. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi, On 04/02/14 12:38, Fujii Masao wrote: ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Sounds reasonable to me. Attached patch changes messages to the following: Process holding the lock: A. Wait queue: B. Processes holding the lock: A, B. Wait queue: C. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index 881b0c3..aa20807 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -251,6 +251,15 @@ ereport(ERROR, /listitem listitem para + functionerrdetail_log_plural(const char *fmt_singuar, const char + *fmt_plural, unsigned long n, ...)/function is like + functionerrdetail_log/, but with support for various plural forms of + the message. + For more information see xref linkend=nls-guidelines. +/para + /listitem + listitem +para functionerrhint(const char *msg, ...)/function supplies an optional quotehint/ message; this is to be used when offering suggestions about how to fix the problem, as opposed to factual details about diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index fb449a8..9620f6e 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1209,13 +1209,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ if (log_lock_waits deadlock_state != DS_NOT_YET_CHECKED) { - StringInfoData buf; + StringInfoData buf, + lock_waiters_sbuf, + lock_holders_sbuf; const char *modename; long secs; int usecs; long msecs; + SHM_QUEUE *procLocks; + PROCLOCK *proclock; + bool first_holder = true, + first_waiter = true; + int lockHoldersNum = 0; initStringInfo(buf); + initStringInfo(lock_waiters_sbuf); + initStringInfo(lock_holders_sbuf); + DescribeLockTag(buf, locallock-tag.lock); modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid, lockmode); @@ -1225,10 +1235,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) msecs = secs * 1000 + usecs / 1000; usecs = usecs % 1000; + /* + * we loop over the lock's procLocks to gather a list of all + * holders and waiters. Thus we will be able to provide more + * detailed information for lock debugging purposes. + * + * lock-procLocks contains all processes which hold or wait for + * this lock. + */ + + LWLockAcquire(partitionLock, LW_SHARED); + + procLocks = (lock-procLocks); + proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, + offsetof(PROCLOCK, lockLink)); + + while (proclock) + { +/* + * we are a waiter if myProc-waitProcLock == proclock; we are + * a holder if it is NULL or something different + */ +if (proclock-tag.myProc-waitProcLock == proclock) +{ + if (first_waiter) + { + appendStringInfo(lock_waiters_sbuf, %d, + proclock-tag.myProc-pid); + first_waiter = false; + } + else + appendStringInfo(lock_waiters_sbuf, , %d, + proclock-tag.myProc-pid); +} +else +{ + if (first_holder) + { + appendStringInfo(lock_holders_sbuf, %d, + proclock-tag.myProc-pid); + first_holder = false; + } + else + appendStringInfo(lock_holders_sbuf, , %d, + proclock-tag.myProc-pid); + + lockHoldersNum++; +} + +proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink, + offsetof(PROCLOCK, lockLink)); + } + + LWLockRelease(partitionLock); + if (deadlock_state == DS_SOFT_DEADLOCK) ereport(LOG, (errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Wait queue: %s., + Processes holding the lock: %s. Wait queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; else if (deadlock_state == DS_HARD_DEADLOCK) { /* @@ -1240,13 +1307,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ ereport(LOG, (errmsg(process %d detected deadlock while waiting for %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Wait queue: %s., + Processes holding the lock: %s. Wait queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; } if (myWaitStatus == STATUS_WAITING) ereport(LOG, (errmsg(process %d still waiting for
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-02-03 17:51:20 -0800, Peter Geoghegan wrote: On Sun, Feb 2, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote: Here are the results of a benchmark on Nathan Boley's 64-core, 4 socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ That's interesting. The maximum number of what you see here (~293125) is markedly lower than what I can get. ... poke around ... Hm, that's partially because you're using pgbench without -M prepared if I see that correctly. The bottleneck in that case is primarily memory allocation. But even after that I am getting higher numbers: ~342497. Trying to nail down the differnce it oddly seems to be your max_connections=80 vs my 100. The profile in both cases is markedly different, way much more spinlock contention with 80. All in Pin/UnpinBuffer(). I updated this benchmark, with your BufferDescriptors alignment patch [1] applied on top of master (while still not using -M prepared in order to keep the numbers comparable). So once again, that's: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/ It made a bigger, fairly noticeable difference, but not so big a difference as you describe here. Are you sure that you saw this kind of difference with only 64 clients, as you mentioned elsewhere [1] (perhaps you fat-fingered [1] -- -cj is ambiguous)? Obviously max_connections is still 80 in the above. Should I have gone past 64 clients to see the problem? The best numbers I see with the [1] patch applied on master is only ~327809 for -S 10 64 clients. Perhaps I've misunderstood. That's likely -M prepared. It was with -c 64 -j 64... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 4th February 2014, Christian kruse Wrote: On 04/02/14 12:38, Fujii Masao wrote: ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Sounds reasonable to me. Attached patch changes messages to the following: Process holding the lock: A. Wait queue: B. Processes holding the lock: A, B. Wait queue: C. This looks good to me also. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
Hi, On 2014-02-04 10:23:14 +0900, Michael Paquier wrote: On Tue, Feb 4, 2014 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: Please find attached a patch implementing lsn as a datatype, based on the one Robert wrote a couple of years ago. Patch contains regression tests as well as a bit of documentation. Perhaps this is too late for 9.4, so if there are no objections I'll simply add this patch to the next commit fest in June for 9.5. I may have lost count, but aren't a bunch of the affected functions new in 9.4? If so, there's a good argument to be made that we should get this in now, rather than waiting and having an API change for those functions in 9.5. Yes, that sounds sensible. + /*-- + * Relational operators for LSNs + *-*/ Isn't it just operators? They aren't really relational... *** 302,307 extern struct varlena *pg_detoast_datum_packed(struct varlena * datum); --- 303,309 #define PG_RETURN_CHAR(x)return CharGetDatum(x) #define PG_RETURN_BOOL(x)return BoolGetDatum(x) #define PG_RETURN_OID(x) return ObjectIdGetDatum(x) + #define PG_RETURN_LSN(x) return LogSeqNumGetDatum(x) #define PG_RETURN_POINTER(x) return PointerGetDatum(x) #define PG_RETURN_CSTRING(x) return CStringGetDatum(x) #define PG_RETURN_NAME(x)return NameGetDatum(x) *** a/src/include/postgres.h --- b/src/include/postgres.h *** *** 484,489 typedef Datum *DatumPtr; --- 484,503 #define ObjectIdGetDatum(X) ((Datum) SET_4_BYTES(X)) /* + * DatumGetLogSeqNum + * Returns log sequence number of a datum. + */ + + #define DatumGetLogSeqNum(X) ((XLogRecPtr) GET_8_BYTES(X)) I am not a fan of LogSegNum. I think at this point fewer people understand that than LSN. There's also no reason to invent a third term for LSNs. We'd have LSN, XLogRecPtr, and LogSeqNum. *** a/src/backend/replication/slotfuncs.c --- b/src/backend/replication/slotfuncs.c *** *** 141,148 pg_get_replication_slots(PG_FUNCTION_ARGS) boolactive; Oid database; const char *slot_name; - - charrestart_lsn_s[MAXFNAMELEN]; int i; SpinLockAcquire(slot-mutex); --- 141,146 Unrelated change. Looks reasonable on a first look. Thanks! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-02-04 02:10:47 -0500, Tom Lane wrote: Amit Kapila amit.kapil...@gmail.com writes: In the function where it is used, it seems to me that it is setting DateStyle as ISO if it is not ISO and function configure_remote_session() will set it to ISO initially. So basically even if the value of DateStyle is junk, it will just do what we wanted i.e setting it to ISO. Does the failure is due to reason that it is not expecting DateStyle to be ISO and due to junk value of this parameter it sets the same to ISO. Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. There's not that much lwlock usage in contrib. It's just pg_stat_statements and pg_buffercache. Neither has tests... So it very well could be that breakage simply hasn't been observed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Tue, Feb 4, 2014 at 6:15 PM, Andres Freund and...@2ndquadrant.com wrote: + /*-- + * Relational operators for LSNs + *-*/ Isn't it just operators? They aren't really relational... Operators for LSNs? *** 302,307 extern struct varlena *pg_detoast_datum_packed(struct varlena * datum); --- 303,309 #define PG_RETURN_CHAR(x)return CharGetDatum(x) #define PG_RETURN_BOOL(x)return BoolGetDatum(x) #define PG_RETURN_OID(x) return ObjectIdGetDatum(x) + #define PG_RETURN_LSN(x) return LogSeqNumGetDatum(x) #define PG_RETURN_POINTER(x) return PointerGetDatum(x) #define PG_RETURN_CSTRING(x) return CStringGetDatum(x) #define PG_RETURN_NAME(x)return NameGetDatum(x) *** a/src/include/postgres.h --- b/src/include/postgres.h *** *** 484,489 typedef Datum *DatumPtr; --- 484,503 #define ObjectIdGetDatum(X) ((Datum) SET_4_BYTES(X)) /* + * DatumGetLogSeqNum + * Returns log sequence number of a datum. + */ + + #define DatumGetLogSeqNum(X) ((XLogRecPtr) GET_8_BYTES(X)) I am not a fan of LogSegNum. I think at this point fewer people understand that than LSN. There's also no reason to invent a third term for LSNs. We'd have LSN, XLogRecPtr, and LogSeqNum. So let's go with DatumGetLSN and LSNGetDatum instead... *** a/src/backend/replication/slotfuncs.c --- b/src/backend/replication/slotfuncs.c *** *** 141,148 pg_get_replication_slots(PG_FUNCTION_ARGS) boolactive; Oid database; const char *slot_name; - - charrestart_lsn_s[MAXFNAMELEN]; int i; SpinLockAcquire(slot-mutex); --- 141,146 Unrelated change. Funnily, the patch attached in my previous mail did not include all the diffs, it is an error with filterdiff that I use to generate context diff patches... My original branch includes the following diffs as well in slotfuncs.c for the second patch: diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 98a860e..68ecdcd 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -141,8 +141,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) boolactive; Oid database; const char *slot_name; - - charrestart_lsn_s[MAXFNAMELEN]; int i; SpinLockAcquire(slot-mutex); @@ -164,9 +162,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) memset(nulls, 0, sizeof(nulls)); - snprintf(restart_lsn_s, sizeof(restart_lsn_s), %X/%X, -(uint32) (restart_lsn 32), (uint32) restart_lsn); - i = 0; values[i++] = CStringGetTextDatum(slot_name); if (database == InvalidOid) @@ -180,7 +175,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; if (restart_lsn != InvalidTransactionId) - values[i++] = CStringGetTextDatum(restart_lsn_s); + values[i++] = restart_lsn; else nulls[i++] = true; Anything else? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On 2014-02-04 19:17:51 +0900, Michael Paquier wrote: On Tue, Feb 4, 2014 at 6:15 PM, Andres Freund and...@2ndquadrant.com wrote: + /*-- + * Relational operators for LSNs + *-*/ Isn't it just operators? They aren't really relational... Operators for LSNs? Fine with me. + #define DatumGetLogSeqNum(X) ((XLogRecPtr) GET_8_BYTES(X)) I am not a fan of LogSegNum. I think at this point fewer people understand that than LSN. There's also no reason to invent a third term for LSNs. We'd have LSN, XLogRecPtr, and LogSeqNum. So let's go with DatumGetLSN and LSNGetDatum instead... Sup. *** a/src/backend/replication/slotfuncs.c --- b/src/backend/replication/slotfuncs.c *** *** 141,148 pg_get_replication_slots(PG_FUNCTION_ARGS) boolactive; Oid database; const char *slot_name; - - charrestart_lsn_s[MAXFNAMELEN]; int i; SpinLockAcquire(slot-mutex); --- 141,146 Unrelated change. Funnily, the patch attached in my previous mail did not include all the diffs, it is an error with filterdiff that I use to generate context diff patches... My original branch includes the following Ah, then it makes more sense. diffs as well in slotfuncs.c for the second patch: diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 98a860e..68ecdcd 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -141,8 +141,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) boolactive; Oid database; const char *slot_name; - - charrestart_lsn_s[MAXFNAMELEN]; int i; SpinLockAcquire(slot-mutex); @@ -164,9 +162,6 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) memset(nulls, 0, sizeof(nulls)); - snprintf(restart_lsn_s, sizeof(restart_lsn_s), %X/%X, -(uint32) (restart_lsn 32), (uint32) restart_lsn); - i = 0; values[i++] = CStringGetTextDatum(slot_name); if (database == InvalidOid) @@ -180,7 +175,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; if (restart_lsn != InvalidTransactionId) - values[i++] = CStringGetTextDatum(restart_lsn_s); + values[i++] = restart_lsn; else nulls[i++] = true; Isn't that missing a LSNGetDatum()? Also, isn't it lacking the corresponding pg_proc change? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation
I'm sorry, I'm replying to an older mail, because I lost your latest mail by mistake. Ah. Sorry, I missed that part. As NTFS junctions and symbolic links are different (although they behave similarly), there seems only a minor inconvenience related to misleading error message i.e. You are right. Fixed. Regards MauMau remove_tblspc_symlink_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR
From: Craig Ringer cr...@2ndquadrant.com I completely agree; just saying that any installer can set the key. I'm convinced that setting this flag is appropriate, at least while Pg relies on having the shared memory segment mapped in the same zone in every executable. Just pointing out that there's a workaround in the mean time. Please don't mind, I didn't misunderstand your intent. I think we should apply this in the next minor release to avoid unnecessary confusion -- more new users would use PostgreSQL on Windows 8/2012 and hit this problem. I added this patch to the CommitFest. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Sun, Feb 2, 2014 at 10:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Allowing ALTER COLUMN SET STORAGE on foreign tables would make sense if for example, SELECT * INTO local_table FROM foreign_table did create a new local table of columns having the storage types associated with those of a foreign table? Seems like a pretty weak argument. It's not that we can't find strange corner cases where applying SET STORAGE to a foreign table doesn't do something; it's that they *are* strange corner cases. The options as we normally don't understand them just aren't sensible in this context, and a good deal of work has been put into an alternative options framework, which is what authors of FDWs ought to be using. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR
On 02/04/2014 07:28 PM, MauMau wrote: Please don't mind, I didn't misunderstand your intent. I think we should apply this in the next minor release to avoid unnecessary confusion -- more new users would use PostgreSQL on Windows 8/2012 and hit this problem. I added this patch to the CommitFest. It's really a bugfix suitable for backpatching IMO. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Tue, Feb 4, 2014 at 7:22 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-04 19:17:51 +0900, Michael Paquier wrote: @@ -180,7 +175,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; if (restart_lsn != InvalidTransactionId) - values[i++] = CStringGetTextDatum(restart_lsn_s); + values[i++] = restart_lsn; else nulls[i++] = true; Isn't that missing a LSNGetDatum()? Oops yes. Will fix. Also, isn't it lacking the corresponding pg_proc change? restart_lsn is the 6th argument of pg_get_replication_slots, and the list of arguments of this function is already changed like that in my patch: {25,25,26,16,28,25} = {25,25,26,16,28,3220} Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On 2014-02-04 21:04:13 +0900, Michael Paquier wrote: On Tue, Feb 4, 2014 at 7:22 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-04 19:17:51 +0900, Michael Paquier wrote: @@ -180,7 +175,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; if (restart_lsn != InvalidTransactionId) - values[i++] = CStringGetTextDatum(restart_lsn_s); + values[i++] = restart_lsn; else nulls[i++] = true; Isn't that missing a LSNGetDatum()? Oops yes. Will fix. Also, isn't it lacking the corresponding pg_proc change? restart_lsn is the 6th argument of pg_get_replication_slots, and the list of arguments of this function is already changed like that in my patch: {25,25,26,16,28,25} = {25,25,26,16,28,3220} Regards, Ok. I think the patch should also adapt pageinspect... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD
On 2014-02-04 00:38:19 +0100, Andres Freund wrote: A quick hack (attached) making BufferDescriptor 64byte aligned indeed restored performance across all max_connections settings. It's not surprising that a misaligned buffer descriptor causes problems - there'll be plenty of false sharing of the spinlocks otherwise. Curious that the the intel machine isn't hurt much by this. I think that is explained here: http://www.agner.org/optimize/blog/read.php?i=142v=t With Sandy Bridge, Misaligned memory operands [are] handled efficiently. No, I don't think so. Those improvements afair refer to unaligned accesses as in accessing a 4 byte variable at address % 4 != 0. So, Christian did some benchmarking on the intel machine, and his results were also lower than mine, and I've since confirmed that it's also possible to reproduce the alignment problems on the intel machine. Which imo means fixing this got more important... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] specifying repeatable read in PGOPTIONS
Hi, I recently had the need to bury the used isolation level in the connection string, but it turns out that doesn't work that well... PGOPTIONS='-c default_transaction_isolation=serializable' \ psql ... -c SHOW default_transaction_isolation works well enough, but PGOPTIONS='-c default_transaction_isolation=repeatable read' \ psql ... -c SHOW default_transaction_isolation doesn't, because of the whitespace. I couldn't come up with any adequate quoting. I'd like to propose adding aliases with dashes instead of spaces to the isolation_level_options array? I'd even like to backport it, because it makes benchmarking across versions unneccessarily hard. Additionally we might want to think about a bit better quoting support for such options? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Tom Lane wrote: Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Hello, I have often seen inquiries about an log message from PostgreSQL server. LOG: could not create IPv6 socket: Address family not supported by protocol That's merely a harmless log message. If we're concerned about users worrying about log messages from this, I'd rather see us downgrade those log messages to DEBUG level than risk breaking the code with behaviors that were proven to be a bad idea a decade ago. But TBH I see no strong need to do anything here. How about just adding a HINT? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: Hello, I have often seen inquiries about an log message from PostgreSQL server. LOG: could not create IPv6 socket: Address family not supported by protocol That's merely a harmless log message. How about just adding a HINT? Hmm ... maybe, but how would you phrase the hint exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 02:10:47 -0500, Tom Lane wrote: Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. There's not that much lwlock usage in contrib. It's just pg_stat_statements and pg_buffercache. Neither has tests... So it very well could be that breakage simply hasn't been observed. Hm, you're right --- I'd have thought there were more of those. Ugh. This problem was bad enough when I thought that it would only lead to link-time errors detectable in the buildfarm. If it can lead to errors only observable at runtime --- and maybe not obvious even then --- then I think we *have to* do something about it. By that I mean that we must get rid of the need to manually plaster PGDLLIMPORT on global variables. Anybody with a Windows build environment want to test the #define extern trick? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 10:43 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 02:10:47 -0500, Tom Lane wrote: Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. There's not that much lwlock usage in contrib. It's just pg_stat_statements and pg_buffercache. Neither has tests... So it very well could be that breakage simply hasn't been observed. Hm, you're right --- I'd have thought there were more of those. Ugh. This problem was bad enough when I thought that it would only lead to link-time errors detectable in the buildfarm. If it can lead to errors only observable at runtime --- and maybe not obvious even then --- then I think we *have to* do something about it. By that I mean that we must get rid of the need to manually plaster PGDLLIMPORT on global variables. Anybody with a Windows build environment want to test the #define extern trick? We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [doc patch] extra_float_digits and casting from real to numeric
Re: To Tom Lane 2014-01-08 20140108094017.ga20...@msgid.df7cb.de What about this patch to mention this gotcha more explicitely in the documentation? diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index 0386330..968f4a7 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** NUMERIC *** 689,694 --- 689,697 literal0/literal, the output is the same on every platform supported by PostgreSQL. Increasing it will produce output that more accurately represents the stored value, but may be unportable. + Casts to other numeric datatypes and the literalto_char/literal + function are not affected by this setting, it affects only the text + representation. /para /note Anyone for that patch? Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
Andres Freund and...@2ndquadrant.com writes: PGOPTIONS='-c default_transaction_isolation=serializable' \ psql ... -c SHOW default_transaction_isolation works well enough, but PGOPTIONS='-c default_transaction_isolation=repeatable read' \ psql ... -c SHOW default_transaction_isolation doesn't, because of the whitespace. I couldn't come up with any adequate quoting. I'd like to propose adding aliases with dashes instead of spaces to the isolation_level_options array? I'd even like to backport it, because it makes benchmarking across versions unneccessarily hard. -1. This is not a general solution to the problem. There are other GUCs for which people might want spaces in the value. Additionally we might want to think about a bit better quoting support for such options? Yeah. See pg_split_opts(), which explicitly acknowledges that it'll fall down for space-containing options. Not sure what the most appropriate quoting convention would be there, but I'm sure we can think of something. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On February 4, 2014 5:06:52 PM CET, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2014 10:43 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 02:10:47 -0500, Tom Lane wrote: Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. There's not that much lwlock usage in contrib. It's just pg_stat_statements and pg_buffercache. Neither has tests... So it very well could be that breakage simply hasn't been observed. Hm, you're right --- I'd have thought there were more of those. Ugh. This problem was bad enough when I thought that it would only lead to link-time errors detectable in the buildfarm. If it can lead to errors only observable at runtime --- and maybe not obvious even then --- then I think we *have to* do something about it. By that I mean that we must get rid of the need to manually plaster PGDLLIMPORT on global variables. Anybody with a Windows build environment want to test the #define extern trick? We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. Err. It might not be very hard but it certainly is time consuming. And that for people not caring about windows. If there were usable, regularly refreshed, instances out there'd it'd be slightly less bad. But this still by far the most annoying and intrusive platform to care about. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] nested hstore - large insert crashes server
CentOS Release 6.5 (final) AMD FX(tm)-8120 Eight-Core 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux memory: 8GB I am testing nested hstore, on a server with both with these patches: jsonb-9.patch.gz nested-hstore-9.patch.gz One of the first tries brings down the server (gobbles up too much memory, I think). When I run: select version(); drop table if exists t_1000 ; create table t_1000 ( idserial , hshstore primary key , hs2 hstore ); insert into t_1000 (hs,hs2) select ( '[' || i || ',' || i|| ']' ) ::hstore , ( '{' || i || '=' || i|| '}' ) ::hstore from generate_series(1, 1000) as f(i) ; - I get: $ time psql -af nestedhs.sql \timing on Timing is on. select version(); version -- PostgreSQL 9.4devel_nested_hstore_20140204_0814_00d4f2a on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit (1 row) Time: 1.288 ms drop table if exists t_1000 ; DROP TABLE Time: 0.808 ms create table t_1000 ( idserial , hshstore primary key , hs2 hstore ); CREATE TABLE Time: 111.397 ms insert into t_1000 (hs,hs2) select ( '[' || i || ',' || i|| ']' ) ::hstore , ( '{' || i || '=' || i|| '}' ) ::hstore from generate_series(1, 1000) as f(i) ; [psql:nestedhs.sql:14: connection to server was lost real5m4.780s user0m0.005s sys 0m0.009s logging: 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed 2014-02-04 10:34:25.854 CET 29133 DETAIL: Failed process was running: insert into t_1000 (hs,hs2) select ( '[' || i || ',' || i|| ']' ) ::hstore , ( '{' || i || '=' || i|| '}' ) ::hstore from generate_series(1, 1000) as f(i) ; 2014-02-04 10:34:25.884 CET 29133 LOG: terminating any other active server processes 2014-02-04 10:34:28.541 CET 29133 LOG: all server processes terminated; reinitializing 2014-02-04 10:34:30.002 CET 29534 LOG: database system was interrupted; last known up at 2014-02-04 10:30:42 CET 2014-02-04 10:34:30.933 CET 29535 FATAL: the database system is in recovery mode 2014-02-04 10:34:31.150 CET 29534 LOG: database system was not properly shut down; automatic recovery in progress 2014-02-04 10:34:31.344 CET 29534 LOG: redo starts at 1/B1CC92F8 2014-02-04 10:34:46.681 CET 29534 LOG: unexpected pageaddr 1/86F4A000 in log segment 0001000100CC, offset 16031744 2014-02-04 10:34:46.681 CET 29534 LOG: redo done at 1/CCF49F50 2014-02-04 10:34:52.039 CET 29133 LOG: database system is ready to accept connections (and btw, I end up with a large but unusable table: testdb=# \dt+ t_1000 List of relations Schema |Name| Type | Owner | Size | Description ++---+--++- public | t_1000 | table | aardvark | 291 MB | (1 row) testdb=# select count(*) from t_1000; count --- 0 (1 row) ) Main .conf settings: setting | current_setting --+-- autovacuum | off port | 6541 shared_buffers | 512MB effective_cache_size | 2GB work_mem | 50MB maintenance_work_mem | 1GB checkpoint_segments | 20 server_version | 9.4devel_nested_hstore_20140204_0814_00d4f2a pg_postmaster_start_time | 2014-02-04 10:12 (uptime: 0d 0h 33m 42s) data_checksums | off feature_id |feature_name| is_supported | is_verified_by | comments ++--++- PKG100 | project name | YES | ej | nested_hstore PKG101 | patched| YES | ej | YES PKG102 | patch file | YES | ej | /home/aardvark/download/pgpatches/0094/nested_hstore/20140130/jsonb-9.patch + || || /home/aardvark/download/pgpatches/0094/nested_hstore/20140130/nested-hstore-9.patch PKG103 | build time | YES | ej | 2014-02-04 08:19:13.600371+01 PKG104 | server_version | YES | ej | 9.4devel_nested_hstore_20140204_0814_00d4f2a PKG105 | server_version_num | YES | ej | 90400 PKG106 | port | YES | ej | 6541 PKG110 | commit hash| YES | ej
Re: [HACKERS] specifying repeatable read in PGOPTIONS
On 2014-02-04 11:36:22 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: PGOPTIONS='-c default_transaction_isolation=serializable' \ psql ... -c SHOW default_transaction_isolation works well enough, but PGOPTIONS='-c default_transaction_isolation=repeatable read' \ psql ... -c SHOW default_transaction_isolation doesn't, because of the whitespace. I couldn't come up with any adequate quoting. I'd like to propose adding aliases with dashes instead of spaces to the isolation_level_options array? I'd even like to backport it, because it makes benchmarking across versions unneccessarily hard. -1. This is not a general solution to the problem. There are other GUCs for which people might want spaces in the value. Sure, I didn't say it was. But I don't see any oother values that are likely being passed via PGOPTIONS that frequently contain spaces. Sure, you can generate a search_path that does so, but that's just asking for problems. Most other GUCs that can contain spaces are PGC_SIGHUP/POSTMASTER. And having to use quoting just makes it awkward to use from shell. Since all the other option values try to take not to force using spaces, I see little reason not to do so here as well. Additionally we might want to think about a bit better quoting support for such options? Yeah. See pg_split_opts(), which explicitly acknowledges that it'll fall down for space-containing options. Not sure what the most appropriate quoting convention would be there, but I'm sure we can think of something. No argument against introducing it. What about simply allowing escaping of the next character using \? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Jan 30, 2014, at 10:06 AM, Sergey Muraviov sergey.k.murav...@gmail.com wrote: Now it looks fine for me. Just as another data point, I recently submitted pgTAP to the Homebrew project This is the build-from-source system for OS X, used by a lot of web developers. In my build script, I originally had depends_on :postgresql Which means, “require any version of PostgreSQL.” But then tests failed on OS X Server, which includes a system-distributed PostgreSQL. Homebrew installs everything in /usr/local, and not only does it disallow installing anything outside of that directory, it doesn’t have any permissions to do so. The install failed, of course, because extensions want to install in $PGROOT/share/extensions. For now, I had to change it to depends_on 'postgresql' A subtle difference that means, “require the latest version of the Homebrew-built PostgreSQL in /usr/local.” However, if extension_control_path was supported, I could change it back to requiring any Postgres and install pgTAP somewhere under /usr/local, as required for Homebrew. Then all the user would have to do to use it with their preferred Postgres would be to set extension_control_path. In other words, I am strongly in favor of this patch, as it gives distribution systems a lot more flexibility (for better and for worse) in determining where extensions should be installed. My $0.02. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 11:36:22 -0500, Tom Lane wrote: -1. This is not a general solution to the problem. There are other GUCs for which people might want spaces in the value. Sure, I didn't say it was. But I don't see any oother values that are likely being passed via PGOPTIONS that frequently contain spaces. application_name --- weren't we just reading about people passing entire command lines there? (They must be using some other way of setting it currently, but PGOPTIONS doesn't seem like an implausible source.) Yeah. See pg_split_opts(), which explicitly acknowledges that it'll fall down for space-containing options. Not sure what the most appropriate quoting convention would be there, but I'm sure we can think of something. No argument against introducing it. What about simply allowing escaping of the next character using \? The same thought had occurred to me. Since it'll typically already be inside some levels of quoting, any quoted-string convention seems like it'd be a pain to use. But a straight backslash-escapes-the-next-char thing wouldn't be too awful, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 11:30 AM, Andres Freund wrote: We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. Err. It might not be very hard but it certainly is time consuming. And that for people not caring about windows. If there were usable, regularly refreshed, instances out there'd it'd be slightly less bad. But this still by far the most annoying and intrusive platform to care about. If someone volunteered to pay for the storage, I'd be prepared to make some time to create an AMI to reduce the startup time dramatically. Basically it would be boot the AMI and start testing your patches. I'd even make it as friendly as possible for people who don't like to get too far from unix-ish environments. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andrew Dunstan and...@dunslane.net writes: If someone volunteered to pay for the storage, I'd be prepared to make some time to create an AMI to reduce the startup time dramatically. Basically it would be boot the AMI and start testing your patches. I'd even make it as friendly as possible for people who don't like to get too far from unix-ish environments. My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 09:34 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: If someone volunteered to pay for the storage, I'd be prepared to make some time to create an AMI to reduce the startup time dramatically. Basically it would be boot the AMI and start testing your patches. I'd even make it as friendly as possible for people who don't like to get too far from unix-ish environments. My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). I know many people that run it on Windows, especially in the medical field. I also know many people that embed it (Apple not withstanding). Yes it is an obnoxious platform but it is THE platform, no matter how much we like to tout our Linux(Hiker) creds. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Tue, Feb 4, 2014 at 8:06 AM, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2014 10:43 AM, Tom Lane wrote: Ugh. This problem was bad enough when I thought that it would only lead to link-time errors detectable in the buildfarm. If it can lead to errors only observable at runtime --- and maybe not obvious even then --- then I think we *have to* do something about it. By that I mean that we must get rid of the need to manually plaster PGDLLIMPORT on global variables. Anybody with a Windows build environment want to test the #define extern trick? We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. If you gave step by step instructions like the ones for MinGW, I would at least give it a try. Last time a looked into it, I gave up after I couldn't figure out which of the umpteen similarly-named products was the one I needed to buy/download-for-free/register-and-download and then install, and I tried a few of them at random without much success. Cheers, Jeff
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com wrote: Now there is approximately 1.4~5% CPU gain for hundred tiny fields, half nulled case I don't want to advocate too strongly for this patch because, number one, Amit is a colleague and more importantly, number two, I can't claim to be an expert on compression. But that having been said, I think these numbers are starting to look awfully good. The only remaining regressions are in the cases where a large fraction of the tuple turns over, and they're not that big even then. The two *worst* tests now seem to be hundred tiny fields, all changed and hundred tiny fields, half changed. For the all changed case, the median unpatched time is 16.3172590732574 and the median patched time is 16.9294109344482, a 4% loss; for the half changed case, the median unpatched time is 16.5795118808746 and the median patched time is 17.0454230308533, a 3% loss. Both cases show minimal change in WAL volume. Meanwhile, in friendlier cases, like one short and one long field, no change, we're seeing big improvements. That particular case shows a speedup of 21% and a WAL reduction of 36%. That's a pretty big deal, and I think not unrepresentative of many real-world workloads. Some might well do better, having either more or longer unchanged fields. Assuming that the logic isn't buggy, a point in need of further study, I'm starting to feel like we want to have this. And I might even be tempted to remove the table-level off switch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore - large insert crashes server
Hi, On 04/02/14 17:41, Erik Rijkers wrote: 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed Did you check if this was the OOM killer? Should be logged in dmesg. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpfuIWxnh8kv.pgp Description: PGP signature
Re: [HACKERS] nested hstore - large insert crashes server
On Tue, February 4, 2014 18:56, Christian Kruse wrote: Hi, On 04/02/14 17:41, Erik Rijkers wrote: 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed Did you check if this was the OOM killer? Should be logged in dmesg. I would be surprised if it wasn't. (no access to that machine at the moment) How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word). I am inclined to blame the patch... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Joshua D. Drake j...@commandprompt.com writes: On 02/04/2014 09:34 AM, Tom Lane wrote: My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). [ shrug... ] If it's so widely used, why is it so hard to find somebody who's willing to put in some development effort for it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 01:28:38PM -0500, Robert Haas wrote: Meanwhile, in friendlier cases, like one short and one long field, no change, we're seeing big improvements. That particular case shows a speedup of 21% and a WAL reduction of 36%. That's a pretty big deal, and I think not unrepresentative of many real-world workloads. Some might well do better, having either more or longer unchanged fields. Assuming that the logic isn't buggy, a point in need of further study, I'm starting to feel like we want to have this. And I might even be tempted to remove the table-level off switch. Does this feature relate to compression of WAL page images at all? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 2014-02-04 14:09:57 -0500, Bruce Momjian wrote: On Tue, Feb 4, 2014 at 01:28:38PM -0500, Robert Haas wrote: Meanwhile, in friendlier cases, like one short and one long field, no change, we're seeing big improvements. That particular case shows a speedup of 21% and a WAL reduction of 36%. That's a pretty big deal, and I think not unrepresentative of many real-world workloads. Some might well do better, having either more or longer unchanged fields. Assuming that the logic isn't buggy, a point in need of further study, I'm starting to feel like we want to have this. And I might even be tempted to remove the table-level off switch. Does this feature relate to compression of WAL page images at all? No. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQputCopyData dont signal error
On Thu, 2011-04-14 at 10:50 +0300, Heikki Linnakangas wrote: On 14.04.2011 10:15, Pavel Stehule wrote: Hello I have a problem with PQputCopyData function. It doesn't signal some error. while ((row = mysql_fetch_row(res)) != NULL) { snprintf(buffer, sizeof(buffer), %s%s\n, row[0], row[1]); copy_result = PQputCopyData(pconn, buffer, strlen(buffer)); printf(%s\n, PQerrorMessage(pconn)); printf(%d\n, copy_result); if (copy_result != 1) { fprintf(stderr, Copy to target table failed: %s, PQerrorMessage(pconn)); EXIT; } } it returns 1 for broken values too :( Is necessary some special check? The way COPY works is that PQputCopyData just sends the data to the server, and the server will buffer it in its internal buffer and processes it when it feels like it. The PQputCopyData() calls don't even need to match line boundaries. I think you'll need to send all the data and finish the COPY until you get an error. If you have a lot of data to send, you might want to slice it into multiple COPY statements of say 50MB each, so that you can catch errors in between. [ replying to old thread ] According to the protocol docs[1]: In the event of a backend-detected error during copy-in mode (including receipt of a CopyFail message), the backend will issue an ErrorResponse message. If the COPY command was issued via an extended-query message, the backend will now discard frontend messages until a Sync message is received, then it will issue ReadyForQuery and return to normal processing. If the COPY command was issued in a simple Query message, the rest of that message is discarded and ReadyForQuery is issued. In either case, any subsequent CopyData, CopyDone, or CopyFail messages issued by the frontend will simply be dropped. If the remaining CopyData messages are dropped, I don't see why PQputCopyData can't return some kind of error indicating that further CopyData messages are useless so that it can stop sending them. Asking the client to break the copy into multiple COPY commands is bad, because then the client needs to figure out the line breaks, which is a burden in many cases. Certainly we don't want to *guarantee* that the backend will issue an error at any particular point, because of the buffering on the server side. But from a practical standpoint, the server will let the client know fairly quickly and it will avoid a lot of client-side work and network traffic. Would a change to PQputCopyData be welcome? Regards, Jeff Davis [1] http://www.postgresql.org/docs/9.3/static/protocol-flow.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore - large insert crashes server
Erik Rijkers e...@xs4all.nl writes: On Tue, February 4, 2014 18:56, Christian Kruse wrote: Did you check if this was the OOM killer? Should be logged in dmesg. I would be surprised if it wasn't. (no access to that machine at the moment) How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word). Well, it suggests that there may be a memory leak, which would be a bug even though we'd assign it lower priority than a true crash. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 10:53 AM, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: On 02/04/2014 09:34 AM, Tom Lane wrote: My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). [ shrug... ] If it's so widely used, why is it so hard to find somebody who's willing to put in some development effort for it? Well, from what I have seen of the Windows world there is a fairly sharp demarcation between users and developers. So use does not necessarily mean developer knowledge. regards, tom lane -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 01:53 PM, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: On 02/04/2014 09:34 AM, Tom Lane wrote: My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). [ shrug... ] If it's so widely used, why is it so hard to find somebody who's willing to put in some development effort for it? I suspect that the open hostility towards it doesn't help much. I do put in quite alot of effort supporting it one way and another, committing patches and running buildfarm members among other things. And we have several others who put in significant amounts of effort. And no, it's not my platform of choice either. Anyway, I will explore my suggestion upthread of a pre-prepared windows development AMI, when I have time. Right now between work for actual money and things like jsonb I am slammed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore - large insert crashes server
* Erik Rijkers (e...@xs4all.nl) wrote: On Tue, February 4, 2014 18:56, Christian Kruse wrote: On 04/02/14 17:41, Erik Rijkers wrote: 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed Did you check if this was the OOM killer? Should be logged in dmesg. I would be surprised if it wasn't. (no access to that machine at the moment) How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word). I am inclined to blame the patch... It sounds like there is at least some investigation which should happen here to see why we're using so much memory (well beyond work_mem and even maint_work_mem it sounds like), but it would also be good to have the machine reconfigured to not allow OOM killing. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PQputCopyData dont signal error
Jeff Davis pg...@j-davis.com writes: On Thu, 2011-04-14 at 10:50 +0300, Heikki Linnakangas wrote: I think you'll need to send all the data and finish the COPY until you get an error. If you have a lot of data to send, you might want to slice it into multiple COPY statements of say 50MB each, so that you can catch errors in between. If the remaining CopyData messages are dropped, I don't see why PQputCopyData can't return some kind of error indicating that further CopyData messages are useless so that it can stop sending them. An error from PQputCopyData indicates a connection-level problem, typically; that is it means that it couldn't send data not that the server had detected a problem with some earlier data. I'd not be surprised if applications would respond to such an error indication by just pre-emptively dropping the connection. (A quick trawl through our own code shows a precedent in pg_restore, in fact.) So the idea of having PQputCopyData start failing as soon as an error has arrived from the server doesn't sound that attractive. What'd be safer is to provide a way of detecting whether an error has arrived (without actually consuming it, of course) so that the case could be handled by adding something like if (PQerrorIsPending(conn)) break; to the send-data loop. This would allow the application code to know what is happening. It would also not impose the penalty of checking for errors on apps that prefer to optimize for the no-error case. A different approach would be for PQputCopyData to just internally suppress sending of further CopyData messages, *without* returning an error; this would be transparent and it would save the network traffic at least. However, my reading of the original example is that the OP was at least as interested in suppressing the creation of further data as in suppressing the sending, so this wouldn't really solve his problem completely. And it's also optimizing for the error case not the no-error case, in a way that doesn't give the app any say in the matter. So I'm not for this particularly, just mentioning it for completeness. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. Both versions have the type align patch applied. pgbench-tools config: SCALES=100 SETCLIENTS=1 4 8 16 32 48 64 96 128 SETTIMES=2 I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ I think the small differences are caused by the fact that I use TCP connections and not Unix domain sockets. The results are pretty impressive… I will post the read-write results as soon as they are finished. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpKL7vRo_Vp0.pgp Description: PGP signature
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. Both versions have the type align patch applied. It certainly seems as if the interesting cases are where clients cores. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ Note that Christian ran this test with max_connections=201, presumably to exercise the alignment problem. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 2014-02-04 11:48:14 -0800, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I added -M prepared to the pgbench call in the benchwarmer script. The read-only tests are finished, I come to similiar results as yours: http://wwwtech.de/pg/benchmarks-lwlock-read-only/ Note that Christian ran this test with max_connections=201, presumably to exercise the alignment problem. I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. I guess that given that *both* cases tested had the patch applied, that makes sense. However, I would have liked to see a real master baseline. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 11:17 AM, Andrew Dunstan wrote: prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). [ shrug... ] If it's so widely used, why is it so hard to find somebody who's willing to put in some development effort for it? Because Windows developers get paid and have better things to do than interact with a bunch of Open Source people and their opinions. Let's be honest, all of us here are not the easiest to work with and we are here largely due to an ideology. Why would a Windows developer bother? I know lots of Windows Developers and most of them don't follow anywhere near the ideology we do. I know very few (read 0) Windows developers that do it for fun (I am not saying they don't have fun. I am saying they don't do it for fun). They do it for their career. Heck, look back through the years at our archives. Nobody really thought that we would somehow bring windows developers into the fold if we ported to that platform. It was a market share play and it worked. However, now we are paying the price for it. To be perfectly frank, the first thing I do when a customer calls and says we are on Windows is tell them I will spend our relationship pushing them to move PostgreSQL to Linux. It works over time for the most part but not on all. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. What CPU model? Can you post /proc/cpuinfo? The distinction between logical and physical cores matters here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On February 4, 2014 8:53:36 PM CET, Peter Geoghegan p...@heroku.com wrote: On Tue, Feb 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: I think he has applied the patch to hack around the alignment issue I pushed to git for both branches. It's not nice enough to be applied yet, but it should fix the issue. I think the 201 is just a remembrance of debugging the issue. I guess that given that *both* cases tested had the patch applied, that makes sense. However, I would have liked to see a real master baseline. Christian, could you rerun with master (the commit on which the branch is based on), the alignment patch, and then the lwlock patch? Best with max_connections 200. That's probably more important than the write tests as a first step.. Thanks, Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Tue, Feb 4, 2014 at 9:26 AM, Andrew Dunstan and...@dunslane.net wrote: On 02/04/2014 11:30 AM, Andres Freund wrote: We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. Err. It might not be very hard but it certainly is time consuming. And that for people not caring about windows. If there were usable, regularly refreshed, instances out there'd it'd be slightly less bad. But this still by far the most annoying and intrusive platform to care about. If someone volunteered to pay for the storage, I'd be prepared to make some time to create an AMI to reduce the startup time dramatically. Basically it would be boot the AMI and start testing your patches. I'd even make it as friendly as possible for people who don't like to get too far from unix-ish environments. Do you know about what it would cost? Could official community funds be used for it (it seems like something that is cheap, but which you wouldn't want to be forgotten about some month.) Having an AMI would help, but even with an AMI in place, MinGW is still insanely slow. Running make on already made PostgreSQL (so there was nothing to actually do) takes 1.5 minutes. And a make after a make clean takes half an hour. This is on an actual desktop, not an AWS micro instance. So doing a git bisect is just painful. Is the MSVC build faster? Cheers, Jeff
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, On 04/02/14 12:02, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 11:39 AM, Christian Kruse christ...@2ndquadrant.com wrote: I'm doing some benchmarks regarding this problem: one set with baseline and one set with your patch. Machine was a 32 core machine (4 CPUs with 8 cores), 252 gib RAM. What CPU model? Can you post /proc/cpuinfo? The distinction between logical and physical cores matters here. model name : Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz 32 physical cores, 64 logical cores. /proc/cpuinfo is applied. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpZpWnKfQtb4.pgp Description: PGP signature
Re: [HACKERS] narwhal and PGDLLIMPORT
On Tue, Feb 4, 2014 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: If someone volunteered to pay for the storage, I'd be prepared to make some time to create an AMI to reduce the startup time dramatically. Basically it would be boot the AMI and start testing your patches. I'd even make it as friendly as possible for people who don't like to get too far from unix-ish environments. My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. You know, I would really prefer to just stick a PGDLLIMPORT on this place and any others that need it, and any others that come up, than turn this into a political football. Having to sprinkle PGDLLIMPORT on the handful of variables that are accessed by contrib modules is, indeed, annoying. But it's not any more annoying than twelve other things that I have to do as a committer, like remembering to bump whichever of catversion, the xlog page magic, and the control data version are relevant to a particular commit, knowing which particular SGML files have to build standalone, and enforcing the project's code formatting, comment, and documentation conventions on everyone who submits a patch. So acting as if this particular annoyance is somehow unique or a good reason to desupport what is despite all of our misgivings one of our most popular platforms doesn't impress me one bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
Perhaps this type should be called pglsn, since it's an implementation-specific detail and not a universal concept like int, point, or uuid. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On Tue, Feb 4, 2014 at 1:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: Robert Haas robertmh...@gmail.com writes: I wonder if these standalone things are really worthwhile. I wonder how difficult it would be to make sufficient link data available when building the standalone files. There would be no linking per se; we would just need the referent's text fragment emitted where the xref tag appears. IIRC, that's basically what the workaround is, except it's not very automated. Even if it were automated, though, there's still a problem: such links aren't really *useful* in flat text format. I think that forcing the author to actually think about what to put there in the flat text version is a good thing, if we're going to retain the flat text version at all. Right. I mean, a lot of the links say things like Section 26.2 which obviously makes no sense in a standalone text file. I agree with your comments upthread: INSTALL *might* still be useful to somebody, but I would be pretty surprised if anyone uses HISTORY or regress_README for anything any more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, On 04/02/14 21:03, Andres Freund wrote: Christian, could you rerun with master (the commit on which the branch is based on), the alignment patch, and then the lwlock patch? Best with max_connections 200. That's probably more important than the write tests as a first step.. Ok, benchmark for baseline+alignment patch is running. This will take a couple of hours and since I have to get up at about 05:00 I won't be able to post it before tomorrow. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpsN5kcUiOgQ.pgp Description: PGP signature
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 03:08 PM, Jeff Janes wrote: Do you know about what it would cost? Could official community funds be used for it (it seems like something that is cheap, but which you wouldn't want to be forgotten about some month.) Having an AMI would help, but even with an AMI in place, MinGW is still insanely slow. Running make on already made PostgreSQL (so there was nothing to actually do) takes 1.5 minutes. And a make after a make clean takes half an hour. This is on an actual desktop, not an AWS micro instance. So doing a git bisect is just painful. Is the MSVC build faster? Would have to check with the same build options (cassert and debug have major timing effects.) I agree it's not lightning fast like make -j 4 on a decent linux box. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Mon, Feb 3, 2014 at 6:31 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov aekorot...@gmail.com wrote: On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/26/2014 08:24 AM, Tomas Vondra wrote: Hi! On 25.1.2014 22:21, Heikki Linnakangas wrote: Attached is a new version of the patch set, with those bugs fixed. I've done a bunch of tests with all the 4 patches applied, and it seems to work now. I've done tests with various conditions (AND/OR, number of words, number of conditions) and I so far I did not get any crashes, infinite loops or anything like that. I've also compared the results to 9.3 - by dumping the database and running the same set of queries on both machines, and indeed I got 100% match. I also did some performance tests, and that's when I started to worry. For example, I generated and ran 1000 queries that look like this: SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(header 53 32 useful dropped)') ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header 53 32 useful dropped)')) DESC; i.e. in this case the query always was 5 words connected by AND. This query is a pretty common pattern for fulltext search - sort by a list of words and give me the best ranked results. On 9.3, the script was running for ~23 seconds, on patched HEAD it was ~40. It's perfectly reproducible, I've repeated the test several times with exactly the same results. The test is CPU bound, there's no I/O activity at all. I got the same results with more queries (~100k). Attached is a simple chart with x-axis used for durations measured on 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious a vast majority of queries is up to 2x slower - that's pretty obvious from the chart. Only about 50 queries are faster on HEAD, and 700 queries are more than 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms on HEAD). Typically, the EXPLAIN ANALYZE looks something like this (on 9.3): http://explain.depesz.com/s/5tv and on HEAD (same query): http://explain.depesz.com/s/1lI Clearly the main difference is in the Bitmap Index Scan which takes 60ms on 9.3 and 120ms on HEAD. On 9.3 the perf top looks like this: 34.79% postgres [.] gingetbitmap 28.96% postgres [.] ginCompareItemPointers 9.36% postgres [.] TS_execute 5.36% postgres [.] check_stack_depth 3.57% postgres [.] FunctionCall8Coll while on 9.4 it looks like this: 28.20% postgres [.] gingetbitmap 21.17% postgres [.] TS_execute 8.08% postgres [.] check_stack_depth 7.11% postgres [.] FunctionCall8Coll 4.34% postgres [.] shimTriConsistentFn Not sure how to interpret that, though. For example where did the ginCompareItemPointers go? I suspect it's thanks to inlining, and that it might be related to the performance decrease. Or maybe not. Yeah, inlining makes it disappear from the profile, and spreads that time to the functions calling it. The profile tells us that the consistent function is called a lot more than before. That is expected - with the fast scan feature, we're calling consistent not only for potential matches, but also to refute TIDs based on just a few entries matching. If that's effective, it allows us to skip many TIDs and avoid consistent calls, which compensates, but if it's not effective, it's just overhead. I would actually expect it to be fairly effective for that query, so that's a bit surprising. I added counters to see where the calls are coming from, and it seems that about 80% of the calls are actually coming from this little the feature I explained earlier: In addition to that, I'm using the ternary consistent function to check if minItem is a match, even if we haven't loaded all the entries yet. That's less important, but I think for something like rare1 | (rare2 frequent) it might be useful. It would allow us to skip fetching 'frequent', when we already know that 'rare1' matches for the current item. I'm not sure if that's worth the cycles, but it seemed like an obvious thing to do, now that we have the ternary consistent function. So, that clearly isn't worth the cycles :-). At least not with an expensive consistent function; it might be worthwhile if we pre-build the truth-table, or cache the results of the consistent function. Attached is a quick patch to remove that, on top of all the other patches, if you want to test the effect. Every single change you did in fast scan seems to be reasonable, but testing shows that something went wrong.
Re: [HACKERS] Wait free LW_SHARED acquisition
On Tue, Feb 4, 2014 at 12:30 PM, Christian Kruse christ...@2ndquadrant.com wrote: Ok, benchmark for baseline+alignment patch is running. I see that you have enabled latency information. For this kind of thing I prefer to hack pgbench-tools to not collect this (i.e. to not pass the -l flag, Per-Transaction Logging). Just remove it and pgbench-tools rolls with it. It may well be that the overhead added is completely insignificant, but for something like this, where the latency information is unlikely to add any value, I prefer to not take the chance. This is a fairly minor point, however, especially since these are only 60 second runs where you're unlikely to accumulate enough transaction latency information to notice any effect. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 08:11:18PM +0100, Andres Freund wrote: On 2014-02-04 14:09:57 -0500, Bruce Momjian wrote: On Tue, Feb 4, 2014 at 01:28:38PM -0500, Robert Haas wrote: Meanwhile, in friendlier cases, like one short and one long field, no change, we're seeing big improvements. That particular case shows a speedup of 21% and a WAL reduction of 36%. That's a pretty big deal, and I think not unrepresentative of many real-world workloads. Some might well do better, having either more or longer unchanged fields. Assuming that the logic isn't buggy, a point in need of further study, I'm starting to feel like we want to have this. And I might even be tempted to remove the table-level off switch. Does this feature relate to compression of WAL page images at all? No. I guess it bothers me we are working on compressing row change sets while the majority(?) of WAL is page images. I know we had a page image compression patch that got stalled. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 11:11 AM, Andres Freund and...@2ndquadrant.com wrote: Does this feature relate to compression of WAL page images at all? No. So the obvious question is: where, if anywhere, do the two efforts (this patch, and Fujii's patch) overlap? Does Fujii have any concerns about this patch as it relates to his effort to compress FPIs? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On February 4, 2014 10:50:10 PM CET, Peter Geoghegan p...@heroku.com wrote: On Tue, Feb 4, 2014 at 11:11 AM, Andres Freund and...@2ndquadrant.com wrote: Does this feature relate to compression of WAL page images at all? No. So the obvious question is: where, if anywhere, do the two efforts (this patch, and Fujii's patch) overlap? Does Fujii have any concerns about this patch as it relates to his effort to compress FPIs? I think there's zero overlap. They're completely complimentary features. It's not like normal WAL records have an irrelevant volume. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 1:58 PM, Andres Freund and...@2ndquadrant.com wrote: I think there's zero overlap. They're completely complimentary features. It's not like normal WAL records have an irrelevant volume. I'd have thought so too, but I would not like to assume. Like many people commenting on this thread, I don't know very much about compression. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/03/2014 07:27 AM, Andres Freund wrote: On 2014-02-03 09:22:52 -0600, Merlin Moncure wrote: I lost my stomach (or maybe it was the glass of red) somewhere in the middle, but I think this needs a lot of work. Especially the io code doesn't seem ready to me. I'd consider ripping out the send/recv code for 9.4, that seems the biggest can of worms. It will still be usable without. Not having type send/recv functions is somewhat dangerous; it can cause problems for libraries that run everything through the binary wire format. I'd give jsonb a pass on that, being a new type, but would be concerned if hstore had that ability revoked. Yea, removing it for hstore would be a compat problem... offhand note: hstore_send seems pretty simply written and clean; it's a simple nonrecursive iterator... But a send function is pretty pointless without the corresponding recv function... And imo recv simply is to dangerous as it's currently written. I am not saying that it cannot be made work, just that it's still nearly as ugly as when I pointed out several of the dangers some weeks back. Oleg, Teodor, any comments on the above? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] integrate pg_upgrade analyze_new_cluster.sh into vacuumdb
On Tue, Jan 21, 2014 at 9:06 AM, Oskari Saarenmaa o...@ohmu.fi wrote: 09.01.2014 05:15, Peter Eisentraut kirjoitti: pg_upgrade creates a script analyze_new_cluster.{sh|bat} that runs vacuumdb --analyze-only in three stages with different statistics target settings to get a fresh cluster analyzed faster. I think this behavior is also useful for clusters or databases freshly created by pg_restore or any other loading mechanism, so it's suboptimal to have this constrained to pg_upgrade. I think the three stage analyze is a wrong solution to the slow analyze problem. It certainly is not the best possible solution. But it might be the best one that can be arrived at within a reasonable amount of time. If we really want to micromanage the process we could put a lot of work into it. Take the case of a small table of about 300 pages. We read the table 3 times (separated by enough time that it is probably no longer cached), first keeping one tuple per page, then 10, then 100. Instead it should probably just jump directly to sampling at statistics target of 100 and then forget those small tables, at least if all we are concerned about is IO costs. (Since the selected tuples are sorted once for each column, there might be a CPU reason to take a small sample at the first pass, if we are more concerned with CPU than IO.) But I do wonder what experience people have with the 3 stage process, how useful is it empirically? If you can't open the database for general use until the 3rd phase is done, then you would just jump to doing that stage, rather than working through all 3 of them. If you can open the database and muddle through without statistics for a while, why not muddle through for the little bit longer that it would take to collect the full set right off the bat, rather than making intermediate passes? So I agree that the current system in not optimal. But this patch is just moving existing behavior from a less general location to a more general one, so I don't think it should be held hostage to improvements that could theoretically be made but which no one has offered to do. I wouldn't want to put in a change that forces users to learn something new for 9.4 only to have it completely redone in 9.5 and then make them learn that. But the documentation and training burden of this change seems small enough that I wouldn't worry about that. (On the other hand, the benefit of the change also seems pretty small.) In my experience most of the analyze time goes to reading random blocks from the disk but we usually use only a small portion of that data (1 row per block.) If we were able to better utilize the data we read we could get good statistics with a lot less IO than we currently need. This was discussed in length at http://www.postgresql.org/message-id/CAM-w4HOjRbNPMW= shjhw_qfapcuu5ege1tmdr0zqu+kqx8q...@mail.gmail.com but it hasn't turned into patches so far. I don't think it is an accident that it hasn't turned into patches. You can't change the laws of statistics just by wanting it badly enough. Our current sampling method is already insufficiently random. We aren't going to fix things by making it even less random. But I guess that that is an empirical question again, have most statistics problems been due to the sample being insufficiently large, or insufficiently random? (Or insufficiently recent?) There could be a different stage-by-stage approach where default_statistics_target is fixed but in the first pass you just take the first 30,000 rows in each table, and then in second pass you take a random 30,000 rows. But the knobs to do that currently do not exist, and I doubt they would be welcomed if their only use is to support pg_upgrade. So that idea is not a blocker to this patch, either. Cheers, Jeff
[HACKERS] Minor performance improvement in transition to external sort
The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). -- Cheers, Jeremy diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 8b520c1..2ea7b2d 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -1211,7 +1211,8 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple) (void) grow_memtuples(state); Assert(state-memtupcount state-memtupsize); } - state-memtuples[state-memtupcount++] = *tuple; + state-memtuples[state-memtupcount] = *tuple; + state-memtuples[state-memtupcount++].tupindex = 0; /* * Check if it's time to switch over to a bounded heapsort. We do @@ -1884,23 +1885,47 @@ inittapes(Tuplesortstate *state) state-tp_tapenum = (int *) palloc0(maxTapes * sizeof(int)); /* -* Convert the unsorted contents of memtuples[] into a heap. Each tuple is -* marked as belonging to run number zero. -* -* NOTE: we pass false for checkIndex since there's no point in comparing -* indexes in this step, even though we do intend the indexes to be part -* of the sort key... +* Convert the unsorted contents of memtuples[] into a heap. Each tuple was +* marked as belonging to run number zero on input; we don't compare that. */ ntuples = state-memtupcount; - state-memtupcount = 0; /* make the heap empty */ - for (j = 0; j ntuples; j++) { - /* Must copy source tuple to avoid possible overwrite */ - SortTuple stup = state-memtuples[j]; + SortTuple * m = state-memtuples; - tuplesort_heap_insert(state, stup, 0, false); + for (j = (ntuples-2)/2; /* comb the array from the last heap parent */ +j = 0;/* to the start */ +j--) + { + int root = j; + int child, swap = 0; + SortTuple ptup = m[root]; + + while ((child = root*2 + 1) = ntuples-1) + { /* root has at least one child. Check left-child */ + if (COMPARETUP(state, ptup, m[child]) 0) + { /* [root] [lchild] */ + if ( ++child = ntuples-1 /* check right-child */ + COMPARETUP(state, ptup, m[child]) 0) + swap = child; + else + break; /* [root] smallest */ + } + else + { + swap = child; + if ( ++child = ntuples-1 /* check right-child */ + COMPARETUP(state, m[swap], m[child]) 0) + swap = child; + } + /* [swap] is smallest of three; move into the parent slot */ + m[root] = m[swap]; + root = swap;/* and repeat with the child subtree */ + } + if (swap) + m[swap] = ptup; + /* This and all heap nodes after are now well-positioned */ + } } - Assert(state-memtupcount == ntuples); state-currentRun = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Robert Haas robertmh...@gmail.com writes: You know, I would really prefer to just stick a PGDLLIMPORT on this place and any others that need it, and any others that come up, than turn this into a political football. Having to sprinkle PGDLLIMPORT on the handful of variables that are accessed by contrib modules is, indeed, annoying. I'm not actually trying to turn this into a political football. What I want is a solution that we can trust, ie, that will allow us to ship Windows code that's not broken. We have failed to do so for at least the past year, and not even known it. I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. The question therefore becomes, what are we going to do instead? Keep on doing what we were doing doesn't strike me as an acceptable answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
On 2014-02-04 13:42:51 -0800, Peter Geoghegan wrote: On Tue, Feb 4, 2014 at 12:30 PM, Christian Kruse christ...@2ndquadrant.com wrote: Ok, benchmark for baseline+alignment patch is running. I see that you have enabled latency information. For this kind of thing I prefer to hack pgbench-tools to not collect this (i.e. to not pass the -l flag, Per-Transaction Logging). Just remove it and pgbench-tools rolls with it. It may well be that the overhead added is completely insignificant, but for something like this, where the latency information is unlikely to add any value, I prefer to not take the chance. This is a fairly minor point, however, especially since these are only 60 second runs where you're unlikely to accumulate enough transaction latency information to notice any effect. Hm, I don't find that convincing. If you look at the results from the last run the latency information is actually quite interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Andrew Dunstan and...@dunslane.net writes: On 02/04/2014 03:08 PM, Jeff Janes wrote: Having an AMI would help, but even with an AMI in place, MinGW is still insanely slow. Running make on already made PostgreSQL (so there was nothing to actually do) takes 1.5 minutes. And a make after a make clean takes half an hour. This is on an actual desktop, not an AWS micro instance. So doing a git bisect is just painful. Is the MSVC build faster? Would have to check with the same build options (cassert and debug have major timing effects.) I agree it's not lightning fast like make -j 4 on a decent linux box. I wonder if ccache exists for Mingw. That thing makes a huge difference in the perceived build speed ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD
On Tue, Feb 4, 2014 at 4:21 AM, Andres Freund and...@2ndquadrant.com wrote: Which imo means fixing this got more important... I strongly agree. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 05:47 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 02/04/2014 03:08 PM, Jeff Janes wrote: Having an AMI would help, but even with an AMI in place, MinGW is still insanely slow. Running make on already made PostgreSQL (so there was nothing to actually do) takes 1.5 minutes. And a make after a make clean takes half an hour. This is on an actual desktop, not an AWS micro instance. So doing a git bisect is just painful. Is the MSVC build faster? Would have to check with the same build options (cassert and debug have major timing effects.) I agree it's not lightning fast like make -j 4 on a decent linux box. I wonder if ccache exists for Mingw. That thing makes a huge difference in the perceived build speed ... Indeed. But it's not really, AFAIK. Certainly it's not in the list of packages known to mingw-get on the machine i checked on (jacana). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Wed, Feb 5, 2014 at 5:26 AM, Peter Eisentraut pete...@gmx.net wrote: Perhaps this type should be called pglsn, since it's an implementation-specific detail and not a universal concept like int, point, or uuid. It makes sense. I'll update the patches according to that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas robertmh...@gmail.com wrote: I've also had some further thoughts about the right way to drive vacuum scheduling. I think what we need to do is tightly couple the rate at which we're willing to do vacuuming to the rate at which we're incurring vacuum debt. That is, if we're creating 100kB/s of pages needing vacuum, we vacuum at 2-3MB/s (with default settings). If we can tolerate 2-3MB/s without adverse impact on other work, then we can tolerate it. Do we gain anything substantial by sand-bagging it? If we're creating 10MB/s of pages needing vacuum, we *still* vacuum at 2-3MB/s. Not shockingly, vacuum gets behind, the database bloats, and everything goes to heck. (Your reference to bloat made be me think your comments here are about vacuuming in general, not specific to IOS. If that isn't the case, then please ignore.) If we can only vacuum at 2-3MB/s without adversely impacting other activity, but we are creating 10MB/s of future vacuum need, then there are basically two possibilities I can think of. Either the 10MB/s represents a spike, and vacuum should tolerate it and hope to catch up on the debt later. Or it represents a new permanent condition, in which case I bought too few hard drives for the work load, and no scheduling decision that autovacuum can make will save me from my folly. Perhaps there is some middle ground between those possibilities, but I don't see room for much middle ground. I guess there might be entirely different possibilities not between those two; for example, I don't realize I'm doing something that is generating 10MB/s of vacuum debt, and would like to have this thing I'm doing be automatically throttled to the point it doesn't interfere with other processes (either directly, or indirectly by bloat) The rate of vacuuming needs to be tied somehow to the rate at which we're creating stuff that needs to be vacuumed. Right now we don't even have a way to measure that, let alone auto-regulate the aggressiveness of autovacuum on that basis. There is the formula used to decide when a table gets vacuumed. Isn't the time delta in this formula a measure of how fast we are creating stuff that needs to be vacuumed for bloat reasons? Is your objection that it doesn't include other reasons we might want to vacuum, or that it just doesn't work very well, or that is not explicitly exposed? Similarly, for marking of pages as all-visible, we currently make the same decision whether the relation is getting index-scanned (in which case the failure to mark those pages all-visible may be suppressing the use of index scans or making them less effective) or whether it's not being accessed at all (in which case vacuuming it won't help anything, and might hurt by pushing other pages out of cache). If it is not getting accessed at all because the database is not very active right now, that would be the perfect time to vacuum it. Between I can accurately project current patterns of (in)activity into the future and People don't build large tables just to ignore them forever, I think the latter is more likely to be true. If the system is busy but this particular table is not, then that would be a better reason to de-prioritise vacuuming that table. But can this degree of reasoning really be implemented in a practical way? In core? Again, if we had better statistics, we could measure this - counting heap fetches for actual index-only scans plus heap fetches for index scans that might have been planned index-only scans but for the relation having too few all-visible pages doesn't sound like an impossible metric to gather. My experience has been that if too few pages are all visible, it generally switches to a seq scan, not an index scan of a different index. But many things that are semantically possible to be index-only-scans would never be planned that way even if allvisible were 100%, so I think it would have to do two planning passes, one with the real allvisible, and a hypothetical one with allvisible set to 100%. And then there is the possibility that, while a high allvisible would be useful, the table is so active that no amount of vacuuming could ever keep it high. Cheers, Jeff
Re: [HACKERS] Re: Misaligned BufferDescriptors causing major performance problems on AMD
On Mon, Feb 3, 2014 at 3:38 PM, Andres Freund and...@2ndquadrant.com wrote: A quick hack (attached) making BufferDescriptor 64byte aligned indeed restored performance across all max_connections settings. It's not surprising that a misaligned buffer descriptor causes problems - there'll be plenty of false sharing of the spinlocks otherwise. Curious that the the intel machine isn't hurt much by this. What fiddling are you thinking of? Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before returning from ShmemAlloc() (and thereby ShmemInitStruct). There is something you have not drawn explicit attention to that is very interesting. If we take REL9_3_STABLE tip to be representative (built with full -O2 optimization, no assertions just debugging symbols), setting max_connections to 91 from 90 does not have the effect of making the BufferDescriptors array aligned; it has the effect of making it *misaligned*. You reported that 91 was much better than 90. I think that the problem actually occurs when the array *is* aligned! I suspect that the scenario described in this article accounts for the quite noticeable effect reported: http://danluu.com/3c-conflict -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On Wed, Feb 5, 2014 at 7:22 AM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). It looks interesting but it is too late to have that in 9.4... You should definitely add this patch to the next commit fest so as it is not lost in the void: https://commitfest.postgresql.org/action/commitfest_view?id=22 Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Wed, Feb 5, 2014 at 9:38 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Feb 5, 2014 at 8:59 AM, Michael Paquier michael.paqu...@gmail.com wrote: I'll update the patches according to that. Here are the updated patches with the following changes (according to previous comments): - Datatype is renamed to pglsn, documentation, file names, regressions and APIs are updated as well. - The DatumGet* and *GetDatum APIs are renamed with PGLSN (Should be PgLsn? But that's a detail) - pg_create_physical_replication_slot uses PGLSNGetDatum for its 6th argument For pageinspect, only page_header is impacted and I think that this should be a separated patch as it makes necessary to dump it to 1.2. I can write it later once the core parts are decided. I just forgot to mention that the 2nd patch does not use context diffs but git diffs because of filterdiff not able to catch all the new content of slotfuncs.c. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
What - if anything - do I need to do to get this on the commitfest list for the next commitfest? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/02/04 20:56), Robert Haas wrote: On Sun, Feb 2, 2014 at 10:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Allowing ALTER COLUMN SET STORAGE on foreign tables would make sense if for example, SELECT * INTO local_table FROM foreign_table did create a new local table of columns having the storage types associated with those of a foreign table? Seems like a pretty weak argument. It's not that we can't find strange corner cases where applying SET STORAGE to a foreign table doesn't do something; it's that they *are* strange corner cases. The options as we normally don't understand them just aren't sensible in this context, and a good deal of work has been put into an alternative options framework, which is what authors of FDWs ought to be using. I just wanted to discuss the possiblity of allowing SET STORAGE on a foreign table, but I've got the point. I'll resume the patch review. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Hello, At Tue, 04 Feb 2014 02:07:08 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 3176.1391497...@sss.pgh.pa.us One good reason not to trust this too much is that getaddrinfo() is fundamentally a userspace DNS access function, and as such it has no very good way to know if there's currently an IPv4 or IPv6 interface configured on the local system. At minimum there are obvious race conditions in that. A case which would be more common is ::1 in /etc/hosts. I had following error with this patch for such a case. | LOG: could not bind IPv4 socket: Address already in use | HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. getaddrinfo returned two same entries having the same address AF_INET 127.0.0.1:14357. One of them is for ::1 in hosts. This is worse than current behavior X-( At Tue, 04 Feb 2014 10:31:03 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 12552.1391527...@sss.pgh.pa.us Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: LOG: could not create IPv6 socket: Address family not supported by protocol That's merely a harmless log message. How about just adding a HINT? Hmm ... maybe, but how would you phrase the hint exactly? Putting the 'exactly' aside, is it something means 'You will get this message when the feature to handle the address family is disabled', only for EAFNOSUPPORT ? Though I don't know whether such a hint is helpful for those who tend to mind that kind of message. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not create IPv6 socket (AI_ADDRCONFIG)
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: getaddrinfo returned two same entries having the same address AF_INET 127.0.0.1:14357. One of them is for ::1 in hosts. This is worse than current behavior X-( Yeah, the fundamental issue is that getaddrinfo tends to return bogus info. How about just adding a HINT? Hmm ... maybe, but how would you phrase the hint exactly? Putting the 'exactly' aside, is it something means 'You will get this message when the feature to handle the address family is disabled', only for EAFNOSUPPORT ? Though I don't know whether such a hint is helpful for those who tend to mind that kind of message. I still think the best thing might be to reduce the individual messages to DEBUG-something, and only produce a LOG entry if we are unable to bind to *any* of the addresses returned by getaddrinfo. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/05/2014 04:08 AM, Jeff Janes wrote: So doing a git bisect is just painful. Is the MSVC build faster? Yes, but not on EC2. I've found Windows EC2 instances so impossibly slow I just gave up working with it. It took 1.5 hours to do a build and regression check with msvc on a Medium EC2 instance; the same build takes 10 mins on my tiny Intel i3 based Windows test machine. It's possible that some of the larger instance types may perform better as they use different approaches to virtualization than simple Xen HVM. I haven't tested, as the cost of those instances rapidly becomes problematic. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/05/2014 02:53 AM, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: On 02/04/2014 09:34 AM, Tom Lane wrote: My own opinion is that I've already wasted untold man-hours thanks to the random porting problems induced by Windows, a platform that I never have and never will care about personally. I will *not* spend my own time doing tests that someone else could do. If we can't get some effort contributed by someone who does use that platform, I'm personally prepared to declare the entire damn thing no longer supported. Although that is obviously your prerogative it is important to remember that Windows is easily the second most used version of PostgreSQL out there (behind Linux). [ shrug... ] If it's so widely used, why is it so hard to find somebody who's willing to put in some development effort for it? Sadly, I'm now convinced that Windows users are just much less likely to contribute anything constructive to a project - code, documentation, anything. It's a real gimme world, and has a really strong ethic that the vendor does things with their software, you don't just go and get involved. That said, I think the fuss being made about the intrusiveness of Windows support and its impact is overblown here. These are a few macros that're noops on other platforms anyway, and some build code hidden away in src/tools . It's ugly. It's annoying. It's crap that users don't contribute back. It's also just not that big a deal; there are many other things that are similarly painful or more so. Expecting folks to fire up an AMI and hand-control the build with a GUI over a high latency connection is a waste of time better spent elsewhere, though, and will result in everyone continuing to avoid any sort of testing on Windows. Personally what I think we need is a *public* Jenkins instance, or similar, to which you can push a branch and have it automatically build and make check on Windows. I've got that running for internal use, but it's on a host I can't share access to (and an unreliable one, at that). I'd be happy to share the setup for the Jenkins instance and the Windows integration parts, along with the instructions I wrote on how to set up the Windows build test node(s) and the tooling I'm using to automate the Windows build. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/04/2014 10:48 PM, Craig Ringer wrote: On 02/05/2014 04:08 AM, Jeff Janes wrote: So doing a git bisect is just painful. Is the MSVC build faster? Yes, but not on EC2. I've found Windows EC2 instances so impossibly slow I just gave up working with it. It took 1.5 hours to do a build and regression check with msvc on a Medium EC2 instance; the same build takes 10 mins on my tiny Intel i3 based Windows test machine. It's possible that some of the larger instance types may perform better as they use different approaches to virtualization than simple Xen HVM. I haven't tested, as the cost of those instances rapidly becomes problematic. I typically use m1.medium. A spot instance for that is currently $0.033 / hour. When I was working on one such the other day it took nothing like 1.5 hours to build and test. I didn't time it so I can't tell you how long it took, but much less than that. Of course YMMV. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. The question therefore becomes, what are we going to do instead? Keep on doing what we were doing doesn't strike me as an acceptable answer. I'm in complete agreement here. Silent failures we can't test for that might sneak data corruption in are not cool. I'll have a look into ways to making sure that globals with incorrect linkage fail at runtime link time, as is the case for functions. I won't be able to spend much time on it immediately; will take a quick look and if I don't find anything, will follow up post-CF4. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/05/2014 12:06 AM, Andrew Dunstan wrote: On 02/04/2014 10:43 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 02:10:47 -0500, Tom Lane wrote: Meh. It might be that the DateStyle usage in postgres_fdw would accidentally fail to malfunction if it saw a bogus value of the variable. But it's hard to believe that this would be true of MainLWLockArray. There's not that much lwlock usage in contrib. It's just pg_stat_statements and pg_buffercache. Neither has tests... So it very well could be that breakage simply hasn't been observed. Hm, you're right --- I'd have thought there were more of those. Ugh. This problem was bad enough when I thought that it would only lead to link-time errors detectable in the buildfarm. If it can lead to errors only observable at runtime --- and maybe not obvious even then --- then I think we *have to* do something about it. By that I mean that we must get rid of the need to manually plaster PGDLLIMPORT on global variables. Anybody with a Windows build environment want to test the #define extern trick? We have details on how to build with Mingw/Msys on Windows on an Amazon VM http://wiki.postgresql.org/wiki/Building_With_MinGW which is either free or very cheap. Do I need to give instructions on how to do this for MSVC builds too? It's really not terribly hard. I've got some guidance on that here: https://github.com/2ndQuadrant/pg_build_win from setting up a clean Windows instance for builds, on to the build process. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
On Wed, Feb 5, 2014 at 10:33 AM, Jon Nelson jnelson+pg...@jamponi.net wrote: What - if anything - do I need to do to get this on the commitfest list for the next commitfest? The list of instructions is here: http://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission Then the next commit fest (#1 for 9.5), will be in June and is here: https://commitfest.postgresql.org/action/commitfest_view?id=22 Note that you will need an account on postgresql.org to register your patch. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On Tue, Feb 04, 2014 at 03:28:45PM -0500, Robert Haas wrote: On Tue, Feb 4, 2014 at 1:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: Robert Haas robertmh...@gmail.com writes: I wonder if these standalone things are really worthwhile. I wonder how difficult it would be to make sufficient link data available when building the standalone files. There would be no linking per se; we would just need the referent's text fragment emitted where the xref tag appears. IIRC, that's basically what the workaround is, except it's not very automated. Even if it were automated, though, there's still a problem: such links aren't really *useful* in flat text format. I think that forcing the author to actually think about what to put there in the flat text version is a good thing, if we're going to retain the flat text version at all. Right. I mean, a lot of the links say things like Section 26.2 which obviously makes no sense in a standalone text file. For xrefs normally displayed that way, text output could emit a URL, either inline or in the form of a footnote. For link targets (e.g. SQL commands) having a friendly text fragment for xref sites, use the normal fragment. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL Failback without rebuild
Hello All, I have been reading through some of the recent discussions about failback when in a streaming replication setup. I define failback as: 1. Node A is master, Node B is slave 2. Node A crashes || Node A is stopped || nothing happens 3. Promote Node B to Master 4. Attach Node A as slave My understanding is currently to achieve step three you need to take a base backup of Node B and deploy it to Node A before starting streaming replication (or use rsync etc...). This is very undesirable for many users, especially if they have a very large database. From the discussions I can see that the problem is to do with Node A writing changes to disk that Node B are not streamed before Node A crashes. Has there been any consensus on this issue? Are there any solutions which might make it into 9.4 or 9.5? I've seen some proposals and a tool (pg_rewind), but all seem to have draw backs. I've been looking mainly at these threads: http://www.postgresql.org/message-id/CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com http://www.postgresql.org/message-id/519df910.4020...@vmware.com Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] narwhal and PGDLLIMPORT
Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tue, Feb 4, 2014 at 11:58 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com wrote: Now there is approximately 1.4~5% CPU gain for hundred tiny fields, half nulled case Assuming that the logic isn't buggy, a point in need of further study, I'm starting to feel like we want to have this. And I might even be tempted to remove the table-level off switch. I have tried to stress on worst case more, as you are thinking to remove table-level switch and found that even if we increase the data by approx. 8 times (ten long fields, all changed, each field contains 80 byte data), the CPU overhead is still 5% which clearly shows that the overhead doesn't increase much even if the length of unmatched data is increased by much larger factor. So the data for worst case adds more weight to your statement (remove table-level switch), however there is no harm in keeping table-level option with default as 'true' and if some users are really sure the updates in their system will have nothing in common, then they can make this new option as 'false'. Below is data for the new case ten long fields, all changed added in attached script file: Unpatched testname | wal_generated | duration --+---+-- ten long fields, all changed |3473999520 | 45.0375978946686 ten long fields, all changed |3473999864 | 45.2536928653717 ten long fields, all changed |3474006880 | 45.1887288093567 After pgrb_delta_encoding_v8.patch -- testname | wal_generated | duration --+---+-- ten long fields, all changed |3474006456 | 47.5744359493256 ten long fields, all changed |3474000136 | 47.3830440044403 ten long fields, all changed |3474002688 | 46.9923310279846 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com wal-update-testsuite.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Failback without rebuild
On Wed, Feb 5, 2014 at 10:30 AM, James Sewell james.sew...@lisasoft.com wrote: Hello All, I have been reading through some of the recent discussions about failback when in a streaming replication setup. I define failback as: Node A is master, Node B is slave Node A crashes || Node A is stopped || nothing happens Promote Node B to Master Attach Node A as slave My understanding is currently to achieve step three you need to take a base backup of Node B and deploy it to Node A before starting streaming replication (or use rsync etc...). I think in above sentence you mean to say to achieve step *four* .. This is very undesirable for many users, especially if they have a very large database. From the discussions I can see that the problem is to do with Node A writing changes to disk that Node B are not streamed before Node A crashes. Yes, this is right. Has there been any consensus on this issue? Are there any solutions which might make it into 9.4 or 9.5? As far as I know, there is still no solution provided in 9.4, can't say anything for 9.5 with any certainity. However in 9.4, there is a new parameter wal_log_hints which can be useful to overcome drawback of pg_rewind. I've seen some proposals and a tool (pg_rewind), but all seem to have draw backs. As far as I remember, one of the main drawbacks for pg_rewind was related to hint bits which can be avoided by wal_log_hints. pg_rewind is not part of core PostgreSQL code, however if you wish, you can try that tool to see if can it solve your purpose. Note - James, in previous reply, I missed to cc to hackers, so sending it again. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] jsonb and nested hstore
On 02/03/2014 05:22 PM, Merlin Moncure wrote: I lost my stomach (or maybe it was the glass of red) somewhere in the middle, but I think this needs a lot of work. Especially the io code doesn't seem ready to me. I'd consider ripping out the send/recv code for 9.4, that seems the biggest can of worms. It will still be usable without. Not having type send/recv functions is somewhat dangerous; it can cause problems for libraries that run everything through the binary wire format. I'd give jsonb a pass on that, being a new type, but would be concerned if hstore had that ability revoked. send/recv functions are also needed for binary-format COPY. IMHO jsonb must have send/recv functions. All other built-in types have them, except for types like 'smgr', 'aclitem' and 'any*' that no-one should be using as column types. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Failback without rebuild
On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Feb 5, 2014 at 10:30 AM, James Sewell james.sew...@lisasoft.com I've seen some proposals and a tool (pg_rewind), but all seem to have draw backs. As far as I remember, one of the main drawbacks for pg_rewind was related to hint bits which can be avoided by wal_log_hints. pg_rewind is not part of core PostgreSQL code, however if you wish, you can try that tool to see if can it solve your purpose. For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4, yes wal_log_hints or checksums is mandatory. The code contains as well some safety checks as well to ensure that a node not using those parameters cannot be rewinded. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Andrew provided us more information and we'll work on recv. What people think about testing this stuff ? btw, we don't have any regression test on this. Oleg On Wed, Feb 5, 2014 at 2:03 AM, Josh Berkus j...@agliodbs.com wrote: On 02/03/2014 07:27 AM, Andres Freund wrote: On 2014-02-03 09:22:52 -0600, Merlin Moncure wrote: I lost my stomach (or maybe it was the glass of red) somewhere in the middle, but I think this needs a lot of work. Especially the io code doesn't seem ready to me. I'd consider ripping out the send/recv code for 9.4, that seems the biggest can of worms. It will still be usable without. Not having type send/recv functions is somewhat dangerous; it can cause problems for libraries that run everything through the binary wire format. I'd give jsonb a pass on that, being a new type, but would be concerned if hstore had that ability revoked. Yea, removing it for hstore would be a compat problem... offhand note: hstore_send seems pretty simply written and clean; it's a simple nonrecursive iterator... But a send function is pretty pointless without the corresponding recv function... And imo recv simply is to dangerous as it's currently written. I am not saying that it cannot be made work, just that it's still nearly as ugly as when I pointed out several of the dangers some weeks back. Oleg, Teodor, any comments on the above? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On 02/04/2014 02:40 AM, Peter Geoghegan wrote: On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I refactored the loop in _bt_moveright to, well, not have that bug anymore. The 'page' and 'opaque' pointers are now fetched at the beginning of the loop. Did I miss something? I think so, yes. You still aren't assigning the value returned by _bt_getbuf() to 'buf'. D'oh, you're right. Since, as I mentioned, _bt_finish_split() ultimately unlocks *and unpins*, it may not be the same buffer as before, so even with the refactoring there are race conditions. Care to elaborate? Or are you just referring to the missing buf = ? A closely related issue is that you haven't mentioned anything about buffer pins/refcount side effects in comments above _bt_finish_split(), even though I believe you should. Ok. A minor stylistic concern is that I think it would be better to only have one pair of _bt_finish_split()/_bt_getbuf() calls regardless of the initial value of 'access'. Ok. I also changed _bt_moveright to never return a write-locked buffer, when the caller asked for a read-lock (an issue you pointed out earlier in this thread). Attached is a new version of the patch, with those issues fixed. btree-incomplete-split-4.patch is a complete patch against the latest fix-btree-page-deletion patch, and moveright-assign-fix.patch is just the changes to _bt_moveright, if you want to review just the changes since the previous patch I posted. - Heikki diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README index 03efc29..43ee75f 100644 --- a/src/backend/access/nbtree/README +++ b/src/backend/access/nbtree/README @@ -404,12 +404,34 @@ an additional insertion above that, etc). For a root split, the followon WAL entry is a new root entry rather than an insertion entry, but details are otherwise much the same. -Because insertion involves multiple atomic actions, the WAL replay logic -has to detect the case where a page split isn't followed by a matching -insertion on the parent level, and then do that insertion on its own (and -recursively for any subsequent parent insertion, of course). This is -feasible because the WAL entry for the split contains enough info to know -what must be inserted in the parent level. +Because splitting involves multiple atomic actions, it's possible that the +system crashes between splitting a page and inserting the downlink for the +new half to the parent. After recovery, the downlink for the new page will +be missing. The search algorithm works correctly, as the page will be found +by following the right-link from its left sibling, although if a lot of +downlinks in the tree are missing, performance will suffer. A more serious +consequence is that if the page without a downlink gets split again, the +insertion algorithm will fail to find the location in the parent level to +insert the downlink. + +Our approach is to create any missing downlinks on-they-fly, when +searching the tree for a new insertion. It could be done during searches, +too, but it seems best not to put any extra updates in what would otherwise +be a read-only operation (updating is not possible in hot standby mode +anyway). To identify missing downlinks, when a page is split, the left page +is flagged to indicate that the split is not yet complete (INCOMPLETE_SPLIT). +When the downlink is inserted to the parent, the flag is cleared atomically +with the insertion. The child page is kept locked until the insertion in the +parent is finished and the flag in the child cleared, but can be released +immediately after that, before recursing up the tree, if the parent also +needs to be split. This ensures that incompletely split pages should not be +seen under normal circumstances; only when insertion to the parent fails +for some reason. + +We flag the left page, even though it's the right page that's missing the +downlink, beacuse it's more convenient to know already when following the +right-link from the left page to the right page that it will need to have +its downlink inserted to the parent. When splitting a non-root page that is alone on its level, the required metapage update (of the fast root link) is performed and logged as part @@ -422,6 +444,14 @@ page is a second record. If vacuum is interrupted for some reason, or the system crashes, the tree is consistent for searches and insertions. The next VACUUM will find the half-dead leaf page and continue the deletion. +Before 9.4, we used to keep track of incomplete splits and page deletions +during recovery and finish them immediately at end of recovery, instead of +doing it lazily at the next insertion or vacuum. However, that made the +recovery much more complicated, and only fixed the problem when crash +recovery was performed. An incomplete split can also occur if an otherwise +recoverable error, like out-of-memory or out-of-disk-space, happens while +inserting the downlink to