Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On 27-01-2015 AM 05:46, Jim Nasby wrote: > On 1/25/15 7:42 PM, Amit Langote wrote: >> On 21-01-2015 PM 07:26, Amit Langote wrote: >>> Ok, I will limit myself to focusing on following things at the moment: >>> >>> * Provide syntax in CREATE TABLE to declare partition key >> >> While working on this, I stumbled upon the question of how we deal with >> any index definitions following from constraints defined in a CREATE >> statement. I think we do not want to have a physical index created for a >> table that is partitioned (in other words, has no heap of itself). As >> the current mechanisms dictate, constraints like PRIMARY KEY, UNIQUE, >> EXCLUSION CONSTRAINT are enforced as indexes. It seems there are really >> two decisions to make here: >> >> 1) how do we deal with any index definitions (either explicit or >> implicit following from constraints defined on it) - do we allow them by >> marking them specially, say, in pg_index, as being mere >> placeholders/templates or invent some other mechanism? >> >> 2) As a short-term solution, do we simply reject creating any indexes >> (/any constraints that require them) on a table whose definition also >> includes PARTITION ON clause? Instead define them on its partitions (or >> any relations in hierarchy that are not further partitioned). >> >> Or maybe I'm missing something... > > Wasn't the idea that the parent table in a partitioned table wouldn't > actually have a heap of it's own? If there's no heap there can't be an > index. > Yes, that's right. Perhaps, we should look at heap-less partitioned relation thingy not so soon as you say below. > That said, I think this is premature optimization that could be done later. It seems so. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Safe memory allocation functions
On Sat, Jan 17, 2015 at 11:06 PM, Robert Haas wrote: > On Fri, Jan 16, 2015 at 10:56 AM, Alvaro Herrera > wrote: >> So how about something like >> >> #define ALLOCFLAG_HUGE 0x01 >> #define ALLOCFLAG_NO_ERROR_ON_OOM 0x02 >> void * >> MemoryContextAllocFlags(MemoryContext context, Size size, int flags); > > That sounds good, although personally I'd rather have the name be > something like MemoryContextAllocExtended; we have precedent for using > "Extended" for this sort of thing elsewhere. Also, I'd suggest trying > to keep the flag name short, e.g. ALLOC_HUGE and ALLOC_NO_OOM (or > ALLOC_SOFT_FAIL?). Yes, I think that this name makes more sense (LockAcquire[Extended], RangeVarGetRelid[Extended]), as well as minimizing shorter name for the flags. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 2015-01-27 08:21:57 +0100, Andreas Karlsson wrote: > On 01/23/2015 02:58 AM, Petr Jelinek wrote: > >On 23/01/15 00:40, Andreas Karlsson wrote: > >>- Renamed some things from int12 to int128, there are still some places > >>with int16 which I am not sure what to do with. > > > >I'd vote for renaming them to int128 too, there is enough C functions > >that user int16 for 16bit integer that this is going to be confusing > >otherwise. > > Do you also think the SQL functions should be named numeric_int128_sum, > numeric_int128_avg, etc? I'm pretty sure we already decided upthread that the SQL interface is going to keep usint int4/8 and by extension int16. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
On 2015-01-27 16:23:53 +0900, Michael Paquier wrote: > On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund wrote: > > Unfortunately that Assert()s when there's a lock conflict because > > e.g. another backend is currently connecting. That's because ProcSleep() > > does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and > > there's no deadlock timeout (or lock timeout) handler registered. > Yes, that could logically happen if there is a lock conflicting as > RowExclusiveLock or lower lock can be taken in recovery. I don't this specific lock (it's a object, not relation lock) can easily be taken directly by a user except during authentication. > > [...] > > afaics, that should work? Not pretty, but probably easier than starting > > to reason about the deadlock detector in the startup process. > Wouldn't it be cleaner to simply register a dedicated handler in > StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as > it is reserved for backend operations? For back-branches, we may even > consider using DEADLOCK_TIMEOUT.. What would that timeout handler actually do? Two problems: a) We really don't want the startup process be killed/error out as that likely means a postmaster restart. So the default deadlock detector strategy is something that's really useless for us. b) Pretty much all other (if not actually all other) heavyweight lock acquisitions in the startup process acquire locks using dontWait = true - which means that the deadlock detector isn't actually run. That's essentially fine because we simply kill everything in our way. C.f. StandbyAcquireAccessExclusiveLock() et al. There's a dedicated 'deadlock detector' like infrastructure around ResolveRecoveryConflictWithBufferPin(), but it deals with a class of deadlocks that's not handled in the deadlock detector anyway. I think this isn't particularly pretty, but it seems to be working well enough, and changing it would be pretty invasive. So keeping in line with all that code seems to be easier. > > We probably should also add a Assert(!InRecovery || sessionLock) to > > LockAcquireExtended() - these kind of problems are otherwise hard to > > find in a developer setting. > So this means that locks other than session ones cannot be taken while > a node is in recovery, but RowExclusiveLock can be taken while in > recovery. Don't we have a problem with this assertion then? Note that InRecovery doesn't mean what you probably think it means: /* * Are we doing recovery from XLOG? * * This is only ever true in the startup process; it should be read as meaning * "this process is replaying WAL records", rather than "the system is in * recovery mode". It should be examined primarily by functions that need * to act differently when called from a WAL redo function (e.g., to skip WAL * logging). To check whether the system is in recovery regardless of which * process you're running in, use RecoveryInProgress() but only after shared * memory startup and lock initialization. */ boolInRecovery = false; The assertion actually should be even stricter: Assert(InRecovery || (sessionLock && dontWait)); i.e. we never acquire a heavyweight lock in the startup process unless it's a session lock (as we don't have resource managers/a xact to track locks) and we don't wait (as we don't have the deadlock detector infrastructure set up). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 01/27/2015 09:05 AM, Andres Freund wrote: On 2015-01-27 08:21:57 +0100, Andreas Karlsson wrote: On 01/23/2015 02:58 AM, Petr Jelinek wrote: On 23/01/15 00:40, Andreas Karlsson wrote: - Renamed some things from int12 to int128, there are still some places with int16 which I am not sure what to do with. I'd vote for renaming them to int128 too, there is enough C functions that user int16 for 16bit integer that this is going to be confusing otherwise. Do you also think the SQL functions should be named numeric_int128_sum, numeric_int128_avg, etc? I'm pretty sure we already decided upthread that the SQL interface is going to keep usint int4/8 and by extension int16. Excellent, then I will leave my latest patch as-is and let the reviewer say what he thinks. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Safe memory allocation functions
Alvaro Herrera wrote: >> So how about something like >> >> #define ALLOCFLAG_HUGE 0x01 >> #define ALLOCFLAG_NO_ERROR_ON_OOM 0x02 >> void * >> MemoryContextAllocFlags(MemoryContext context, Size size, int flags); The flag for huge allocations may be useful, but I don't actually see much value in the flag ALLOC_NO_OOM if the stuff in aset.c returns unconditionally NULL in case of an OOM and we let palloc complain about an OOM when allocation returns NULL. Something I am missing perhaps? >> I definitely do not want to push the nofail stuff via the >> MemoryContextData-> API into aset.c. Imo aset.c should always return >> NULL and then mcxt.c should throw the error if in the normal palloc() >> function. > > Sure, that seems reasonable ... Yes, this would simplify the footprint of this patch to aset.c to a minimum by changing the ereport to NULL in a couple of places. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Safe memory allocation functions
On 2015-01-27 17:27:53 +0900, Michael Paquier wrote: > Alvaro Herrera wrote: > >> So how about something like > >> > >> #define ALLOCFLAG_HUGE 0x01 > >> #define ALLOCFLAG_NO_ERROR_ON_OOM 0x02 > >> void * > >> MemoryContextAllocFlags(MemoryContext context, Size size, int flags); > The flag for huge allocations may be useful, but I don't actually see > much value in the flag ALLOC_NO_OOM if the stuff in aset.c returns > unconditionally NULL in case of an OOM and we let palloc complain > about an OOM when allocation returns NULL. Something I am missing > perhaps? I guess the idea is to have *user facing* MemoryContextAllocExtended() that can do both huge and no-oom allocations. Otherwise we need palloc like wrappers for all combinations. We're certainly not just going to ignore memory allocation failures generally in in MemoryContextAllocExtended() Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
At 2015-01-26 17:45:52 -0500, robertmh...@gmail.com wrote: > > > Based on the recent emails, it appears there's been a shift of > > preference to having it be in-core […] > > Well, I'm not sure that anyone else here agreed with me on that Sure, an in-core AUDIT command would be great. Stephen has always said that would be the most preferable solution; and if we had the code to implement it, I doubt anyone would prefer the contrib module instead. But we don't have that code now, and we won't have it in time for 9.5. We had an opportunity to work on pgaudit in its current form, and I like to think that the result is useful. To me, the question has always been whether some variant of that code would be acceptable for 9.5's contrib. If so, I had some time to work on that. If not… well, hard luck. But the option to implement AUDIT was not available to me, which is why I have not commented much on it recently. > The basic dynamic here seems to be you asking for changes and Abhijit > making them but without any real confidence, and I don't feel good > about that. I understand how I might have given you that impression, but I didn't mean to, and I don't really feel that way. I appreciate Stephen's suggestions and, although it took me some time to understand them fully, I think the use of GRANT to provide finer-grained auditing configuration has improved pgaudit. I am slightly concerned by the resulting complexity, but I think that can be addressed by examples and so on. I wouldn't be unhappy if this code were to go into contrib. (I should point out that it is also not the case that I do not hold any opinions and would be happy with anything pgaudit-shaped being included. For example, I strongly prefer GRANT to the 'alice:*:*' approach.) Anyway, I think it's reasonably clear now that pgaudit is unlikely to make it into 9.5 in any form, so I'll find something else to do. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Hello, thank you for the comment. > Looking at this a bit, I'm not sure completely replacing RoleId with a > node is the best idea; some of the users of that production in the > grammar are okay with accepting only normal strings as names, and don't > need all the CURRENT_USER etc stuff. You're right. the productions don't need RoleSpec already uses char* for the role member in its *Stmt structs. I might have done a bit too much. Adding new nonterminal RoleId and using it makes a bit duplicate of check code for "public"/"none" and others but removes other redundant check for "!= CSTRING" from some production, CREATE ROLE/USER/GROUP, ALTER ROLE/USER/GROUP RENAME. RoleId in the patch still has rule components for CURRENT_USER, SESSION_USER, and CURRENT_ROLE. Without them, the parser prints an error ununderstandable to users. | =# alter role current_user rename to "PuBlic"; | ERROR: syntax error at or near "rename" | LINE 1: alter role current_user rename to "PuBlic"; | ^ With the components, the error message becomes like this. | =# alter role current_role rename to "PuBlic"; | ERROR: reserved word cannot be used | LINE 1: alter role current_role rename to "PuBlic"; |^ > Maybe we need a new production, > say RoleSpec, and we modify the few productions that need the extra > flexibility? For instance we could have ALTER USER RoleSpec instead of > ALTER USER RoleId. But we would keep CREATE USER RoleId, because it > doesn't make any sense to accept CREATE USER CURRENT_USER. > I think that would lead to a less invasive patch also. > Am I making sense? I think I did what you expected. It was good as the code got simpler but the invasive-ness dosn't seem to be reduced.. What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From d90b7e09968f32a5b543242469fb65e304df3318 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 14 Nov 2014 17:37:22 +0900 Subject: [PATCH] ALTER USER CURRENT_USER v4 Added RoleId for the use in where CURRENT_USER-like sutff is not used. CREATE ROLE/USER/GROUP, ALTER ROLE/USER/GROUP RENAME are changed to use RoleId. --- src/backend/catalog/aclchk.c | 30 +++-- src/backend/commands/alter.c | 2 +- src/backend/commands/extension.c | 2 +- src/backend/commands/foreigncmds.c | 57 - src/backend/commands/schemacmds.c | 30 - src/backend/commands/tablecmds.c | 4 +- src/backend/commands/tablespace.c | 2 +- src/backend/commands/user.c| 86 +++--- src/backend/nodes/copyfuncs.c | 37 -- src/backend/nodes/equalfuncs.c | 35 -- src/backend/parser/gram.y | 229 + src/backend/parser/parse_utilcmd.c | 4 +- src/backend/utils/adt/acl.c| 34 ++ src/include/commands/user.h| 2 +- src/include/nodes/nodes.h | 1 + src/include/nodes/parsenodes.h | 48 ++-- src/include/utils/acl.h| 2 +- 17 files changed, 400 insertions(+), 205 deletions(-) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 1e3888e..1c90626 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -430,13 +430,16 @@ ExecuteGrantStmt(GrantStmt *stmt) foreach(cell, stmt->grantees) { PrivGrantee *grantee = (PrivGrantee *) lfirst(cell); + Oid grantee_uid = ACL_ID_PUBLIC; - if (grantee->rolname == NULL) - istmt.grantees = lappend_oid(istmt.grantees, ACL_ID_PUBLIC); - else - istmt.grantees = -lappend_oid(istmt.grantees, - get_role_oid(grantee->rolname, false)); + /* "public" is mapped to ACL_ID_PUBLIC */ + if (grantee->role->roltype != ROLESPEC_PUBLIC) + { + grantee_uid = get_rolespec_oid(grantee->role, false); + if (!OidIsValid(grantee_uid)) +grantee_uid = ACL_ID_PUBLIC; + } + istmt.grantees = lappend_oid(istmt.grantees, grantee_uid); } /* @@ -913,13 +916,16 @@ ExecAlterDefaultPrivilegesStmt(AlterDefaultPrivilegesStmt *stmt) foreach(cell, action->grantees) { PrivGrantee *grantee = (PrivGrantee *) lfirst(cell); + Oid grantee_uid = ACL_ID_PUBLIC; - if (grantee->rolname == NULL) - iacls.grantees = lappend_oid(iacls.grantees, ACL_ID_PUBLIC); - else - iacls.grantees = -lappend_oid(iacls.grantees, - get_role_oid(grantee->rolname, false)); + /* "public" is mapped to ACL_ID_PUBLIC */ + if (grantee->role->roltype != ROLESPEC_PUBLIC) + { + grantee_uid = get_rolespec_oid(grantee->role, false); + if (!OidIsValid(grantee_uid)) +grantee_uid = ACL_ID_PUBLIC; + } + iacls.grantees = lappend_oid(iacls.grantees, grantee_uid); } /* diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 78b54b4..1d8799b 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -679,7 +679,7 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid) Oid ExecAlt
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
Hi Marco, On 16/01/15 16:55, Marco Nenciarini wrote: > On 14/01/15 17:22, Gabriele Bartolini wrote: > > > > My opinion, Marco, is that for version 5 of this patch, you: > > > > 1) update the information on the wiki (it is outdated - I know you have > > been busy with LSN map optimisation) > > Done. > > > 2) modify pg_basebackup in order to accept a directory (or tar file) and > > automatically detect the LSN from the backup profile > > New version of patch attached. The -I parameter now requires a backup > profile from a previous backup. I've added a sanity check that forbid > incremental file level backups if the base timeline is different from > the current one. > > > 3) add the documentation regarding the backup profile and pg_basebackup > > > > Next on my TODO list. > > > Once we have all of this, we can continue trying the patch. Some > > unexplored paths are: > > > > * tablespace usage > > I've improved my pg_restorebackup python PoC. It now supports tablespaces. About tablespaces, I noticed that any pointing to tablespace locations is lost during the recovery of an incremental backup changing the tablespace mapping (-T option). Here the steps I followed: - creating and filling a test database obtained through pgbench psql -c "CREATE DATABASE pgbench" pgbench -U postgres -i -s 5 -F 80 pgbench - a first base backup with pg_basebackup: mkdir -p backups/$(date '+%d%m%y%H%M')/data && pg_basebackup -v -F p -D backups/$(date '+%d%m%y%H%M')/data -x - creation of a new tablespace, alter the table "pgbench_accounts" to set the new tablespace: mkdir -p /home/gbroccolo/pgsql/tbls psql -c "CREATE TABLESPACE tbls LOCATION '/home/gbroccolo/pgsql/tbls'" psql -c "ALTER TABLE pgbench_accounts SET TABLESPACE tbls" pgbench - Doing some work on the database: pgbench -U postgres -T 120 pgbench - a second incremental backup with pg_basebackup specifying the new location for the tablespace through the tablespace mapping: mkdir -p backups/$(date '+%d%m%y%H%M')/data backups/$(date '+%d%m%y%H%M')/tbls && pg_basebackup -v -F p -D backups/$(date '+%d%m%y%H%M')/data -x -I backups/2601151641/data/backup_profile -T /home/gbroccolo/pgsql/tbls=/home/gbroccolo/pgsql/backups/$(date '+%d%m%y%H%M')/tbls - a recovery based on the tool pg_restorebackup.py attached in http://www.postgresql.org/message-id/54b9428e.9020...@2ndquadrant.it ./pg_restorebackup.py backups/2601151641/data backups/2601151707/data /tmp/data -T /home/gbroccolo/pgsql/backups/2601151707/tbls=/tmp/tbls In the last step, I obtained the following stack trace: Traceback (most recent call last): File "./pg_restorebackup.py", line 74, in shutil.copy2(base_file, dest_file) File "/home/gbroccolo/.pyenv/versions/2.7.5/lib/python2.7/shutil.py", line 130, in copy2 copyfile(src, dst) File "/home/gbroccolo/.pyenv/versions/2.7.5/lib/python2.7/shutil.py", line 82, in copyfile with open(src, 'rb') as fsrc: IOError: [Errno 2] No such file or directory: 'backups/2601151641/data/base/16384/16406_fsm' Any idea on what's going wrong? Thanks, Giuseppe. -- Giuseppe Broccolo - 2ndQuadrant Italy PostgreSQL Training, Services and Support giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it
Re: [HACKERS] [POC] FETCH limited by bytes.
Thank you for the comment. The automatic way to determin the fetch_size looks become too much for the purpose. An example of non-automatic way is a new foreign table option like 'fetch_size' but this exposes the inside too much... Which do you think is preferable? Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane wrote in <24503.1421943...@sss.pgh.pa.us> > Kyotaro HORIGUCHI writes: > > Hello, as the discuttion on async fetching on postgres_fdw, FETCH > > with data-size limitation would be useful to get memory usage > > stability of postgres_fdw. > > > Is such a feature and syntax could be allowed to be added? > > This seems like a lot of work, and frankly an incredibly ugly API, > for a benefit that is entirely hypothetical. Have you got numbers > showing any actual performance win for postgres_fdw? The API is a rush work to make the path for the new parameter (but, yes, I did too much for the purpose that use from postgres_fdw..) and it can be any saner syntax but it's not the time to do so yet. The data-size limitation, any size to limit, would give significant gain especially for small sized rows. This patch began from the fact that it runs about twice faster when fetch size = 1 than 100. http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyot...@lab.ntt.co.jp I took exec times to get 1M rows from localhost via postgres_fdw and it showed the following numbers. =# SELECT a from ft1; fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) (local)0.75s 10060 6.2s 6000 (0.006) 1 60 2.7s 60 (0.6 ) 3 60 2.2s180 (2.0 ) 6 60 2.4s360 (4.0 ) =# SELECT a, b, c from ft1; fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) (local)0.8s 100 204 12 s 20400 (0.02 ) 1000 204 10 s 204000 (0.2 ) 1 204 5.8s204 (2) 2 204 5.9s408 (4) =# SELECT a, b, d from ft1; fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) (local)0.8s 100 1356 17 s 135600 (0.136) 1000 1356 15 s1356000 (1.356) 1475 1356 13 s2000100 (2.0 ) 2950 1356 13 s4000200 (4.0 ) The definitions of the environment are the following. CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres'); CREATE USER MAPPING FOR PUBLIC SERVER sv1; CREATE TABLE lt1 (a int, b timestamp, c text, d text); CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 OPTIONS (table_name 'lt1'); INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM generate_series(0, 99) a); The "avg row size" is alloced_mem/fetch_size and the alloced_mem is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE + tup->t_len) for all stored tuples in the receiver side, fetch_more_data() in postgres_fdw. They are about 50% gain for the smaller tuple size and 25% for the larger. They looks to be optimal at where alloced_mem is around 2MB by the reason unknown to me. Anyway the difference seems to be significant. > Even if we wanted to do something like this, I strongly object to > measuring size by heap_compute_data_size. That's not a number that users > would normally have any direct knowledge of; nor does it have anything > at all to do with the claimed use-case, where what you'd really need to > measure is bytes transmitted down the wire. (The difference is not small: > for instance, toasted values would likely still be toasted at the point > where you're measuring.) Sure. Finally, the attached patch #1 which does the following things. - Sender limits the number of tuples using the sum of the net length of the column values to be sent, not including protocol overhead. It is calculated in the added function slot_compute_attr_size(), using raw length for compressed values. - postgres_fdw calculates fetch limit bytes by the following formula, MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple); The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and MAX_FETCH_SIZE = 3. fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes) (auto) 60 2.4s 108 ( 1.08) (auto)204 7.3s536400 ( 0.54) (auto) 1356 15 s430236 ( 0.43) This is meaningfully fast but the patch looks too big and the meaning of the new parameter is hard to understand..:( On the other hand the cause of the displacements of alloced_mem shown above is per-tuple overhead, the sum of which is unknown before execution. The second patch makes FE
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
At 2015-01-26 18:47:29 -0600, jim.na...@bluetreble.com wrote: > > Anything happen with this? Nothing so far. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
2015-01-26 23:29 GMT+01:00 Jim Nasby : > On 1/26/15 4:17 PM, Pavel Stehule wrote: > >> Any way to reduce the code duplication between the array and >> non-array versions? Maybe factor out the operator caching code? >> >> >> I though about it - but there is different checks, different result >> processing, different result type. >> >> I didn't find any readable and reduced form :( >> > > Yeah, that's why I was thinking specifically of the operator caching > code... isn't that identical? That would at least remove a dozen lines... It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data. Regards Pavel > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] proposal: row_to_array function
2015-01-26 21:44 GMT+01:00 Jim Nasby : > On 1/25/15 4:23 AM, Pavel Stehule wrote: > >> >> I tested a concept iteration over array in format [key1, value1, key2, >> value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], >> ...] too >> >> It is only a few lines more to current code, and this change doesn't >> break a compatibility. >> >> Do you think, so this patch is acceptable? >> >> Ideas, comments? >> > > Aside from fixing the comments... I think this needs more tests on corner > cases. For example, what happens when you do > > foreach a, b, c in array(array(1,2),array(3,4)) ? > it is relative simple behave -- empty values are NULL array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4] > > Or the opposite case of > > foreach a,b in array(array(1,2,3)) > > Also, what about: > > foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); array --- {1,2,3,4,5,6,7,8} (1 row) so it generate pairs {1,2}{3,4},{5,6},{7,8} Regards Pavel Stehule > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On Mon, Jan 26, 2015 at 4:03 PM, Andres Freund wrote: >> I basically have two ideas to fix this. >> >> 1) Make do_pg_start_backup() acquire a SHARE lock on >>pg_database. That'll prevent it from starting while a movedb() is >>still in progress. Then additionally add pg_backup_in_progress() >>function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup || >>XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and >>movedb() to error out if a backup is in progress. > > Attached is a patch trying to this. Doesn't look too bad and lead me to > discover missing recovery conflicts during a AD ST. > > But: It doesn't actually work on standbys, because lock.c prevents any > stronger lock than RowExclusive from being acquired. And we need need a > lock that can conflict with WAL replay of DBASE_CREATE, to handle base > backups that are executed on the primary. Those obviously can't detect > whether any standby is currently doing a base backup... > > I currently don't have a good idea how to mangle lock.c to allow > this. I've played with doing it like in the second patch, but that > doesn't actually work because of some asserts around ProcSleep - leading > to locks on database objects not working in the startup process (despite > already being used). > > The easiest thing would be to just use a lwlock instead of a heavyweight > lock - but those aren't canceleable... How about just wrapping an lwlock around a flag variable? movedb() increments the variable when starting and decrements it when done (must use PG_ENSURE_ERROR_CLEANUP). Starting a backup errors out (or waits in 1-second increments) if it's non-zero. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund wrote: > On 2015-01-22 20:54:47 -0500, Stephen Frost wrote: >> * Bruce Momjian (br...@momjian.us) wrote: >> > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: >> > > Or do you - as the text edited in your patch, but not the quote above - >> > > mean to run pg_upgrade just on the primary and then rsync? >> > >> > No, I was going to run it on both, then rsync. >> >> I'm pretty sure this is all a lot easier than you believe it to be. If >> you want to recreate what pg_upgrade does to a cluster then the simplest >> thing to do is rsync before removing any of the hard links. rsync will >> simply recreate the same hard link tree that pg_upgrade created when it >> ran, and update files which were actually changed (the catalog tables). > > I don't understand why that'd be better than simply fixing (yes, that's > imo the correct term) pg_upgrade to retain relfilenodes across the > upgrade. Afaics there's no conflict risk and it'd make the clusters much > more similar, which would be good; independent of rsyncing standbys. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian wrote: > On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote: >> > > You'd have to replace the existing data directory on the master to do >> > > that, which pg_upgrade was designed specifically to not do, in case >> > > things went poorly. >> > >> > Why? Just rsync the new data directory onto the old directory on the >> > standbys. That's fine and simple. >> >> That still doesn't address the need to use --size-only, it would just >> mean that you don't need to use -H. If anything the -H part is the >> aspect which worries me the least about this approach. > > I can now confirm that it works, just as Stephen said. I was able to > upgrade a standby cluster that contained the regression database, and > the pg_dump output was perfect. > > I am attaching doc instruction that I will add to all branches as soon > as someone else confirms my results. You will need to use rsync > --itemize-changes to see the hard links being created, e.g.: > >hf+ pgsql/data/base/16415/28188 => pgsql.old/data/base/16384/28188 My rsync manual page (on two different systems) mentions nothing about remote_dir, so I'd be quite unable to follow your proposed directions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Mon, Jan 26, 2015 at 9:52 PM, Michael Paquier wrote: > On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas wrote: >> That's what I hope to find out. :-) > Buildfarm seems happy now. I just gave a try to that on one of my > small Windows VMs and compared the performance with 9.4 for this > simple test case when building with MSVC 2010: > create table aa as select random()::text as a, 'filler filler filler' > as b a from generate_series(1,100); > create index aai in aa(a): > On 9.4, the index creation took 26.5s, while on master it took 18s. > That's nice, particularly for things like a restore from a dump. Cool. That's a little bit smaller win than I would have expected given my Linux results, but it's certainly respectable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GetLockConflicts() and thus recovery conflicts seem pretty broken
Hi, While investigating other bugs I noticed that ResolveRecoveryConflictWithLock() wasn't really working. Turns out GetLockConflicts() violates it's API contract which says: * The result array is palloc'd and is terminated with an invalid VXID. Problem 1: We don't actually put the terminator there. It happens to more or less accidentally work on a master because the array is palloc0()ed there and while a 0 is valid backend id it happens to not be a valid local transaction id. In HS we don't actually allocate the array every time, but it's instead statically allocated. Without zeroing. I have no idea why this doesn't crash ResolveRecoveryConflictWithInterrupt() which does: while (!lock_acquired) { if (++num_attempts < 3) backends = GetLockConflicts(locktag, AccessExclusiveLock); ... ResolveRecoveryConflictWithVirtualXIDs(backends, PROCSIG_RECOVERY_CONFLICT_LOCK); and ResolveRecoveryConflictWithVirtualXIDs does: ... while (VirtualTransactionIdIsValid(*waitlist)) { /* kill kill kill */ /* The virtual transaction is gone now, wait for the next one */ waitlist++; } I guess we just accidentally happen to come across appropriately set memory at some point. I'm really baffled that this hasn't caused significant problems so far. Problem 2: Since bcd8528f001 and 29eedd312274 the "the result array is palloc'd" is wrong because we're now doing: static VirtualTransactionId *vxids; /* * Allocate memory to store results, and fill with InvalidVXID. We only * need enough space for MaxBackends + a terminator, since prepared xacts * don't count. InHotStandby allocate once in TopMemoryContext. */ if (InHotStandby) { if (vxids == NULL) vxids = (VirtualTransactionId *) MemoryContextAlloc(TopMemoryContext, sizeof(VirtualTransactionId) * (MaxBackends + 1)); } else vxids = (VirtualTransactionId *) palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1)); Obviously that violates the API contract. I'm inclined to rip the HS special case out and add a pfree() to the single HS caller. The commit message introducing it says: Use malloc() in GetLockConflicts() when called InHotStandby to avoid repeated palloc calls. Current code assumed this was already true, so this is a bug fix. But I can't really believe this is relevant. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Robert Haas (robertmh...@gmail.com) wrote: > On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian wrote: > > On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote: > >> > > You'd have to replace the existing data directory on the master to do > >> > > that, which pg_upgrade was designed specifically to not do, in case > >> > > things went poorly. > >> > > >> > Why? Just rsync the new data directory onto the old directory on the > >> > standbys. That's fine and simple. > >> > >> That still doesn't address the need to use --size-only, it would just > >> mean that you don't need to use -H. If anything the -H part is the > >> aspect which worries me the least about this approach. > > > > I can now confirm that it works, just as Stephen said. I was able to > > upgrade a standby cluster that contained the regression database, and > > the pg_dump output was perfect. > > > > I am attaching doc instruction that I will add to all branches as soon > > as someone else confirms my results. You will need to use rsync > > --itemize-changes to see the hard links being created, e.g.: > > > >hf+ pgsql/data/base/16415/28188 => > > pgsql.old/data/base/16384/28188 > > My rsync manual page (on two different systems) mentions nothing about > remote_dir, so I'd be quite unable to follow your proposed directions. The example listed works, but only when it's a local rsync: rsync --archive --hard-links --size-only old_dir new_dir remote_dir Perhaps a better example (or additional one) would be with a remote rsync, including clarification of old and new dir, like so: (run in /var/lib/postgresql) rsync --archive --hard-links --size-only \ 9.3/main \ 9.4/main \ server:/var/lib/postgresql/ Note that 9.3/main and 9.4/main are two source directories for rsync to copy over, while server:/var/lib/postgresql/ is a remote destination directory. The above directories match a default Debian/Ubuntu install. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade and rsync
Robert Haas writes: > On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund wrote: >> I don't understand why that'd be better than simply fixing (yes, that's >> imo the correct term) pg_upgrade to retain relfilenodes across the >> upgrade. Afaics there's no conflict risk and it'd make the clusters much >> more similar, which would be good; independent of rsyncing standbys. > +1. That's certainly impossible for the system catalogs, which means you have to be able to deal with relfilenode discrepancies for them, which means that maintaining the same relfilenodes for user tables is of dubious value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 27, 2015 at 08:02:37AM +0100, Daniel Bausch wrote: > Hi PG devs! > > Tom Lane writes: > > >> Wait for first IO, issue second IO request > >> Compute > >> Already have second IO request, issue third > >> ... > > > >> We'd be a lot less sensitive to IO latency. > > > > It would take about five minutes of coding to prove or disprove this: > > stick a PrefetchBuffer call into heapgetpage() to launch a request for the > > next page as soon as we've read the current one, and then see if that > > makes any obvious performance difference. I'm not convinced that it will, > > but if it did then we could think about how to make it work for real. > > Sorry for dropping in so late... > > I have done all this two years ago. For TPC-H Q8, Q9, Q17, Q20, and Q21 > I see a speedup of ~100% when using IndexScan prefetching + Nested-Loops > Look-Ahead (the outer loop!). > (On SSD with 32 Pages Prefetch/Look-Ahead + Cold Page Cache / Small RAM) Would you be so kind as to pass along any patches (ideally applicable to git master), tests, and specific measurements you made? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund >> wrote: >>> I don't understand why that'd be better than simply fixing (yes, that's >>> imo the correct term) pg_upgrade to retain relfilenodes across the >>> upgrade. Afaics there's no conflict risk and it'd make the clusters much >>> more similar, which would be good; independent of rsyncing standbys. > >> +1. > > That's certainly impossible for the system catalogs, which means you > have to be able to deal with relfilenode discrepancies for them, which > means that maintaining the same relfilenodes for user tables is of > dubious value. Why is that impossible for the system catalogs? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost wrote: > The example listed works, but only when it's a local rsync: > > rsync --archive --hard-links --size-only old_dir new_dir remote_dir > > Perhaps a better example (or additional one) would be with a remote > rsync, including clarification of old and new dir, like so: > > (run in /var/lib/postgresql) > rsync --archive --hard-links --size-only \ > 9.3/main \ > 9.4/main \ > server:/var/lib/postgresql/ > > Note that 9.3/main and 9.4/main are two source directories for rsync to > copy over, while server:/var/lib/postgresql/ is a remote destination > directory. The above directories match a default Debian/Ubuntu install. My point is that Bruce's patch suggests looking for "remote_dir" in the rsync documentation, but no such term appears there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
Robert Haas writes: > On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane wrote: >> That's certainly impossible for the system catalogs, which means you >> have to be able to deal with relfilenode discrepancies for them, which >> means that maintaining the same relfilenodes for user tables is of >> dubious value. > Why is that impossible for the system catalogs? New versions aren't guaranteed to have the same system catalogs, let alone the same relfilenodes for them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost wrote: > > The example listed works, but only when it's a local rsync: > > > > rsync --archive --hard-links --size-only old_dir new_dir remote_dir > > > > Perhaps a better example (or additional one) would be with a remote > > rsync, including clarification of old and new dir, like so: > > > > (run in /var/lib/postgresql) > > rsync --archive --hard-links --size-only \ > > 9.3/main \ > > 9.4/main \ > > server:/var/lib/postgresql/ > > > > Note that 9.3/main and 9.4/main are two source directories for rsync to > > copy over, while server:/var/lib/postgresql/ is a remote destination > > directory. The above directories match a default Debian/Ubuntu install. > > My point is that Bruce's patch suggests looking for "remote_dir" in > the rsync documentation, but no such term appears there. Ah, well, perhaps we could simply add a bit of clarification to this: for details on specifying remote_dir like so: for details on specifying the destination remote_dir ? On my system, the rsync man page has '[DEST]' in the synopsis, but it doesn't actually go on to specifically define what 'DEST' is, rather referring to it later as 'destination' or 'remote directory'. I'm sure other suggestions would be welcome if they'd help clarify. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade and rsync
On 2015-01-27 10:20:48 -0500, Robert Haas wrote: > On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane wrote: > > Robert Haas writes: > >> On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund > >> wrote: > >>> I don't understand why that'd be better than simply fixing (yes, that's > >>> imo the correct term) pg_upgrade to retain relfilenodes across the > >>> upgrade. Afaics there's no conflict risk and it'd make the clusters much > >>> more similar, which would be good; independent of rsyncing standbys. > > > >> +1. > > > > That's certainly impossible for the system catalogs, which means you > > have to be able to deal with relfilenode discrepancies for them, which > > means that maintaining the same relfilenodes for user tables is of > > dubious value. > > Why is that impossible for the system catalogs? Maybe it's not impossible for existing catalogs, but it's certainly complicated. But I don't think it's all that desirable anyway - they're not the same relation after the pg_upgrade anyway (initdb/pg_dump filled them). That's different for the user defined relations. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: dynahash replacement for buffer table
On Mon, Jan 26, 2015 at 11:20 PM, Robert Haas wrote: > This developed a slight merge conflict. I've rebased the attached > version, and I also took the step of getting rid of buf_table.c, as I > think I proposed somewhere upthread. This avoids the overhead of > constructing a BufferTag only to copy it into a BufferLookupEnt, plus > some function calls and so forth. A quick-and-dirty test suggests > this might not have cut down on the 1-client overhead much, but I > think it's worth doing anyway: it's certainly saving a few cycles, and > I don't think it's complicating anything measurably. So here are median-of-three results for 5-minute read-only pgbench runs at scale factor 1000, shared_buffers = 8GB, on hydra (POWER7) at various client counts. clients - master@4b2a25 - master+chash-buftable-v2 1 8319.254199 8105.479632 2 15485.561948 15138.227533 3 23690.186576 23264.702784 4 32157.346740 31536.870881 5 40879.532747 40455.794841 6 49063.279970 49625.681573 7 57518.374517 57492.275197 8 65351.415323 65622.211763 16 126166.416498 126668.793798 24 155727.916112 155587.414299 32 180329.012019 179543.424754 40 201384.186317 200109.614362 48 222352.265595 225688.574611 56 240400.659338 254398.144976 64 253699.031075 266624.224699 72 261198.336625 270652.578322 80 264569.862257 270332.738084 That's extremely unimpressive. You (Andres) showed a much bigger benefit on a different machine with a much higher scale factor (5000) so I don't know whether the modest benefit here is because of the different machine, the different scale factor, or what. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: dynahash replacement for buffer table
On 2015-01-27 10:36:35 -0500, Robert Haas wrote: > On Mon, Jan 26, 2015 at 11:20 PM, Robert Haas wrote: > > This developed a slight merge conflict. I've rebased the attached > > version, and I also took the step of getting rid of buf_table.c, as I > > think I proposed somewhere upthread. This avoids the overhead of > > constructing a BufferTag only to copy it into a BufferLookupEnt, plus > > some function calls and so forth. A quick-and-dirty test suggests > > this might not have cut down on the 1-client overhead much, but I > > think it's worth doing anyway: it's certainly saving a few cycles, and > > I don't think it's complicating anything measurably. > > So here are median-of-three results for 5-minute read-only pgbench > runs at scale factor 1000, shared_buffers = 8GB, on hydra (POWER7) at > various client counts. > That's extremely unimpressive. You (Andres) showed a much bigger > benefit on a different machine with a much higher scale factor (5000) > so I don't know whether the modest benefit here is because of the > different machine, the different scale factor, or what. Based on my test on hydra some time back the bottleneck is nearly entirely in GetSnapshotData() at higher client counts. So I'm not that surprised you don't see the big benefit here. IIRC on hydra the results for using a large enough shared_buffers setting for a fully cached run at that scale is pretty close to your master results - so there's really not much performance increase to be expected by making the buf table lockless. I guess you would see a slightly bigger difference at a different shared_buffer/scale combination. IIRC a scale 1000 is about 15GB large? So about half of the data set fit in shared buffers. In the scale 5k results I posted it was about 1/5. I had also tested on a four socket x86 machine where GetSnapshotData() was a, but not the sole big, bottleneck. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas writes: > > On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane wrote: > >> That's certainly impossible for the system catalogs, which means you > >> have to be able to deal with relfilenode discrepancies for them, which > >> means that maintaining the same relfilenodes for user tables is of > >> dubious value. > > > Why is that impossible for the system catalogs? > > New versions aren't guaranteed to have the same system catalogs, let alone > the same relfilenodes for them. Indeed, new versions almost certainly have wholly new system catalogs. While there might be a reason to keep the relfilenodes the same, it doesn't actually help with the pg_upgrade use-case we're currently discussing (at least, not without additional help). The problem is that we certainly must transfer all the new catalogs, but how would rsync know that those catalog files have to be transferred but not the user relations? Using --size-only would mean that system catalogs whose sizes happen to match after the upgrade wouldn't be transferred and that would certainly lead to a corrupt situation. Andres proposed a helper script which would go through the entire tree on the remote side and set all the timestamps on the remote side to match those on the local side (prior to the pg_upgrade). If all the relfilenodes remained the same and the timestamps on the catalog tables all changed then it might work to do (without using --size-only): stop-cluster set-timestamp-script pg_upgrade rsync new_data_dir -> remote:existing_cluster This would mean that any other files which happened to be changed by pg_upgrade beyond the catalog tables would also get copied across. The issue that I see with that is that if the pg_upgrade process does touch anything outside of the system catalogs, then its documented revert mechanism (rename the control file and start the old cluster back up, prior to having started the new cluster) wouldn't be valid. Requiring an extra script which runs around changing timestamps on files is a bit awkward too, though I suppose possible, and then we'd also have to document that this process only works with $version of pg_upgrade that does the preservation of the relfilenodes. I suppose there's also technically a race condition to consider, if the whole thing is scripted and pg_upgrade manages to change an existing file in the same second that the old cluster did then that file wouldn't be recognized by the rsync as having been updated. That's not too hard to address though- just wait a second somewhere in there. Still, I'm not really sure that this approach really gains us much over the approach that Bruce is proposing. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/27/2015 12:24 AM, Noah Misch wrote: For the moment, maybe I could commit the fix for the non U+ case for escape_json, and we could think some more about detecting and warning about the problem strings. +1 for splitting development that way. Fixing the use of escape_json() is objective, but the choices around the warning are more subtle. OK, so something like this patch? I'm mildly concerned that you and I are the only ones commenting on this. cheers andrew diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index 3c137ea..8d2b38f 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -94,6 +94,8 @@ static void datum_to_json(Datum val, bool is_null, StringInfo result, static void add_json(Datum val, bool is_null, StringInfo result, Oid val_type, bool key_scalar); static text *catenate_stringinfo_string(StringInfo buffer, const char *addon); +static void escape_json_work(StringInfo buf, const char *str, + bool jsonb_unicode); /* the null action object used for pure validation */ static JsonSemAction nullSemAction = @@ -2356,6 +2358,18 @@ json_object_two_arg(PG_FUNCTION_ARGS) void escape_json(StringInfo buf, const char *str) { + escape_json_work( buf, str, false); +} + +void +escape_jsonb(StringInfo buf, const char *str) +{ + escape_json_work( buf, str, true); +} + +static void +escape_json_work(StringInfo buf, const char *str, bool jsonb_unicode) +{ const char *p; appendStringInfoCharMacro(buf, '\"'); @@ -2398,7 +2412,9 @@ escape_json(StringInfo buf, const char *str) * unicode escape that should be present is \u, all the * other unicode escapes will have been resolved. */ -if (p[1] == 'u' && +if (jsonb_unicode && strncmp(p+1, "u", 5) == 0) + appendStringInfoCharMacro(buf, *p); +else if (!jsonb_unicode && p[1] == 'u' && isxdigit((unsigned char) p[2]) && isxdigit((unsigned char) p[3]) && isxdigit((unsigned char) p[4]) && diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 644ea6d..22e1263 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -305,7 +305,7 @@ jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal) appendBinaryStringInfo(out, "null", 4); break; case jbvString: - escape_json(out, pnstrdup(scalarVal->val.string.val, scalarVal->val.string.len)); + escape_jsonb(out, pnstrdup(scalarVal->val.string.val, scalarVal->val.string.len)); break; case jbvNumeric: appendStringInfoString(out, diff --git a/src/include/utils/json.h b/src/include/utils/json.h index 6f69403..661d7bd 100644 --- a/src/include/utils/json.h +++ b/src/include/utils/json.h @@ -42,7 +42,13 @@ extern Datum json_build_array_noargs(PG_FUNCTION_ARGS); extern Datum json_object(PG_FUNCTION_ARGS); extern Datum json_object_two_arg(PG_FUNCTION_ARGS); +/* + * escape_jsonb is more strict about unicode escapes, and will only not + * escape \u, as that is the only unicode escape that should still be + * present. + */ extern void escape_json(StringInfo buf, const char *str); +extern void escape_jsonb(StringInfo buf, const char *str); extern Datum json_typeof(PG_FUNCTION_ARGS); diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out index aa5686f..b5457c4 100644 --- a/src/test/regress/expected/jsonb.out +++ b/src/test/regress/expected/jsonb.out @@ -324,11 +324,20 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04'); (1 row) COMMIT; --- unicode escape - backslash is not escaped +-- unicode null - backslash not escaped +-- note: using to_jsonb here bypasses the jsonb input routine. +select jsonb '"\u"', to_jsonb(text '\u'); + jsonb | to_jsonb +--+-- + "\u" | "\u" +(1 row) + +-- any other unicode-looking escape - backslash is escaped +-- all unicode characters should have been resolved select to_jsonb(text '\uabcd'); - to_jsonb --- - "\uabcd" + to_jsonb +--- + "\\uabcd" (1 row) -- any other backslash is escaped @@ -338,6 +347,14 @@ select to_jsonb(text '\abcd'); "\\abcd" (1 row) +-- doubled backslash should be reproduced +-- this isn't a unicode escape, it's an escaped backslash followed by 'u' +select jsonb '"\\uabcd"'; + jsonb +--- + "\\uabcd" +(1 row) + --jsonb_agg CREATE TEMP TABLE rows AS SELECT x, 'txt' || x as y diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out index 687ae63..10aaac8 100644 --- a/src/test/regress/expected/jsonb_1.out +++ b/src/test/regress/expected/jsonb_1.out @@ -324,11 +324,20 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04'); (1 row) COMMIT; --- unicode escape - backslash is not escaped +-- unicode null - backslash not escaped +-- note: using to_jsonb here bypasses the jsonb input routine. +select jsonb '"\u"', to_jsonb(text '\u'); + jsonb | to_jsonb +--+--
Re: [HACKERS] numeric access out of bounds
I wrote: > Andrew Gierth writes: >> I can see two possible fixes: one to correct the assumptions in the >> macros, the other to check for NaN before calling init_var_from_num in >> numeric_send (all the other functions seem to do this check explicitly). >> Which would be preferable? > I'm inclined to think special-casing NaN in numeric_send is the thing to > do, since it is that way everywhere else. If we could push down all the > special casing into init_var_from_num then that would probably be better, > but in most cases that looks unattractive. After taking a closer look I realized that you were right with your first alternative: the field access macros are just plain wrong, and we should fix 'em. Done now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Andrew Dunstan writes: > On 01/27/2015 12:24 AM, Noah Misch wrote: >> +1 for splitting development that way. Fixing the use of escape_json() is >> objective, but the choices around the warning are more subtle. > OK, so something like this patch? I'm mildly concerned that you and I > are the only ones commenting on this. Doesn't seem to me like this fixes anything. If the content of a jsonb value is correct, the output will be the same with or without this patch; and if it's not, this isn't going to really improve matters. I think coding anything is premature until we decide how we're going to deal with the fundamental ambiguity. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
> "Peter" == Peter Geoghegan writes: Peter> What I find particularly interesting about this patch is that it Peter> makes sorting numerics significantly faster than even sorting Peter> float8 values, Played some more with this. Testing on some different gcc versions showed that the results were not consistent between versions; the latest I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8 as slightly slower; the difference was all in the time of the float8 case, the time for numeric was virtually the same. For one specific test query, taking the best time of multiple runs, float8: gcc4.7 = 980ms, gcc4.9 = 833ms numeric: gcc4.7 = 940ms, gcc4.9 = 920ms (vs. 650ms for bigint on either version) So honestly I think abbreviation for float8 is a complete red herring. Also, I couldn't get any detectable benefit from inlining DatumGetFloat8, though I may have to play more with that to be certain (above tests did not have any float8-related modifications at all, just the datum and numeric abbrevs patches). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based incremental backup v6
Hi, here it is another version of the file based incremental backup patch. Changelog from the previous one: * pg_basebackup --incremental option take the directory containing the base backup instead of the backup profile file * rename the backup_profile file at the same time of backup_label file when starting the first time from a backup. * handle "pg_basebackup -D -" appending the backup profile to the resulting tar stream * added documentation for -I/--incremental option to pg_basebackup doc * updated replication protocol documentation The reationale of moving the backup_profile out of the way during recovery is to avoid using a data directory which has been already started as a base of a backup. I've also lightly improved the pg_restorebackup PoC implementing the syntax advised by Gabriele: ./pg_restorebackup.py DESTINATION BACKUP_1 BACKUP_2 [BACKUP_3, ...] It also supports relocation of tablespace with -T option. The -T option is mandatory if there was any tablespace defined in the PostgreSQL instance when the incremental_backup was taken. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it From 56fed6e250280f8e5d5c17252db631f33a3c9d8f Mon Sep 17 00:00:00 2001 From: Marco Nenciarini Date: Tue, 14 Oct 2014 14:31:28 +0100 Subject: [PATCH] File-based incremental backup v6 Add backup profile to pg_basebackup INCREMENTAL option implementaion --- doc/src/sgml/protocol.sgml | 86 - doc/src/sgml/ref/pg_basebackup.sgml| 31 ++- src/backend/access/transam/xlog.c | 18 +- src/backend/access/transam/xlogfuncs.c | 2 +- src/backend/replication/basebackup.c | 335 +++-- src/backend/replication/repl_gram.y| 6 + src/backend/replication/repl_scanner.l | 1 + src/bin/pg_basebackup/pg_basebackup.c | 191 +-- src/include/access/xlog.h | 3 +- src/include/replication/basebackup.h | 5 + 10 files changed, 639 insertions(+), 39 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index efe75ea..fc24648 100644 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** The commands accepted in walsender mode *** 1882,1888 ! BASE_BACKUP [LABEL 'label'] [PROGRESS] [FAST] [WAL] [NOWAIT] [MAX_RATE rate] BASE_BACKUP --- 1882,1888 ! BASE_BACKUP [LABEL 'label'] [INCREMENTAL 'start_lsn'] [PROGRESS] [FAST] [WAL] [NOWAIT] [MAX_RATE rate] BASE_BACKUP *** The commands accepted in walsender mode *** 1905,1910 --- 1905,1928 + INCREMENTAL 'start_lsn' + + + Requests a file-level incremental backup of all files changed after + start_lsn. When operating with + INCREMENTAL, the content of every block-organised + file will be analyzed and the file will be sent if at least one + block has a LSN higher than or equal to the provided + start_lsn. + + + The backup_profile will contain information on + every file that has been analyzed, even those that have not been sent. + + + + + PROGRESS *** The commands accepted in walsender mode *** 2022,2028 ustar interchange format specified in the POSIX 1003.1-2008 standard) dump of the tablespace contents, except that the two trailing blocks of zeroes specified in the standard are omitted. ! After the tar data is complete, a final ordinary result set will be sent, containing the WAL end position of the backup, in the same format as the start position. --- 2040,2046 ustar interchange format specified in the POSIX 1003.1-2008 standard) dump of the tablespace contents, except that the two trailing blocks of zeroes specified in the standard are omitted. ! After the tar data is complete, an ordinary result set will be sent, containing the WAL end position of the backup, in the same format as the start position. *** The commands accepted in walsender mode *** 2073,2082 the server supports it. ! Once all tablespaces have been sent, a final regular result set will be sent. This result set contains the end position of the backup, given in XLogRecPtr format as a single column in a single row. --- 2091,2162 the server supports it. ! Once all tablespaces have been sent, another regular result set will be sent. This result set contains the end position of the backup, given in XLogRecPtr format as a single column in a single row. + +
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/27/2015 12:23 PM, Tom Lane wrote: Andrew Dunstan writes: On 01/27/2015 12:24 AM, Noah Misch wrote: +1 for splitting development that way. Fixing the use of escape_json() is objective, but the choices around the warning are more subtle. OK, so something like this patch? I'm mildly concerned that you and I are the only ones commenting on this. Doesn't seem to me like this fixes anything. If the content of a jsonb value is correct, the output will be the same with or without this patch; and if it's not, this isn't going to really improve matters. I think coding anything is premature until we decide how we're going to deal with the fundamental ambiguity. The input \\uabcd will be stored correctly as \uabcd, but this will in turn be rendered as \uabcd, whereas it should be rendered as \\uabcd. That's what the patch fixes. There are two problems here and this addresses one of them. The other problem is the ambiguity regarding \\u and \u. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] File based Incremental backup v7
Il 27/01/15 10:25, Giuseppe Broccolo ha scritto:> Hi Marco, > > On 16/01/15 16:55, Marco Nenciarini wrote: >> On 14/01/15 17:22, Gabriele Bartolini wrote: >> > >> > My opinion, Marco, is that for version 5 of this patch, you: >> > >> > 1) update the information on the wiki (it is outdated - I know you have >> > been busy with LSN map optimisation) >> >> Done. >> >> > 2) modify pg_basebackup in order to accept a directory (or tar file) and >> > automatically detect the LSN from the backup profile >> >> New version of patch attached. The -I parameter now requires a backup >> profile from a previous backup. I've added a sanity check that forbid >> incremental file level backups if the base timeline is different from >> the current one. >> >> > 3) add the documentation regarding the backup profile and pg_basebackup >> > >> >> Next on my TODO list. >> >> > Once we have all of this, we can continue trying the patch. Some >> > unexplored paths are: >> > >> > * tablespace usage >> >> I've improved my pg_restorebackup python PoC. It now supports tablespaces. > > About tablespaces, I noticed that any pointing to tablespace locations > is lost during the recovery of an incremental backup changing the > tablespace mapping (-T option). Here the steps I followed: > > * creating and filling a test database obtained through pgbench > > psql -c "CREATE DATABASE pgbench" > pgbench -U postgres -i -s 5 -F 80 pgbench > > * a first base backup with pg_basebackup: > > mkdir -p backups/$(date '+%d%m%y%H%M')/data && pg_basebackup -v -F p -D backups/$(date '+%d%m%y%H%M')/data -x > > * creation of a new tablespace, alter the table "pgbench_accounts" to > set the new tablespace: > > mkdir -p /home/gbroccolo/pgsql/tbls > psql -c "CREATE TABLESPACE tbls LOCATION '/home/gbroccolo/pgsql/tbls'" > psql -c "ALTER TABLE pgbench_accounts SET TABLESPACE tbls" pgbench > > * Doing some work on the database: > > pgbench -U postgres -T 120 pgbench > > * a second incremental backup with pg_basebackup specifying the new > location for the tablespace through the tablespace mapping: > > mkdir -p backups/$(date '+%d%m%y%H%M')/data backups/$(date '+%d%m%y%H%M')/tbls && pg_basebackup -v -F p -D backups/$(date '+%d%m%y%H%M')/data -x -I backups/2601151641/data/backup_profile -T /home/gbroccolo/pgsql/tbls=/home/gbroccolo/pgsql/backups/$(date '+%d%m%y%H%M')/tbls > > * a recovery based on the tool pg_restorebackup.py attached in > http://www.postgresql.org/message-id/54b9428e.9020...@2ndquadrant.it > > ./pg_restorebackup.py backups/2601151641/data backups/2601151707/data /tmp/data -T /home/gbroccolo/pgsql/backups/2601151707/tbls=/tmp/tbls > > In the last step, I obtained the following stack trace: > > Traceback (most recent call last): > File "./pg_restorebackup.py", line 74, in > shutil.copy2(base_file, dest_file) > File "/home/gbroccolo/.pyenv/versions/2.7.5/lib/python2.7/shutil.py", line 130, in copy2 > copyfile(src, dst) > File "/home/gbroccolo/.pyenv/versions/2.7.5/lib/python2.7/shutil.py", line 82, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: 'backups/2601151641/data/base/16384/16406_fsm' > > > Any idea on what's going wrong? > I've done some test and it looks like that FSM nodes always have InvalidXLogRecPtr as LSN. Ive updated the patch to always include files if all their pages have LSN == InvalidXLogRecPtr Updated patch v7 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it From bffcdf0d5c3258c8848215011eb8e8b3377d9f18 Mon Sep 17 00:00:00 2001 From: Marco Nenciarini Date: Tue, 14 Oct 2014 14:31:28 +0100 Subject: [PATCH] File-based incremental backup v7 Add backup profiles and --incremental to pg_basebackup --- doc/src/sgml/protocol.sgml | 86 - doc/src/sgml/ref/pg_basebackup.sgml| 31 ++- src/backend/access/transam/xlog.c | 18 +- src/backend/access/transam/xlogfuncs.c | 2 +- src/backend/replication/basebackup.c | 344 +++-- src/backend/replication/repl_gram.y| 6 + src/backend/replication/repl_scanner.l | 1 + src/bin/pg_basebackup/pg_basebackup.c | 191 -- src/include/access/xlog.h | 3 +- src/include/replication/basebackup.h | 5 + 10 files changed, 648 insertions(+), 39 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index efe75ea..fc24648 100644 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** The commands accepted in walsender mode *** 1882,1888 ! BASE_BACKUP [LABEL 'label'] [PROGRESS] [FAST] [WAL] [NOWAIT] [MAX_RATE rate] BASE_BACKUP --- 1882,1888 ! BASE_BACKUP [LABEL 'label'] [INCREMENTAL 'start_lsn'] [PROGRESS] [FAST] [WAL] [NOWAIT] [MAX_RATE rate] B
[HACKERS] Release notes
Where can one find the running copy of release notes for pending releases? -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release notes
On 2015-01-27 10:09:12 -0800, Joshua D. Drake wrote: > Where can one find the running copy of release notes for pending releases? In the source tree on the master once it exists. It doesn't yet for the next set of releases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
Heikki Linnakangas writes: > I think you are confusing NULL pointers with an SQL NULL. > gistgetadjusted checks that it's not an SQL NULL (!oldisnull[i]), but it > does not check if it's a NULL pointer > (DatumGetPointer(oldentries[i].key) != NULL). The case we're worried > about is that the value is not an SQL NULL, i.e. isnull flag is false, > but the Datum is a NULL pointer. Actually both of these deserve to be worried about; though it's fairly clear from looking at the core GIST code that it avoids calling gistKeyIsEQ on SQL NULLs. > Nevertheless, looking at the code, I don't that a NULL pointer can ever > be passed to the same-method, for any of the built-in or contrib > opclasses, but it's very confusing because some functions check for > that. My proof goes like this: > 1. The key value passed as argument must've been originally formed by > the compress, union, or picksplit methods. > 1.1. Some compress methods do nothing, and just return the passed-in > key, which comes from the table and cannot be a NULL pointer (the > compress method is never called for SQL NULLs). Other compress methods > don't check for a NULL pointer, and would crash if there was one. > gist_poly_compress() and gist_circle_compress() do check for a NULL, but > they only return a NULL key if the input key is NULL, which cannot > happen because the input comes from a table. So I believe the > NULL-checks in those functions are dead code, and none of the compress > methods ever return a NULL key. > 1.2. None of the union methods return a NULL pointer (nor expect one as > input). > 1.3. None of the picksplit methods return a NULL pointer. They all > return one of the original values, or one formed with the union method. > The picksplit method can return a NULL pointer in the spl_ldatum or > spl_rdatum field, in the degenerate case that it puts all keys on the > same halve. However, the caller, gistUserPickSplit checks for that and > immediately overwrites spl_ldatum and spl_rdatum with sane values in > that case. Sounds good to me. > I don't understand what this sentence in the documentation on the > compress method means: >> Depending on your needs, you could also need to care about >> compressing NULL values in there, storing for example (Datum) 0 like >> gist_circle_compress does. I believe you're right that I added this because there were checks for null pointers in some but not all of the opclass support functions. It looked to me like some opclasses might be intending to pass around null pointers as valid (not-SQL-NULL) values. I think your analysis above eliminates that idea though. It's a sufficiently weird concept that I don't feel a need to document or support it. So I'm fine with taking out both this documentation text and the dead null-pointer checks; but let's do that all in one patch not piecemeal. In any case, this is just cosmetic cleanup; there's no actual hazard here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release notes
Andres Freund writes: > On 2015-01-27 10:09:12 -0800, Joshua D. Drake wrote: >> Where can one find the running copy of release notes for pending releases? > In the source tree on the master once it exists. It doesn't yet for the > next set of releases. The closest thing to a "running copy" is the git commit log. Somebody (usually Bruce or I) writes release notes from that shortly before a release is made. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release notes
On 01/27/2015 10:17 AM, Tom Lane wrote: Andres Freund writes: On 2015-01-27 10:09:12 -0800, Joshua D. Drake wrote: Where can one find the running copy of release notes for pending releases? In the source tree on the master once it exists. It doesn't yet for the next set of releases. The closest thing to a "running copy" is the git commit log. Somebody (usually Bruce or I) writes release notes from that shortly before a release is made. Thanks for the info. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Andrew Dunstan writes: > On 01/27/2015 12:23 PM, Tom Lane wrote: >> I think coding anything is premature until we decide how we're going to >> deal with the fundamental ambiguity. > The input \\uabcd will be stored correctly as \uabcd, but this will in > turn be rendered as \uabcd, whereas it should be rendered as \\uabcd. > That's what the patch fixes. > There are two problems here and this addresses one of them. The other > problem is the ambiguity regarding \\u and \u. It's the same problem really, and until we have an answer about what to do with \u, I think any patch is half-baked and possibly counterproductive. In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean an on-disk incompatibility for jsonb data containing U+, but hopefully there is very little of that out there yet. If we can get a fix into 9.4.1, I think it's reasonable to consider such solutions. The most obvious way to store such data unambiguously is to just go ahead and store U+ as a NUL byte (\000). The only problem with that is that then such a string cannot be considered to be a valid value of type TEXT, which would mean that we'd need to throw an error if we were asked to convert a JSON field containing such a character to text. I don't particularly have a problem with that, except possibly for the time cost of checking for \000 before allowing a conversion to occur. While a memchr() check might be cheap enough, we could also consider inventing a new JEntry type code for string-containing-null, so that there's a distinction in the type system between strings that are coercible to text and those that are not. If we went down a path like that, the currently proposed patch would be quite useless. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 28/01/15 06:29, Andrew Gierth wrote: "Peter" == Peter Geoghegan writes: Peter> What I find particularly interesting about this patch is that it Peter> makes sorting numerics significantly faster than even sorting Peter> float8 values, Played some more with this. Testing on some different gcc versions showed that the results were not consistent between versions; the latest I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8 as slightly slower; the difference was all in the time of the float8 case, the time for numeric was virtually the same. For one specific test query, taking the best time of multiple runs, float8: gcc4.7 = 980ms, gcc4.9 = 833ms numeric: gcc4.7 = 940ms, gcc4.9 = 920ms (vs. 650ms for bigint on either version) So honestly I think abbreviation for float8 is a complete red herring. Also, I couldn't get any detectable benefit from inlining DatumGetFloat8, though I may have to play more with that to be certain (above tests did not have any float8-related modifications at all, just the datum and numeric abbrevs patches). Since gcc5.0 is due to be released in less than 3 months, it might be worth testing with that. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
Hi 2015-01-27 11:41 GMT+01:00 Pavel Stehule : > > > 2015-01-26 21:44 GMT+01:00 Jim Nasby : > >> On 1/25/15 4:23 AM, Pavel Stehule wrote: >> >>> >>> I tested a concept iteration over array in format [key1, value1, key2, >>> value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], >>> ...] too >>> >>> It is only a few lines more to current code, and this change doesn't >>> break a compatibility. >>> >>> Do you think, so this patch is acceptable? >>> >>> Ideas, comments? >>> >> >> Aside from fixing the comments... I think this needs more tests on corner >> cases. For example, what happens when you do >> >> foreach a, b, c in array(array(1,2),array(3,4)) ? >> > > it is relative simple behave -- empty values are NULL > > array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively > ARRAY[1,2,3,4] > > >> >> Or the opposite case of >> >> foreach a,b in array(array(1,2,3)) >> >> Also, what about: >> >> foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? > > > > postgres=# select array(select > unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); >array > --- > {1,2,3,4,5,6,7,8} > (1 row) > > so it generate pairs {1,2}{3,4},{5,6},{7,8} > I fixed situation when array has not enough elements. More tests, simple doc Regards Pavel > > Regards > > Pavel Stehule > > >> -- >> Jim Nasby, Data Architect, Blue Treble Consulting >> Data in Trouble? Get it in Treble! http://BlueTreble.com >> > > diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out new file mode 100644 index 9749e45..e44532e *** a/contrib/hstore/expected/hstore.out --- b/contrib/hstore/expected/hstore.out *** select %% 'aa=>1, cq=>l, b=>g, fg=>NULL' *** 1148,1153 --- 1148,1169 {b,g,aa,1,cq,l,fg,NULL} (1 row) + -- fast iteration over keys + do $$ + declare + key text; + value text; + begin + foreach key, value in array hstore_to_array('aa=>1, cq=>l, b=>g, fg=>NULL'::hstore) + loop + raise notice 'key: %, value: %', key, value; + end loop; + end; + $$; + NOTICE: key: b, value: g + NOTICE: key: aa, value: 1 + NOTICE: key: cq, value: l + NOTICE: key: fg, value: select hstore_to_matrix('aa=>1, cq=>l, b=>g, fg=>NULL'::hstore); hstore_to_matrix - diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql new file mode 100644 index 5a9e9ee..7b9eb09 *** a/contrib/hstore/sql/hstore.sql --- b/contrib/hstore/sql/hstore.sql *** select avals(''); *** 257,262 --- 257,275 select hstore_to_array('aa=>1, cq=>l, b=>g, fg=>NULL'::hstore); select %% 'aa=>1, cq=>l, b=>g, fg=>NULL'; + -- fast iteration over keys + do $$ + declare + key text; + value text; + begin + foreach key, value in array hstore_to_array('aa=>1, cq=>l, b=>g, fg=>NULL'::hstore) + loop + raise notice 'key: %, value: %', key, value; + end loop; + end; + $$; + select hstore_to_matrix('aa=>1, cq=>l, b=>g, fg=>NULL'::hstore); select %# 'aa=>1, cq=>l, b=>g, fg=>NULL'; diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml new file mode 100644 index 69a0885..4ef0299 *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** NOTICE: row = {7,8,9} *** 2490,2495 --- 2490,2518 NOTICE: row = {10,11,12} + + + FOREACH cycle can be used for iteration over record. You + need a extension. For this case a clause + SLICE should not be used. FOREACH + statements supports list of target variables. When source array is + a array of composites, then composite array element is saved to target + variables. When the array is a array of scalar values, then target + variables are filled item by item. + + CREATE FUNCTION trig_function() RETURNS TRIGGER AS $$ + DECLARE + key text; value text; + BEGIN + FOREACH key, value IN ARRAY hstore_to_array(hstore(NEW)) + LOOP + RAISE NOTICE 'key = %, value = %', key, value; + END LOOP; + RETURN NEW; + END; + $$ LANGUAGE plpgsql; + + diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c new file mode 100644 index ae5421f..4ab3d90 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** exec_stmt_foreach_a(PLpgSQL_execstate *e *** 2242,2247 --- 2242,2250 Datum value; bool isnull; + + bool multiassign = false; + /* get the value of the array expression */ value = exec_eval_expr(estate, stmt->expr, &isnull, &arrtype); if (isnull) *** exec_stmt_foreach_a(PLpgSQL_execstate *e *** 2303,2308 --- 2306,2328 (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("FOREACH loop variable must not be of an array type"))); + /* + * Proof concept -- multiassign in FOREACH cycle + * + * Motivation: FOREACH key, value IN ARRAY hstore_to_array(hstore(NEW)) ... + */ + if (loop_var->dtype == PLPGSQL_DTYPE_ROW + &
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-27 07:16:27 -0500, Robert Haas wrote: > On Mon, Jan 26, 2015 at 4:03 PM, Andres Freund wrote: > >> I basically have two ideas to fix this. > >> > >> 1) Make do_pg_start_backup() acquire a SHARE lock on > >>pg_database. That'll prevent it from starting while a movedb() is > >>still in progress. Then additionally add pg_backup_in_progress() > >>function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup || > >>XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and > >>movedb() to error out if a backup is in progress. > > > > Attached is a patch trying to this. Doesn't look too bad and lead me to > > discover missing recovery conflicts during a AD ST. > > > > But: It doesn't actually work on standbys, because lock.c prevents any > > stronger lock than RowExclusive from being acquired. And we need need a > > lock that can conflict with WAL replay of DBASE_CREATE, to handle base > > backups that are executed on the primary. Those obviously can't detect > > whether any standby is currently doing a base backup... > > > > I currently don't have a good idea how to mangle lock.c to allow > > this. I've played with doing it like in the second patch, but that > > doesn't actually work because of some asserts around ProcSleep - leading > > to locks on database objects not working in the startup process (despite > > already being used). > > > > The easiest thing would be to just use a lwlock instead of a heavyweight > > lock - but those aren't canceleable... > > How about just wrapping an lwlock around a flag variable? movedb() > increments the variable when starting and decrements it when done > (must use PG_ENSURE_ERROR_CLEANUP). Starting a backup errors out (or > waits in 1-second increments) if it's non-zero. That'd end up essentially being a re-emulation of locks. Don't find that all that pretty. But maybe we have to go there. Here's an alternative approach. I think it generally is superior and going in the right direction, but I'm not sure it's backpatchable. It basically consists out of: 1) Make GetLockConflicts() actually work. 2) Allow the startup process to actually acquire locks other than AccessExclusiveLocks. There already is code acquiring other locks, but it's currently broken because they're acquired in blocking mode which isn't actually supported in the startup mode. Using this infrastructure we can actually fix a couple small races around database creation/drop. 3) Allow session locks during recovery to be heavier than RowExclusiveLock - since relation/object session locks aren't ever taken out by the user (without a plain lock first) that's safe. 4) Perform streaming base backups from within a transaction command, to provide error handling. 5) Make walsender actually react to recovery conflict interrupts 6) Prevent access to the template database of a CREATE DATABASE during WAL replay. 7) Add an interlock to prevent base backups and movedb() to run concurrently. What we actually came here for. Comments? At the very least it's missing some documentation updates about the locking changes in ALTER DATABASE, CREATE DATABASE and the base backup sections. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 6e196d17e3dc3ae923321c1b49eb46ccd5ac75b0 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 27 Jan 2015 19:52:11 +0100 Subject: [PATCH] Fix various issues around WAL replay and ALTER DATABASE SET TABLESPACE. Discussion: 20150120152819.gc24...@alap3.anarazel.de Fix GetLockConflicts() to properly terminate the list of conflicting backends. It's unclear why this hasn't caused more problems. Discussion: 20150127142713.gd29...@awork2.anarazel.de Don't acquire blocking locks on the database in dbase_redo(), not enough state setup. Discussion: 20150126212458.ga29...@awork2.anarazel.de Don't allow access to the template database during the replay of a CREATE DATABASE. --- src/backend/access/transam/xlog.c| 29 + src/backend/commands/dbcommands.c| 104 ++- src/backend/replication/basebackup.c | 15 + src/backend/replication/walsender.c | 14 + src/backend/storage/ipc/standby.c| 117 ++- src/backend/storage/lmgr/lmgr.c | 31 ++ src/backend/storage/lmgr/lock.c | 13 ++-- src/backend/utils/init/postinit.c| 2 +- src/include/access/xlog.h| 1 + src/include/storage/lmgr.h | 3 + src/include/storage/standby.h| 2 +- 11 files changed, 240 insertions(+), 91 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..38e7dff 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -53,6 +53,7 @@ #include "storage/ipc.h" #include "storage/large_object.h" #include "storage/latch.
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/27/2015 01:40 PM, Tom Lane wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean an on-disk incompatibility for jsonb data containing U+, but hopefully there is very little of that out there yet. If we can get a fix into 9.4.1, I think it's reasonable to consider such solutions. Hmm, OK. I had thought we'd be ruling that out, but I agree if it's on the table what I suggested is unnecessary. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Andrew Dunstan writes: > On 01/27/2015 01:40 PM, Tom Lane wrote: >> In particular, I would like to suggest that the current representation of >> \u is fundamentally broken and that we have to change it, not try to >> band-aid around it. This will mean an on-disk incompatibility for jsonb >> data containing U+, but hopefully there is very little of that out >> there yet. If we can get a fix into 9.4.1, I think it's reasonable to >> consider such solutions. > Hmm, OK. I had thought we'd be ruling that out, but I agree if it's on > the table what I suggested is unnecessary. Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the representation. The only other plausible answer seems to be to flat out reject \u. But I assume nobody likes that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2015-01-26 22:34 GMT+01:00 Jim Nasby : > On 1/22/15 2:01 PM, Pavel Stehule wrote: > >> >> * I would to simplify a behave of evaluating of message >> expression - probably I disallow NULL there. >> >> >> Well, the only thing I could see you doing there is throwing a >> different error if the hint is null. I don't see that as an improvement. >> I'd just leave it as-is. >> >> >> I enabled a NULL - but enforced a WARNING before. >> > > I don't see the separate warning as being helpful. I'd just do something > like > > +(err_hint != NULL) ? errhint("%s", > err_hint) : errhint("Message attached to failed assertion is null") )); > done > > There should also be a test case for a NULL message. > is there, if I understand well Regards Pavel > > * GUC enable_asserts will be supported >> >> >> That would be good. Would that allow for enabling/disabling on a >> per-function basis too? >> >> >> sure - there is only question if we develop a #option >> enable|disable_asserts. I have no string idea. >> > > The option would be nice, but I don't think it's strictly necessary. The > big thing is being able to control this on a per-function basis (which I > think you can do with ALTER FUNCTION SET?) > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 6bcb106..5663983 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** dynamic_library_path = 'C:\tools\postgre *** 6912,6917 --- 6912,6931 + + enable_user_asserts (boolean) + +enable_user_asserts configuration parameter + + + + + If true, any user assertions are evaluated. By default, this + is set to true. + + + + exit_on_error (boolean) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml new file mode 100644 index 69a0885..26f7eba *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** END LOOP label Errors and Messages + + RAISE statement + RAISE *** RAISE unique_violation USING MESSAGE = ' *** 3564,3570 --- 3567,3599 the whole category. + + + ASSERT statement + + + ASSERT + + + + assertions + in PL/pgSQL + + + + Use the ASSERT statement to ensure so expected + predicate is allways true. If the predicate is false or is null, + then a assertion exception is raised. + + + ASSERT expression , message expression ; + + + The user assertions can be enabled or disabled by the + . + + diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt new file mode 100644 index 28c8c40..da12428 *** a/src/backend/utils/errcodes.txt --- b/src/backend/utils/errcodes.txt *** PEERRCODE_PLPGSQL_ERROR *** 454,459 --- 454,460 P0001EERRCODE_RAISE_EXCEPTIONraise_exception P0002EERRCODE_NO_DATA_FOUND no_data_found P0003EERRCODE_TOO_MANY_ROWS too_many_rows + P0004EERRCODE_ASSERT_EXCEPTION assert_exception Section: Class XX - Internal Error diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c new file mode 100644 index c35867b..cd55e94 *** a/src/backend/utils/init/globals.c --- b/src/backend/utils/init/globals.c *** bool IsBinaryUpgrade = false; *** 99,104 --- 99,105 bool IsBackgroundWorker = false; bool ExitOnAnyError = false; + bool enable_user_asserts = true; int DateStyle = USE_ISO_DATES; int DateOrder = DATEORDER_MDY; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c new file mode 100644 index f6df077..b3a2660 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** static struct config_bool ConfigureNames *** 980,985 --- 980,994 }, { + {"enable_user_asserts", PGC_USERSET, ERROR_HANDLING_OPTIONS, + gettext_noop("Enable user asserts checks."), + NULL + }, + &enable_user_asserts, + true, + NULL, NULL, NULL + }, + { {"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS, gettext_noop("Terminate session on any error."), NULL diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h new file mode 100644 index 6e33a17..fab3e8a *** a/src/include/miscadmin.h --- b/src/include/miscadmin.h *** extern bool IsBackgroundWorker; *** 137,142 --- 137,143 extern PGDLLIMPORT bool IsBinaryUpgrade; extern bool ExitOnAnyError; + extern bool enable_user_asserts; extern PG
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/27/2015 02:28 PM, Tom Lane wrote: Andrew Dunstan writes: On 01/27/2015 01:40 PM, Tom Lane wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean an on-disk incompatibility for jsonb data containing U+, but hopefully there is very little of that out there yet. If we can get a fix into 9.4.1, I think it's reasonable to consider such solutions. Hmm, OK. I had thought we'd be ruling that out, but I agree if it's on the table what I suggested is unnecessary. Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the representation. The only other plausible answer seems to be to flat out reject \u. But I assume nobody likes that. I don't think we can be in the business of rejecting valid JSON. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
Hello here is a initial version of row_to_array function - transform any row to array in format proposed by Andrew. Regards Pavel 2015-01-27 19:58 GMT+01:00 Pavel Stehule : > Hi > > 2015-01-27 11:41 GMT+01:00 Pavel Stehule : > >> >> >> 2015-01-26 21:44 GMT+01:00 Jim Nasby : >> >>> On 1/25/15 4:23 AM, Pavel Stehule wrote: >>> I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too It is only a few lines more to current code, and this change doesn't break a compatibility. Do you think, so this patch is acceptable? Ideas, comments? >>> >>> Aside from fixing the comments... I think this needs more tests on >>> corner cases. For example, what happens when you do >>> >>> foreach a, b, c in array(array(1,2),array(3,4)) ? >>> >> >> it is relative simple behave -- empty values are NULL >> >> array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively >> ARRAY[1,2,3,4] >> >> >>> >>> Or the opposite case of >>> >>> foreach a,b in array(array(1,2,3)) >>> >>> Also, what about: >>> >>> foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? >> >> >> >> postgres=# select array(select >> unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); >>array >> --- >> {1,2,3,4,5,6,7,8} >> (1 row) >> >> so it generate pairs {1,2}{3,4},{5,6},{7,8} >> > > I fixed situation when array has not enough elements. > > More tests, simple doc > > Regards > > Pavel > > >> >> Regards >> >> Pavel Stehule >> >> >>> -- >>> Jim Nasby, Data Architect, Blue Treble Consulting >>> Data in Trouble? Get it in Treble! http://BlueTreble.com >>> >> >> > diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c new file mode 100644 index 3dc9a84..d758d2d *** a/src/backend/utils/adt/rowtypes.c --- b/src/backend/utils/adt/rowtypes.c *** *** 21,26 --- 21,27 #include "catalog/pg_type.h" #include "funcapi.h" #include "libpq/pqformat.h" + #include "utils/array.h" #include "utils/builtins.h" #include "utils/lsyscache.h" #include "utils/typcache.h" *** btrecordimagecmp(PG_FUNCTION_ARGS) *** 1810,1812 --- 1811,1898 { PG_RETURN_INT32(record_image_cmp(fcinfo)); } + + /* + * transform any record to array in format [key1, value1, key2, value2 [, ...]] + */ + Datum + row_to_array(PG_FUNCTION_ARGS) + { + HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0); + TupleDesc rectupdesc; + Oid rectuptyp; + int32 rectuptypmod; + HeapTupleData rectuple; + int ncolumns; + Datum *recvalues; + bool *recnulls; + ArrayBuildState *builder; + int i; + + /* Extract type info from the tuple itself */ + rectuptyp = HeapTupleHeaderGetTypeId(rec); + rectuptypmod = HeapTupleHeaderGetTypMod(rec); + rectupdesc = lookup_rowtype_tupdesc(rectuptyp, rectuptypmod); + ncolumns = rectupdesc->natts; + + /* Build a temporary HeapTuple control structure */ + rectuple.t_len = HeapTupleHeaderGetDatumLength(rec); + ItemPointerSetInvalid(&(rectuple.t_self)); + rectuple.t_tableOid = InvalidOid; + rectuple.t_data = rec; + + recvalues = (Datum *) palloc(ncolumns * sizeof(Datum)); + recnulls = (bool *) palloc(ncolumns * sizeof(bool)); + + /* Break down the tuple into fields */ + heap_deform_tuple(&rectuple, rectupdesc, recvalues, recnulls); + + /* Prepare target array */ + builder = initArrayResult(TEXTOID, CurrentMemoryContext); + + for (i = 0; i < ncolumns; i++) + { + Oid columntyp = rectupdesc->attrs[i]->atttypid; + Datum value; + bool isnull; + + /* Ignore dropped columns */ + if (rectupdesc->attrs[i]->attisdropped) + continue; + + builder = accumArrayResult(builder, + CStringGetTextDatum(NameStr(rectupdesc->attrs[i]->attname)), + false, + TEXTOID, + CurrentMemoryContext); + + if (!recnulls[i]) + { + char *outstr; + bool typIsVarlena; + Oid typoutput; + FmgrInfo proc; + + getTypeOutputInfo(columntyp, &typoutput, &typIsVarlena); + fmgr_info_cxt(typoutput, &proc, CurrentMemoryContext); + outstr = OutputFunctionCall(&proc, recvalues[i]); + + value = CStringGetTextDatum(outstr); + isnull = false; + } + else + { + value = (Datum) 0; + isnull = true; + } + + builder = accumArrayResult(builder, + value, isnull, + TEXTOID, + CurrentMemoryContext); + } + + ReleaseTupleDesc(rectupdesc); + + PG_RETURN_DATUM(makeArrayResult(builder, CurrentMemoryContext)); + } diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h new file mode 100644 index 9edfdb8..a27cf4a *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** DATA(insert OID = 376 ( string_to_array *** 891,896 --- 891,898 DESCR("split delimited text into text[], with null string"); DATA(insert OID = 384 ( array_to_string PGNSP PGUID 12 1 0 0 0
Re: [HACKERS] proposal: row_to_array function
Example: postgres=# do $$ declare r record; declare k text; v text; begin for r in select * from foo loop foreach k,v in array row_to_array(r) loop raise notice 'k: %, v: %', k, v; end loop; end loop; end; $$; NOTICE: k: a, v: 2 NOTICE: k: b, v: NAZDAR NOTICE: k: c, v: 2015-01-27 NOTICE: k: a, v: 2 NOTICE: k: b, v: AHOJ NOTICE: k: c, v: 2015-01-27 DO Regards Pavel 2015-01-27 21:26 GMT+01:00 Pavel Stehule : > Hello > > here is a initial version of row_to_array function - transform any row to > array in format proposed by Andrew. > > Regards > > Pavel > > 2015-01-27 19:58 GMT+01:00 Pavel Stehule : > >> Hi >> >> 2015-01-27 11:41 GMT+01:00 Pavel Stehule : >> >>> >>> >>> 2015-01-26 21:44 GMT+01:00 Jim Nasby : >>> On 1/25/15 4:23 AM, Pavel Stehule wrote: > > I tested a concept iteration over array in format [key1, value1, key2, > value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], > ...] too > > It is only a few lines more to current code, and this change doesn't > break a compatibility. > > Do you think, so this patch is acceptable? > > Ideas, comments? > Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do foreach a, b, c in array(array(1,2),array(3,4)) ? >>> >>> it is relative simple behave -- empty values are NULL >>> >>> array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively >>> ARRAY[1,2,3,4] >>> >>> Or the opposite case of foreach a,b in array(array(1,2,3)) Also, what about: foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? >>> >>> >>> >>> postgres=# select array(select >>> unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); >>>array >>> --- >>> {1,2,3,4,5,6,7,8} >>> (1 row) >>> >>> so it generate pairs {1,2}{3,4},{5,6},{7,8} >>> >> >> I fixed situation when array has not enough elements. >> >> More tests, simple doc >> >> Regards >> >> Pavel >> >> >>> >>> Regards >>> >>> Pavel Stehule >>> >>> -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com >>> >>> >> >
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Andrew Dunstan writes: > On 01/27/2015 02:28 PM, Tom Lane wrote: >> Well, we can either fix it now or suffer with a broken representation >> forever. I'm not wedded to the exact solution I described, but I think >> we'll regret it if we don't change the representation. >> >> The only other plausible answer seems to be to flat out reject \u. >> But I assume nobody likes that. > I don't think we can be in the business of rejecting valid JSON. Actually, after studying the code a bit, I wonder if we wouldn't be best off to do exactly that, at least for 9.4.x. At minimum we're talking about an API change for JsonSemAction functions (which currently get the already-de-escaped string as a C string; not gonna work for embedded nulls). I'm not sure if there are already third-party extensions using that API, but it seems possible, in which case changing it in a minor release wouldn't be nice. Even ignoring that risk, making sure we'd fixed everything seems like more than a day's work, which is as much as I for one could spare before 9.4.1. Also, while the idea of throwing error only when a \0 needs to be converted to text seems logically clean, it looks like that might pretty much cripple the usability of such values anyhow, because we convert to text at the drop of a hat. So some investigation and probably additional work would be needed to ensure you could do at least basic things with such values. (A function for direct conversion to bytea might be useful too.) I think the "it would mean rejecting valid JSON" argument is utter hogwash. We already reject, eg, "\u00A0" if you're not using a UTF8 encoding. And we reject "1e1", not because that's invalid JSON but because of an implementation restriction of our underlying numeric type. I don't see any moral superiority of that over rejecting "\u" because of an implementation restriction of our underlying text type. So at this point I propose that we reject \u when de-escaping JSON. Anybody who's seriously unhappy with that can propose a patch to fix it properly in 9.5 or later. We probably need to rethink the re-escaping behavior as well; I'm not sure if your latest patch is the right answer for that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila wrote: > Script used to test is attached (parallel_count.sh) Why does this use EXPLAIN ANALYZE instead of \timing ? > IBM POWER-7 16 cores, 64 hardware threads > RAM = 64GB > > Table Size - 120GB > > Used below statements to create table - > create table tbl_perf(c1 int, c2 char(1000)); > insert into tbl_perf values(generate_series(1,1000),'a'); > insert into tbl_perf values(generate_series(1001,3000),'a'); > insert into tbl_perf values(generate_series(3001,11000),'a'); I generated this table using this same method and experimented with copying the whole file to the bit bucket using dd. I did this on hydra, which I think is the same machine you used. time for i in `seq 0 119`; do if [ $i -eq 0 ]; then f=16388; else f=16388.$i; fi; dd if=$f of=/dev/null bs=8k; done There is a considerable amount of variation in the amount of time this takes to run based on how much of the relation is cached. Clearly, there's no way for the system to cache it all, but it can cache a significant portion, and that affects the results to no small degree. dd on hydra prints information on the data transfer rate; on uncached 1GB segments, it runs at right around 400 MB/s, but that can soar to upwards of 3GB/s when the relation is fully cached. I tried flushing the OS cache via echo 1 > /proc/sys/vm/drop_caches, and found that immediately after doing that, the above command took 5m21s to run - i.e. ~321000 ms. Most of your test times are faster than that, which means they reflect some degree of caching. When I immediately reran the command a second time, it finished in 4m18s the second time, or ~258000 ms. The rate was the same as the first test - about 400 MB/s - for most of the files, but 27 of the last 28 files went much faster, between 1.3 GB/s and 3.7 GB/s. This tells us that the OS cache on this machine has anti-spoliation logic in it, probably not dissimilar to what we have in PG. If the data were cycled through the system cache in strict LRU fashion, any data that was leftover from the first run would have been flushed out by the early part of the second run, so that all the results from the second set of runs would have hit the disk. But in fact, that's not what happened: the last pages from the first run remained cached even after reading an amount of new data that exceeds the size of RAM on that machine. What I think this demonstrates is that we're going to have to be very careful to control for caching effects, or we may find that we get misleading results. To make this simpler, I've installed a setuid binary /usr/bin/drop_caches that you (or anyone who has an account on that machine) can use you drop the caches; run 'drop_caches 1'. > Block-By-Block > > No. of workers/Time (ms) 0 2 > Run-1 267798 295051 > Run-2 276646 296665 > Run-3 281364 314952 > Run-4 290231 326243 > Run-5 288890 295684 The next thing I did was run test with the block-by-block method after having dropped the caches. I did this with 0 workers and with 8 workers. I dropped the caches and restarted postgres before each test, but then ran each test a second time to see the effect of caching by both the OS and by PostgreSQL. I got these results: With 0 workers, first run took 883465.352 ms, and second run took 295050.106 ms. With 8 workers, first run took 340302.250 ms, and second run took 307767.758 ms. This is a confusing result, because you expect parallelism to help more when the relation is partly cached, and make little or no difference when it isn't cached. But that's not what happened. I've also got a draft of a prefetching implementation here that I'd like to test out, but I've just discovered that it's buggy, so I'm going to send these results for now and work on fixing that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Robert, all, * Robert Haas (robertmh...@gmail.com) wrote: > There is a considerable amount of variation in the amount of time this > takes to run based on how much of the relation is cached. Clearly, > there's no way for the system to cache it all, but it can cache a > significant portion, and that affects the results to no small degree. > dd on hydra prints information on the data transfer rate; on uncached > 1GB segments, it runs at right around 400 MB/s, but that can soar to > upwards of 3GB/s when the relation is fully cached. I tried flushing > the OS cache via echo 1 > /proc/sys/vm/drop_caches, and found that > immediately after doing that, the above command took 5m21s to run - > i.e. ~321000 ms. Most of your test times are faster than that, which > means they reflect some degree of caching. When I immediately reran > the command a second time, it finished in 4m18s the second time, or > ~258000 ms. The rate was the same as the first test - about 400 MB/s > - for most of the files, but 27 of the last 28 files went much faster, > between 1.3 GB/s and 3.7 GB/s. [...] > With 0 workers, first run took 883465.352 ms, and second run took 295050.106 > ms. > With 8 workers, first run took 340302.250 ms, and second run took 307767.758 > ms. > > This is a confusing result, because you expect parallelism to help > more when the relation is partly cached, and make little or no > difference when it isn't cached. But that's not what happened. These numbers seem to indicate that the oddball is the single-threaded uncached run. If I followed correctly, the uncached 'dd' took 321s, which is relatively close to the uncached-lots-of-workers and the two cached runs. What in the world is the uncached single-thread case doing that it takes an extra 543s, or over twice as long? It's clearly not disk i/o which is causing the slowdown, based on your dd tests. One possibility might be round-trip latency. The multi-threaded case is able to keep the CPUs and the i/o system going, and the cached results don't have as much latency since things are cached, but the single-threaded uncached case going i/o -> cpu -> i/o -> cpu, ends up with a lot of wait time as it switches between being on CPU and waiting on the i/o. Just some thoughts. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
On 1/26/15 6:11 PM, Greg Stark wrote: On Tue, Jan 27, 2015 at 12:03 AM, Jim Nasby mailto:jim.na...@bluetreble.com>> wrote: But one backend can effectively "pin" a buffer more than once, no? If so, then ISTM there's some risk that code path A pins and forgets to unpin, but path B accidentally unpins for A. The danger is that there's a codepath that refers to memory it doesn't have a pin on but that there is no actual test in our regression suite that doesn't actually have a second pin on the same buffer. If there is in fact no reachable code path at all without the second pin then there's no active bug, only a bad coding practice. But if there are code paths that we just aren't testing then that's a real bug. IIRC CLOBBER_FREED_MEMORY only affects palloc'd blocks. Do we not have a mode that automatically removes any buffer as soon as it's not pinned? That seems like it would be a valuable addition. By setting BufferDesc.tag to 0? On a related note... I'm confused by this part of UnpinBuffer. How is refcount ending up > 0?? Assert(ref->refcount > 0); ref->refcount--; if (ref->refcount == 0) { /* I'd better not still hold any locks on the buffer */ Assert(!LWLockHeldByMe(buf->content_lock)); Assert(!LWLockHeldByMe(buf->io_in_progress_lock)); LockBufHdr(buf); /* Decrement the shared reference count */ Assert(buf->refcount > 0); buf->refcount--; BTW, I certainly see no evidence of CLOBBER_FREED_MEMORY coming into play here. Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I suppose the same might not be true for rarely called codepaths or in cases where the buffers are usually already pinned. Yeah, that's what I was thinking. If there's some easy way to correctly associate pins with specific code paths (owners?) then maybe it's worth doing so; but I don't think it's worth much effort. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On 1/27/15 1:04 AM, Haribabu Kommi wrote: On Mon, Jun 30, 2014 at 5:06 PM, Abhijit Menon-Sen wrote: I think having two columns would work. The columns could be called "database" and "database_list" and "user" and "user_list" respectively. The database column may contain one of "all", "sameuser", "samegroup", "replication", but if it's empty, database_list will contain an array of database names. Then ("all", {}) and ("", {all}) are easily separated. Likewise for user and user_list. Thanks for the review. I corrected all the review comments except the one to add two columns as (database, database_list and user, user_list). I feel this may cause some confusion to the users. Here I attached the latest version of the patch. I will add this patch to the next commitfest. Apologies if this was covered, but why isn't the IP address an inet instead of text? Also, what happens if someone reloads the config in the middle of running the SRF? ISTM it'd be better to do something like process all of parsed_hba_lines into a tuplestore. Obviously there's still a race condition there, but at least it's a lot smaller, and AFAIK no worse than the pg_stats views. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/26/15 4:45 PM, Robert Haas wrote: On Mon, Jan 26, 2015 at 5:21 PM, Stephen Frost wrote: I don't disagree with you about any of that. I don't think you disagree with my comment either. What I'm not entirely clear on is how consensus could be reached. Leaving it dormant for the better part of a year certainly doesn't appear to have helped that situation. We've discussed having it be part of the main server and having it be a contrib module and until about a week ago, I had understood that having it in contrib would be preferrable. Based on the recent emails, it appears there's been a shift of preference to having it be in-core, but clearly there's no time left to do that in this release cycle. Well, I'm not sure that anyone else here agreed with me on that, and one person does not a consensus make no matter who it is. The basic problem here is that we don't seem to have even two people here who agree on how this ought to be done. The basic dynamic here seems to be you asking for changes and Abhijit making them but without any real confidence, and I don't feel good about that. I'm willing to defer to an emerging consensus here when there is one, but what Abhijit likes best is not a consensus, and neither is what you like, and neither is what I like. What we need is some people agreeing with each other. BTW, I know that at least earlier versions of EnterpriseDB's version of Postgres (circa 2007) had an auditing feature. I never dealt with any customers who were using it when I was there, but perhaps some other folks could shed some light on what customers wanted to see an an auditing feature... (I'm looking at you, Jimbo!) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On 1/27/15 3:56 AM, Abhijit Menon-Sen wrote: At 2015-01-26 18:47:29 -0600, jim.na...@bluetreble.com wrote: Anything happen with this? Nothing so far. It would be best to get this into a commit fest so it's not lost. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Jan 23, 2015 at 6:42 AM, Amit Kapila wrote: > Fixed-Chunks > > No. of workers/Time (ms) 0 2 4 8 16 24 32 > Run-1 250536 266279 251263 234347 87930 50474 35474 > Run-2 249587 230628 225648 193340 83036 35140 9100 > Run-3 234963 220671 230002 256183 105382 62493 27903 > Run-4 239111 245448 224057 189196 123780 63794 24746 > Run-5 239937 222820 219025 220478 114007 77965 39766 I cannot reproduce these results. I applied your fixed-chunk size patch and ran SELECT parallel_count('tbl_perf', 32) a few times. The first thing I notice is that, as I predicted, there's an issue with different workers finishing at different times. For example, from my first run: 2015-01-27 22:13:09 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34700) exited with exit code 0 2015-01-27 22:13:09 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34698) exited with exit code 0 2015-01-27 22:13:09 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34701) exited with exit code 0 2015-01-27 22:13:10 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34699) exited with exit code 0 2015-01-27 22:15:00 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34683) exited with exit code 0 2015-01-27 22:15:29 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34673) exited with exit code 0 2015-01-27 22:15:58 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34679) exited with exit code 0 2015-01-27 22:16:38 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34689) exited with exit code 0 2015-01-27 22:16:39 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34671) exited with exit code 0 2015-01-27 22:16:47 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34677) exited with exit code 0 2015-01-27 22:16:47 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34672) exited with exit code 0 2015-01-27 22:16:48 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34680) exited with exit code 0 2015-01-27 22:16:50 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34686) exited with exit code 0 2015-01-27 22:16:51 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34670) exited with exit code 0 2015-01-27 22:16:51 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34690) exited with exit code 0 2015-01-27 22:16:51 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34674) exited with exit code 0 2015-01-27 22:16:52 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34684) exited with exit code 0 2015-01-27 22:16:53 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34675) exited with exit code 0 2015-01-27 22:16:53 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34682) exited with exit code 0 2015-01-27 22:16:53 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34691) exited with exit code 0 2015-01-27 22:16:54 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34676) exited with exit code 0 2015-01-27 22:16:54 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34685) exited with exit code 0 2015-01-27 22:16:55 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34692) exited with exit code 0 2015-01-27 22:16:56 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34687) exited with exit code 0 2015-01-27 22:16:56 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34678) exited with exit code 0 2015-01-27 22:16:57 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34681) exited with exit code 0 2015-01-27 22:16:57 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34688) exited with exit code 0 2015-01-27 22:16:59 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34694) exited with exit code 0 2015-01-27 22:16:59 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34693) exited with exit code 0 2015-01-27 22:17:02 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34695) exited with exit code 0 2015-01-27 22:17:02 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34697) exited with exit code 0 2015-01-27 22:17:02 UTC [34660] LOG: worker process: parallel worker for PID 34668 (PID 34696) exited with exit code 0 That run started at 22:13:01. Within 4 seconds, 4 workers exited. So clearly we are not getting the promised 32-way parallelism for the whole test. Granted, in this instance, *most* of the workers run until the end, but I think we'll find that there are uncomfortably-frequent cases where we get significantly less parallelism than we planned on because the work isn't divided evenly. But leaving that aside, I've run this test 6 times in a row now, with a warm cache, and the bes
Re: [HACKERS] proposal: searching in array function - array_position
On 1/27/15 4:36 AM, Pavel Stehule wrote: 2015-01-26 23:29 GMT+01:00 Jim Nasby mailto:jim.na...@bluetreble.com>>: On 1/26/15 4:17 PM, Pavel Stehule wrote: Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code? I though about it - but there is different checks, different result processing, different result type. I didn't find any readable and reduced form :( Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at least remove a dozen lines... It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data. Externally cached data? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Tue, Jan 27, 2015 at 5:55 PM, Jim Nasby wrote: > BTW, I know that at least earlier versions of EnterpriseDB's version of > Postgres (circa 2007) had an auditing feature. I never dealt with any > customers who were using it when I was there, but perhaps some other folks > could shed some light on what customers wanted to see an an auditing > feature... (I'm looking at you, Jimbo!) It's still there, but it's nothing like pgaudit. What I hear is that our customers are looking for something that has the capabilities of DBMS_FGA. I haven't researched either that or pgaudit enough to know how similar they are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/27/15 9:29 AM, Stephen Frost wrote: My point is that Bruce's patch suggests looking for "remote_dir" in >the rsync documentation, but no such term appears there. Ah, well, perhaps we could simply add a bit of clarification to this: for details on specifying remote_dir The whole remote_dir discussion made me think of something... would --link-dest be any help here? --link-dest=DIR This option behaves like --copy-dest, but unchanged files are hard linked from DIR to the des- tination directory. The files must be identical in all preserved attributes (e.g. permis- sions, possibly ownership) in order for the files to be linked together. An example: rsync -av --link-dest=$PWD/prior_dir host:src_dir/ new_dir/ -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
On 1/27/15 1:30 PM, Pavel Stehule wrote: I don't see the separate warning as being helpful. I'd just do something like +(err_hint != NULL) ? errhint("%s", err_hint) : errhint("Message attached to failed assertion is null") )); done There should also be a test case for a NULL message. is there, if I understand well I see it now. Looks good. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 1/27/15 2:26 PM, Pavel Stehule wrote: here is a initial version of row_to_array function - transform any row to array in format proposed by Andrew. Please start a new thread for this... does it depend on the key-value patch? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
Jim Nasby writes: > On 1/26/15 6:11 PM, Greg Stark wrote: >> Fwiw I think our experience is that bugs where buffers are unpinned get >> exposed pretty quickly in production. I suppose the same might not be true >> for rarely called codepaths or in cases where the buffers are usually >> already pinned. > Yeah, that's what I was thinking. If there's some easy way to correctly > associate pins with specific code paths (owners?) then maybe it's worth doing > so; but I don't think it's worth much effort. If you have a working set larger than shared_buffers, then yeah it's likely that reference-after-unpin bugs would manifest pretty quickly. This does not necessarily translate into something reproducible or debuggable, however; and besides that you'd really rather that such bugs not get into the field in the first place. The point of my Valgrind proposal was to provide a mechanism that would have a chance of catching such bugs in a *development* context, and provide some hint of where in the codebase the fault is, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On Tue, Jan 27, 2015 at 12:40 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 01/27/2015 12:23 PM, Tom Lane wrote: >>> I think coding anything is premature until we decide how we're going to >>> deal with the fundamental ambiguity. > >> The input \\uabcd will be stored correctly as \uabcd, but this will in >> turn be rendered as \uabcd, whereas it should be rendered as \\uabcd. >> That's what the patch fixes. >> There are two problems here and this addresses one of them. The other >> problem is the ambiguity regarding \\u and \u. > > It's the same problem really, and until we have an answer about > what to do with \u, I think any patch is half-baked and possibly > counterproductive. > > In particular, I would like to suggest that the current representation of > \u is fundamentally broken and that we have to change it, not try to > band-aid around it. This will mean an on-disk incompatibility for jsonb > data containing U+, but hopefully there is very little of that out > there yet. If we can get a fix into 9.4.1, I think it's reasonable to > consider such solutions. > > The most obvious way to store such data unambiguously is to just go ahead > and store U+ as a NUL byte (\000). The only problem with that is that > then such a string cannot be considered to be a valid value of type TEXT, > which would mean that we'd need to throw an error if we were asked to > convert a JSON field containing such a character to text. Hm, does this include text out operations for display purposes? If so, then any query selecting jsonb objects with null bytes would fail. How come we have to error out? How about a warning indicating the string was truncated? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 1/27/15 12:58 PM, Pavel Stehule wrote: postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); array --- {1,2,3,4,5,6,7,8} (1 row) so it generate pairs {1,2}{3,4},{5,6},{7,8} I fixed situation when array has not enough elements. More tests, simple doc Hrm, this wasn't what I was expecting: + select foreach_test_ab(array[1,2,3,4]); + NOTICE: a: 1, b: 2 + NOTICE: a: 3, b: 4 I was expecting that foreach a,b array would be expecting something in the array to have a dimension of 2. :( I think this is bad, because this: foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]); will give you 1,2; 3,4; 5,6. I don't see any way that that makes sense. Even if it did make sense, I'm more concerned that adding this will seriously paint us into a corner when it comes to the (to me) more rational case of returning {1,2,3},{4,5,6}. I think we need to think some more about this, at least to make sure we're not painting ourselves into a corner for more appropriate array iteration. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
Merlin Moncure writes: > On Tue, Jan 27, 2015 at 12:40 PM, Tom Lane wrote: >> In particular, I would like to suggest that the current representation of >> \u is fundamentally broken and that we have to change it, not try to >> band-aid around it. This will mean an on-disk incompatibility for jsonb >> data containing U+, but hopefully there is very little of that out >> there yet. If we can get a fix into 9.4.1, I think it's reasonable to >> consider such solutions. >> >> The most obvious way to store such data unambiguously is to just go ahead >> and store U+ as a NUL byte (\000). The only problem with that is that >> then such a string cannot be considered to be a valid value of type TEXT, >> which would mean that we'd need to throw an error if we were asked to >> convert a JSON field containing such a character to text. > Hm, does this include text out operations for display purposes? If > so, then any query selecting jsonb objects with null bytes would fail. > How come we have to error out? How about a warning indicating the > string was truncated? No, that's not a problem, because jsonb_out would produce an escaped textual representation, so any null would come out as \u again. The trouble comes up when you do something that's supposed to produce a *non escaped* text equivalent of a JSON string value, such as the ->> operator. Arguably, ->> is broken already with the current coding, in that these results are entirely inconsistent: regression=# select '{"a":"foo\u0040bar"}'::jsonb ->> 'a'; ?column? -- foo@bar (1 row) regression=# select '{"a":"foo\ubar"}'::jsonb ->> 'a'; ?column? -- foo\ubar (1 row) regression=# select '{"a":"foo\\ubar"}'::jsonb ->> 'a'; ?column? -- foo\ubar (1 row) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/26/15 11:11 PM, Amit Kapila wrote: On Tue, Jan 27, 2015 at 3:18 AM, Jim Nasby mailto:jim.na...@bluetreble.com>> wrote: > > On 1/23/15 10:16 PM, Amit Kapila wrote: >> >> Further, if we want to just get the benefit of parallel I/O, then >> I think we can get that by parallelising partition scan where different >> table partitions reside on different disk partitions, however that is >> a matter of separate patch. > > > I don't think we even have to go that far. > > > We'd be a lot less sensitive to IO latency. > > I wonder what kind of gains we would see if every SeqScan in a query spawned a worker just to read tuples and shove them in a queue (or shove a pointer to a buffer in the queue). > Here IIUC, you want to say that just get the read done by one parallel worker and then all expression calculation (evaluation of qualification and target list) in the main backend, it seems to me that by doing it that way, the benefit of parallelisation will be lost due to tuple communication overhead (may be the overhead is less if we just pass a pointer to buffer but that will have another kind of problems like holding buffer pins for a longer period of time). I could see the advantage of testing on lines as suggested by Tom Lane, but that seems to be not directly related to what we want to achieve by this patch (parallel seq scan) or if you think otherwise then let me know? There's some low-hanging fruit when it comes to improving our IO performance (or more specifically, decreasing our sensitivity to IO latency). Perhaps the way to do that is with the parallel infrastructure, perhaps not. But I think it's premature to look at parallelism for increasing IO performance, or worrying about things like how many IO threads we should have before we at least look at simpler things we could do. We shouldn't assume there's nothing to be gained short of a full parallelization implementation. That's not to say there's nothing else we could use parallelism for. Sort, merge and hash operations come to mind. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/27/15 3:46 PM, Stephen Frost wrote: With 0 workers, first run took 883465.352 ms, and second run took 295050.106 ms. >With 8 workers, first run took 340302.250 ms, and second run took 307767.758 ms. > >This is a confusing result, because you expect parallelism to help >more when the relation is partly cached, and make little or no >difference when it isn't cached. But that's not what happened. These numbers seem to indicate that the oddball is the single-threaded uncached run. If I followed correctly, the uncached 'dd' took 321s, which is relatively close to the uncached-lots-of-workers and the two cached runs. What in the world is the uncached single-thread case doing that it takes an extra 543s, or over twice as long? It's clearly not disk i/o which is causing the slowdown, based on your dd tests. One possibility might be round-trip latency. The multi-threaded case is able to keep the CPUs and the i/o system going, and the cached results don't have as much latency since things are cached, but the single-threaded uncached case going i/o -> cpu -> i/o -> cpu, ends up with a lot of wait time as it switches between being on CPU and waiting on the i/o. This exactly mirrors what I've seen on production systems. On a single SeqScan I can't get anywhere close to the IO performance I could get with dd. Once I got up to 4-8 SeqScans of different tables running together, I saw iostat numbers that were similar to what a single dd bs=8k would do. I've tested this with iSCSI SAN volumes on both 1Gbit and 10Gbit ethernet. This is why I think that when it comes to IO performance, before we start worrying about real parallelization we should investigate ways to do some kind of async IO. I only have my SSD laptop and a really old server to test on, but I'll try Tom's suggestion of adding a PrefetchBuffer call into heapgetpage() unless someone beats me to it. I should be able to do it tomorrow. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/27/15 5:06 PM, Robert Haas wrote: On Tue, Jan 27, 2015 at 5:55 PM, Jim Nasby wrote: BTW, I know that at least earlier versions of EnterpriseDB's version of Postgres (circa 2007) had an auditing feature. I never dealt with any customers who were using it when I was there, but perhaps some other folks could shed some light on what customers wanted to see an an auditing feature... (I'm looking at you, Jimbo!) It's still there, but it's nothing like pgaudit. What I hear is that our customers are looking for something that has the capabilities of DBMS_FGA. I haven't researched either that or pgaudit enough to know how similar they are. Do they really need the full capabilities of DBMS_FGA? At first glance, it looks even more sophisticated than anything that's been discussed so far. To wit (from http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_fga.htm#CDEIECAG): DBMS_FGA.ADD_POLICY( object_schema VARCHAR2, object_nameVARCHAR2, policy_nameVARCHAR2, audit_conditionVARCHAR2, audit_column VARCHAR2, handler_schema VARCHAR2, handler_module VARCHAR2, enable BOOLEAN, statement_typesVARCHAR2, audit_trailBINARY_INTEGER IN DEFAULT, audit_column_opts BINARY_INTEGER IN DEFAULT); -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
On 1/27/15 5:16 PM, Tom Lane wrote: Jim Nasby writes: On 1/26/15 6:11 PM, Greg Stark wrote: Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I suppose the same might not be true for rarely called codepaths or in cases where the buffers are usually already pinned. Yeah, that's what I was thinking. If there's some easy way to correctly associate pins with specific code paths (owners?) then maybe it's worth doing so; but I don't think it's worth much effort. If you have a working set larger than shared_buffers, then yeah it's likely that reference-after-unpin bugs would manifest pretty quickly. This does not necessarily translate into something reproducible or debuggable, however; and besides that you'd really rather that such bugs not get into the field in the first place. The point of my Valgrind proposal was to provide a mechanism that would have a chance of catching such bugs in a *development* context, and provide some hint of where in the codebase the fault is, too. That's what I was looking for two; I was just wondering if there was an easy way to also cover the case of one path forgetting to Unpin and a second path accidentally Unpinning (with neither dropping the refcount to 0). It sounds like it's just not worth worrying about that though. Do you think there's merit to having bufmgr.c do something special when refcount hits 0 in a CLOBBER_FREED_MEMORY build? It seems like it's a lot easier to enable that than to setup valgrind (though I've never tried the latter). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Tue, Jan 27, 2015 at 6:54 PM, Jim Nasby wrote: > On 1/27/15 5:06 PM, Robert Haas wrote: >> >> On Tue, Jan 27, 2015 at 5:55 PM, Jim Nasby >> wrote: >>> >>> BTW, I know that at least earlier versions of EnterpriseDB's version of >>> Postgres (circa 2007) had an auditing feature. I never dealt with any >>> customers who were using it when I was there, but perhaps some other >>> folks >>> could shed some light on what customers wanted to see an an auditing >>> feature... (I'm looking at you, Jimbo!) >> >> >> It's still there, but it's nothing like pgaudit. What I hear is that >> our customers are looking for something that has the capabilities of >> DBMS_FGA. I haven't researched either that or pgaudit enough to know >> how similar they are. > > > Do they really need the full capabilities of DBMS_FGA? At first glance, it > looks even more sophisticated than anything that's been discussed so far. To > wit (from > http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_fga.htm#CDEIECAG): > > DBMS_FGA.ADD_POLICY( >object_schema VARCHAR2, >object_nameVARCHAR2, >policy_nameVARCHAR2, >audit_conditionVARCHAR2, >audit_column VARCHAR2, >handler_schema VARCHAR2, >handler_module VARCHAR2, >enable BOOLEAN, >statement_typesVARCHAR2, >audit_trailBINARY_INTEGER IN DEFAULT, >audit_column_opts BINARY_INTEGER IN DEFAULT); I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On Tue, Jan 27, 2015 at 2:16 PM, Andres Freund wrote: > That'd end up essentially being a re-emulation of locks. Don't find that > all that pretty. But maybe we have to go there. The advantage of it is that it is simple. The only thing we're really giving up is the deadlock detector, which I think isn't needed in this case. > Here's an alternative approach. I think it generally is superior and > going in the right direction, but I'm not sure it's backpatchable. I tend think this is changing too many things to back-patch. It might all be fine, but it's pretty wide-reaching, so the chances of collateral damage seem non-trivial. Even in master, I'm not sure I see the point in rocking the apple cart to this degree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c
On Wed, Jan 28, 2015 at 3:15 AM, Tom Lane wrote: > So I'm fine with taking out both this documentation text and the dead > null-pointer checks; but let's do that all in one patch not piecemeal. > In any case, this is just cosmetic cleanup; there's no actual hazard > here. Attached is a patch with all those things done. I added as well an assertion in gistKeyIsEQ checking if the input datums are NULL. I believe that this is still useful for developers, feel free to rip it out from the patch if you think otherwise. Regards, -- Michael From 95b051aac56de68412a7b0635484e46eb4e24ad0 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 28 Jan 2015 09:36:11 +0900 Subject: [PATCH] Remove dead NULL-pointer checks in GiST code The key value passed to the same method is generated by either the compress, union or picksplit methods, but it happens that none of them actually pass a NULL pointer. Some compress methods do not check for a NULL pointer, what should have resulted in a crash. Note that gist_poly_compress() and gist_circle_compress() do check for a NULL, but they only return a NULL key if the input key is NULL, which cannot happen because the input comes from a table. This commit also removes a documentation note added in commit a0a3883 that has been introduced based on the fact that some routines of the same method were doing NULL-pointer checks, and adds an assertion in gistKeyIsEQ to ensure that no NULL keys are passed when calling the same method. --- contrib/btree_gist/btree_utils_num.c | 8 ++ contrib/btree_gist/btree_utils_var.c | 10 ++- doc/src/sgml/gist.sgml | 5 src/backend/access/gist/gistproc.c | 56 src/backend/access/gist/gistutil.c | 3 ++ 5 files changed, 25 insertions(+), 57 deletions(-) diff --git a/contrib/btree_gist/btree_utils_num.c b/contrib/btree_gist/btree_utils_num.c index 505633c..7c5b911 100644 --- a/contrib/btree_gist/btree_utils_num.c +++ b/contrib/btree_gist/btree_utils_num.c @@ -147,13 +147,9 @@ gbt_num_same(const GBT_NUMKEY *a, const GBT_NUMKEY *b, const gbtree_ninfo *tinfo b2.lower = &(((GBT_NUMKEY *) b)[0]); b2.upper = &(((GBT_NUMKEY *) b)[tinfo->size]); - if ( - (*tinfo->f_eq) (b1.lower, b2.lower) && - (*tinfo->f_eq) (b1.upper, b2.upper) - ) - return TRUE; - return FALSE; + return ((*tinfo->f_eq) (b1.lower, b2.lower) && + (*tinfo->f_eq) (b1.upper, b2.upper)); } diff --git a/contrib/btree_gist/btree_utils_var.c b/contrib/btree_gist/btree_utils_var.c index b7dd060..6ad3347 100644 --- a/contrib/btree_gist/btree_utils_var.c +++ b/contrib/btree_gist/btree_utils_var.c @@ -337,7 +337,6 @@ bool gbt_var_same(Datum d1, Datum d2, Oid collation, const gbtree_vinfo *tinfo) { - bool result; GBT_VARKEY *t1 = (GBT_VARKEY *) DatumGetPointer(d1); GBT_VARKEY *t2 = (GBT_VARKEY *) DatumGetPointer(d2); GBT_VARKEY_R r1, @@ -346,13 +345,8 @@ gbt_var_same(Datum d1, Datum d2, Oid collation, r1 = gbt_var_key_readable(t1); r2 = gbt_var_key_readable(t2); - if (t1 && t2) - result = ((*tinfo->f_cmp) (r1.lower, r2.lower, collation) == 0 && - (*tinfo->f_cmp) (r1.upper, r2.upper, collation) == 0); - else - result = (t1 == NULL && t2 == NULL); - - return result; + return ((*tinfo->f_cmp) (r1.lower, r2.lower, collation) == 0 && + (*tinfo->f_cmp) (r1.upper, r2.upper, collation) == 0); } diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml index 5de282b..f9644b0 100644 --- a/doc/src/sgml/gist.sgml +++ b/doc/src/sgml/gist.sgml @@ -498,11 +498,6 @@ my_compress(PG_FUNCTION_ARGS) course. - -Depending on your needs, you could also need to care about -compressing NULL values in there, storing for example -(Datum) 0 like gist_circle_compress does. - diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c index 4decaa6..1df6357 100644 --- a/src/backend/access/gist/gistproc.c +++ b/src/backend/access/gist/gistproc.c @@ -1039,25 +1039,15 @@ gist_poly_compress(PG_FUNCTION_ARGS) if (entry->leafkey) { - retval = palloc(sizeof(GISTENTRY)); - if (DatumGetPointer(entry->key) != NULL) - { - POLYGON*in = DatumGetPolygonP(entry->key); - BOX *r; + POLYGON*in = DatumGetPolygonP(entry->key); + BOX *r; - r = (BOX *) palloc(sizeof(BOX)); - memcpy((void *) r, (void *) &(in->boundbox), sizeof(BOX)); - gistentryinit(*retval, PointerGetDatum(r), - entry->rel, entry->page, - entry->offset, FALSE); - - } - else - { - gistentryinit(*retval, (Datum) 0, - entry->rel, entry->page, - entry->offset, FALSE); - } + retval = palloc(sizeof(GISTENTRY)); + r = (BOX *) palloc(sizeof(BOX)); + memcpy((void *) r, (void *) &(in->boundbox), sizeof(BOX)); + gistentryinit(*retval, PointerGetDatum(r), + entry->rel, entry->page, + entry->offset, FALSE); } else retval = entry; @@ -1113,28 +1103,18 @@ gist_circle_compress(PG_FUNCTION
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-27 19:24:24 -0500, Robert Haas wrote: > On Tue, Jan 27, 2015 at 2:16 PM, Andres Freund wrote: > > That'd end up essentially being a re-emulation of locks. Don't find that > > all that pretty. But maybe we have to go there. > > The advantage of it is that it is simple. The only thing we're really > giving up is the deadlock detector, which I think isn't needed in this > case. I think it's more than just the deadlock detector. Consider pg_locks/pg_stat_activity.waiting, cancelling a acquisition, error cleanup and recursive acquisitions. Acquiring session locks + a surrounding transaction command deals with with cleanups without introducing PG_ENSURE blocks in a couple places. And we need recursive acquisition so a streaming base backup can acquire the lock over the whole runtime, while a manual pg_start_backup() does only for its own time. Especially the visibility in the system views is something I'd not like to give up in additional blocks we introduce in the backbranches. > > Here's an alternative approach. I think it generally is superior and > > going in the right direction, but I'm not sure it's backpatchable. > > I tend think this is changing too many things to back-patch. It might > all be fine, but it's pretty wide-reaching, so the chances of > collateral damage seem non-trivial. Even in master, I'm not sure I > see the point in rocking the apple cart to this degree. It's big, true. But a fair amount of it stuff I think we have to do anyway. The current code acquiring session locks in dbase_redo() is borked - we either assert or segfault if there's a connection in the wrong moment beause there's no deadlock handler. Plus it has races that aren't that hard to hit :(. To fix the crashes (but not the race) we can have a separate ResolveRecoveryConflictWithObjectLock that doesn't record the existance of the lock, but doesn't ever do a ProcSleep(). Not pretty either :( Seems like a situation with no nice solutions so far :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/27/15 6:09 PM, Jim Nasby wrote: > The whole remote_dir discussion made me think of something... would > --link-dest be any help here? I'm pretty sure --link-dest would not be effective in this case. The problem exists on the source side and --link-dest only operates on the destination. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Additional role attributes && superuser review
All, I have attached and updated patch for review. Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index 62305d2..fd4b9ab *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 1445,1450 --- 1445,1486 + rolbypassrls + bool + Role can bypass row level security + + + + rolexclbackup + bool + Role can perform on-line exclusive backup operations + + + + rolxlogreplay + bool + Role can control xlog recovery replay operations + + + + rollogfile + bool + Role can rotate log files + + + + rolmonitor + bool + Role can view pg_stat_* details + + + + rolsignal + bool + Role can signal backend processes + + + rolconnlimit int4 diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index d57243a..096c770 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT set_config('log_statement_stats', *** 16425,16438 pg_start_backup(label text , fast boolean ) pg_lsn !Prepare for performing on-line backup (restricted to superusers or replication roles) pg_stop_backup() pg_lsn !Finish performing on-line backup (restricted to superusers or replication roles) --- 16425,16438 pg_start_backup(label text , fast boolean ) pg_lsn !Prepare for performing on-line exclusive backup (restricted to superusers or replication roles) pg_stop_backup() pg_lsn !Finish performing on-line exclusive backup (restricted to superusers or replication roles) diff --git a/doc/src/sgml/ref/create_role.sgml b/doc/src/sgml/ref/create_role.sgml new file mode 100644 index ea26027..0fc3b42 *** a/doc/src/sgml/ref/create_role.sgml --- b/doc/src/sgml/ref/create_role.sgml *** CREATE ROLE connlimit | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password' | VALID UNTIL 'timestamp' *** CREATE ROLE connlimit diff --git a/doc/src/sgml/ref/create_user.sgml b/doc/src/sgml/ref/create_user.sgml new file mode 100644 index 065999c..f7f10c7 *** a/doc/src/sgml/ref/create_user.sgml --- b/doc/src/sgml/ref/create_user.sgml *** CREATE USER connlimit | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password' | VALID UNTIL 'timestamp' diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 2179bf7..12b8a17 *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** *** 27,32 --- 27,33 #include "miscadmin.h" #include "replication/walreceiver.h" #include "storage/smgr.h" + #include "utils/acl.h" #include "utils/builtins.h" #include "utils/numeric.h" #include "utils/guc.h" *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,63 backupidstr = text_to_cstring(backupid); ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! errmsg("must be superuser or replication role to run a backup"))); startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL); --- 55,65 backupidstr = text_to_cstring(backupid); ! if (!has_replication_privilege(GetUserId()) ! && !has_exclbackup_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! errmsg("must be superuser, replication role or exclusive backup role to run a backup"))); startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL); *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,91 { XLogRecPtr stoppoint; ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! (errmsg("must be superuser or replication role to run a backup"; stoppoint = do_pg_stop_backup(NULL, true, NULL); --- 84,94 { XLogRecPtr stoppoint; ! if (!has_replication_privilege(GetUserId()) ! && !has_exclbackup_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! errmsg("must be superuser, replication role or exclusive backup role to run a backup"))); stoppoint = do_pg_stop_backup(NULL, true, NULL); *** pg_switch_xlog(PG_FUNCTION_ARGS) *** 100,109 { XLogRecPtr switchpoint; ! if (!superuser()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! (errmsg("must be superuser to sw
Re: [HACKERS] [POC] FETCH limited by bytes.
Last year I was working on a patch to postgres_fdw where the fetch_size could be set at the table level and the server level. I was able to get the settings parsed and they would show up in pg_foreign_table and pg_foreign_servers. Unfortunately, I'm not very familiar with how foreign data wrappers work, so I wasn't able to figure out how to get these custom values passed from the PgFdwRelationInfo struct into the query's PgFdwScanState struct. I bring this up only because it might be a simpler solution, in that the table designer could set the fetch size very high for narrow tables, and lower or default for wider tables. It's also a very clean syntax, just another option on the table and/or server creation. My incomplete patch is attached. On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Thank you for the comment. > > The automatic way to determin the fetch_size looks become too > much for the purpose. An example of non-automatic way is a new > foreign table option like 'fetch_size' but this exposes the > inside too much... Which do you think is preferable? > > Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane wrote in < > 24503.1421943...@sss.pgh.pa.us> > > Kyotaro HORIGUCHI writes: > > > Hello, as the discuttion on async fetching on postgres_fdw, FETCH > > > with data-size limitation would be useful to get memory usage > > > stability of postgres_fdw. > > > > > Is such a feature and syntax could be allowed to be added? > > > > This seems like a lot of work, and frankly an incredibly ugly API, > > for a benefit that is entirely hypothetical. Have you got numbers > > showing any actual performance win for postgres_fdw? > > The API is a rush work to make the path for the new parameter > (but, yes, I did too much for the purpose that use from > postgres_fdw..) and it can be any saner syntax but it's not the > time to do so yet. > > The data-size limitation, any size to limit, would give > significant gain especially for small sized rows. > > This patch began from the fact that it runs about twice faster > when fetch size = 1 than 100. > > > http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyot...@lab.ntt.co.jp > > I took exec times to get 1M rows from localhost via postgres_fdw > and it showed the following numbers. > > =# SELECT a from ft1; > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) > (local)0.75s > 10060 6.2s 6000 (0.006) > 1 60 2.7s 60 (0.6 ) > 3 60 2.2s180 (2.0 ) > 6 60 2.4s360 (4.0 ) > > =# SELECT a, b, c from ft1; > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) > (local)0.8s > 100 204 12 s 20400 (0.02 ) > 1000 204 10 s 204000 (0.2 ) > 1 204 5.8s204 (2) > 2 204 5.9s408 (4) > > =# SELECT a, b, d from ft1; > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1) > (local)0.8s > 100 1356 17 s 135600 (0.136) > 1000 1356 15 s1356000 (1.356) > 1475 1356 13 s2000100 (2.0 ) > 2950 1356 13 s4000200 (4.0 ) > > The definitions of the environment are the following. > > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host > 'localhost', dbname 'postgres'); > CREATE USER MAPPING FOR PUBLIC SERVER sv1; > CREATE TABLE lt1 (a int, b timestamp, c text, d text); > CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 > OPTIONS (table_name 'lt1'); > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM > generate_series(0, 99) a); > > The "avg row size" is alloced_mem/fetch_size and the alloced_mem > is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE + > tup->t_len) for all stored tuples in the receiver side, > fetch_more_data() in postgres_fdw. > > They are about 50% gain for the smaller tuple size and 25% for > the larger. They looks to be optimal at where alloced_mem is > around 2MB by the reason unknown to me. Anyway the difference > seems to be significant. > > > Even if we wanted to do something like this, I strongly object to > > measuring size by heap_compute_data_size. That's not a number that users > > would normally have any direct knowledge of; nor does it have anything > > at all to do with the claimed use-case, where what you'd really need to > > measure is bytes transmitted down the wire. (The difference is not > small: > > for instance, toasted values would likely still be toasted at the point > > where you're measuring.) > > Sure. Finally, the attached patch #1 which does the following > things. > > - Sen
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 27, 2015 at 4:46 PM, Stephen Frost wrote: >> With 0 workers, first run took 883465.352 ms, and second run took 295050.106 >> ms. >> With 8 workers, first run took 340302.250 ms, and second run took 307767.758 >> ms. >> >> This is a confusing result, because you expect parallelism to help >> more when the relation is partly cached, and make little or no >> difference when it isn't cached. But that's not what happened. > > These numbers seem to indicate that the oddball is the single-threaded > uncached run. If I followed correctly, the uncached 'dd' took 321s, > which is relatively close to the uncached-lots-of-workers and the two > cached runs. What in the world is the uncached single-thread case doing > that it takes an extra 543s, or over twice as long? It's clearly not > disk i/o which is causing the slowdown, based on your dd tests. Yeah, I'm wondering if the disk just froze up on that run for a long while, which has been known to occasionally happen on this machine, because I can't reproduce that crappy number. I did the 0-worker test a few more times, with the block-by-block method, dropping the caches and restarting PostgreSQL each time, and got: 32.968 ms 322873.325 ms 322967.722 ms 321759.273 ms After that last run, I ran it a few more times without restarting PostgreSQL or dropping the caches, and got: 257629.348 ms 289668.976 ms 290342.970 ms 258035.226 ms 284237.729 ms Then I redid the 8-client test. Cold cache, I got 337312.554 ms. On the rerun, 323423.813 ms. Third run, 324940.785. There is more variability than I would like here. Clearly, it goes a bit faster when the cache is warm, but that's about all I can say with any confidence. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 27, 2015 at 6:00 PM, Robert Haas wrote: > Now, when you did what I understand to be the same test on the same > machine, you got times ranging from 9.1 seconds to 35.4 seconds. > Clearly, there is some difference between our test setups. Moreover, > I'm kind of suspicious about whether your results are actually > physically possible. Even in the best case where you somehow had the > maximum possible amount of data - 64 GB on a 64 GB machine - cached, > leaving no space for cache duplication between PG and the OS and no > space for the operating system or postgres itself - the table is 120 > GB, so you've got to read *at least* 56 GB from disk. Reading 56 GB > from disk in 9 seconds represents an I/O rate of >6 GB/s. I grant that > there could be some speedup from issuing I/O requests in parallel > instead of serially, but that is a 15x speedup over dd, so I am a > little suspicious that there is some problem with the test setup, > especially because I cannot reproduce the results. So I thought about this a little more, and I realized after some poking around that hydra's disk subsystem is actually six disks configured in a software RAID5[1]. So one advantage of the chunk-by-chunk approach you are proposing is that you might be able to get all of the disks chugging away at once, because the data is presumably striped across all of them. Reading one block at a time, you'll never have more than 1 or 2 disks going, but if you do sequential reads from a bunch of different places in the relation, you might manage to get all 6. So that's something to think about. One could imagine an algorithm like this: as long as there are more 1GB segments remaining than there are workers, each worker tries to chug through a separate 1GB segment. When there are not enough 1GB segments remaining for that to work, then they start ganging up on the same segments. That way, you get the benefit of spreading out the I/O across multiple files (and thus hopefully multiple members of the RAID group) when the data is coming from disk, but you can still keep everyone busy until the end, which will be important when the data is all in-memory and you're just limited by CPU bandwidth. All that aside, I still can't account for the numbers you are seeing. When I run with your patch and what I think is your test case, I get different (slower) numbers. And even if we've got 6 drives cranking along at 400MB/s each, that's still only 2.4 GB/s, not >6 GB/s. So I'm still perplexed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Not my idea. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Mon, Jan 26, 2015 at 05:41:59PM -0500, Stephen Frost wrote: > I've thought about it a fair bit actually and I agree that there is some > risk to using rsync for *incremental* base backups. That is, you have > a setup where you loop with: > > pg_start_backup > rsync -> dest > pg_stop_backup > > without using -I, changing what 'dest' is, or making sure it's empty > every time. The problem is the 1s-level granularity used on the > timestamp. A possible set of operations, all within 1s, is: > > file changed > rsync starts copying the file > file changed again (somewhere prior to where rsync is at) > rsync finishes the file copy > > Now, this isn't actually a problem for the first time that file is > backed up- the issue is if that file isn't changed again. rsync won't > re-copy it, but that change that rsync missed won't be in the WAL > history for the *second* backup that's done (only the first), leading to > a case where that file would end up corrupted. Interesting problem, but doesn't rsync use sub-second accuracy? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote: > The example listed works, but only when it's a local rsync: > > rsync --archive --hard-links --size-only old_dir new_dir remote_dir > > Perhaps a better example (or additional one) would be with a remote > rsync, including clarification of old and new dir, like so: > > (run in /var/lib/postgresql) > rsync --archive --hard-links --size-only \ > 9.3/main \ > 9.4/main \ > server:/var/lib/postgresql/ > > Note that 9.3/main and 9.4/main are two source directories for rsync to > copy over, while server:/var/lib/postgresql/ is a remote destination > directory. The above directories match a default Debian/Ubuntu install. OK, sorry everyone was confused by 'remote_dir'. Does this new patch help? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml new file mode 100644 index e1cd260..4e4fe64 *** a/doc/src/sgml/pgupgrade.sgml --- b/doc/src/sgml/pgupgrade.sgml *** pg_upgrade.exe *** 409,414 --- 409,486 + Upgrade any Log-Shipping Standby Servers + + + If you have Log-Shipping Standby Servers (), follow these steps to upgrade them (before + starting any servers): + + + + + + Install the new PostgreSQL binaries on standby servers + + +Make sure the new binaries and support files are installed +on all the standby servers. Do not run +initdb. If initdb was run, delete +the standby server data directories. Also, install any custom +shared object files on the new standbys that you installed in the +new master cluster. + + + + + Run rsync + + +From a directory that is above the old and new database cluster +directories, run this for each slave: + + +rsync --archive --hard-links --size-only old_dir new_dir remote_dir + + +where old_dir and new_dir are relative +to the current directory, and remote_dir is +above the old and new cluster directories on +the standby server. The old and new relative cluster paths +must match on the master and standby server. Consult the +rsync manual page for details on specifying the +remote directory, e.g. slavehost:/var/lib/postgresql/. +rsync will be fast when --link mode is +used because it will create hard links on the remote server rather +than transfering user data. + + + + + Configure log-shipping to standby servers + + +Configure the servers for log shipping. (You do not need to run +pg_start_backup() and pg_stop_backup() +or take a file system backup as the slaves are still sychronized +with the master.) + + + + + + + + + Start the new server + + + The new server and any rsync'ed standby servers can + now be safely started. + + + + Post-Upgrade processing *** psql --username postgres --file script.s *** 548,562 -A Log-Shipping Standby Server () cannot -be upgraded because the server must allow writes. The simplest way -is to upgrade the primary and use rsync to rebuild the -standbys. You can run rsync while the primary is down, -or as part of a base backup () -which overwrites the old standby cluster. - - - If you want to use link mode and you do not want your old cluster to be modified when the new cluster is started, make a copy of the old cluster and upgrade that in link mode. To make a valid copy --- 620,625 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/27/15 9:32 PM, Bruce Momjian wrote > Now, this isn't actually a problem for the first time that file is > backed up- the issue is if that file isn't changed again. rsync won't > re-copy it, but that change that rsync missed won't be in the WAL > history for the *second* backup that's done (only the first), leading to > a case where that file would end up corrupted. > Interesting problem, but doesn't rsync use sub-second accuracy? > According to my empirical testing on Linux and OSX the answer is no: rsync does not use sub-second accuracy. This seems to be true even on file systems like ext4 that support millisecond mod times, at least it was true on Ubuntu 12.04 running ext4. Even on my laptop there is a full half-second of vulnerability for rsync. Faster systems may have a larger window. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] pg_upgrade and rsync
On Tue, Jan 27, 2015 at 09:44:51PM -0500, David Steele wrote: > On 1/27/15 9:32 PM, Bruce Momjian wrote > > Now, this isn't actually a problem for the first time that file is > > backed up- the issue is if that file isn't changed again. rsync won't > > re-copy it, but that change that rsync missed won't be in the WAL > > history for the *second* backup that's done (only the first), leading to > > a case where that file would end up corrupted. > > Interesting problem, but doesn't rsync use sub-second accuracy? > > > According to my empirical testing on Linux and OSX the answer is no: > rsync does not use sub-second accuracy. This seems to be true even on > file systems like ext4 that support millisecond mod times, at least it > was true on Ubuntu 12.04 running ext4. > > Even on my laptop there is a full half-second of vulnerability for > rsync. Faster systems may have a larger window. OK, bummer. Well, I don't think we ever recommend to run rsync without checksums, but the big problem is that rsync doesn't do checksums by default. :-( pg_upgrade recommends using two rsyncs: To make a valid copy of the old cluster, use rsync to create a dirty copy of the old cluster while the server is running, then shut down the old server and run rsync again to update the copy with any changes to make it consistent. You might want to exclude some I am afraid that will not work as it could miss changes, right? When would the default mod-time checking every be safe? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/27/15 9:51 PM, Bruce Momjian wrote: >> According to my empirical testing on Linux and OSX the answer is no: >> rsync does not use sub-second accuracy. This seems to be true even on >> file systems like ext4 that support millisecond mod times, at least it >> was true on Ubuntu 12.04 running ext4. >> >> Even on my laptop there is a full half-second of vulnerability for >> rsync. Faster systems may have a larger window. > OK, bummer. Well, I don't think we ever recommend to run rsync without > checksums, but the big problem is that rsync doesn't do checksums by > default. :-( > > pg_upgrade recommends using two rsyncs: > >To make a valid copy of the old cluster, use rsync to create >a dirty copy of the old cluster while the server is running, then shut >down the old server and run rsync again to update the copy >with any changes to make it consistent. You might want to exclude some > > I am afraid that will not work as it could miss changes, right? When > would the default mod-time checking every be safe? > According to my testing the default mod-time checking is never completely safe in rsync. I've worked around this in PgBackRest by building the manifest and then waiting until the start of the next second before starting to copy. It was the only way I could make the incremental backups reliable without requiring checksums (which are optional as in rsync for performance). Of course, you still have to trust the clock for this to work. This is definitely an edge case. Not only does the file have to be modified in the same second *after* rsync has done the copy, but the file also has to not be modified in *any other subsequent second* before the next incremental backup. If the file is busy enough to have a collision with rsync in that second, then it is very likely to be modified before the next incremental backup which is generally a day or so later. And, of course, the backup where the issue occurs is fine - it's the next backup that is invalid. However, the hot/cold backup scheme as documented does make the race condition more likely since the two backups are done in close proximity temporally. Ultimately, the most reliable method is to use checksums. For me the biggest issue is that there is no way to discover if a db in consistent no matter how much time/resources you are willing to spend. I could live with the idea of the occasional bad backup (since I keep as many as possible), but having no way to know whether it is good or not is very frustrating. I know data checksums are a step in that direction, but they are a long way from providing the optimal solution. I've implemented rigorous checksums in PgBackRest but something closer to the source would be even better. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote: > > It would be best to get this into a commit fest so it's not lost. It's there already. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
Dne 28.1.2015 0:25 "Jim Nasby" napsal(a): > > On 1/27/15 12:58 PM, Pavel Stehule wrote: >> >> postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); >> array >> --- >> {1,2,3,4,5,6,7,8} >> (1 row) >> >> so it generate pairs {1,2}{3,4},{5,6},{7,8} >> >> >> I fixed situation when array has not enough elements. >> >> More tests, simple doc > > > Hrm, this wasn't what I was expecting: > > + select foreach_test_ab(array[1,2,3,4]); > + NOTICE: a: 1, b: 2 > + NOTICE: a: 3, b: 4 > > I was expecting that foreach a,b array would be expecting something in the array to have a dimension of 2. :( It is inconsist (your expectation) with current implementation of FOREACH. It doesnt produce a array when SLICING is missing. And it doesnt calculate with dimensions. I would not to change this rule. It is not ambigonuous and it allows to work with 1d, 2d, 3d dimensions array. You can process Andrew format well and my proposed format (2d array) well too. There can be differen behave when SLICING is used. There we can iterate exactly with dimensions. We can design a behave in this case? > > I think this is bad, because this: > > foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]); > > will give you 1,2; 3,4; 5,6. I don't see any way that that makes sense. Even if it did make sense, I'm more concerned that adding this will seriously paint us into a corner when it comes to the (to me) more rational case of returning {1,2,3},{4,5,6}. > > I think we need to think some more about this, at least to make sure we're not painting ourselves into a corner for more appropriate array iteration. > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: > Andrew Dunstan writes: > > On 01/27/2015 02:28 PM, Tom Lane wrote: > >> Well, we can either fix it now or suffer with a broken representation > >> forever. I'm not wedded to the exact solution I described, but I think > >> we'll regret it if we don't change the representation. > So at this point I propose that we reject \u when de-escaping JSON. I would have agreed on 2014-12-09, and this release is the last chance to make such a change. It is a bold wager that could pay off, but -1 from me anyway. I can already envision the blog post from the DBA staying on 9.4.0 because 9.4.1 pulled his ability to store U+ in jsonb. jsonb was *the* top-billed 9.4 feature, and this thread started with Andrew conveying a field report of a scenario more obscure than storing U+. Therefore, we have to assume many users will notice the change. This move would also add to the growing evidence that our .0 releases are really beta(N+1) releases in disguise. > Anybody who's seriously unhappy with that can propose a patch to fix it > properly in 9.5 or later. Someone can still do that by introducing a V2 of the jsonb binary format and preserving the ability to read both formats. (Too bad Andres's proposal to include a format version didn't inform the final format, but we can wing it.) I agree that storing U+ as 0x00 is the best end state. > We probably need to rethink the re-escaping behavior as well; I'm not > sure if your latest patch is the right answer for that. Yes, we do. No change to the representation of U+ is going to fix the following bug, but that patch does fix it: [local] test=# select test-# $$"\\u05e2"$$::jsonb = $$"\\u05e2"$$::jsonb, test-# $$"\\u05e2"$$::jsonb = $$"\\u05e2"$$::jsonb::text::jsonb; ?column? | ?column? --+-- t| f (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Jan 28, 2015 at 9:47 AM, Jim Nasby wrote: > On 1/27/15 1:04 AM, Haribabu Kommi wrote: >> >> Here I attached the latest version of the patch. >> I will add this patch to the next commitfest. > > > Apologies if this was covered, but why isn't the IP address an inet instead > of text? Corrected the address type as inet instead of text. updated patch is attached. > Also, what happens if someone reloads the config in the middle of running > the SRF? hba entries are reloaded only in postmaster process, not in every backend. So there shouldn't be any problem with config file reload. Am i missing something? Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 01/28/2015 04:16 AM, Robert Haas wrote: On Tue, Jan 27, 2015 at 6:00 PM, Robert Haas wrote: Now, when you did what I understand to be the same test on the same machine, you got times ranging from 9.1 seconds to 35.4 seconds. Clearly, there is some difference between our test setups. Moreover, I'm kind of suspicious about whether your results are actually physically possible. Even in the best case where you somehow had the maximum possible amount of data - 64 GB on a 64 GB machine - cached, leaving no space for cache duplication between PG and the OS and no space for the operating system or postgres itself - the table is 120 GB, so you've got to read *at least* 56 GB from disk. Reading 56 GB from disk in 9 seconds represents an I/O rate of >6 GB/s. I grant that there could be some speedup from issuing I/O requests in parallel instead of serially, but that is a 15x speedup over dd, so I am a little suspicious that there is some problem with the test setup, especially because I cannot reproduce the results. So I thought about this a little more, and I realized after some poking around that hydra's disk subsystem is actually six disks configured in a software RAID5[1]. So one advantage of the chunk-by-chunk approach you are proposing is that you might be able to get all of the disks chugging away at once, because the data is presumably striped across all of them. Reading one block at a time, you'll never have more than 1 or 2 disks going, but if you do sequential reads from a bunch of different places in the relation, you might manage to get all 6. So that's something to think about. One could imagine an algorithm like this: as long as there are more 1GB segments remaining than there are workers, each worker tries to chug through a separate 1GB segment. When there are not enough 1GB segments remaining for that to work, then they start ganging up on the same segments. That way, you get the benefit of spreading out the I/O across multiple files (and thus hopefully multiple members of the RAID group) when the data is coming from disk, but you can still keep everyone busy until the end, which will be important when the data is all in-memory and you're just limited by CPU bandwidth. OTOH, spreading the I/O across multiple files is not a good thing, if you don't have a RAID setup like that. With a single spindle, you'll just induce more seeks. Perhaps the OS is smart enough to read in large-enough chunks that the occasional seek doesn't hurt much. But then again, why isn't the OS smart enough to read in large-enough chunks to take advantage of the RAID even when you read just a single file? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
2015-01-28 0:01 GMT+01:00 Jim Nasby : > On 1/27/15 4:36 AM, Pavel Stehule wrote: > >> 2015-01-26 23:29 GMT+01:00 Jim Nasby > jim.na...@bluetreble.com>>: >> >> On 1/26/15 4:17 PM, Pavel Stehule wrote: >> >> Any way to reduce the code duplication between the array and >> non-array versions? Maybe factor out the operator caching code? >> >> >> I though about it - but there is different checks, different >> result processing, different result type. >> >> I didn't find any readable and reduced form :( >> >> >> Yeah, that's why I was thinking specifically of the operator caching >> code... isn't that identical? That would at least remove a dozen lines... >> >> >> It is only partially identical - I would to use cache for array_offset, >> but it is not necessary for array_offsets .. depends how we would to modify >> current API to support externally cached data. >> > > Externally cached data? Some from these functions has own caches for minimize access to typcache (array_map, array_cmp is example). And in first case, I am trying to push these information from fn_extra, in second case I don't do it, because I don't expect a repeated call (and I am expecting so type cache will be enough). I plan to do some benchmark to calculate if we have to do, or we can use type cache only. Pavel > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] proposal: row_to_array function
2015-01-28 0:16 GMT+01:00 Jim Nasby : > On 1/27/15 2:26 PM, Pavel Stehule wrote: > >> here is a initial version of row_to_array function - transform any row to >> array in format proposed by Andrew. >> > > Please start a new thread for this... does it depend on the key-value > patch? partially - a selected format should be well supported by FOREACH statement Regards Pavel > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] TABLESAMPLE patch
Hi, I took a look on this and found nice. By the way, the parameter for REPEATABLE seems allowing to be a expression in ParseTableSample but the grammer rejects it. The following change seems enough. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; } | /*EMPTY*/ { $$ = NULL; } ; regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
2015-01-28 6:49 GMT+01:00 Pavel Stehule : > > Dne 28.1.2015 0:25 "Jim Nasby" napsal(a): > > > > On 1/27/15 12:58 PM, Pavel Stehule wrote: > >> > >> postgres=# select array(select > unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); > >> array > >> --- > >> {1,2,3,4,5,6,7,8} > >> (1 row) > >> > >> so it generate pairs {1,2}{3,4},{5,6},{7,8} > >> > >> > >> I fixed situation when array has not enough elements. > >> > >> More tests, simple doc > > > > > > Hrm, this wasn't what I was expecting: > > > > + select foreach_test_ab(array[1,2,3,4]); > > + NOTICE: a: 1, b: 2 > > + NOTICE: a: 3, b: 4 > > > > I was expecting that foreach a,b array would be expecting something in > the array to have a dimension of 2. :( > > It is inconsist (your expectation) with current implementation of FOREACH. > It doesnt produce a array when SLICING is missing. And it doesnt calculate > with dimensions. > > I would not to change this rule. It is not ambigonuous and it allows to > work with > 1d, 2d, 3d dimensions array. You can process Andrew format well and my > proposed format (2d array) well too. > one small example CREATE OR REPLACE FUNCTION iterate_over_pairs(text[]) RETURNS void AS $$ DECLARE v1 text; v2 text; e text; i int := 0; BEGIN FOREACH e IN ARRAY $1 LOOP IF i % 2 = 0 THEN v1 := e; ELSE v2 := e; RAISE NOTICE 'v1: %, v2: %', v1, v2; END IF; i := i + 1; END LOOP; END; $$ LANGUAGE plpgsql; postgres=# SELECT iterate_over_pairs(ARRAY[1,2,3,4]::text[]); NOTICE: v1: 1, v2: 2 NOTICE: v1: 3, v2: 4 iterate_over_pairs (1 row) postgres=# SELECT iterate_over_pairs(ARRAY[[1,2],[3,4]]::text[]); NOTICE: v1: 1, v2: 2 NOTICE: v1: 3, v2: 4 iterate_over_pairs (1 row) I can use iterate_over_pairs for 1D or 2D arrays well -- a FOREACH was designed in this direction - without SLICE a dimensions data are unimportant. Discussed enhancing of FOREACH is faster and shorter (readable) iterate_over_pairs use case. FOREACH v1, v2 IN ARRAY $1 LOOP .. END LOOP; It is consistent with current design You can look to patch - in this moment a SLICE > 0 is disallowed for situation, when target variable is ROW and source is not ROW. Regards Pavel There can be differen behave when SLICING is used. There we can iterate > exactly with dimensions. We can design a behave in this case? > > > > > I think this is bad, because this: > > > > foreach_test_ab('{{1,2,3},{4,5,6}}'::int[]); > > > > will give you 1,2; 3,4; 5,6. I don't see any way that that makes sense. > Even if it did make sense, I'm more concerned that adding this will > seriously paint us into a corner when it comes to the (to me) more rational > case of returning {1,2,3},{4,5,6}. > > > > I think we need to think some more about this, at least to make sure > we're not painting ourselves into a corner for more appropriate array > iteration. > > > > -- > > Jim Nasby, Data Architect, Blue Treble Consulting > > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] proposal: plpgsql - Assert statement
2015-01-28 0:13 GMT+01:00 Jim Nasby : > On 1/27/15 1:30 PM, Pavel Stehule wrote: > >> I don't see the separate warning as being helpful. I'd just do >> something like >> >> +(err_hint != NULL) ? errhint("%s", >> err_hint) : errhint("Message attached to failed assertion is null") )); >> >> >> done >> >> >> There should also be a test case for a NULL message. >> >> >> is there, if I understand well >> > > I see it now. Looks good. please, can you enhance a documentation part - I am sorry, my English skills are not enough Thank you Pavel > > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 28, 2015 at 12:38 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > > On 01/28/2015 04:16 AM, Robert Haas wrote: >> >> On Tue, Jan 27, 2015 at 6:00 PM, Robert Haas wrote: >>> >>> Now, when you did what I understand to be the same test on the same >>> machine, you got times ranging from 9.1 seconds to 35.4 seconds. >>> Clearly, there is some difference between our test setups. Moreover, >>> I'm kind of suspicious about whether your results are actually >>> physically possible. Even in the best case where you somehow had the >>> maximum possible amount of data - 64 GB on a 64 GB machine - cached, >>> leaving no space for cache duplication between PG and the OS and no >>> space for the operating system or postgres itself - the table is 120 >>> GB, so you've got to read *at least* 56 GB from disk. Reading 56 GB >>> from disk in 9 seconds represents an I/O rate of >6 GB/s. I grant that >>> there could be some speedup from issuing I/O requests in parallel >>> instead of serially, but that is a 15x speedup over dd, so I am a >>> little suspicious that there is some problem with the test setup, >>> especially because I cannot reproduce the results. >> >> >> So I thought about this a little more, and I realized after some >> poking around that hydra's disk subsystem is actually six disks >> configured in a software RAID5[1]. So one advantage of the >> chunk-by-chunk approach you are proposing is that you might be able to >> get all of the disks chugging away at once, because the data is >> presumably striped across all of them. Reading one block at a time, >> you'll never have more than 1 or 2 disks going, but if you do >> sequential reads from a bunch of different places in the relation, you >> might manage to get all 6. So that's something to think about. >> >> One could imagine an algorithm like this: as long as there are more >> 1GB segments remaining than there are workers, each worker tries to >> chug through a separate 1GB segment. When there are not enough 1GB >> segments remaining for that to work, then they start ganging up on the >> same segments. That way, you get the benefit of spreading out the I/O >> across multiple files (and thus hopefully multiple members of the RAID >> group) when the data is coming from disk, but you can still keep >> everyone busy until the end, which will be important when the data is >> all in-memory and you're just limited by CPU bandwidth. > > > OTOH, spreading the I/O across multiple files is not a good thing, if you don't have a RAID setup like that. With a single spindle, you'll just induce more seeks. > Yeah, if such a thing happens then there is less chance that user will get any major benefit via parallel sequential scan unless the qualification expressions or other expressions used in statement are costly. So here one way could be that either user should configure the parallel sequence scan parameters in such a way that only when it can be beneficial it should perform parallel scan (something like increase parallel_tuple_comm_cost or we can have some another parameter) or just not use parallel sequential scan (parallel_seqscan_degree=0). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
Andres Freund wrote: > I think this isn't particularly pretty, but it seems to be working well > enough, and changing it would be pretty invasive. So keeping in line > with all that code seems to be easier. OK, I'm convinced with this part to remove the call of LockSharedObjectForSession that uses dontWait and replace it by a loop in ResolveRecoveryConflictWithDatabase. > Note that InRecovery doesn't mean what you probably think it means: > [stuff] > boolInRecovery = false; Yes, right. I misunderstood with RecoveryInProgress(). > The assertion actually should be even stricter: > Assert(!InRecovery || (sessionLock && dontWait)); > i.e. we never acquire a heavyweight lock in the startup process unless > it's a session lock (as we don't have resource managers/a xact to track > locks) and we don't wait (as we don't have the deadlock detector > infrastructure set up). No problems with this assertion here. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers