Re: [HACKERS] Parallel Seq Scan
On Sat, Jan 24, 2015 at 12:24 AM, Joshua D. Drake wrote: > > > On 01/23/2015 10:44 AM, Jim Nasby wrote: >> >> number of workers especially at slightly higher worker count. >> >> Those fixed chunk numbers look pretty screwy. 2, 4 and 8 workers make no >> difference, then suddenly 16 cuts times by 1/2 to 1/3? Then 32 cuts time >> by another 1/2 to 1/3? > There is variation in tests at different worker count but there is definitely improvement from 0 to 2 worker count (if you refer my initial mail on this data, with 2 workers there is a benefit of ~20%) and I think we run the tests in a similar way (like compare 0 and 2 or 0 or 4 or 0 and 8), then the other effects could be minimised and we might see better consistency, however the general trend with fixed-chunk seems to be that scanning that way is better. I think the real benefit with the current approach/patch can be seen with qualifications (especially costly expression evaluation). Further, if we want to just get the benefit of parallel I/O, then I think we can get that by parallelising partition scan where different table partitions reside on different disk partitions, however that is a matter of separate patch. > > cached? First couple of runs gets the relations into memory? > Not entirely, as the table size is double than RAM, so each run has to perform I/O. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Unsafe coding in ReorderBufferCommit()
There are at least two bugs in reorderbuffer.c's ReorderBufferCommit(): 1. Although "iterstate" is modified within the PG_TRY segment and referenced within the PG_CATCH segment, it is not marked "volatile". This means that its value upon reaching the PG_CATCH segment is indeterminate. In practice, what can happen is that it gets set back to its value at the time of reaching PG_TRY, which will always be NULL, so that the effect would be to miss out calling ReorderBufferIterTXNFinish in the CATCH code. 2. On the other hand, if we get past the ReorderBufferIterTXNFinish call within the PG_TRY block and then suffer a failure, ReorderBufferIterTXNFinish will be called again in the PG_CATCH block. This is due to failure to reset iterstate to NULL after the finish call. (So this error could be masked if the compiler did cause iterstate to revert to NULL during longjmp.) I'm not sure whether #1 is harmless, but #2 most certainly isn't, as it would result in access to already-freed memory. The first of these was pointed out to me by Mark Wilding of Salesforce. It's really pretty distressing that modern versions of gcc don't warn about this (not even with -Wclobbered). The very ancient gcc on "gaur" does warn, but my experience is that it emits a lot of false positives too, so I'm not that eager anymore to plaster "volatile" on any variable it whinges about. Still, it sure looks like we need a "volatile" here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of the feature.). Thanks. Looking forward to your findings. Ideally, we'd be able to get a fix in for 9.4.1. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perl coding error in msvc build system?
Am 23.01.2015 um 09:17 schrieb Abhijit Menon-Sen: At 2014-06-03 22:30:50 -0400, pete...@gmx.net wrote: I'm not sure whether the following coding actually detects any errors: Solution.pm: open(P, "cl /? 2>&1 |") || die "cl command not found"; Since nobody with a Windows system has commented, I'm just writing to say that from a Perl perspective, I agree with your analysis and the patch looks perfectly sensible. I can confirm it on my Windows system. Calling build from a console without nmake in the path I always get: Unable to determine Visual Studio version: The nmake version could not be determined. at src/tools/msvc/Mkvcbuild.pm line 63. This means that the following construct in VSObjectFactory.pm doesn't have the desired effect. open(P, "nmake /? 2>&1 |") || croak "Unable to determine Visual Studio version: The nmake command wasn't found."; On the other hand complicacy is in the eye of the beholder. Perl constructs like the following get quite a few wtf's (http://www.osnews.com/story/19266/WTFs_m) from a simple-minded person like me. $? >> 8 == 0 or die "cl command not found"; However as it fixes a confirmed problem and as maintainance of perl code is an issue of its own, please go ahead. Regards, Brar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
[pruning the Cc: list and starting a new thread] Here's the cleaned-up version of the patch to allow abbreviated keys when sorting a single Datum. This also removes comments that suggest that the caller of tuplesort_begin_datum should ever have to care about the abbreviated key optimization. I'll add this to the CF. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 8079d97..08088ea 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -363,10 +363,6 @@ initialize_aggregates(AggState *aggstate, * We use a plain Datum sorter when there's a single input column; * otherwise sort the full tuple. (See comments for * process_ordered_aggregate_single.) - * - * In the future, we should consider forcing the - * tuplesort_begin_heap() case when the abbreviated key - * optimization can thereby be used, even when numInputs is 1. */ peraggstate->sortstate = (peraggstate->numInputs == 1) ? diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c index f9a5f7f..39ed85b 100644 --- a/src/backend/utils/adt/orderedsetaggs.c +++ b/src/backend/utils/adt/orderedsetaggs.c @@ -268,10 +268,6 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples) /* * Initialize tuplesort object. - * - * In the future, we should consider forcing the tuplesort_begin_heap() - * case when the abbreviated key optimization can thereby be used, even - * when !use_tuples. */ if (use_tuples) osastate->sortstate = tuplesort_begin_heap(qstate->tupdesc, diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index b6e302f..1f506bf 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -147,13 +147,18 @@ bool optimize_bounded_sort = true; * case where the first key determines the comparison result. Note that * for a pass-by-reference datatype, datum1 points into the "tuple" storage. * + * There is one special case: when the sort support infrastructure provides an + * "abbreviated key" representation, where the key is (typically) a pass by + * value proxy for a pass by reference type. In this case, the abbreviated key + * is stored in datum1 in place of the actual first key column. + * * When sorting single Datums, the data value is represented directly by - * datum1/isnull1. If the datatype is pass-by-reference and isnull1 is false, - * then datum1 points to a separately palloc'd data value that is also pointed - * to by the "tuple" pointer; otherwise "tuple" is NULL. There is one special - * case: when the sort support infrastructure provides an "abbreviated key" - * representation, where the key is (typically) a pass by value proxy for a - * pass by reference type. + * datum1/isnull1 for pass by value types (or null values). If the datatype is + * pass-by-reference and isnull1 is false, then "tuple" points to a separately + * palloc'd data value, otherwise "tuple" is NULL. The value of datum1 is then + * either the same pointer as "tuple", or is an abbreviated key value as + * described above. Accordingly, "tuple" is always used in preference to + * datum1 as the authoritative value for pass-by-reference cases. * * While building initial runs, tupindex holds the tuple's run number. During * merge passes, we re-use it to hold the input tape number that each tuple in @@ -901,30 +906,36 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation, state->copytup = copytup_datum; state->writetup = writetup_datum; state->readtup = readtup_datum; + state->abbrevNext = 10; state->datumType = datumType; - /* Prepare SortSupport data */ - state->onlyKey = (SortSupport) palloc0(sizeof(SortSupportData)); - - state->onlyKey->ssup_cxt = CurrentMemoryContext; - state->onlyKey->ssup_collation = sortCollation; - state->onlyKey->ssup_nulls_first = nullsFirstFlag; - /* - * Conversion to abbreviated representation infeasible in the Datum case. - * It must be possible to subsequently fetch original datum values within - * tuplesort_getdatum(), which would require special-case preservation of - * original values. - */ - state->onlyKey->abbreviate = false; - - PrepareSortSupportFromOrderingOp(sortOperator, state->onlyKey); - /* lookup necessary attributes of the datum type */ get_typlenbyval(datumType, &typlen, &typbyval); state->datumTypeLen = typlen; state->datumTypeByVal = typbyval; + /* Prepare SortSupport data */ + state->sortKeys = (SortSupport) palloc0(sizeof(SortSupportData)); + + state->sortKeys->ssup_cxt = CurrentMemoryContext; + state->sortKeys->ssup_collation = sortCollation; + state->sortKeys->ssup_nulls_first = nullsFirstFlag; + + /* abbreviation is possible here only for by-reference types */ + state->sortKeys->abbreviate = !typbyval; + + PrepareSortSupportFromOrderingOp(sortOperator, state->sortKeys); + + /* + * The "onlyKey" optimiza
Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat
Here's v0.5. (Why did you use that weird decimal versioning scheme? You could just say "v4" and save a couple of keystrokes). This patch makes perfect sense to me now. I was ready to commit, but I checked the regression test you added and noticed that you're only reading results for the last set of operations because they all use the same table and so each new set clobbers the values for the previous one. So I modified them to use one table for each set, and report the counters for all tables. In doing this I noticed that the one for trunc_stats_test3 is at odds with what your comment in the .sql file says; would you review it please? Thanks. (I didn't update the expected file.) BTW you forgot to update expected/prepared_xact_1.out, for the case when prep xacts are disabled. If some other committer decides to give this a go, please remember to bump catversion before pushing. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 66d5083..b2993b8 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -71,6 +71,7 @@ #include "parser/parse_type.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" +#include "pgstat.h" #include "rewrite/rewriteDefine.h" #include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" @@ -1220,6 +1221,8 @@ ExecuteTruncate(TruncateStmt *stmt) */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 827ad71..afe1da2 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -197,8 +197,12 @@ typedef struct TwoPhasePgStatRecord PgStat_Counter tuples_inserted; /* tuples inserted in xact */ PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ + PgStat_Counter inserted_pre_trunc; /* tuples inserted prior to truncate */ + PgStat_Counter updated_pre_trunc; /* tuples updated prior to truncate */ + PgStat_Counter deleted_pre_trunc; /* tuples deleted prior to truncate */ Oid t_id; /* table's OID */ bool t_shared; /* is it a shared catalog? */ + bool t_truncated; /* was the relation truncated? */ } TwoPhasePgStatRecord; /* @@ -1859,6 +1863,59 @@ pgstat_count_heap_delete(Relation rel) } /* + * pgstat_table_record_truncate - remember i/u/d counts accumulated so far + */ +static void +pgstat_table_record_truncate(PgStat_TableXactStatus *trans) +{ + if (!trans->truncated) + { + trans->inserted_pre_trunc = trans->tuples_inserted; + trans->updated_pre_trunc = trans->tuples_updated; + trans->deleted_pre_trunc = trans->tuples_deleted; + } +} + +/* + * pgstat_table_restore_truncate - restore counters when a truncate aborts + */ +static void +pgstat_table_restore_truncate(PgStat_TableXactStatus *trans) +{ + if (trans->truncated) + { + trans->tuples_inserted = trans->inserted_pre_trunc; + trans->tuples_updated = trans->updated_pre_trunc; + trans->tuples_deleted = trans->deleted_pre_trunc; + } +} + +/* + * pgstat_count_truncate - update tuple counters due to truncate + */ +void +pgstat_count_truncate(Relation rel) +{ + PgStat_TableStatus *pgstat_info = rel->pgstat_info; + + if (pgstat_info != NULL) + { + /* We have to log the effect at the proper transactional level */ + int nest_level = GetCurrentTransactionNestLevel(); + + if (pgstat_info->trans == NULL || + pgstat_info->trans->nest_level != nest_level) + add_tabstat_xact_level(pgstat_info, nest_level); + + pgstat_table_record_truncate(pgstat_info->trans); + pgstat_info->trans->truncated = true; + pgstat_info->trans->tuples_inserted = 0; + pgstat_info->trans->tuples_updated = 0; + pgstat_info->trans->tuples_deleted = 0; + } +} + +/* * pgstat_update_heap_dead_tuples - update dead-tuples count * * The semantics of this are that we are reporting the nontransactional @@ -1916,12 +1973,22 @@ AtEOXact_PgStat(bool isCommit) Assert(trans->upper == NULL); tabstat = trans->parent; Assert(tabstat->trans == trans); + /* restore pre-truncate stats (if any) in case of aborted xact */ + if (!isCommit) +pgstat_table_restore_truncate(trans); /* count attempted actions regardless of commit/abort */ tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted; tabstat->t_counts.t_tuples_updated += trans->tuples_updated; tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted; if (isCommit) { +tabstat->t_counts.t_truncated = trans->truncated; +if (trans->truncated) +{ + /* forget live/dead stats seen by backend thus far */ + tabstat->t_counts.t_delta_live_tuples = 0; + tabstat->t_counts.t_delta_dead_tuples = 0; +} /* insert adds a live tuple, delete removes one */ tabstat
Re: [HACKERS] pg_upgrade and rsync
On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote: > > Why? Just rsync the new data directory onto the old directory on the > > standbys. That's fine and simple. > > That still doesn't address the need to use --size-only, it would just > mean that you don't need to use -H. If anything the -H part is the > aspect which worries me the least about this approach. It took me a while to understand what Stephen was saying, so let me explain the details so everyone can get on the same page. First, let's look at the downsides of using non-hardlink rsync against a slave cluster, whether we run pg_upgrade on the slave or not: o must preserve db directory and relfilenodes (4 new things for pg_upgrade to preserve) o must checksum files because there is no way to distinguish user tables/indexes (which don't need to be copied) from system tables/indexes (which must be copied so it is in sync with the master) o must use log_wal_hints when the slave is installed so the checksums match So, even if if all the checksums work, it will be slow/expensive. Stephen's idea is quite interesting. You run pg_upgrade on the master, then, before you start the new server, you run rsync with special flags and sync the old _and_ new clusters on the master with just the old cluster on the standby (yeah, odd). Yes, this is very odd, and where I got lost too. First, this only works when pg_upgrade is run in --link mode. What rsync --hard-links --size-only is going to do is record which files have hard links, remember their inode numbers, and cross-reference the hard-linked files. When doing the rsync remote comparisons, the master's old relfilenodes will match the standby's old relfilenodes, and because we are using --size-only, they will be considered identical and not copied, or even checksumed. When it goes to do the standby's new cluster, none of the directories will exist, so they will all be copied along with the system objects (they are small), but the user tables/indexes will be identified as already existing in the slave's old cluster so it will hard-link to those standby's old cluster files. Once rsync is complete, you can delete the old cluster on master and standby. This is effectively duplicating the way pg_upgrade works. What is interesting is that this will work on any version of pg_upgrade, with no modifications, as long as link mode is used. You _cannot_ run initdb on the standby, as this will create system files that would prevent the master's system files from being copied. This is also going to remove your recovery.conf on the standby, and replace your postgresql.conf with the master's, so any modifications you made to the standby will have to be saved and restored in to the new cluster before starting. I plan to run some tests soon to verify this method works, and if so, I can document it in the pg_upgrade manual page. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Jim Nasby (jim.na...@bluetreble.com) wrote: > On 1/23/15 12:16 PM, Stephen Frost wrote: > >Just to clarify- this concept isn't actually mine but was suggested by a > >pretty sizable PG user who has a great deal of familiarity with other > >databases. I don't mean to try and invoke the 'silent majority' but > >rather to make sure folks don't think this is my idea alone or that it's > >only me who thinks it makes sense.:) Simon had weighed in earlier > >with, iirc, a comment that he thought it was a good approach also, > >though that was a while ago and things have changed. > > I know there's definitely demand for auditing. I'd love to see us support it. > > >I happen to like the idea specifically because it would allow regular > >roles to change the auditing settings (no need to be a superuser or to > >be able to modify postgresql.conf/postgresql.auto.conf) > > Is there really a use case for non-superusers to be able to change auditing > config? That seems like a bad idea. What's a bad idea is having every auditor on the system running around as superuser.. > Also, was there a solution to how to configure auditing on specific objects > with a role-based mechanism? I think we really do need something akin to > role:action:object tuples, and I don't see how to do that with roles alone. That is supported with the grant-based proposal. > BTW, I'm starting to feel like this needs a wiki page to get the design > pulled together. I agree with that and was planning to offer help with the documentation and building of such a wiki with examples, etc, once the implementation was far enough along to demonstrate that the design will actually work.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/23/15 12:16 PM, Stephen Frost wrote: Just to clarify- this concept isn't actually mine but was suggested by a pretty sizable PG user who has a great deal of familiarity with other databases. I don't mean to try and invoke the 'silent majority' but rather to make sure folks don't think this is my idea alone or that it's only me who thinks it makes sense.:) Simon had weighed in earlier with, iirc, a comment that he thought it was a good approach also, though that was a while ago and things have changed. I know there's definitely demand for auditing. I'd love to see us support it. I happen to like the idea specifically because it would allow regular roles to change the auditing settings (no need to be a superuser or to be able to modify postgresql.conf/postgresql.auto.conf) Is there really a use case for non-superusers to be able to change auditing config? That seems like a bad idea. Also, was there a solution to how to configure auditing on specific objects with a role-based mechanism? I think we really do need something akin to role:action:object tuples, and I don't see how to do that with roles alone. BTW, I'm starting to feel like this needs a wiki page to get the design pulled together. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perl coding error in msvc build system?
On 01/23/2015 03:00 PM, Alvaro Herrera wrote: Why is the "last;" still there? AFAICS it matters in the old coding because of the while, but I don't think there is an equivalent looping construct in the new code. You're right, I missed that. It shouldn't be there, and will produce an error like Can't "last" outside a loop block cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor issues with code comments related to abbreviated keys
On Fri, Jan 23, 2015 at 2:59 PM, Peter Geoghegan wrote: > On Thu, Jan 22, 2015 at 5:17 PM, Peter Geoghegan wrote: >> Attached patch fixes minor issues in code comments that relate to >> abbreviated keys. > > There should also be a description of hyperLogLog in the new README > file within ./src/backend/lib/ > > I suggest: > > hyperloglog.c - a streaming cardinality estimator Committed with that addition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perl coding error in msvc build system?
Why is the "last;" still there? AFAICS it matters in the old coding because of the while, but I don't think there is an equivalent looping construct in the new code. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor issues with code comments related to abbreviated keys
On Thu, Jan 22, 2015 at 5:17 PM, Peter Geoghegan wrote: > Attached patch fixes minor issues in code comments that relate to > abbreviated keys. There should also be a description of hyperLogLog in the new README file within ./src/backend/lib/ I suggest: hyperloglog.c - a streaming cardinality estimator -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2015-01-23 14:27:51 -0500, Stephen Frost wrote: > > * Andres Freund (and...@2ndquadrant.com) wrote: > > > On 2015-01-23 14:05:10 -0500, Stephen Frost wrote: > > > > If I follow what you're suggesting, pg_upgrade would > > > > need a new 'in-place' mode that removes all of the catalog tables from > > > > the old cluster and puts the new catalog tables into place and leaves > > > > everything else alone. > > > > > > No. Except that it'd preserve the relfilenodes (i.e. the filenames of > > > relations) it'd work exactly the same as today. The standby is simply > > > updated by rsyncing the new data directory of the primary to the > > > standby. > > > > You'd have to replace the existing data directory on the master to do > > that, which pg_upgrade was designed specifically to not do, in case > > things went poorly. > > Why? Just rsync the new data directory onto the old directory on the > standbys. That's fine and simple. That still doesn't address the need to use --size-only, it would just mean that you don't need to use -H. If anything the -H part is the aspect which worries me the least about this approach. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] src/port/gethostname.c sure looks like dead code
AFAICS there are no provisions to ever build this file, on any platform, and have not been since 8.0. If we did build it, it would likely fail on a large subset of platforms, because it refers to a constant SYS_NMLN that isn't too portable. I propose we just remove it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 2015-01-23 14:27:51 -0500, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > On 2015-01-23 14:05:10 -0500, Stephen Frost wrote: > > > If I follow what you're suggesting, pg_upgrade would > > > need a new 'in-place' mode that removes all of the catalog tables from > > > the old cluster and puts the new catalog tables into place and leaves > > > everything else alone. > > > > No. Except that it'd preserve the relfilenodes (i.e. the filenames of > > relations) it'd work exactly the same as today. The standby is simply > > updated by rsyncing the new data directory of the primary to the > > standby. > > You'd have to replace the existing data directory on the master to do > that, which pg_upgrade was designed specifically to not do, in case > things went poorly. Why? Just rsync the new data directory onto the old directory on the standbys. That's fine and simple. > You'd still have to deal with the tablespace directories being renamed > also, since we include the major version and catalog build in the > directory name.. True. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2015-01-23 14:05:10 -0500, Stephen Frost wrote: > > * Andres Freund (and...@2ndquadrant.com) wrote: > > > On 2015-01-23 13:52:54 -0500, Stephen Frost wrote: > > > > That wouldn't actually help with what Bruce is trying to do, which > > > > is to duplicate the results of the pg_upgrade from the master over to > > > > the standby. > > > > > > Well, it'd pretty much obliviate the need to run pg_upgrade on the > > > standby. As there's no renamed files you don't need to muck around with > > > leaving hardlinks in place and such just so that rsync recognizes > > > unchanged files. > > > > Uh, pg_upgrade always either creates a hard link tree or copies > > everything over. > > Yes. The problem is that the filenames after pg_upgrade aren't the same > as before. Which means that a simple rsync call won't be able to save > anything because the standby's filenames differ. What you can do is > rsync both cluster directories (i.e. the old and the post pg_upgrade > ones) and use rsync -H, right? Without transferring both -H won't detect > the hardlinks as they need to be in the synced set. That's pretty > cumbersome/complicated, and far from cheap. The filenames don't need to be the same for rsync -H to work. You specifically do *not* want to independently rsync the old and new clusters- you need to run a single rsync (and one for each tablespace) with -H and then it'll realize that the old cluster on both systems is identical and will just recreate the hard links, and copy the completely new files (the catalog tables). > > If I follow what you're suggesting, pg_upgrade would > > need a new 'in-place' mode that removes all of the catalog tables from > > the old cluster and puts the new catalog tables into place and leaves > > everything else alone. > > No. Except that it'd preserve the relfilenodes (i.e. the filenames of > relations) it'd work exactly the same as today. The standby is simply > updated by rsyncing the new data directory of the primary to the > standby. You'd have to replace the existing data directory on the master to do that, which pg_upgrade was designed specifically to not do, in case things went poorly. You'd still have to deal with the tablespace directories being renamed also, since we include the major version and catalog build in the directory name.. This whole process really isn't all that complicated in the end.. my_data_dir/old_cluster my_data_dir/new_cluster pg_upgrade rsync -H --size-only my_data_dir/ standby:/path/to/my_data_dir start the clusters remove the old cluster on the master and standby. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure wrote: > > So far, the 'nasty' damage seems to generally if not always follow a > checksum failure and the checksum failures are always numerically > adjacent. For example: > > [cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page > verification failed, calculated checksum 9465 but expected 9477 at > character 20 > [cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING: page > verification failed, calculated checksum 61889 but expected 61903 at > character 20 > [cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING: page > verification failed, calculated checksum 27311 but expected 27316 > > I'm not up on the intricacies of our checksum algorithm but this is > making me suspicious that we are looking at a improperly flipped > visibility bit via some obscure problem -- almost certainly with > vacuum playing a role. That very much sounds like the block is getting duplicated from one place to another. Even flipping one hint bit (aren't these index pages? Do they have hint bits) should thoroughly scramble the checksum. Because the checksum adds in the block number after the scrambling has been done, copying a page to another nearby location will just move the (expected) checksum a little bit. Cheers, Jeff
Re: [HACKERS] pg_upgrade and rsync
On 2015-01-23 14:05:10 -0500, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > On 2015-01-23 13:52:54 -0500, Stephen Frost wrote: > > > That wouldn't actually help with what Bruce is trying to do, which > > > is to duplicate the results of the pg_upgrade from the master over to > > > the standby. > > > > Well, it'd pretty much obliviate the need to run pg_upgrade on the > > standby. As there's no renamed files you don't need to muck around with > > leaving hardlinks in place and such just so that rsync recognizes > > unchanged files. > > Uh, pg_upgrade always either creates a hard link tree or copies > everything over. Yes. The problem is that the filenames after pg_upgrade aren't the same as before. Which means that a simple rsync call won't be able to save anything because the standby's filenames differ. What you can do is rsync both cluster directories (i.e. the old and the post pg_upgrade ones) and use rsync -H, right? Without transferring both -H won't detect the hardlinks as they need to be in the synced set. That's pretty cumbersome/complicated, and far from cheap. > If I follow what you're suggesting, pg_upgrade would > need a new 'in-place' mode that removes all of the catalog tables from > the old cluster and puts the new catalog tables into place and leaves > everything else alone. No. Except that it'd preserve the relfilenodes (i.e. the filenames of relations) it'd work exactly the same as today. The standby is simply updated by rsyncing the new data directory of the primary to the standby. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-23 13:01:34 -0600, Jim Nasby wrote: > On 1/23/15 9:10 AM, Andres Freund wrote: > >On 2015-01-22 22:58:17 +0100, Andres Freund wrote: > >>On 2015-01-22 16:38:49 -0500, Stephen Frost wrote: > >>>I'm trying to figure out why you'd do '2' in master when in discussion > >>>of '1' you say "I also don't think ALTER DATABASE is even intentionally > >>>run at the time of a base backup." I agree with that sentiment and am > >>>inclined to say that '1' is good enough throughout. > >> > >>Because the way it currently works is a major crock. It's more luck than > >>anything that it actually somewhat works. We normally rely on WAL to > >>bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE > >>we've ignored that. > > > >>And. Hm. The difficulty of the current method is evidenced by the fact > >>that so far nodoby recognized that 1) as described above isn't actually > >>safe. It fails to protect against basebackups on a standby as its > >>XLogCtl state will obviously not be visible on the master. > > > >Further evidenced by the fact that the current method isn't > >crash/shutdown safe at all. If a standby was shut down/crashed/was > >started on a base backup when a CREATE DATABASE from the primary is > >replayed the template database used can be in an nearly arbitrarily bad > >state. It'll later get fixed up by recovery - but those changes won't > >make it to the copied database. > > I think we all agree that ADAT can't run while a base backup is > happening, which I believe is what you're describing above. No, that's not what I'm descirbing in the last paragraph. The problem is present without any AD ST. When a cluster crashes it's likely not in a consistent state - we need to replay from the last checkpoint's REDO pointer to the minimum recovery location to make it so. The problem though is that if the copied database (both for CREATE DB/SET TABLESPACE) is copied that record can be replayed well before the database is in a consistent state. In that case the new copy will be in a corrupted state as it'll not be fixed up by recovery, in contrast to the original, which will. > Perhaps there isn't really an issue here; I suspect ADAT is very > rarely used. What I'm worried about though is that someone is using it > regularly for some reason, with non-trivial databases, and this is > going to completely hose them. I think if somebody does this regularly on nontrivialy sized databases, over replication, they'd have hit this bug a long time ago. It's really not that hard to reproduce. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2015-01-23 13:52:54 -0500, Stephen Frost wrote: > > That wouldn't actually help with what Bruce is trying to do, which > > is to duplicate the results of the pg_upgrade from the master over to > > the standby. > > Well, it'd pretty much obliviate the need to run pg_upgrade on the > standby. As there's no renamed files you don't need to muck around with > leaving hardlinks in place and such just so that rsync recognizes > unchanged files. Uh, pg_upgrade always either creates a hard link tree or copies everything over. If I follow what you're suggesting, pg_upgrade would need a new 'in-place' mode that removes all of the catalog tables from the old cluster and puts the new catalog tables into place and leaves everything else alone. I don't really think I'd want to go there either.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-23 12:52:03 -0600, Jim Nasby wrote: > On 1/22/15 3:18 PM, Andres Freund wrote: > >>, but to then put it's entire contents into WAL? Blech. > >Besides actually having a chance of being correct, doing so will save > >having to do two checkpoints inside movedb(). I think it's pretty likely > >that that actually saves overall IO, even including the WAL > >writes. Especially if there's other databases in the cluster at the same > >time. > > If you're moving a small amount of data, maybe. If you're moving > several hundred GB or more? You're going to flood WAL and probably > cause all replication/archiving to lag. I have hard time believing many people do AD ST on a database a couple hundred GB large. That'd mean the database is out of business for a significant amount of time (remember, there can't be any connected users during that time). I think realistically it's only used on smaller databases. The reason createdb()/movedb() work the way they do is imo simply because they're old. So far I can't see how the current solution can actually be made safe to be executed on a not yet consistent database. And we obviously can't wait for it to be consistent (as we're replaying a linear stream of WAL). > Is there some way we can add an option to control this? I'm thinking > that by default ADAT will error if a backup is running, but allow the > user to over-ride that. Why would we want to allow that? There's simply no way it's safe, so ...? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 1/23/15 9:10 AM, Andres Freund wrote: On 2015-01-22 22:58:17 +0100, Andres Freund wrote: On 2015-01-22 16:38:49 -0500, Stephen Frost wrote: I'm trying to figure out why you'd do '2' in master when in discussion of '1' you say "I also don't think ALTER DATABASE is even intentionally run at the time of a base backup." I agree with that sentiment and am inclined to say that '1' is good enough throughout. Because the way it currently works is a major crock. It's more luck than anything that it actually somewhat works. We normally rely on WAL to bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE we've ignored that. And. Hm. The difficulty of the current method is evidenced by the fact that so far nodoby recognized that 1) as described above isn't actually safe. It fails to protect against basebackups on a standby as its XLogCtl state will obviously not be visible on the master. Further evidenced by the fact that the current method isn't crash/shutdown safe at all. If a standby was shut down/crashed/was started on a base backup when a CREATE DATABASE from the primary is replayed the template database used can be in an nearly arbitrarily bad state. It'll later get fixed up by recovery - but those changes won't make it to the copied database. I think we all agree that ADAT can't run while a base backup is happening, which I believe is what you're describing above. We'd have to somehow cover that same scenario on replicas too. Perhaps there isn't really an issue here; I suspect ADAT is very rarely used. What I'm worried about though is that someone is using it regularly for some reason, with non-trivial databases, and this is going to completely hose them. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 2015-01-23 13:52:54 -0500, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > On 2015-01-22 20:54:47 -0500, Stephen Frost wrote: > > > * Bruce Momjian (br...@momjian.us) wrote: > > > > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: > > > > > Or do you - as the text edited in your patch, but not the quote above > > > > > - > > > > > mean to run pg_upgrade just on the primary and then rsync? > > > > > > > > No, I was going to run it on both, then rsync. > > > > > > I'm pretty sure this is all a lot easier than you believe it to be. If > > > you want to recreate what pg_upgrade does to a cluster then the simplest > > > thing to do is rsync before removing any of the hard links. rsync will > > > simply recreate the same hard link tree that pg_upgrade created when it > > > ran, and update files which were actually changed (the catalog tables). > > > > I don't understand why that'd be better than simply fixing (yes, that's > > imo the correct term) pg_upgrade to retain relfilenodes across the > > upgrade. Afaics there's no conflict risk and it'd make the clusters much > > more similar, which would be good; independent of rsyncing standbys. > > That's an entirely orthogonal discussion from the original one though, > no? Don't think so. > That wouldn't actually help with what Bruce is trying to do, which > is to duplicate the results of the pg_upgrade from the master over to > the standby. Well, it'd pretty much obliviate the need to run pg_upgrade on the standby. As there's no renamed files you don't need to muck around with leaving hardlinks in place and such just so that rsync recognizes unchanged files. > Trying to pg_upgrade both the master and the standby, to me at least, > seems like an even *worse* approach than trusting rsync with -H and > --size-only.. I think running pg_upgrade on the standby is a dangerous folly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 01/23/2015 10:44 AM, Jim Nasby wrote: number of workers especially at slightly higher worker count. Those fixed chunk numbers look pretty screwy. 2, 4 and 8 workers make no difference, then suddenly 16 cuts times by 1/2 to 1/3? Then 32 cuts time by another 1/2 to 1/3? cached? First couple of runs gets the relations into memory? JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2015-01-22 20:54:47 -0500, Stephen Frost wrote: > > * Bruce Momjian (br...@momjian.us) wrote: > > > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: > > > > Or do you - as the text edited in your patch, but not the quote above - > > > > mean to run pg_upgrade just on the primary and then rsync? > > > > > > No, I was going to run it on both, then rsync. > > > > I'm pretty sure this is all a lot easier than you believe it to be. If > > you want to recreate what pg_upgrade does to a cluster then the simplest > > thing to do is rsync before removing any of the hard links. rsync will > > simply recreate the same hard link tree that pg_upgrade created when it > > ran, and update files which were actually changed (the catalog tables). > > I don't understand why that'd be better than simply fixing (yes, that's > imo the correct term) pg_upgrade to retain relfilenodes across the > upgrade. Afaics there's no conflict risk and it'd make the clusters much > more similar, which would be good; independent of rsyncing standbys. That's an entirely orthogonal discussion from the original one though, no? That wouldn't actually help with what Bruce is trying to do, which is to duplicate the results of the pg_upgrade from the master over to the standby. Even if the relfilenodes were the same across the upgrade, I don't think it'd be a good idea to run pg_upgrade on the standby and hope the results match close enough to the master that you can trust updates to the catalog tables on the standby from the master going forward to work.. Trying to pg_upgrade both the master and the standby, to me at least, seems like an even *worse* approach than trusting rsync with -H and --size-only.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 1/22/15 3:18 PM, Andres Freund wrote: , but to then put it's entire contents into WAL? Blech. Besides actually having a chance of being correct, doing so will save having to do two checkpoints inside movedb(). I think it's pretty likely that that actually saves overall IO, even including the WAL writes. Especially if there's other databases in the cluster at the same time. If you're moving a small amount of data, maybe. If you're moving several hundred GB or more? You're going to flood WAL and probably cause all replication/archiving to lag. Obviously, we need to do be reliable, but I think a lot of users would much rather that they can't muck with tablespaces while a backup is running than an ADAT suddenly consumes way more resources than before. Is there some way we can add an option to control this? I'm thinking that by default ADAT will error if a backup is running, but allow the user to over-ride that. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 2015-01-22 20:54:47 -0500, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: > > > Or do you - as the text edited in your patch, but not the quote above - > > > mean to run pg_upgrade just on the primary and then rsync? > > > > No, I was going to run it on both, then rsync. > > I'm pretty sure this is all a lot easier than you believe it to be. If > you want to recreate what pg_upgrade does to a cluster then the simplest > thing to do is rsync before removing any of the hard links. rsync will > simply recreate the same hard link tree that pg_upgrade created when it > ran, and update files which were actually changed (the catalog tables). I don't understand why that'd be better than simply fixing (yes, that's imo the correct term) pg_upgrade to retain relfilenodes across the upgrade. Afaics there's no conflict risk and it'd make the clusters much more similar, which would be good; independent of rsyncing standbys. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/23/15 5:42 AM, Amit Kapila wrote: *Fixed-Chunks* *No. of workers/Time (ms)* >*0* *2* *4* *8* *16**24**32* Run-1 250536 266279 251263 234347 87930 50474 35474 Run-2 249587 230628 225648 193340 83036 35140 9100 Run-3 234963 220671 230002 256183 105382 62493 27903 Run-4 239111 245448 224057 189196 123780 63794 24746 Run-5 239937 222820 219025 220478 114007 77965 39766 The trend remains same although there is some variation. In block-by-block approach, it performance dips (execution takes more time) with more number of workers, though it stabilizes at some higher value, still I feel it is random as it leads to random scan. In Fixed-chunk approach, the performance improves with more number of workers especially at slightly higher worker count. Those fixed chunk numbers look pretty screwy. 2, 4 and 8 workers make no difference, then suddenly 16 cuts times by 1/2 to 1/3? Then 32 cuts time by another 1/2 to 1/3? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
* Jim Nasby (jim.na...@bluetreble.com) wrote: > On 1/22/15 7:54 PM, Stephen Frost wrote: > >* Bruce Momjian (br...@momjian.us) wrote: > >>>On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: > >Or do you - as the text edited in your patch, but not the quote above - > >mean to run pg_upgrade just on the primary and then rsync? > >>> > >>>No, I was going to run it on both, then rsync. > >I'm pretty sure this is all a lot easier than you believe it to be. If > >you want to recreate what pg_upgrade does to a cluster then the simplest > >thing to do is rsync before removing any of the hard links. rsync will > >simply recreate the same hard link tree that pg_upgrade created when it > >ran, and update files which were actually changed (the catalog tables). > > > >The problem, as mentioned elsewhere, is that you have to checksum all > >the files because the timestamps will differ. You can actually get > >around that with rsync if you really want though- tell it to only look > >at file sizes instead of size+time by passing in --size-only. > > What if instead of trying to handle that on the rsync side, we changed > pg_upgrade so that it created hardlinks that had the same timestamp as the > original file? So, two things, I chatted w/ Bruce and he was less concerned about the lack of being able to match up the timestamps than I was. He has a point though- the catalog tables are going to get copied anyway since they won't be hard links and checking that all the other files match in size and that both the master and the standby are at the same xlog position should give you a pretty good feeling that everything matches up sufficiently. Second, I don't follow what you mean by having pg_upgrade change the hardlinks to have the same timestamp- for starters, the timestamp is in the inode and not the actual hard link (two files hard linked together won't have different timestamps..) and second, the problem isn't on the master side- it's on the standby side. The standby's files will have timestamps different from the master and there really isn't much to be done about that. > That said, the whole timestamp race condition in rsync gives me the > heebie-jeebies. For normal workloads maybe it's not that big a deal, but when > dealing with fixed-size data (ie: Postgres blocks)? Eww. The race condition is a problem for pg_start/stop_backup and friends. In this instance, everything will be shut down when the rsync is running, so there isn't a timestamp race condition to worry about. > How horribly difficult would it be to allow pg_upgrade to operate on multiple > servers? Could we have it create a shell script instead of directly modifying > things itself? Or perhaps some custom "command file" that could then be > replayed by pg_upgrade on another server? Of course, that's assuming that > replicas are compatible enough with masters for that to work... Yeah, I had suggested that to Bruce also, but it's not clear why that would be any different from an rsync --size-only in the end, presuming everything went according to plan. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade and rsync
On 1/22/15 7:54 PM, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us) wrote: >On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote: > >Or do you - as the text edited in your patch, but not the quote above - > >mean to run pg_upgrade just on the primary and then rsync? > >No, I was going to run it on both, then rsync. I'm pretty sure this is all a lot easier than you believe it to be. If you want to recreate what pg_upgrade does to a cluster then the simplest thing to do is rsync before removing any of the hard links. rsync will simply recreate the same hard link tree that pg_upgrade created when it ran, and update files which were actually changed (the catalog tables). The problem, as mentioned elsewhere, is that you have to checksum all the files because the timestamps will differ. You can actually get around that with rsync if you really want though- tell it to only look at file sizes instead of size+time by passing in --size-only. What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file? That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww. How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Alvaro Herrera wrote: > I'm tweaking your v24 a bit more now, thanks -- main change is to make > vacuum_one_database be called only to run one analyze stage, so it never > iterates for each stage; callers must iterate calling it multiple times > in those cases. (There's only one callsite that needs changing anyway.) I made some more changes, particularly so that the TAP test pass (we were missing the semicolon when a table name was not specified to prepare_vacuum_command). I reordered the code in a more sensible manner, remove the vacuum_database_stage layer (which was pretty useless), and changed the analyze-in-stages mode: if we pass a valid stage number, run that stage, if not, then we're not in analyze-in-stage mode. So I got rid of the boolean flag altogether. I also moved the per-stage commands and messages back into a struct inside a function, since there's no need to have them be file-level variables anymore. -j1 is now the same as not specifying anything, and vacuum_one_database uses more common code in the parallel and not-parallel cases: the not-parallel case is just the parallel case with a single connection, so the setup and shutdown is mostly the same in both cases. I pushed the result. Please test, particularly on Windows. If you can use configure --enable-tap-tests and run them ("make check" in the src/bin/scripts subdir) that would be good too .. not sure whether that's expected to work on Windows. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Jim Nasby (jim.na...@bluetreble.com) wrote: > On 1/21/15 6:50 PM, Stephen Frost wrote: > >>I'm still nervous about overloading this onto the roles system; I think it > >>will end up being very easy to accidentally break. But if others think > >>it'll work then I guess I'm just being paranoid. > >Break in which way..? If you're saying "it'll be easy for a user to > >misconfigure" then I might agree with you- but documentation and > >examples can help to address that. > > I'm worried about user misconfiguration. Setting up a good system of roles > (as in, distinguishing between application accounts, users, account(s) used > to deploy code changes, DBAs, etc) is already tricky because of all the > different use cases to consider. I fear that adding auditing to that matrix > is just going to make it worse. Even with an in-core solution, users would need to work out who should be able to configure auditing.. I agree that seeing the permission grants to the auditing roles might be confusing for folks who have not seen it before, but I think that'll quickly resolve itself since the only people who would see that are those who want to use pgaudit... > I do like Robert's idea of role:action:object triplets more, though I'm not > sure it's enough. For example, what happens if you I'd suggest considering what happens if you: ALTER ROLE su_role RENAME TO new_su_role; Or if you want to grant a non-superuser the ability to modify the auditing rules.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 21, 2015 at 7:05 PM, Jim Nasby wrote: > > I'm still nervous about overloading this onto the roles system; I think it > > will end up being very easy to accidentally break. But if others think it'll > > work then I guess I'm just being paranoid. > > I agree with you. I don't hear anyone who actually likes that > proposal except for Stephen. The list of people who don't like it > seems to include the patch author, which strikes me as a rather > significant point. Just to clarify- this concept isn't actually mine but was suggested by a pretty sizable PG user who has a great deal of familiarity with other databases. I don't mean to try and invoke the 'silent majority' but rather to make sure folks don't think this is my idea alone or that it's only me who thinks it makes sense. :) Simon had weighed in earlier with, iirc, a comment that he thought it was a good approach also, though that was a while ago and things have changed. I happen to like the idea specifically because it would allow regular roles to change the auditing settings (no need to be a superuser or to be able to modify postgresql.conf/postgresql.auto.conf), it would provide the level of granularity desired, would follow table, column, role changes, renames, drops, recreations, the dependency system would function as expected, etc, while keeping it as an extension, which I understood to be pretty desirable, especially initially. * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 21, 2015 at 1:42 AM, Abhijit Menon-Sen > wrote: > > At 2015-01-20 20:36:39 -0500, robertmh...@gmail.com wrote: > >> I think this is throwing the baby out with the bathwater. Neither C > >> functions nor all-or-nothing are going to be of any practical use. > > > > Do you see some approach that has a realistic chance of making 9.5 and > > would also actually be worth putting into 9.5? Suggestions very welcome. > > Well, I'm still of the view that there's little to lose by having this > remain out-of-core for a release or three. We don't really all agree > on what we want, and non-core code can evolve a lot faster than core > code, so what's the rush? Being out of core makes it unusable in production environments for a large number of organizations. :/ > I'm coming around to the view that we're going to need fairly deep > integration to make this work nicely. It seems to me that the natural > way of controlling auditing of a table is with table or column options > on that table, but that requires either an extensible reloptions > framework that we don't have today, or that it be in the core server. > Similarly, the nice way of controlling options on a user is some > property attached to the user: audit operations X, Y, Z when performed > by this user. This is basically the same position which I held about a year ago when we were discussing this, but there was quite a bit of push-back on having an in-core solution, from the additional catalog tables to making the grammar larger and slowing things down. For my 2c, auditing is more than valuable enough to warrant those compromises, but I'm not anxious to spend time developing an in-core solution only to get it shot down after all the work has been put into it. Still, if folks have come around to feeling like an in-core solution makes sense, I'm certainly willing to work towards making that happen; it's, after all, what I've been interested in for years. :) > If you held a gun to my head and said, we must have something this > release, I'd probably go for a GUC with a value that is a > comma-separated list of role:operation:object triplets, like this: > > audit_stuff = 'alice:*:*, *:delete:*, bob:*:table secret' > > ...read as: > > - Audit everything alice does. > - Audit all deletes by anyone. > - Audit all access by bob to table secret. And this approach doesn't address any of the issues mentioned above, unfortunately, which would make it pretty difficult to really use. I think I'd rather deal with pgaudit being outside of the tree than pursue an approach which has so many issues. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/21/15 6:50 PM, Stephen Frost wrote: I'm still nervous about overloading this onto the roles system; I think it will end up being very easy to accidentally break. But if others think it'll work then I guess I'm just being paranoid. Break in which way..? If you're saying "it'll be easy for a user to misconfigure" then I might agree with you- but documentation and examples can help to address that. I'm worried about user misconfiguration. Setting up a good system of roles (as in, distinguishing between application accounts, users, account(s) used to deploy code changes, DBAs, etc) is already tricky because of all the different use cases to consider. I fear that adding auditing to that matrix is just going to make it worse. I do like Robert's idea of role:action:object triplets more, though I'm not sure it's enough. For example, what happens if you CREATE ROLE su SUPERUSER NOINHERIT NOLOGIN; CREATE ROLE su_role IN ROLE su NOLOGIN; GRANT su_role TO bob; and have su_role:*:* Does bob get audited all the time then? Only when he does SET ROLE su? For that matter, how does SET ROLE affect auditing? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Andres Freund wrote: > On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: > > + PQsetnonblocking(connSlot[0].connection, 1); > > + > > + for (i = 1; i < concurrentCons; i++) > > + { > > + connSlot[i].connection = connectDatabase(dbname, host, port, > > username, > > + > > prompt_password, progname, false); > > + > > + PQsetnonblocking(connSlot[i].connection, 1); > > + connSlot[i].isFree = true; > > + connSlot[i].sock = PQsocket(connSlot[i].connection); > > + } > > Are you sure about this global PQsetnonblocking()? This means that you > might not be able to send queries... And you don't seem to be waiting > for sockets waiting for writes in the select loop - which means you > might end up being stuck waiting for reads when you haven't submitted > the query. > > I think you might need a more complex select() loop. On nonfree > connections also wait for writes if PQflush() returns != 0. I removed the PQsetnonblocking() calls. They were a misunderstanding, I think. > > +/* > > + * GetIdleSlot > > + * Process the slot list, if any free slot is available then return > > + * the slotid else perform the select on all the socket's and wait > > + * until atleast one slot becomes available. > > + */ > > +static int > > +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, > > + const char *progname, bool completedb) > > +{ > > + int i; > > + fd_set slotset; > > > Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. I tried without the check to use 1500 connections, and select() didn't even blink -- everything worked fine vacuuming 1500 tables in parallel on a set of 2000 tables. Not sure what's the actual limit but my FD_SETSIZE says 1024, so I'm clearly over the limit. (I tried to run it with -j2000 but the server didn't start with that many connections. I didn't try any intermediate numbers.) Anyway I added the check. I fixed some more minor issues and pushed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Thu, Jan 22, 2015 at 5:51 PM, Peter Geoghegan wrote: > That having been said, it's clearer to continue to handle each case (C > locale vs other locales) separately within the new > bttext_abbrev_convert() function, just to be consistent, but also to > avoid NUL-terminating the text strings to pass to strxfrm(), which as > you point out is an avoidable cost. So, I'm not asking you to restore > the terser uniform use of strxfrm() we had before, but, for what it's > worth, that was not the real issue. The real issue was that strxfrm() > spuriously used the wrong locale, as you said. Provided strxfrm() had > the correct locale (the C locale), then AFAICT it ought to have been > fine, regardless of whether or not it then behave exactly equivalently > to memcpy(). I don't agree. On a system where HAVE_LOCALE_T is not defined, there is *no way* to get strxfrm() to behave like memcpy(), because we're not passing a locale to it. Clearly, strxfrm() is going to do a locale-aware transformation in that case whether the user wrote collate "C" or not. The comments in regc_pg_locale.c explain: * 1. In C/POSIX collations, we use hard-wired code. We can't depend on * the functions since those will obey LC_CTYPE. Note that these * collations don't give a fig about multibyte characters. * * 2. In the "default" collation (which is supposed to obey LC_CTYPE): * * 2a. When working in UTF8 encoding, we use the functions if * available. This assumes that every platform uses Unicode codepoints * directly as the wchar_t representation of Unicode. On some platforms * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0x. * * 2b. In all other encodings, or on machines that lack , we use * the functions for pg_wchar values up to 255, and punt for values * above that. This is only 100% correct in single-byte encodings such as * LATINn. However, non-Unicode multibyte encodings are mostly Far Eastern * character sets for which the properties being tested here aren't very * relevant for higher code values anyway. The difficulty with using the * functions with non-Unicode multibyte encodings is that we can * have no certainty that the platform's wchar_t representation matches * what we do in pg_wchar conversions. * * 3. Other collations are only supported on platforms that HAVE_LOCALE_T. * Here, we use the locale_t-extended forms of the and * functions, under exactly the same cases as #2. In other words, even on systems that don't HAVE_LOCALE_T, we still have to support the default collation and the C collation, and they have to behave differently. There's no way to make that work using only strxfrm(), because nothing gets passed to that function to tell it which of those two things it is supposed to do. Now even if the above were not an issue, for example because we dropped support for systems where !HAVE_LOCALE_T, I think it would be poor form to depend on strxfrm_l() to behave like memcpy() where we could just as easily call memcpy() and be *sure* that it was going to do what we wanted. Much of writing good code is figuring out what could go wrong and then figuring out how to prevent it, and relying on strxfrm_l() would be an unnecessary risk regardless. Given the existence of !HAVE_LOCALE_T systems, it's just plain broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Wed, Jan 21, 2015 at 1:42 AM, Abhijit Menon-Sen wrote: > At 2015-01-20 20:36:39 -0500, robertmh...@gmail.com wrote: >> I think this is throwing the baby out with the bathwater. Neither C >> functions nor all-or-nothing are going to be of any practical use. > > Do you see some approach that has a realistic chance of making 9.5 and > would also actually be worth putting into 9.5? Suggestions very welcome. Well, I'm still of the view that there's little to lose by having this remain out-of-core for a release or three. We don't really all agree on what we want, and non-core code can evolve a lot faster than core code, so what's the rush? I'm coming around to the view that we're going to need fairly deep integration to make this work nicely. It seems to me that the natural way of controlling auditing of a table is with table or column options on that table, but that requires either an extensible reloptions framework that we don't have today, or that it be in the core server. Similarly, the nice way of controlling options on a user is some property attached to the user: audit operations X, Y, Z when performed by this user. If you held a gun to my head and said, we must have something this release, I'd probably go for a GUC with a value that is a comma-separated list of role:operation:object triplets, like this: audit_stuff = 'alice:*:*, *:delete:*, bob:*:table secret' ...read as: - Audit everything alice does. - Audit all deletes by anyone. - Audit all access by bob to table secret. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Wed, Jan 21, 2015 at 7:05 PM, Jim Nasby wrote: > I'm still nervous about overloading this onto the roles system; I think it > will end up being very easy to accidentally break. But if others think it'll > work then I guess I'm just being paranoid. I agree with you. I don't hear anyone who actually likes that proposal except for Stephen. The list of people who don't like it seems to include the patch author, which strikes me as a rather significant point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Wed, Jan 21, 2015 at 2:22 AM, Peter Geoghegan wrote: > You'll probably prefer the attached. This patch works by disabling > abbreviation, but only after writing out runs, with the final merge > left to go. That way, it doesn't matter when abbreviated keys are not > read back from disk (or regenerated). Yes, this seems like the way to go for now. Thanks, committed. And thanks to Andrew for spotting this so quickly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Fri, Jan 23, 2015 at 2:18 AM, David Rowley wrote: > On 20 January 2015 at 17:10, Peter Geoghegan wrote: >> >> On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier >> wrote: >> >> > With your patch applied, the failure with MSVC disappeared, but there >> > is still a warning showing up: >> > (ClCompile target) -> >> > src\backend\lib\hyperloglog.c(73): warning C4334: '<<' : result of >> > 32-bit shift implicitly converted to 64 bits (was 64-bit shift >> > intended? >> >> That seems harmless. I suggest an explicit cast to "Size" here. > > This caught my eye too. > > I agree about casting to Size. > > Patch attached. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Dilip kumar wrote: > Changes: > 1. In current patch vacuum_one_database (for table list), have the table loop > outside and analyze_stage loop inside, so it will finish > All three stage for one table first and then pick the next table. But > vacuum_one_database_parallel will do the stage loop outside and will call > run_parallel_vacuum, > Which will have table loop, so for one stage all the tables will be vacuumed > first, then go to next stage. > So for merging two function both functions behaviors should be identical, I > think if user have given a list of tables in analyze-in-stages, than doing > all the table > Atleast for one stage and then picking next stage will be better solution so > I have done it that way. Yeah, I think the stages loop should be outermost, as discussed upthread somewhere -- it's better to have initial stats for all tables as soon as possible, and improve them later, than have some tables/dbs with no stats for a longer period while full stats are computed for some specific tables/database. I'm tweaking your v24 a bit more now, thanks -- main change is to make vacuum_one_database be called only to run one analyze stage, so it never iterates for each stage; callers must iterate calling it multiple times in those cases. (There's only one callsite that needs changing anyway.) > 2. in select_loop > For WIN32 TranslateSocketError function I replaced with > if (WSAGetLastError() == WSAEINTR) > errno == EINTR; > > otherwise I have to expose TranslateSocketError function from socket and > include extra header. Grumble. Don't like this bit, but moving TranslateSocketError to src/common is outside the scope of this patch, so okay. (pg_dump's parallel stuff has the same issue anyway.) In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) > I have tested in windows also its working fine. Great, thanks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Stephen Frost wrote: > > > > +#define GetModifiedColumns(relinfo, estate) \ > > > > + (rt_fetch((relinfo)->ri_RangeTableIndex, > > > > (estate)->es_range_table)->modifiedCols) > > > > > > I assume you are aware that this GetModifiedColumns() macro is a > > > duplicate of another one found elsewhere. However I think this is not > > > such a hot idea; the UPSERT patch series has a preparatory patch that > > > changes that other macro definition, as far as I recall; probably it'd > > > be a good idea to move it elsewhere, to avoid a future divergence. > > > > Yeah, I had meant to do something about that and had looked around but > > didn't find any particularly good place to put it. Any suggestions on > > that? > > Hmm, tough call now that I look it up. This macro depends on > ResultRelInfo and EState, both of which are in execnodes.h, and also on > rt_fetch which is in parsetree.h. There is no existing header that > includes parsetree.h (only .c files), so we would have to add one > inclusion on some other header file, or create a new header with just > this definition. None of these sounds real satisfactory (including > parsetree.h in execnodes.h sounds very bad). Maybe just add a comment > on both definitions to note that they are dupes of each other? That > would be more backpatchable that anything else that occurs to me right > away. Good thought, I'll do that. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal: knowing detail of config files via SQL
On Fri, Jan 23, 2015 at 4:36 AM, Jim Nasby wrote: > > > Would this view have a row for every option in a config file? IE: if you set > something in both postgresql.conf and postgresql.auto.conf, would it show up > twice? I think it should, and that there should be a way to see which > setting is actually in effect. yes. > It looks like you're attempting to handle #include, yes? Of course yes. >> Also other idea is to add additional columns existing view >> (pg_settings), according to prev discussion. > > > I think it would be useful to have a separate view that shows all > occurrences of a setting. I recall some comment about source_file and > source_line not always being correct in pg_settings; if that's true we > should fix that. You mean that pg_settings view shows the variable in current session? pg_settings view shows can be changed if user set the GUC parameter in session or GUC parameter is set to role individually. But I don't think pg_file_settings view has such a behavior. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-22 22:58:17 +0100, Andres Freund wrote: > On 2015-01-22 16:38:49 -0500, Stephen Frost wrote: > > I'm trying to figure out why you'd do '2' in master when in discussion > > of '1' you say "I also don't think ALTER DATABASE is even intentionally > > run at the time of a base backup." I agree with that sentiment and am > > inclined to say that '1' is good enough throughout. > > Because the way it currently works is a major crock. It's more luck than > anything that it actually somewhat works. We normally rely on WAL to > bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE > we've ignored that. > And. Hm. The difficulty of the current method is evidenced by the fact > that so far nodoby recognized that 1) as described above isn't actually > safe. It fails to protect against basebackups on a standby as its > XLogCtl state will obviously not be visible on the master. Further evidenced by the fact that the current method isn't crash/shutdown safe at all. If a standby was shut down/crashed/was started on a base backup when a CREATE DATABASE from the primary is replayed the template database used can be in an nearly arbitrarily bad state. It'll later get fixed up by recovery - but those changes won't make it to the copied database. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
Stephen Frost wrote: > > > +#define GetModifiedColumns(relinfo, estate) \ > > > + (rt_fetch((relinfo)->ri_RangeTableIndex, > > > (estate)->es_range_table)->modifiedCols) > > > > I assume you are aware that this GetModifiedColumns() macro is a > > duplicate of another one found elsewhere. However I think this is not > > such a hot idea; the UPSERT patch series has a preparatory patch that > > changes that other macro definition, as far as I recall; probably it'd > > be a good idea to move it elsewhere, to avoid a future divergence. > > Yeah, I had meant to do something about that and had looked around but > didn't find any particularly good place to put it. Any suggestions on > that? Hmm, tough call now that I look it up. This macro depends on ResultRelInfo and EState, both of which are in execnodes.h, and also on rt_fetch which is in parsetree.h. There is no existing header that includes parsetree.h (only .c files), so we would have to add one inclusion on some other header file, or create a new header with just this definition. None of these sounds real satisfactory (including parsetree.h in execnodes.h sounds very bad). Maybe just add a comment on both definitions to note that they are dupes of each other? That would be more backpatchable that anything else that occurs to me right away. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perl coding error in msvc build system?
On 01/23/2015 03:17 AM, Abhijit Menon-Sen wrote: At 2014-06-03 22:30:50 -0400, pete...@gmx.net wrote: I'm not sure whether the following coding actually detects any errors: Solution.pm: open(P, "cl /? 2>&1 |") || die "cl command not found"; Since nobody with a Windows system has commented, I'm just writing to say that from a Perl perspective, I agree with your analysis and the patch looks perfectly sensible. Not quite. This line: if ($output =~ /^\/favor:<.+AMD64/) needs an m modifier on the regex, I think, so that the ^ matches any beginning of line, not just the first. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH CHECK and Column-Level Privileges
Alvaro, Thanks for the review. * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Note the first comment in this hunk was not update to talk about NULL > instead of "": Ah, good catch, will fix. > Hm, this is a bit odd. I thought you were going to return the subset of > columns that the user had permission to read, not empty if any of them > was inaccesible. Did I misunderstand? The subset is for regular relations. For indexes and keys, we only return either the whole key or nothing. Returning a partial key didn't make much sense to me and we also don't know which columns were actually provided by the user since it's going through the index AM, so we can't return just what the user provided. > > +#define GetModifiedColumns(relinfo, estate) \ > > + (rt_fetch((relinfo)->ri_RangeTableIndex, > > (estate)->es_range_table)->modifiedCols) > > I assume you are aware that this GetModifiedColumns() macro is a > duplicate of another one found elsewhere. However I think this is not > such a hot idea; the UPSERT patch series has a preparatory patch that > changes that other macro definition, as far as I recall; probably it'd > be a good idea to move it elsewhere, to avoid a future divergence. Yeah, I had meant to do something about that and had looked around but didn't find any particularly good place to put it. Any suggestions on that? > > @@ -1689,25 +1722,54 @@ ExecWithCheckOptions(ResultRelInfo *resultRelInfo, > > * dropped columns. We used to use the slot's tuple descriptor to decode > > the > > * data, but the slot's descriptor doesn't identify dropped columns, so we > > * now need to be passed the relation's descriptor. > > + * > > + * Note that, like BuildIndexValueDescription, if the user does not have > > + * permission to view any of the columns involved, a NULL is returned. > > Unlike > > + * BuildIndexValueDescription, if the user has access to view a subset of > > the > > + * column involved, that subset will be returned with a key identifying > > which > > + * columns they are. > > */ > > Ah, I now see that you are aware of the NULL-or-nothing nature of > BuildIndexValueDescription ... but the comments there don't explain the > reason. Perhaps a comment in BuildIndexValueDescription is in order > rather than a code change? Yeah, I'll add comments to BuildIndexValueDescription to explain why it's all-or-nothing. I've also been working on back-patching the fixes; the next update will hopefully include patches for all branches. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WITH CHECK and Column-Level Privileges
Note the first comment in this hunk was not update to talk about NULL instead of "": > @@ -163,13 +168,63 @@ BuildIndexValueDescription(Relation indexRelation, > Datum *values, bool *isnull) > { > StringInfoData buf; > + Form_pg_index idxrec; > + HeapTuple ht_idx; > int natts = indexRelation->rd_rel->relnatts; > int i; > + int keyno; > + Oid indexrelid = RelationGetRelid(indexRelation); > + Oid indrelid; > + AclResult aclresult; > + > + /* > + * Check permissions- if the users does not have access to view the > + * data in the key columns, we return "" instead, which callers can > + * test for and use or not accordingly. > + * > + * First we need to check table-level SELECT access and then, if > + * there is no access there, check column-level permissions. > + */ [hunk continues] > + /* Table-level SELECT is enough, if the user has it */ > + aclresult = pg_class_aclcheck(indrelid, GetUserId(), ACL_SELECT); > + if (aclresult != ACLCHECK_OK) > + { > + /* > + * No table-level access, so step through the columns in the > + * index and make sure the user has SELECT rights on all of > them. > + */ > + for (keyno = 0; keyno < idxrec->indnatts; keyno++) > + { > + AttrNumber attnum = idxrec->indkey.values[keyno]; > + > + aclresult = pg_attribute_aclcheck(indrelid, attnum, > GetUserId(), > + > ACL_SELECT); > + > + if (aclresult != ACLCHECK_OK) > + { > + /* No access, so clean up and return */ > + ReleaseSysCache(ht_idx); > + return NULL; > + } > + } > + } Hm, this is a bit odd. I thought you were going to return the subset of columns that the user had permission to read, not empty if any of them was inaccesible. Did I misunderstand? > diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c > index 4c1..20acf04 100644 > --- a/src/backend/executor/execMain.c > +++ b/src/backend/executor/execMain.c > @@ -82,12 +82,17 @@ static void ExecutePlan(EState *estate, PlanState > *planstate, > DestReceiver *dest); > static bool ExecCheckRTEPerms(RangeTblEntry *rte); > static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt); > -static char *ExecBuildSlotValueDescription(TupleTableSlot *slot, > +static char *ExecBuildSlotValueDescription(Oid reloid, > + TupleTableSlot *slot, > TupleDesc tupdesc, > + Bitmapset > *modifiedCols, > int maxfieldlen); > static void EvalPlanQualStart(EPQState *epqstate, EState *parentestate, > Plan *planTree); > > +#define GetModifiedColumns(relinfo, estate) \ > + (rt_fetch((relinfo)->ri_RangeTableIndex, > (estate)->es_range_table)->modifiedCols) I assume you are aware that this GetModifiedColumns() macro is a duplicate of another one found elsewhere. However I think this is not such a hot idea; the UPSERT patch series has a preparatory patch that changes that other macro definition, as far as I recall; probably it'd be a good idea to move it elsewhere, to avoid a future divergence. > @@ -1689,25 +1722,54 @@ ExecWithCheckOptions(ResultRelInfo *resultRelInfo, > * dropped columns. We used to use the slot's tuple descriptor to decode the > * data, but the slot's descriptor doesn't identify dropped columns, so we > * now need to be passed the relation's descriptor. > + * > + * Note that, like BuildIndexValueDescription, if the user does not have > + * permission to view any of the columns involved, a NULL is returned. > Unlike > + * BuildIndexValueDescription, if the user has access to view a subset of the > + * column involved, that subset will be returned with a key identifying which > + * columns they are. > */ Ah, I now see that you are aware of the NULL-or-nothing nature of BuildIndexValueDescription ... but the comments there don't explain the reason. Perhaps a comment in BuildIndexValueDescription is in order rather than a code change? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jan 22, 2015 at 7:23 PM, Robert Haas wrote: > > On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila wrote: > > 1. Scanning block-by-block has negative impact on performance and > > I thin it will degrade more if we increase parallel count as that can lead > > to more randomness. > > > > 2. Scanning in fixed chunks improves the performance. Increasing > > parallel count to a very large number might impact the performance, > > but I think we can have a lower bound below which we will not allow > > multiple processes to scan the relation. > > I'm confused. Your actual test numbers seem to show that the > performance with the block-by-block approach was slightly higher with > parallelism than without, where as the performance with the > chunk-by-chunk approach was lower with parallelism than without, but > the text quoted above, summarizing those numbers, says the opposite. > > Also, I think testing with 2 workers is probably not enough. I think > we should test with 8 or even 16. > Below is the data with more number of workers, the amount of data and other configurations remains as previous, I have only increased parallel worker count: *Block-By-Block* *No. of workers/Time (ms)* *0* *2* *4* *8* *16* *24* *32* Run-1 257851 287353 350091 330193 284913 338001 295057 Run-2 263241 314083 342166 347337 378057 351916 348292 Run-3 315374 334208 389907 340327 328695 330048 330102 Run-4 301054 312790 314682 352835 323926 324042 302147 Run-5 304547 314171 349158 350191 350468 341219 281315 *Fixed-Chunks* *No. of workers/Time (ms)* *0* *2* *4* *8* *16* *24* *32* Run-1 250536 266279 251263 234347 87930 50474 35474 Run-2 249587 230628 225648 193340 83036 35140 9100 Run-3 234963 220671 230002 256183 105382 62493 27903 Run-4 239111 245448 224057 189196 123780 63794 24746 Run-5 239937 222820 219025 220478 114007 77965 39766 The trend remains same although there is some variation. In block-by-block approach, it performance dips (execution takes more time) with more number of workers, though it stabilizes at some higher value, still I feel it is random as it leads to random scan. In Fixed-chunk approach, the performance improves with more number of workers especially at slightly higher worker count. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 22 January 2015 23:16, Alvaro Herrera Wrote, > Here's v23. > > There are two things that continue to bother me and I would like you, > dear patch author, to change them before committing this patch: > > 1. I don't like having vacuum_one_database() and a separate > vacuum_one_database_parallel(). I think we should merge them into one > function, which does either thing according to parameters. There's > plenty in there that's duplicated. > > 2. in particular, the above means that run_parallel_vacuum can no > longer exist as it is. Right now vacuum_one_database_parallel relies > on run_parallel_vacuum to do the actual job parallellization. I would > like to have that looping in the improved vacuum_one_database() > function instead. > Looking forward to v24, Thanks you for your effort, I have tried to change the patch as per your instructions and come up with v24, Changes: 1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so it will finish All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loop outside and will call run_parallel_vacuum, Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage. So for merging two function both functions behaviors should be identical, I think if user have given a list of tables in analyze-in-stages, than doing all the table Atleast for one stage and then picking next stage will be better solution so I have done it that way. 2. in select_loop For WIN32 TranslateSocketError function I replaced with if (WSAGetLastError() == WSAEINTR) errno == EINTR; otherwise I have to expose TranslateSocketError function from socket and include extra header. I have tested in windows also its working fine. Regards, Dilip vacuumdb_parallel_v24.patch Description: vacuumdb_parallel_v24.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
At 2014-12-24 08:10:46 -0500, pete...@gmx.net wrote: > > As a demo for how this might look, attached is a wildly incomplete > patch to produce GNU long-link headers. Hi Peter. In what way exactly is this patch wildly incomplete? (I ask because it's been added to the current CF). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perl coding error in msvc build system?
At 2014-06-03 22:30:50 -0400, pete...@gmx.net wrote: > > I'm not sure whether the following coding actually detects any errors: > > Solution.pm: > > open(P, "cl /? 2>&1 |") || die "cl command not found"; Since nobody with a Windows system has commented, I'm just writing to say that from a Perl perspective, I agree with your analysis and the patch looks perfectly sensible. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers