Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
Mark Dilgerwrites: >> On Dec 12, 2015, at 3:42 PM, Tom Lane wrote: >> ... In general, though, I'd rather not try to >> teach InteractiveBackend() such a large amount about SQL syntax. > I use CREATE RULE within startup files in the fork that I maintain. I have > lots of them, totaling perhaps 50k lines of rule code. I don't think any of > that > code would have a problem with the double-newline separation you propose, > which seems a more elegant solution to me. Yeah? Just for proof-of-concept, could you run your startup files with the postgres.c patch as proposed, and see whether you get any failures? > Admittedly, the double-newline > separation would need to be documented at the top of each sql file, otherwise > it would be quite surprising to those unfamiliar with it. Agreed, that wouldn't be a bad thing. I thought of a positive argument not to do the "fully right" thing by means of implementing the exactly-right command boundary rules. Suppose that you mess up in information_schema.sql or another large input file by introducing an extra left parenthesis in some query. What would happen if InteractiveBackend() were cognizant of the paren-matching rule is that it would slurp everything till the end-of-file and then produce a syntax error message quoting all that text; not much better than what happens today. With a command break rule like semi-newline-newline, there'll be a limited horizon as to how much text gets swallowed before you get the error message. Note that this is different from the situation with a fully interactive input processor like psql: if you're typing the same thing in psql, you'll realize as soon as it doesn't execute the command when-expected that something is wrong. You won't type another thousand lines of input before looking closely at what you typed already. I'm still not quite sold on semi-newline-newline as being the best possible command boundary rule here; but I do think that "fully correct" boundary rules are less attractive than they might sound. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabling an index temporarily
On Sun, Dec 13, 2015 at 1:16 AM, Jaime Casanova < jaime.casan...@2ndquadrant.com> wrote: > indexrelid = 'indexname'::regclass; This works, but might bloat system catalog.
Re: [HACKERS] [patch] Proposal for \rotate in psql
2015-12-10 19:29 GMT+01:00 Pavel Stehule: > > > >> postgres=# \crosstabview 4 +month label >> > > Maybe using optional int order column instead label is better - then you > can do sort on client side > > so the syntax can be "\crosstabview VCol [+/-]HCol [[+-]HOrderCol] > here is patch - supported syntax: \crosstabview VCol [+/-]HCol [HOrderCol] Order column should to contains any numeric value. Values are sorted on client side Regards Pavel > > Regards > > Pavel > > > >> >> ┌──┬───┬───┬┬───┬───┬───┬───┬───┬───┬───┬───┐ >> │ customer │ led │ úno │ bře │ dub │ kvě │ >> čen │ čec │ srp │ zář │ říj │ lis │ >> >> ╞══╪═══╪═══╪╪═══╪═══╪═══╪═══╪═══╪═══╪═══╪═══╡ >> │ A** │ │ ││ │ >> │ │ │ │ │ 13000 │ │ >> │ A│ │ │ 8000 │ │ >> │ │ │ │ │ │ │ >> │ B* │ │ ││ │ >> │ │ │ │ │ │ 3200 │ >> │ B*** │ │ ││ │ >> │ │ │ │ 26200 │ │ │ >> │ B* │ │ ││ │ >> │ │ 14000 │ │ │ │ │ >> │ C** │ │ ││ 7740 │ >> │ │ │ │ │ │ │ >> │ C*** │ │ ││ │ >> │ │ │ │ 26000 │ │ │ >> │ C* │ │ ││ 12000 │ >> │ │ │ │ │ │ │ >> │ G*** │ 30200 │ 26880 │ 13536 │ 39360 │ 60480 │ >> 54240 │ 44160 │ 16320 │ 29760 │ 22560 │ 20160 │ >> │ G*** │ │ ││ │ │ >> 25500 │ │ │ │ │ │ >> │ G** │ │ ││ │ │ >> 16000 │ │ │ │ │ │ >> │ I* │ │ ││ │ >> │ │ │ 27920 │ │ │ │ >> │ i│ │ ││ 13500 │ >> │ │ │ │ │ │ │ >> │ n* │ │ ││ │ >> │ │ 12600 │ │ │ │ │ >> │ Q** │ │ ││ │ 16700 >> │ │ │ │ │ │ │ >> │ S*** │ │ ││ │ >> │ │ 8000 │ │ │ │ │ >> │ S*** │ │ ││ │ 5368 >> │ │ │ │ │ │ │ >> │ s*** │ │ │ 5000 │ 3200 │ >> │ │ │ │ │ │ │ >> >> └──┴───┴───┴┴───┴───┴───┴───┴───┴───┴───┴───┘ >> (18 rows) >> >> >> >> Regards >> >> Pavel >> >> >>> >>> >>> Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite >>> >>> >> > diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml new file mode 100644 index 47e9da2..a45f4b8 *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** lo_import 152801 *** 2461,2466 --- 2461,2555 + + \crosstabview [ colV [-|+]colH ] + + + Execute the current query buffer (like \g) and shows the results + inside a crosstab grid. The output column colV + becomes a vertical header and the output column + colH becomes a horizontal header. + The results for the other output columns are projected inside the grid. + + + + colV + and colH can indicate a + column position (starting at 1), or a column name. Normal case folding + and quoting rules apply on column names. By default, + colV is column 1 + and colH is column 2. + A query having only one output column cannot be viewed in crosstab, and + colH must differ from + colV. + + + + The vertical header, displayed as the leftmost column, + contains the set of all distinct values found in + column colV, in the order + of their first appearance in the query results. + + + The horizontal header, displayed as the first row, + contains the set of all distinct non-null values found in + column colH. They come + by default in their order of appearance in the query results, or in ascending + order if
Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function
On Sun, Dec 13, 2015 at 12:07 AM, Tomas Vondrawrote: > Hi, > > attached is a patch adding a function pg_current_xlog_flush_location(), > which proved quite useful when investigating the ext4 data loss bug. It's > mostly what was already sent to that thread, except for docs that were > missing in the initial version. > > /* + * Get latest WAL flush pointer + */ +XLogRecPtr +GetXLogFlushRecPtr(void) +{ + SpinLockAcquire(>info_lck); + LogwrtResult = XLogCtl->LogwrtResult; + SpinLockRelease(>info_lck); + + return LogwrtResult.Flush; +} + Is there a reason why you can't use existing function GetFlushRecPtr() in xlog.c? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinkerwrote: > > > What, if any, other load should be placed on the underlying table during the > test? > > I ask because CIC statements that run in seconds on our staging machine can > take many hours on our production machine, when most of the access is just > reads, though those reads may have been part of a larger transaction that > did updates elsewhere. That sounds like the CIC is just blocked waiting for long-lived transactions to go away. There isn't much that changes to the sorting algorithm can do about that. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: track last known XLOG segment in control file
On Sun, Dec 13, 2015 at 4:24 AM, Tomas Vondrawrote: > > > On 12/12/2015 11:39 PM, Andres Freund wrote: > >> On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote: >> >>> On 12/12/2015 11:20 PM, Andres Freund wrote: >>> On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: > this is the second improvement proposed in the thread [1] about ext4 > data > loss issue. It adds another field to control file, tracking the last > known > WAL segment. This does not eliminate the data loss, just the silent > part of > it when the last segment gets lost (due to forgetting the rename, > deleting > it by mistake or whatever). The patch makes sure the cluster refuses to > start if that happens. > Uh, that's fairly expensive. In many cases it'll significantly increase the number of fsyncs. >>> >>> It should do exactly 1 additional fsync per WAL segment. Or do you think >>> otherwise? >>> >> >> Which is nearly doubling the number of fsyncs, for a good number of >> workloads. And it does so to a separate file, i.e. it's not like >> these writes and the flushes can be combined. In workloads where >> pg_xlog is on a separate partition it'll add the only source of >> fsyncs besides checkpoint to the main data directory. >> > > I also think so. > I doubt it will make any difference in practice, at least on reasonable > hardware (which you should have, if fsync performance matters to you). > > But some performance testing will be necessary, I don't expect this to go > in without that. It'd be helpful if you could describe the workload. > > I think to start with you can try to test pgbench read-write workload when the data fits in shared_buffers. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [patch] Proposal for \rotate in psql
2015-12-13 8:14 GMT+01:00 Pavel Stehule: > > > 2015-12-10 19:29 GMT+01:00 Pavel Stehule : > >> >> >> >>> postgres=# \crosstabview 4 +month label >>> >> >> Maybe using optional int order column instead label is better - then you >> can do sort on client side >> >> so the syntax can be "\crosstabview VCol [+/-]HCol [[+-]HOrderCol] >> > > here is patch - supported syntax: \crosstabview VCol [+/-]HCol [HOrderCol] > > Order column should to contains any numeric value. Values are sorted on > client side > fixed error messages > > Regards > > Pavel > > >> >> Regards >> >> Pavel >> >> >> >>> >>> ┌──┬───┬───┬┬───┬───┬───┬───┬───┬───┬───┬───┐ >>> │ customer │ led │ úno │ bře │ dub │ kvě │ >>> čen │ čec │ srp │ zář │ říj │ lis │ >>> >>> ╞══╪═══╪═══╪╪═══╪═══╪═══╪═══╪═══╪═══╪═══╪═══╡ >>> │ A** │ │ ││ │ >>> │ │ │ │ │ 13000 │ │ >>> │ A│ │ │ 8000 │ │ >>> │ │ │ │ │ │ │ >>> │ B* │ │ ││ │ >>> │ │ │ │ │ │ 3200 │ >>> │ B*** │ │ ││ │ >>> │ │ │ │ 26200 │ │ │ >>> │ B* │ │ ││ │ >>> │ │ 14000 │ │ │ │ │ >>> │ C** │ │ ││ 7740 │ >>> │ │ │ │ │ │ │ >>> │ C*** │ │ ││ │ >>> │ │ │ │ 26000 │ │ │ >>> │ C* │ │ ││ 12000 │ >>> │ │ │ │ │ │ │ >>> │ G*** │ 30200 │ 26880 │ 13536 │ 39360 │ 60480 │ >>> 54240 │ 44160 │ 16320 │ 29760 │ 22560 │ 20160 │ >>> │ G*** │ │ ││ │ │ >>> 25500 │ │ │ │ │ │ >>> │ G** │ │ ││ │ │ >>> 16000 │ │ │ │ │ │ >>> │ I* │ │ ││ │ >>> │ │ │ 27920 │ │ │ │ >>> │ i│ │ ││ 13500 │ >>> │ │ │ │ │ │ │ >>> │ n* │ │ ││ │ >>> │ │ 12600 │ │ │ │ │ >>> │ Q** │ │ ││ │ 16700 >>> │ │ │ │ │ │ │ >>> │ S*** │ │ ││ │ >>> │ │ 8000 │ │ │ │ │ >>> │ S*** │ │ ││ │ 5368 >>> │ │ │ │ │ │ │ >>> │ s*** │ │ │ 5000 │ 3200 │ >>> │ │ │ │ │ │ │ >>> >>> └──┴───┴───┴┴───┴───┴───┴───┴───┴───┴───┴───┘ >>> (18 rows) >>> >>> >>> >>> Regards >>> >>> Pavel >>> >>> > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: http://www.manitou-mail.org > Twitter: @DanielVerite > >>> >> > diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml new file mode 100644 index 47e9da2..a45f4b8 *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** lo_import 152801 *** 2461,2466 --- 2461,2555 + + \crosstabview [ colV [-|+]colH ] + + + Execute the current query buffer (like \g) and shows the results + inside a crosstab grid. The output column colV + becomes a vertical header and the output column + colH becomes a horizontal header. + The results for the other output columns are projected inside the grid. + + + + colV + and colH can indicate a + column position (starting at 1), or a column name. Normal case folding + and quoting rules apply on column names. By default, + colV is column 1 + and colH is column 2. + A query having only one output column cannot be viewed in crosstab, and + colH must differ from + colV. + + + + The vertical header, displayed as the leftmost column, + contains the set of all distinct values found in + column colV, in the order + of their first appearance in the query results. + + + The horizontal header, displayed as the first row, +
Re: [HACKERS] Parallel Aggregate
On Sat, Dec 12, 2015 at 8:42 AM, David Rowleywrote: > On 12 December 2015 at 04:00, Robert Haas wrote: >> >> I'd like to commit David Rowley's patch from the other thread first, >> and then deal with this one afterwards. The only thing I feel >> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC, >> for clarity. > > > I have addressed that in my local copy. I'm now just working on adding some > test code which uses the new infrastructure. Perhaps I'll just experiment > with the parallel aggregate stuff instead now. > Here I attached a patch with following changes, i feel it is better to include them as part of combine aggregate patch. 1. Added Missing outfuncs.c changes for newly added variables in Aggref structure 2. Keeping the aggregate function in final aggregate stage to do the final aggregate on the received tuples from all workers. Patch still needs a fix for correcting the explain plan output issue. postgres=# explain analyze verbose select count(*), sum(f1) from tbl where f1 % 100 = 0 group by f3; QUERY PLAN Finalize HashAggregate (cost=1853.75..1853.76 rows=1 width=12) (actual time=92.428..92.429 rows=1 loops=1) Output: pg_catalog.count(*), pg_catalog.sum((sum(f1))), f3 Group Key: tbl.f3 -> Gather (cost=0.00..1850.00 rows=500 width=12) (actual time=92.408..92.416 rows=3 loops=1) Output: f3, (count(*)), (sum(f1)) Regards, Hari Babu Fujitsu Australia set_ref_final_agg.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabling an index temporarily
> Tatsuo Ishii wrote: >>> Wouldn't something like: >>> >>> ALTER INDEX foo SET DISABLED; >>> >>> See more in line with our grammar? >> >> But this will affect other sessions, no? > > Not if it is used in a transaction that ends with a ROLLBACK, > but then you might as well use DROP INDEX, except > that DROP INDEX takes an access exclusive lock. I thought about this. Problem with the transaction rollback technique is, I would not be able to test with an application which runs multiple transactions. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Fri, Dec 11, 2015 at 4:54 PM, Michael Paquierwrote: > On Fri, Dec 11, 2015 at 4:27 AM, Andres Freund wrote: >> On 2015-12-10 18:36:32 +0100, Andres Freund wrote: >>> On 2015-12-10 12:19:12 -0500, Robert Haas wrote: >>> > > The real problem there imo isn't that the copy_relation_data() doesn't >>> > > deal with 0 block tables, but that ATExecSetTableSpace() doesn't have a >>> > > unlogged table specific codepath like heap_create_with_catalog() has. >>> > >>> > It looks to me like somewhere we need to do log_smgrcreate(..., >>> > INIT_FORKNUM) in the unlogged table case. >>> >>> Yes. >>> >>> > RelationCreateStorage() >>> > skips this for the main forknum of an unlogged table, which seems OK, >>> > but there's nothing that even attempts it for the init fork, which >>> > does not seem OK. >>> >>> We unfortunately can't trivially delegate that work to >>> RelationCreateStorage(). E.g. heap_create() documents that only the main >>> fork is created :( >>> >>> > I guess that logic should happen in >>> > ATExecSetTableSpace just after smgrcreate(dstrel, forkNum, false). >>> >>> Looks like it's the easiest place. >> >>> > > A second problem is that the smgrimmedsync() in copy_relation_data() >>> > > isn't called for the init fork of unlogged relations, even if it needs >>> > > to. >> >> Here's a patch doing that. It's not yet fully polished, but I wanted to >> get it out, because I noticed one thing: >> >> In ATExecSetTableSpace(), for !main forks, we currently call >> smgrcreate(), but not log_smgrcreate(). Even for PERSISTENT >> relations. That seems a bit odd to me. It currently seems to be without >> further consequence because, if there's actual data in the fork, we'll >> just create the relation in _mdfd_getseg(); or we can cope with the >> relation not being there. But to me that feels wrong. >> >> It seems better to do the log_smgrcreate() for RELPERSISTENCE_PERMANENT, >> not just INIT_FORKNUM. What do you guys think? > > This fixes the problem in my environment. > > + if (rel->rd_rel->relpersistence == > RELPERSISTENCE_PERMANENT || > + (rel->rd_rel->relpersistence == > RELPERSISTENCE_UNLOGGED && > +forkNum == INIT_FORKNUM)) > + log_smgrcreate(, forkNum); > There should be a XLogIsNeeded() check as well. Removing the check on > RELPERSISTENCE_UNLOGGED is fine as well... Not mandatory though :) > > +* The init fork for an unlogged relation in many respects has to be > +* treated the same as normal relation, changes need to be WAL > logged and > +* it needs to be synced to disk. > +*/ > + copying_initfork = relpersistence == RELPERSISTENCE_UNLOGGED && > + forkNum == INIT_FORKNUM; > Here as well just a check on INIT_FORKNUM would be fine. Should we consider this bug a 9.5 blocker? I feel uneasy with the fact of releasing a new major version knowing that we know some bugs on it, and this one is not cool so I have added it in the list of open items. We know the problem and there is a patch, so this is definitely solvable. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Dec 9, 2015 at 1:02 AM, Robert Haaswrote: > > On Thu, Dec 3, 2015 at 1:48 AM, Amit Kapila wrote: > > I think the way to address is don't add backend to Group list if it is > > not intended to update the same page as Group leader. For transactions > > to be on different pages, they have to be 32768 transactionid's far apart > > and I don't see much possibility of that happening for concurrent > > transactions that are going to be grouped. > > That might work. > Okay, attached patch group_update_clog_v3.patch implements the above. > >> My idea for how this could possibly work is that you could have a list > >> of waiting backends for each SLRU buffer page. > > > > Won't this mean that first we need to ensure that page exists in one of > > the buffers and once we have page in SLRU buffer, we can form the > > list and ensure that before eviction, the list must be processed? > > If my understanding is right, then for this to work we need to probably > > acquire CLogControlLock in Shared mode in addition to acquiring it > > in Exclusive mode for updating the status on page and performing > > pending updates for other backends. > > Hmm, that wouldn't be good. You're right: this is a problem with my > idea. We can try what you suggested above and see how that works. We > could also have two or more slots for groups - if a backend doesn't > get the lock, it joins the existing group for the same page, or else > creates a new group if any slot is unused. > I have implemented this idea as well in the attached patch group_slots_update_clog_v3.patch > I think it might be > advantageous to have at least two groups because otherwise things > might slow down when some transactions are rolling over to a new page > while others are still in flight for the previous page. Perhaps we > should try it both ways and benchmark. > Sure, I can do the benchmarks with both the patches, but before that if you can once check whether group_slots_update_clog_v3.patch is inline with what you have in mind then it will be helpful. Note - I have used attached patch transaction_burner_v1.patch (extracted from Jeff's patch upthread) to verify the transactions that fall into different page boundaries. group_update_clog_v3.patch Description: Binary data group_slots_update_clog_v3.patch Description: Binary data transactionid_burner_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication and multimaster
On 12 December 2015 at 02:24, Robert Haaswrote: > On Fri, Dec 11, 2015 at 5:16 AM, Andres Freund wrote: > > On 2015-12-11 18:12:55 +0800, Craig Ringer wrote: > >> On 10 December 2015 at 03:19, Robert Haas > wrote: > >> > On Sun, Dec 6, 2015 at 10:24 PM, Craig Ringer > >> > wrote: > >> > > * A way to securely make a libpq connection from a bgworker without > >> > messing > >> > > with passwords etc. Generate one-time cookies, sometihng like that. > >> > > >> > Why would you have the bgworker connect to the database via TCP > >> > instead of just doing whatever it wants to do directly? > > > >> pg_dump and pg_restore, mainly, for copying the initial database state. > > > > Well, you don't want to necessarily directly connect from the bgworker, > > but from processes started from a bgworker. I guess that's where a good > > bit of the Robert's confusion originated. > > That's part of it, yeah. I'm a little scared of this design. I mean, > I understand now why Craig wants to do this (thanks for explaining, > Craig!), but it seems like it's going to have a lot of the same > reliability problems that pg_upgrade does. Yes, and more. Especially when dealing with multiple upstream servers, etc. It's not very nice. I would very much prefer to have a better way to achieve the initial data sync, but at present I don't think there is any better approach that's even remotely practical. I'm not saying there's a better way to get the functionality Yup. That's the problem. > but it's pretty obvious that depending on tools other than the server > itself, and in particular > pg_dump, vastly increases the failure surface area. It's not too bad to find pg_dump, though we landed up not being able to re-use find_other_exec for various reasons I'll have to try to dig out of the cruftier corners of my memory. It has a fairly sane interface too. Things get hairy when you want to do things like "give me all the upstream's non-table objects, then give me [this set of table definitions]"... then you go and sync the data from an exported snapshot using COPY, then finish up by restoring the constraints for the set of tables you dumped. Being able to access pg_dump and pg_restore's dependency resolution logic, object dumping routines, etc from regular SQL and from the SPI would be wonderful. I believe the main complaints about doing that when it was discussed in the past were issues with downgrading. A pg_get_tabledef(...) in 9.6 might emit keywords etc that a 9.2 server wouldn't understand, and the way we currently solve this is to require that you run 9.2's pg_dump against the 9.6 server to get a 9.2-compatible dump. So if we had pg_get_blahdef functions we'd still need external versions, so why bother having them? The alternative is to have all the get_blahdef functions accept a param for server version compatibility, which would work but burden future servers with knowledge about older versions' features and corresponding code cruft for some extended period of time. So it's gone nowhere to date. For that matter it's not clear that pg_get_blahdef functions would be the right solution, but I can't see directly poking around in the catalogs and basically re-implementing pg_dump being OK either. So what else could we do? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Sat, Dec 12, 2015 at 11:37 AM, Noah Mischwrote: > On Fri, Dec 11, 2015 at 09:34:34PM +0900, Michael Paquier wrote: >> On Fri, Dec 11, 2015 at 8:48 PM, Alvaro Herrera >> wrote: >> > Michael Paquier wrote: >> >> On Fri, Dec 11, 2015 at 5:35 AM, Alvaro Herrera >> >> wrote: >> >> I guess that to complete your idea we could allow PostgresNode to get >> >> a custom name for its log file through an optional parameter like >> >> logfile => 'myname' or similar. And if nothing is defined, process >> >> falls back to applname. So this would give the following: >> >> ${testname}_${logfile}.log >> > >> > Sure. I don't think we should the name only for the log file, though, >> > but also for things like the "## " informative messages we print here >> > and there. That would make the log file simpler to follow. Also, I'm >> > not sure about having it be optional. (TBH I'm not sure about applname >> > either; why do we keep that one?) >> >> OK, so let's do this: the node name is a mandatory argument of >> get_new_node, which is passed to "new PostgresNode" like the port and >> the host, and it is then used in the log file name as well as in the >> information messages you are mentioning. That's a patch simple enough. >> Are you fine with this approach? > > Sounds reasonable so far. OK, done so. >> Regarding the application name, I still think it is useful to have it >> though. pg_rewind should actually use it, and the other patch adding >> the recovery routines will use it. > > Using the application_name connection parameter is fine, but I can't think of > a reason to set it to "node_".$node->port instead of $node->name. And I can't > think of a use for the $node->applname field once you have $node->name. What > use case would benefit? I have the applname stuff, and updated the log messages to use the node name for clarity. The patch to address those points is attached. Regards, -- Michael 20151212_tap_node_name.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabling an index temporarily
Tatsuo Ishii wrote: >> Wouldn't something like: >> >> ALTER INDEX foo SET DISABLED; >> >> See more in line with our grammar? > > But this will affect other sessions, no? Not if it is used in a transaction that ends with a ROLLBACK, but then you might as well use DROP INDEX, except that DROP INDEX takes an access exclusive lock. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Sat, Dec 12, 2015 at 12:00 AM, Tom Lanewrote: > Greg Stark writes: >> So I don't think it makes sense to hold up improvements today hoping >> for something like this. > > Yeah, it's certainly a wishlist item rather than something that should > block shorter-term improvements. OK, hence it seems that the next move is to move on with Thomas' stuff. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On 2015-12-11 16:54:45 +0900, Michael Paquier wrote: > + if (rel->rd_rel->relpersistence == > RELPERSISTENCE_PERMANENT || > + (rel->rd_rel->relpersistence == > RELPERSISTENCE_UNLOGGED && > +forkNum == INIT_FORKNUM)) > + log_smgrcreate(, forkNum); > There should be a XLogIsNeeded() check as well. RelationCreateStorage(), which creates the main fork, has no such check. Seems better to stay symmetric, even if it's not strictly necessary. > Removing the check on RELPERSISTENCE_UNLOGGED is fine as well... Not > mandatory though :) I've gone back and forth on this, I can't really convince myself either way. We might end up reusing init forks for 'global temporary tables' or somesuch, so being a bit stricter seems slight better. Anyway, it's of no actual consequence for now. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sun, Dec 6, 2015 at 4:25 PM, Peter Geogheganwrote: > Maybe we should consider trying to get patch 0002 (the memory > pool/merge patch) committed first, something Greg Stark suggested > privately. That might actually be an easier way of integrating this > work, since it changes nothing about the algorithm we use for merging > (it only improves memory locality), and so is really an independent > piece of work (albeit one that makes a huge overall difference due to > the other patches increasing the time spent merging in absolute terms, > and especially as a proportion of the total). I have a question about the terminology used in this patch. What is a tuple proper? What is it in contradistinction to? I would think that a tuple which is located in its own palloc'ed space is the "proper" one, leaving a tuple allocated in the bulk memory pool to be called...something else. I don't know what the non-judgmental-sounding antonym of postpositive "proper" is. Also, if I am reading this correctly, when we refill a pool from a logical tape we still transform each tuple as it is read from the disk format to the memory format. This inflates the size quite a bit, at least for single-datum tuples. If we instead just read the disk format directly into the pool, and converted them into the in-memory format when each tuple came due for the merge heap, would that destroy the locality of reference you are seeking to gain? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabling an index temporarily
On 12/12/2015 11:42, Albe Laurenz wrote: > Tatsuo Ishii wrote: >>> Wouldn't something like: >>> >>> ALTER INDEX foo SET DISABLED; >>> >>> See more in line with our grammar? >> >> But this will affect other sessions, no? > > Not if it is used in a transaction that ends with a ROLLBACK, > but then you might as well use DROP INDEX, except > that DROP INDEX takes an access exclusive lock. > > Yours, > Laurenz Albe > Oleg and Teodor announced some time ago an extension for this exact use case, see http://www.postgresql.org/message-id/pine.lnx.4.64.0910062354510.6...@sn.sai.msu.ru This also has the advantage of not needing an exclusive lock on the index. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On 2015-12-12 20:49:52 +0900, Michael Paquier wrote: > Should we consider this bug a 9.5 blocker? I don't think so. But either way, I'm right now working on getting it fixed anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql tab completion bug for ALL IN TABLESPACE
Hi all, I just bumped into the following thing while looking again at Thomas' patch for psql tab completion: --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1040,7 +1040,7 @@ psql_completion(const char *text, int start, int end) pg_strcasecmp(prev5_wd, "IN") == 0 && pg_strcasecmp(prev4_wd, "TABLESPACE") == 0 && pg_strcasecmp(prev2_wd, "OWNED") == 0 && -pg_strcasecmp(prev4_wd, "BY") == 0) +pg_strcasecmp(prev_wd, "BY") == 0) { COMPLETE_WITH_QUERY(Query_for_list_of_roles); This should be backpatched, attached is the needed patch. Regards, -- Michael 20151212_psql_alltblspc.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Wed, Dec 9, 2015 at 8:17 PM, Thomas Munro wrote: > Thanks for looking at this Michael. It's probably not much fun to > review! Here is a new version with that bit removed. More responses > inline below. Could this patch be rebased? There are now conflicts with 8b469bd7 and it does not apply cleanly. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On 2015-12-12 12:52:21 +0100, Andres Freund wrote: > On 2015-12-12 20:49:52 +0900, Michael Paquier wrote: > > Should we consider this bug a 9.5 blocker? > > I don't think so. But either way, I'm right now working on getting it > fixed anyway. And done. Took a bit longer than predicted - I had to adjust my test scripts to cope with < 9.4 not having tablespace mapping. Rather annoying. It'd be really convenient if tablespaces relative to the main data directory were supported... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
Michael Paquierwrites: > Should we consider this bug a 9.5 blocker? I feel uneasy with the fact > of releasing a new major version knowing that we know some bugs on it, > and this one is not cool so I have added it in the list of open items. > We know the problem and there is a patch, so this is definitely > solvable. FWIW, general project policy has been that pre-existing bugs are not release blockers as such. However, if we have a fix in hand that we would like to get some field testing on, it's certainly desirable to push it out first in a beta or RC release, rather than having it first hit the field in stable-branch updates. So we might delay a beta/RC to make sure such a fix is available for testing. But that's not quite the same thing as a release blocker. To my mind, a release blocker is something that would make it undesirable for users to upgrade to the new release compared to the previous branch, so bugs the branches have in common are never that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabling an index temporarily
> Oleg and Teodor announced some time ago an extension for this exact use > case, see > http://www.postgresql.org/message-id/pine.lnx.4.64.0910062354510.6...@sn.sai.msu.ru > > This also has the advantage of not needing an exclusive lock on the index. Thanks for the info. I will try out them. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
Noah, Robert, All On 2015-12-11 11:20:21 -0500, Robert Haas wrote: > On Thu, Dec 10, 2015 at 9:32 AM, Robert Haaswrote: > >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch. > > > > So let's do that, then. > > Who is going to take care of this? Here's two patches: 1) The fix to SetOffsetVacuumLimit(). I've tested this by introducing a probabilistic "return false;" to find_multixact_start(), and torturing postgres by burning through billions of multixactids of various sizes. Behaves about as bad^Wgood as without the induced failures; before the patch there were moments of spurious warnings/errors when ->offsetStopLimit was out of whack. 2) A subset of the comment changes from Noah's repository. Some of the comment changes didn't make sense without the removal SlruScanDirCbFindEarliest(), a small number of other changes I couldn't fully agree with. Noah, are you ok with pushing that subset of your changes? Is "Slightly edited subset of a larger patchset by Noah Misch" an OK attribution? Noah, on a first glance I think e50cca0ae ("Remove the SlruScanDirCbFindEarliest() test from TruncateMultiXact().") is a good idea. So I do encourage you to submit that as a separate patch. Regards, Andres >From d26d41a96a7fc8dbe9827d55837875790b21ca1b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sat, 12 Dec 2015 17:57:21 +0100 Subject: [PATCH 1/2] Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure. In case find_multixact_start() failed SetOffsetVacuumLimit() installed 0 into MultiXactState->offsetStopLimit. Unlike oldestOffset the to-be-installed value was not restored in the error branch. Luckily there are no known cases where find_multixact_start() will return an error in 9.5 and above. But if the bug is triggered, e.g. due to filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId() could continue allocating mxids even if close to a wraparound, or it could erroneously stop allocating mxids, even if no wraparound is looming. Luckily the wrong value would be corrected the next time SetOffsetVacuumLimit() is called, or by a restart. Reported-By: Noah Misch, although this is not his preferred fix Backpatch: 9.5, where the bug was introduced as part of 4f627f --- src/backend/access/transam/multixact.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index b66a2b6..d2619bd 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2552,6 +2552,7 @@ SetOffsetVacuumLimit(void) bool oldestOffsetKnown = false; bool prevOldestOffsetKnown; MultiXactOffset offsetStopLimit = 0; + MultiXactOffset prevOffsetStopLimit; /* * NB: Have to prevent concurrent truncation, we might otherwise try to @@ -2566,6 +2567,7 @@ SetOffsetVacuumLimit(void) nextOffset = MultiXactState->nextOffset; prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown; prevOldestOffset = MultiXactState->oldestOffset; + prevOffsetStopLimit = MultiXactState->offsetStopLimit; Assert(MultiXactState->finishedStartup); LWLockRelease(MultiXactGenLock); @@ -2633,11 +2635,13 @@ SetOffsetVacuumLimit(void) { /* * If we failed to get the oldest offset this time, but we have a - * value from a previous pass through this function, use the old value - * rather than automatically forcing it. + * value from a previous pass through this function, use the old + * values rather than automatically forcing an emergency autovacuum + * cycle again. */ oldestOffset = prevOldestOffset; oldestOffsetKnown = true; + offsetStopLimit = prevOffsetStopLimit; } /* Install the computed values */ -- 2.6.0.rc3 >From addafc27e7b89187a3f55a2cbce136d58722c97b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sat, 12 Dec 2015 17:53:41 +0100 Subject: [PATCH 2/2] Improve/Fix comments around multixact truncation. My commits 4f627f8 ("Rework the way multixact truncations work.") and aa29c1c ("Remove legacy multixact truncation support.") missed updating a number of comments. Fix that. Additionally improve accuracy of a few of the added comments. Reported-By: Noah Misch Author: Slightly edited subset of a larger patchset by Noah Misch Backpatch: 9.5, which is the first branch to contain the above commits --- src/backend/access/heap/README.tuplock | 6 ++ src/backend/access/transam/multixact.c | 36 +++--- src/backend/access/transam/xlog.c | 4 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock index 10b8d78..f845958 100644 --- a/src/backend/access/heap/README.tuplock +++ b/src/backend/access/heap/README.tuplock @@ -102,10 +102,8 @@ the MultiXacts in them are no longer of
Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
On Fri, Dec 11, 2015 at 5:35 PM, Peter Geogheganwrote: > On Fri, Dec 11, 2015 at 2:26 PM, Corey Huinker > wrote: > > Sure, the machine we called "ninefivealpha", which incidentally, failed > to > > find a single bug in alpha2 thru beta2, is currently idle, and concurrent > > index creation times are a bugbear around these parts. Can somebody, > either > > in this thread or privately, outline what sort of a test they'd like to > see? > > Any kind of CREATE INDEX CONCURRENTLY test, before and after. > > I looked at a simple, random int4 column. That seems like a good case > to focus on, since there isn't too much other overhead. I think I > performed my test on an unlogged table, to make sure other overhead > was even further minimized. > > -- > Peter Geoghegan > What, if any, other load should be placed on the underlying table during the test? I ask because CIC statements that run in seconds on our staging machine can take many hours on our production machine, when most of the access is just reads, though those reads may have been part of a larger transaction that did updates elsewhere.
Re: [HACKERS] Bootstrap DATA is a pita
Mark Dilgerwrites: >> On Dec 11, 2015, at 2:54 PM, Caleb Welton wrote: >> Compare: >> CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal >> STRICT AS 'lo_export' WITH (OID=765); >> >> DATA(insert OID = 765 ( lo_export PGNSP PGUID 12 1 0 0 0 f f f >> f t f v u 2 0 23 "26 25" _null_ _null_ _null_ _null_ _null_ lo_export _null_ >> _null_ _null_ )); > I would like to hear more about this idea. Are you proposing that we use > something > like the above CREATE FUNCTION format to express what is currently being > expressed > with DATA statements? Yes, that sort of idea has been kicked around some already, see the archives. > That is an interesting idea, though I don't know what exactly > that would look like. If you want to forward this idea, I'd be eager to hear > your thoughts. > If not, I'll try to make progress with my idea of tab delimited files and > such (or really, > Alvaro's idea of csv files that I only slightly corrupted). Personally I would like to see both approaches explored. Installing as much as we can via SQL commands is attractive for a number of reasons; but there is going to be an irreducible minimum amount of stuff that has to be inserted by something close to the current bootstrapping process. (And I'm not convinced that that "minimum amount" is going to be very small...) So it's not impossible that we'd end up accepting *both* types of patches, one to do more in the post-bootstrap SQL world and one to make the bootstrap data notation less cumbersome. In any case it would be useful to push both approaches forward some more before we make any decisions between them. BTW, there's another thing I'd like to see improved in this area, which is a problem already but will get a lot worse if we push more work into the post-bootstrap phase of initdb. That is that the post-bootstrap phase is both inefficient and impossible to debug. If you've ever had a failure there, you'll have seen that the backend spits out an entire SQL script and says there's an error in it somewhere; that's because it gets the whole per-stage script as one submission. (Try introducing a syntax error somewhere in information_schema.sql, and you'll see what I mean.) Breaking the stage scripts down further would help, but that is unattractive because each one requires a fresh backend startup/shutdown, including a full checkpoint. I'd like to see things rejiggered so that there's only one post-bootstrap standalone backend session that performs all the steps, but initdb feeds it just one SQL command at a time so that errors are better localized. That should both speed up initdb noticeably and make debugging easier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: add pg_current_xlog_flush_location function
Hi, attached is a patch adding a function pg_current_xlog_flush_location(), which proved quite useful when investigating the ext4 data loss bug. It's mostly what was already sent to that thread, except for docs that were missing in the initial version. I'll put this into 2016-01 CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 60b9a09..0d67b82 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16755,6 +16755,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_create_restore_point +pg_current_xlog_flush_location + + pg_current_xlog_insert_location @@ -16811,6 +16814,13 @@ SELECT set_config('log_statement_stats', 'off', false); +pg_current_xlog_flush_location() + + pg_lsn + Get current transaction log flush location + + + pg_current_xlog_insert_location() pg_lsn @@ -16943,13 +16953,14 @@ postgres=# select pg_start_backup('label_goes_here'); pg_current_xlog_location displays the current transaction log write location in the same format used by the above functions. Similarly, pg_current_xlog_insert_location displays the current transaction log -insertion point. The insertion point is the logical end -of the transaction log -at any instant, while the write location is the end of what has actually -been written out from the server's internal buffers. The write location -is the end of what can be examined from outside the server, and is usually +insertion point and pg_current_xlog_flush_location displays the +current transaction log flush point. The insertion point is the logical +end of the transaction log at any instant, while the write location is the end of +what has actually been written out from the server's internal buffers and flush +location is the location guaranteed to be written to durable storage. The write +location is the end of what can be examined from outside the server, and is usually what you want if you are interested in archiving partially-complete transaction log -files. The insertion point is made available primarily for server +files. The insertion and flush points are made available primarily for server debugging purposes. These are both read-only operations and do not require superuser permissions. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 71fc8ff..40a17e8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10632,6 +10632,19 @@ GetXLogWriteRecPtr(void) } /* + * Get latest WAL flush pointer + */ +XLogRecPtr +GetXLogFlushRecPtr(void) +{ + SpinLockAcquire(>info_lck); + LogwrtResult = XLogCtl->LogwrtResult; + SpinLockRelease(>info_lck); + + return LogwrtResult.Flush; +} + +/* * Returns the redo pointer of the last checkpoint or restartpoint. This is * the oldest point in WAL that we still need, if we have to restart recovery. */ diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 329bb8c..35c581d 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -195,7 +195,7 @@ pg_current_xlog_location(PG_FUNCTION_ARGS) } /* - * Report the current WAL insert location (same format as pg_start_backup etc) + * Report the current WAL flush location (same format as pg_start_backup etc) * * This function is mostly for debugging purposes. */ @@ -216,6 +216,27 @@ pg_current_xlog_insert_location(PG_FUNCTION_ARGS) } /* + * Report the current WAL insert location (same format as pg_start_backup etc) + * + * This function is mostly for debugging purposes. + */ +Datum +pg_current_xlog_flush_location(PG_FUNCTION_ARGS) +{ + XLogRecPtr current_recptr; + + if (RecoveryInProgress()) + ereport(ERROR, +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("recovery is in progress"), + errhint("WAL control functions cannot be executed during recovery."))); + + current_recptr = GetXLogFlushRecPtr(); + + PG_RETURN_LSN(current_recptr); +} + +/* * Report the last WAL receive location (same format as pg_start_backup etc) * * This is useful for determining how much of WAL is guaranteed to be received diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 790ca66..985291d 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -235,6 +235,7 @@ extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream); extern XLogRecPtr GetXLogReplayRecPtr(TimeLineID *replayTLI); extern XLogRecPtr GetXLogInsertRecPtr(void); extern XLogRecPtr GetXLogWriteRecPtr(void); +extern XLogRecPtr GetXLogFlushRecPtr(void); extern bool RecoveryIsPaused(void);
Re: [HACKERS] Disabling an index temporarily
On 11 December 2015 at 22:03, Joshua D. Drakewrote: > On 12/11/2015 06:25 PM, Tatsuo Ishii wrote: > >> What about inventing a new SET command something like: >> >> SET disabled_index to >> >> This adds to "disabled index list". The disabled index >> list let the planner to disregard the indexes in the list. >> >> SET enabled_index to >> >> This removes from the disabled index list. >> >> SHOW disabled_index >> >> This shows the content of the disabled index list. > > > Wouldn't something like: > > ALTER INDEX foo SET DISABLED; > > See more in line with our grammar? > > I assume the index is only disabled as far as the planner is concerned and > all updates/inserts/deletes will still actually update the index > appropriately? > BTW, you can do that today with UPDATE pg_index SET indisvalid = false WHERE indexrelid = 'indexname'::regclass; -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
On Sat, Dec 12, 2015 at 9:48 AM, Corey Huinkerwrote: > I ask because CIC statements that run in seconds on our staging machine can > take many hours on our production machine, when most of the access is just > reads, though those reads may have been part of a larger transaction that > did updates elsewhere. I don't think it's necessary to simulate any other load. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Greg Stark writes: > There may be other errors that would be surprising for Tom or Robert. What > I did with the string argument fuzzer was printed counts for each sqlstate > for the errors and watched for errors that only occurred occasionally or > didn't make sense to me. > > Also, do you have any timeouts? I currently set statement_timeout to 1s to avoid wasting time letting postgres crunch numbers. Less than 0.5% of the queries run into this timeout. > Do you have any stats on how long these queries are taking to plan? > What's the longest query to plan you've found? No, I'm currently not logging timing spects. The schema I let the instances log into is part of the repo[1]. > Do you have coverage data for the corpus? I do have some older numbers for line coverage from before the recent grammar extension: | revision | overall | parser | |--+-+| | a4c1989 |26.0 | 20.4 | regards, Andreas Footnotes: [1] https://github.com/anse1/sqlsmith/blob/master/log.sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-12-09 15:09, Alexander Korotkov wrote: Patch was rebased against current master. Any notes about current version of patch? It would be nice to commit it and continue work on other parts of am extendability. The rebase seems broken, there are things missing in this version of the patch (for example the validation functions). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY
On 12/11/15 4:12 PM, Stephen Frost wrote: > As with ACLs, the DROP OWNED BY caller must have permission to modify > the policy or a WARNING is thrown and no change is made to the policy. That warning doesn't tell the user anything about how to fix the situation or whether or why the situation is a problem and what to do about it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On 2015-11-15 16:24:24 -0500, Robert Haas wrote: > I think what we should do about the buffer locks is polish up this > patch and get it applied: > > http://www.postgresql.org/message-id/20150907175909.gd5...@alap3.anarazel.de > > I think it needs to be adapted to use predefined constants for the > tranche IDs instead of hardcoded values, maybe based on the attached > tranche-constants.patch. Here's two patches doing that. The first is an adaption of your constants patch, using an enum and also converting xlog.c's locks. The second is the separation into distinct tranches. One thing to call out is that an "oversized" s_lock can now make BufferDesc exceed 64 bytes, right now that's just the case when it's larger than 4 bytes. I'm not sure if that's cause for real concern, given that it's not very concurrent or ancient platforms where that's the case. http://archives.postgresql.org/message-id/20150915020625.GI9666%40alap3.anarazel.de would alleviate that concern again, as it collapses flags, usage_count, buf_hdr_lock and refcount into one 32 bit int... Greetings, Andres Freund >From 5f10d977f285109df16bfef6b79456564251aa54 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Sat, 12 Dec 2015 18:41:59 +0100 Subject: [PATCH 1/2] Define enum of builtin LWLock tranches. It's a bit cumbersome having to allocate tranche IDs with LWLockNewTrancheId() because the returned value needs to be shared between backends, which all need to call LWLockRegisterTranche(), somehow. For builtin tranches we can easily do better, and simple pre-define tranche IDs for those. This is motivated by an upcoming patch adding further builtin tranches. Author: Robert Haas and Andres Freund Discussion: CA+TgmoYciHS4FuU2rYNt8bX0+7ZcNRBGeO-L20apcXTeo7=4...@mail.gmail.com --- src/backend/access/transam/xlog.c | 10 +++--- src/backend/storage/lmgr/lwlock.c | 7 --- src/include/storage/lwlock.h | 11 +++ 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 71fc8ff..147fd53 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -512,7 +512,6 @@ typedef struct XLogCtlInsert */ WALInsertLockPadded *WALInsertLocks; LWLockTranche WALInsertLockTranche; - int WALInsertLockTrancheId; } XLogCtlInsert; /* @@ -4653,7 +4652,7 @@ XLOGShmemInit(void) /* Initialize local copy of WALInsertLocks and register the tranche */ WALInsertLocks = XLogCtl->Insert.WALInsertLocks; - LWLockRegisterTranche(XLogCtl->Insert.WALInsertLockTrancheId, + LWLockRegisterTranche(LWTRANCHE_WAL_INSERT, >Insert.WALInsertLockTranche); return; } @@ -4677,17 +4676,14 @@ XLOGShmemInit(void) (WALInsertLockPadded *) allocptr; allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS; - XLogCtl->Insert.WALInsertLockTrancheId = LWLockNewTrancheId(); - XLogCtl->Insert.WALInsertLockTranche.name = "WALInsertLocks"; XLogCtl->Insert.WALInsertLockTranche.array_base = WALInsertLocks; XLogCtl->Insert.WALInsertLockTranche.array_stride = sizeof(WALInsertLockPadded); - LWLockRegisterTranche(XLogCtl->Insert.WALInsertLockTrancheId, >Insert.WALInsertLockTranche); + LWLockRegisterTranche(LWTRANCHE_WAL_INSERT, >Insert.WALInsertLockTranche); for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++) { - LWLockInitialize([i].l.lock, - XLogCtl->Insert.WALInsertLockTrancheId); + LWLockInitialize([i].l.lock, LWTRANCHE_WAL_INSERT); WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr; } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index b13ebc6..84691df 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -445,7 +445,7 @@ CreateLWLocks(void) /* Initialize all LWLocks in main array */ for (id = 0, lock = MainLWLockArray; id < numLocks; id++, lock++) - LWLockInitialize(>lock, 0); + LWLockInitialize(>lock, LWTRANCHE_MAIN); /* * Initialize the dynamic-allocation counters, which are stored just @@ -457,7 +457,7 @@ CreateLWLocks(void) LWLockCounter = (int *) ((char *) MainLWLockArray - 3 * sizeof(int)); LWLockCounter[0] = NUM_FIXED_LWLOCKS; LWLockCounter[1] = numLocks; - LWLockCounter[2] = 1; /* 0 is the main array */ + LWLockCounter[2] = LWTRANCHE_FIRST_USER_DEFINED; } if (LWLockTrancheArray == NULL) @@ -466,12 +466,13 @@ CreateLWLocks(void) LWLockTrancheArray = (LWLockTranche **) MemoryContextAlloc(TopMemoryContext, LWLockTranchesAllocated * sizeof(LWLockTranche *)); + Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED); } MainLWLockTranche.name = "main"; MainLWLockTranche.array_base = MainLWLockArray; MainLWLockTranche.array_stride = sizeof(LWLockPadded); - LWLockRegisterTranche(0, ); + LWLockRegisterTranche(LWTRANCHE_MAIN, ); } /* diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h index 4653e09..9d22273
Re: [HACKERS] Using quicksort for every external sort run
On Sat, Dec 12, 2015 at 12:41 AM, Greg Starkwrote: > On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan wrote: >> On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: >> >> I guess you mean insertion sort. What's the theoretical justification >> for the change? > > Well my thinking was that hard coding a series of comparisons would be > faster than a loop doing a O(n^2) algorithm even for small constants. > And sort networks are perfect for hard coded sorts because they do the > same comparisons regardless of the results of previous comparisons so > there are no branches. And even better the comparisons are as much as > possible independent of each other -- sort networks are typically > measured by the depth which assumes any comparisons between disjoint > pairs can be done in parallel. Even if it's implemented in serial the > processor is probably parallelizing some of the work. > > So I implemented a quick benchmark outside Postgres based on sorting > actual SortTuples with datum1 defined to be random 64-bit integers (no > nulls). Indeed the sort networks perform faster on average despite > doing more comparisons. That makes me think the cpu is indeed doing > some of the work in parallel. The open coded version you shared bloats the code by 37kB, I'm not sure it is pulling it's weight, especially given relatively heavy comparators. A quick index creation test on int4's profiled with perf shows about 3% of CPU being spent in the code being replaced. Any improvement on that is going to be too small to easily quantify. As the open coding doesn't help with eliminating control flow dependencies, so my idea is to encode the sort network comparison order in an array and use that to drive a simple loop. The code size would be pretty similar to insertion sort and the loop overhead should mostly be hidden by the CPU OoO machinery. Probably won't help much, but would be interesting and simple enough to try out. Can you share you code for the benchmark so I can try it out? Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sat, Dec 12, 2015 at 7:42 PM, Ants Aasmawrote: > As the open coding doesn't help with eliminating control flow > dependencies, so my idea is to encode the sort network comparison > order in an array and use that to drive a simple loop. The code size > would be pretty similar to insertion sort and the loop overhead should > mostly be hidden by the CPU OoO machinery. Probably won't help much, > but would be interesting and simple enough to try out. Can you share > you code for the benchmark so I can try it out? I can. But the further results showing the number of comparisons is higher than for insertion sort have dampened my enthusiasm for the change. I'm assuming that even if it's faster for a simple integer or sort it'll be much slower for anything that requires calling out to the datatype comparator. I also hadn't actually measured what percentage of the sort was being spent in the insertion sort. I had guessed it would be higher. The test is attached. qsort_tuple.c is copied from tuplesort (with the ifdef for NOPRESORT added, but you could skip that if you want.). Compile with something like: gcc -DNOPRESORT -O3 -DCOUNTS -Wall -Wno-unused-function simd-sort-test.c -- greg #include #include #include #include #include #include #include #ifdef COUNTS unsigned nswaps; unsigned ncmps; #define countswap (nswaps++) #define countcmp (ncmps++) #else #define countswap ((void)0) #define countcmp ((void)0) #endif typedef void *Tuplesortstate; typedef long long int Datum; typedef int bool; typedef struct { void *tuple; /* the tuple proper */ Datum datum1; /* value of first key column */ bool isnull1; /* is first key column NULL? */ int tupindex; /* see notes above */ } SortTuple; typedef SortTuple elem_t; typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b, Tuplesortstate *state); typedef struct SortSupportData *SortSupport; static int comparator(Datum a, Datum b, SortSupport ssup) { if (b > a) return -1; else if (b < a) return 1; else return 0; }; struct SortSupportData { bool ssup_nulls_first; bool ssup_reverse; int (*comparator)(Datum,Datum,SortSupport); }; struct SortSupportData ssup = {0, 0, comparator}; static inline int ApplySortComparator(Datum datum1, bool isNull1, Datum datum2, bool isNull2, SortSupport ssup) { int compare; countcmp; if (isNull1) { if (isNull2) compare = 0; /* NULL "=" NULL */ else if (ssup->ssup_nulls_first) compare = -1; /* NULL "<" NOT_NULL */ else compare = 1; /* NULL ">" NOT_NULL */ } else if (isNull2) { if (ssup->ssup_nulls_first) compare = 1; /* NOT_NULL ">" NULL */ else compare = -1; /* NOT_NULL "<" NULL */ } else { compare = (*ssup->comparator) (datum1, datum2, ssup); if (ssup->ssup_reverse) compare = -compare; } return compare; } #define CHECK_FOR_INTERRUPTS() do {} while (0) #define Min(a,b) ((a)<(b)?(a):(b)) #define Max(a,b) ((a)<(b)?(b):(a)) #include "qsort_tuple.c" #define mycmp(a,b) \ (ApplySortComparator(list[a].datum1, list[a].isnull1, \ list[b].datum1, list[b].isnull1, \ )) #define myswap(a,b) \ do { \ elem_t _tmp;\ _tmp = list[a];\ list[a] = list[b]; \ list[b] = _tmp;\ countswap; \ } while (0) #define myswapif(a,b)\ do { \ if (mycmp(a,b) > 0) \ myswap(a,b);\ } while (0) static int insertion_ok(unsigned N) { return N<1000; } static void insertion(elem_t *list, unsigned N) { if (N > 1000) { printf("insertion sort not feasible for large N\n"); exit(1); } for (unsigned pm = 1; pm < N; pm++) for (unsigned pl = pm; pl && mycmp(pl-1, pl) > 0; pl--) myswap(pl, pl-1); } static int sort_networks_ok(unsigned N) { return N<=8; } static void sort_networks(elem_t *list, unsigned N) { if (N > 8) { printf("sort_networks only implemented for N in 0..8\n"); exit(1); } switch(N) { case 0: case 1: break; case 2: myswapif(0,1); break; case 3: myswapif(0,1); myswapif(1,2); myswapif(0,1); break; case 4: myswapif(0,1); myswapif(2,3); myswapif(1,3); myswapif(0,2); myswapif(1,2); break; case 5: myswapif(1,2); myswapif(3,4); myswapif(1,3); myswapif(0,2); myswapif(2,4); myswapif(0,3); myswapif(0,1); myswapif(2,3); myswapif(1,2); break; case 6: myswapif(0,1); myswapif(2,3); myswapif(4,5); myswapif(0,2); myswapif(3,5); myswapif(1,4); myswapif(0,1); myswapif(2,3); myswapif(4,5); myswapif(1,2); myswapif(3,4); myswapif(2,3); break; case 7: myswapif(1,2); myswapif(3,4); myswapif(5,6); myswapif(0,2); myswapif(4,6); myswapif(3,5); myswapif(2,6); myswapif(1,5); myswapif(0,4); myswapif(2,5); myswapif(0,3); myswapif(2,4); myswapif(1,3); myswapif(0,1); myswapif(2,3); myswapif(4,5); break; case 8: myswapif(0, 1); myswapif(2, 3); myswapif(4, 5); myswapif(6, 7); myswapif(0, 3); myswapif(1, 2); myswapif(4, 7); myswapif(5, 6);
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Sat, Dec 12, 2015 at 1:17 PM, and...@anarazel.dewrote: > On 2015-11-15 16:24:24 -0500, Robert Haas wrote: >> I think what we should do about the buffer locks is polish up this >> patch and get it applied: >> >> http://www.postgresql.org/message-id/20150907175909.gd5...@alap3.anarazel.de >> >> I think it needs to be adapted to use predefined constants for the >> tranche IDs instead of hardcoded values, maybe based on the attached >> tranche-constants.patch. > > Here's two patches doing that. The first is an adaption of your > constants patch, using an enum and also converting xlog.c's locks. The > second is the separation into distinct tranches. Personally, I prefer the #define approach to the enum, but I can live with doing it this way. Other than that, I think these patches look good, although if it's OK with you I would like to make a pass over the comments and the commit messages which seem to me that they could benefit from a bit of editing (but not much substantive change). > One thing to call out is that an "oversized" s_lock can now make > BufferDesc exceed 64 bytes, right now that's just the case when it's > larger than 4 bytes. I'm not sure if that's cause for real concern, > given that it's not very concurrent or ancient platforms where that's > the case. > http://archives.postgresql.org/message-id/20150915020625.GI9666%40alap3.anarazel.de > would alleviate that concern again, as it collapses flags, usage_count, > buf_hdr_lock and refcount into one 32 bit int... I don't think that would be worth worrying about even if we didn't have a plan in mind that would make it go away again, and even less so given that we do have such a plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Aggregate
On Fri, Dec 11, 2015 at 4:42 PM, David Rowleywrote: > On 12 December 2015 at 04:00, Robert Haas wrote: >> I'd like to commit David Rowley's patch from the other thread first, >> and then deal with this one afterwards. The only thing I feel >> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC, >> for clarity. > > > I have addressed that in my local copy. I'm now just working on adding some > test code which uses the new infrastructure. Excellent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY
On Sat, Dec 12, 2015 at 4:35 PM, Stephen Frostwrote: > * Peter Eisentraut (pete...@gmx.net) wrote: >> On 12/11/15 4:12 PM, Stephen Frost wrote: >> > As with ACLs, the DROP OWNED BY caller must have permission to modify >> > the policy or a WARNING is thrown and no change is made to the policy. >> >> That warning doesn't tell the user anything about how to fix the >> situation or whether or why the situation is a problem and what to do >> about it. > > I modeled it after the other warnings which are output by DROP OWNED BY > when it's unable to perform the requested drop. I'm not against trying > to add something, but you tend to get a bunch of those messages at once > which means having a hint would result in a bunch of repeated messages > and I don't think that'd be very helpful. Further, it's essentially a > 'permission denied' type of error, which generally means that the > individual who is running it can't do anything to fix it anyway. > > I'm not against looking to improve things here, but I don't think just > trying to make a change here makes sense. We could throw a warning+hint > at the end of DROP OWNED, if anything wasn't able to be dropped, which > provided more information, perhaps. I'm not convinced that would really > be very useful to the individual running the command and would need to, > in essence, be "please get someone with higher privileges to run this, > or get them to give you permission to run it". I don't think we really > want to go there (anyone else recall the "please see your network > administrator" errors..?). > > If I'm misunderstanding your thoughts here, please let me know. This appears to address one of the open items at https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items -- if so, please update that page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
> On Dec 12, 2015, at 3:42 PM, Tom Lanewrote: > > Joe Conway writes: >> On 12/12/2015 02:31 PM, Tom Lane wrote: >>> I'm not particularly wedded to this rule. In principle we could go so >>> far as to import psql's code that parses commands and figures out which >>> semicolons are command terminators --- but that is a pretty large chunk >>> of code, and I think it'd really be overkill considering that initdb >>> deals only with fixed input scripts. But if anyone has another simple >>> rule for breaking SQL into commands, we can certainly discuss >>> alternatives. > >> Possibly inadequate, but I wrote a get_one_query() function to grab one >> statement at a time from a possibly multi-statement string and it isn't >> all that many lines of code: >> https://github.com/jconway/pgsynck/blob/master/pgsynck.c > > Hmm. Doesn't look like that handles semicolons embedded in CREATE RULE; > for that you'd have to track parenthesis nesting as well. (It's arguable > that we won't ever need that case during initdb, but I'd just as soon not > wire in such an assumption.) In general, though, I'd rather not try to > teach InteractiveBackend() such a large amount about SQL syntax. I use CREATE RULE within startup files in the fork that I maintain. I have lots of them, totaling perhaps 50k lines of rule code. I don't think any of that code would have a problem with the double-newline separation you propose, which seems a more elegant solution to me. Admittedly, the double-newline separation would need to be documented at the top of each sql file, otherwise it would be quite surprising to those unfamiliar with it. You mentioned upthread that introducing a syntax error in one of these files results in a not-so-helpful error message that dumps the contents of the entire file. I can confirm that happens, and is hardly useful. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On Sat, Dec 12, 2015 at 12:02 PM, Andres Freundwrote: > Noah, Robert, All > > On 2015-12-11 11:20:21 -0500, Robert Haas wrote: >> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas wrote: >> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch. >> > >> > So let's do that, then. >> >> Who is going to take care of this? > > Here's two patches: > > 1) The fix to SetOffsetVacuumLimit(). > >I've tested this by introducing a probabilistic "return false;" to >find_multixact_start(), and torturing postgres by burning through >billions of multixactids of various sizes. Behaves about as >bad^Wgood as without the induced failures; before the patch there >were moments of spurious warnings/errors when ->offsetStopLimit was >out of whack. I find the commit message you wrote a little difficult to read, and propose the following version instead, which reads better to me: Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would install 0 into MultiXactState->offsetStopLimit. Luckily, there are no known cases where find_multixact_start() will return an error in 9.5 and above. But if it were to happen, for example due to filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId() could continue allocating mxids even if close to a wraparound, or it could erroneously stop allocating mxids, even if no wraparound is looming. The wrong value would be corrected the next time SetOffsetVacuumLimit() is called, or by a restart. (I have no comments on the substance of either patch and have reviewed the first one to a negligible degree - it doesn't look obviously wrong - and the second one not at all.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] PostGIS Doc Urls
The attached patch changes web references to PostGIS to point to the actual community site (postgis.net) which is active, rather than the historical site (postgis.org). P. -- Paul Ramsey http://cleverelephant.ca http://postgis.net postgis_doc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sat, Dec 12, 2015 at 2:28 PM, Peter Geogheganwrote: > On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes wrote: >> I have a question about the terminology used in this patch. What is a >> tuple proper? What is it in contradistinction to? I would think that >> a tuple which is located in its own palloc'ed space is the "proper" >> one, leaving a tuple allocated in the bulk memory pool to be >> called...something else. I don't know what the >> non-judgmental-sounding antonym of postpositive "proper" is. > > "Tuple proper" is a term that appears 5 times in tuplesort.c today. As > it says at the top of that file: > > /* > * The objects we actually sort are SortTuple structs. These contain > * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple), > * which is a separate palloc chunk --- we assume it is just one chunk and > * can be freed by a simple pfree(). SortTuples also contain the tuple's > * first key column in Datum/nullflag format, and an index integer. Those usages make sense to me, as they are locally self-contained and it is clear what they are in contradistinction to. But your usage is spread throughout (even in function names, not just comments) and seems to contradict the current usage as yours are not separately palloced, as the "proper" ones described here are. I think that "proper" only works when the same comment also defines the alternative, rather than as some file-global description. Maybe "pooltuple" rather than "tupleproper" > >> Also, if I am reading this correctly, when we refill a pool from a >> logical tape we still transform each tuple as it is read from the disk >> format to the memory format. This inflates the size quite a bit, at >> least for single-datum tuples. If we instead just read the disk >> format directly into the pool, and converted them into the in-memory >> format when each tuple came due for the merge heap, would that destroy >> the locality of reference you are seeking to gain? > > Are you talking about alignment? Maybe alignment, but also the size of the SortTuple struct itself, which is not present on tape but is present in memory if I understand correctly. When reading 128kb (32 blocks) worth of in-memory pool, it seems like it only gets to read 16 to 18 blocks of tape to fill them up, in the case of building an index on single column 32-byte random md5 digests. I don't exactly know where all of that space goes, I'm taking an experimentalist approach. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
On Sun, Dec 13, 2015 at 12:04:34AM +, Greg Stark wrote: > On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich> wrote: > Did you publish the source already? I haven't been following all > along, sorry if these are all answered questions. Either he has, or the following is a truly astonishing coincidence: https://github.com/anse1/sqlsmith Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
Andres Freundwrites: > That's cool too. Besides processing the .bki files, and there largely > reg*_in, the many restarts are the most expensive parts of initdb. BTW, in case anyone is doubting it, I did a little bit of "perf" tracing and confirmed Andres' comment here: more than 50% of the runtime of the bootstrap phase is eaten by the pg_proc seqscans performed by regprocin. There's nothing else amounting to more than a few percent, so basically nothing else in bootstrap is worth optimizing before we fix that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostGIS Doc Urls
Paul Ramseywrites: > The attached patch changes web references to PostGIS to point to the actual > community site (postgis.net) which is active, rather than the historical site > (postgis.org). Pushed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Greg Stark writes: > On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich> wrote: > When you hit the timeout is this implemented in your fuzzer or using > statement_timeout? If the former, can you add a statement_timeout of > just short of the timeout in the fuzzer and find cases where the > planner might not be calling CHECK_FOR_INTERRUPTS frequently enough? It's the latter. I don't think I can add a client-side timeout into sqlsmith elegantly. IMHO it's better to write another test tool that just re-runs the queries that were logged with a timeout by sqlsmith and investigates their timeout-behavior more closely. >> I do have some older numbers for line coverage from before the recent >> grammar extension: > > If you have a corpus of queries in a simple format it would be pretty > convenient to add them in a regression test and then run make coverage > to get html reports. Hmm, I thought I found a workflow that would yield sqlsmith's coverage without integrating it into the regession tests. This is what I did: make install initdb /tmp/gcov pg_ctl -D /tmp/gcov start make installcheck pg_ctl -D /tmp/gcov stop make coverage-clean pg_ctl -D /tmp/gcov start sqlsmith --target='dbname=regression' --max-queries=1 pg_ctl -D /tmp/gcov stop make coverage-html It seems to yield a pure sqlsmith-only coverage report, as a "make coverage-html" before the "make coverage-clean" yields a report with much higher score. Maybe there are drawbacks to the workflow you are suggesting? I just re-did it with the current sqlsmith code, and it's up by 25% compared to the latest tested revision: | revision | overall | parser | |--+-+| | a4c1989 |26.0 | 20.4 | | ee099e6 |33.8 | 25.8 | I also put the report here, in case someone wants to look at certain details, or make suggestions into what directions to best extend the grammar to increase coverage. http://ansel.ydns.eu/~andreas/coverage/ http://ansel.ydns.eu/~andreas/gcov.tar.xz > Did you publish the source already? I haven't been following all > along, sorry if these are all answered questions. It's not had a proper release yet, but the code is available via github in all its rapid-prototypesque glory: https://github.com/anse1/sqlsmith regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Handle policies during DROP OWNED BY
* Peter Eisentraut (pete...@gmx.net) wrote: > On 12/11/15 4:12 PM, Stephen Frost wrote: > > As with ACLs, the DROP OWNED BY caller must have permission to modify > > the policy or a WARNING is thrown and no change is made to the policy. > > That warning doesn't tell the user anything about how to fix the > situation or whether or why the situation is a problem and what to do > about it. I modeled it after the other warnings which are output by DROP OWNED BY when it's unable to perform the requested drop. I'm not against trying to add something, but you tend to get a bunch of those messages at once which means having a hint would result in a bunch of repeated messages and I don't think that'd be very helpful. Further, it's essentially a 'permission denied' type of error, which generally means that the individual who is running it can't do anything to fix it anyway. I'm not against looking to improve things here, but I don't think just trying to make a change here makes sense. We could throw a warning+hint at the end of DROP OWNED, if anything wasn't able to be dropped, which provided more information, perhaps. I'm not convinced that would really be very useful to the individual running the command and would need to, in essence, be "please get someone with higher privileges to run this, or get them to give you permission to run it". I don't think we really want to go there (anyone else recall the "please see your network administrator" errors..?). If I'm misunderstanding your thoughts here, please let me know. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PATCH: track last known XLOG segment in control file
Hi, On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: > this is the second improvement proposed in the thread [1] about ext4 data > loss issue. It adds another field to control file, tracking the last known > WAL segment. This does not eliminate the data loss, just the silent part of > it when the last segment gets lost (due to forgetting the rename, deleting > it by mistake or whatever). The patch makes sure the cluster refuses to > start if that happens. Uh, that's fairly expensive. In many cases it'll significantly increase the number of fsyncs. I've a bit of a hard time believing this'll be worthwhile. Additionally this doesn't seem to take WAL replay into account? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janeswrote: > I have a question about the terminology used in this patch. What is a > tuple proper? What is it in contradistinction to? I would think that > a tuple which is located in its own palloc'ed space is the "proper" > one, leaving a tuple allocated in the bulk memory pool to be > called...something else. I don't know what the > non-judgmental-sounding antonym of postpositive "proper" is. "Tuple proper" is a term that appears 5 times in tuplesort.c today. As it says at the top of that file: /* * The objects we actually sort are SortTuple structs. These contain * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple), * which is a separate palloc chunk --- we assume it is just one chunk and * can be freed by a simple pfree(). SortTuples also contain the tuple's * first key column in Datum/nullflag format, and an index integer. > Also, if I am reading this correctly, when we refill a pool from a > logical tape we still transform each tuple as it is read from the disk > format to the memory format. This inflates the size quite a bit, at > least for single-datum tuples. If we instead just read the disk > format directly into the pool, and converted them into the in-memory > format when each tuple came due for the merge heap, would that destroy > the locality of reference you are seeking to gain? Are you talking about alignment? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: track last known XLOG segment in control file
On 12/12/2015 11:20 PM, Andres Freund wrote: Hi, On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: this is the second improvement proposed in the thread [1] about ext4 data loss issue. It adds another field to control file, tracking the last known WAL segment. This does not eliminate the data loss, just the silent part of it when the last segment gets lost (due to forgetting the rename, deleting it by mistake or whatever). The patch makes sure the cluster refuses to start if that happens. Uh, that's fairly expensive. In many cases it'll significantly increase the number of fsyncs. It should do exactly 1 additional fsync per WAL segment. Or do you think otherwise? > I've a bit of a hard time believing this'll be worthwhile. The trouble is protections like this only seem worthwhile after the fact, when something happens. I think it's reasonable protection against issues similar to the one I reported ~2 weeks ago. YMMV. > Additionally this doesn't seem to take WAL replay into account? I think the comparison in StartupXLOG needs to be less strict, to allow cases when we actually replay more WAL segments. Is that what you mean? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
On 12/12/2015 02:31 PM, Tom Lane wrote: > I'm not particularly wedded to this rule. In principle we could go so > far as to import psql's code that parses commands and figures out which > semicolons are command terminators --- but that is a pretty large chunk > of code, and I think it'd really be overkill considering that initdb > deals only with fixed input scripts. But if anyone has another simple > rule for breaking SQL into commands, we can certainly discuss > alternatives. Possibly inadequate, but I wrote a get_one_query() function to grab one statement at a time from a possibly multi-statement string and it isn't all that many lines of code: https://github.com/jconway/pgsynck/blob/master/pgsynck.c > Anyway, the attached patch tweaks postgres.c to follow that rule instead > of slurp-to-EOF when -j is given. I doubt that being non-backwards- > compatible is a problem here; in fact, I'm tempted to rip out the -j > switch altogether and just have standalone mode always parse input the > same way. Does anyone know of people using standalone mode other than > for initdb? sepgsql uses it for installation, but it does not appear to use -j I'm not sure why it is required but at some point I'd like to dig into that. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
On Fri, Dec 11, 2015 at 9:13 AM, Robert Haaswrote: > Also, I'd be in favor of you updating the patch to reflect the > comments from Tom and Simon on November 17th. Attached revision: * Has more worked out comments on encoding, per Simon's request. * Uses Tom's preferred formulation for encoding TIDs (which involves unsigned integer casts). * Sets indexcursor = NULL when the tuplesort is empty, just to be tidy, per Tom's request. -- Peter Geoghegan From dd15ae3d0a2bc29fbcefebe2ae08834d2e247070 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 14 Nov 2015 16:57:20 -0800 Subject: [PATCH] Speed up CREATE INDEX CONCURRENTLY's TID sort Add a simple TID encoding scheme. This is used for encoding and decoding TIDs to and from int8/int64 values, for the benefit of an expensive, extra tuplesort operation performed by CREATE INDEX CONCURRENTLY. The sort now uses the default int8 opclass to sort values in the ad-hoc encoding, and so benefits from the availability of int8 SortSupport. Despite the fact that item pointers take up only 6 bytes on disk, the TID type is always pass-by-reference due to considerations about how Datums are represented and accessed. On the other hand, int8 is usually, in practice, a pass-by-value type. Testing shows this approach makes CREATE INDEX CONCURRENTLY significantly faster on platforms where int8 is pass-by-value, especially when the big saving in memory within tuplesort.c prevents the sort from being performed externally. The approach will help to some degree on all platforms, though. --- src/backend/catalog/index.c | 71 ++--- 1 file changed, 67 insertions(+), 4 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index e59b163..c10be3d 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -109,6 +109,8 @@ static void index_update_stats(Relation rel, static void IndexCheckExclusion(Relation heapRelation, Relation indexRelation, IndexInfo *indexInfo); +static inline int64 itemptr_encode(ItemPointer itemptr); +static inline void itemptr_decode(ItemPointer itemptr, int64 encoded); static bool validate_index_callback(ItemPointer itemptr, void *opaque); static void validate_index_heapscan(Relation heapRelation, Relation indexRelation, @@ -2832,7 +2834,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot) ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples; ivinfo.strategy = NULL; - state.tuplesort = tuplesort_begin_datum(TIDOID, TIDLessOperator, + /* + * Encode TIDs as int8 values for the sort, rather than directly sorting + * item pointers. This can be significantly faster, primarily because TID + * is a pass-by-reference type on all platforms, whereas int8 is + * pass-by-value on most platforms. + */ + state.tuplesort = tuplesort_begin_datum(INT8OID, Int8LessOperator, InvalidOid, false, maintenance_work_mem, false); @@ -2872,14 +2880,55 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot) } /* + * itemptr_encode - Encode ItemPointer as int64/int8 + * + * This representation must produce values encoded as int64 that sort in the + * same order as their corresponding original TID values would (using the + * default int8 opclass to produce a result equivalent to the default TID + * opclass). + * + * As noted in validate_index(), this can be significantly faster. + */ +static inline int64 +itemptr_encode(ItemPointer itemptr) +{ + BlockNumber block = ItemPointerGetBlockNumber(itemptr); + OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr); + int64 encoded; + + /* + * Use the 16 least significant bits for the offset. 32 adjacent bits are + * used for the block number. Since remaining bits are unused, there + * cannot be negative encoded values (We assume a two's complement + * representation). + */ + encoded = ((uint64) block << 16) | (uint16) offset; + + return encoded; +} + +/* + * itemptr_decode - Decode int64/int8 representation back to ItemPointer + */ +static inline void +itemptr_decode(ItemPointer itemptr, int64 encoded) +{ + BlockNumber block = (BlockNumber) (encoded >> 16); + OffsetNumber offset = (OffsetNumber) (encoded & 0x); + + ItemPointerSet(itemptr, block, offset); +} + +/* * validate_index_callback - bulkdelete callback to collect the index TIDs */ static bool validate_index_callback(ItemPointer itemptr, void *opaque) { v_i_state *state = (v_i_state *) opaque; + int64 encoded = itemptr_encode(itemptr); - tuplesort_putdatum(state->tuplesort, PointerGetDatum(itemptr), false); + tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false); state->itups += 1; return false;/* never actually delete anything */ } @@ -2911,6 +2960,7 @@ validate_index_heapscan(Relation heapRelation, /* state variables for the merge */ ItemPointer indexcursor = NULL; +
Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
I wrote: > BTW, there's another thing I'd like to see improved in this area, which is > a problem already but will get a lot worse if we push more work into the > post-bootstrap phase of initdb. That is that the post-bootstrap phase is > both inefficient and impossible to debug. If you've ever had a failure > there, you'll have seen that the backend spits out an entire SQL script > and says there's an error in it somewhere; that's because it gets the > whole per-stage script as one submission. (Try introducing a syntax error > somewhere in information_schema.sql, and you'll see what I mean.) > Breaking the stage scripts down further would help, but that is > unattractive because each one requires a fresh backend startup/shutdown, > including a full checkpoint. I'd like to see things rejiggered so that > there's only one post-bootstrap standalone backend session that performs > all the steps, but initdb feeds it just one SQL command at a time so that > errors are better localized. That should both speed up initdb noticeably > and make debugging easier. I thought this sounded like a nice lazy-Saturday project, so I started poking at it, and attached is a WIP patch. The core issue that has to be dealt with is that standalone-backend mode currently has just two rules for deciding when to stop collecting input and execute the command buffer, and they both suck: 1. By default, execute after every newline. (Actually, you can quote a newline with a backslash, but we don't use that ability anywhere.) 2. With -j, slurp the entire input until EOF, and execute it as one giant multicommand string. We're doing #2 to handle information_schema.sql and the other large SQL scripts that initdb runs, which is why the response to an error in those scripts is so yucky. After some experimentation, I came up with the idea of executing any time that a semicolon followed by two newlines is seen. This nicely breaks up input like information_schema.sql. There are probably some residual places where more than one command is executed in a single string, but we could fix that with some more newlines. Obviously, this rule is capable of being fooled if you have a newline followed by a blank line in a comment or quoted literal --- but it turns out that no such case exists anywhere in initdb's data. I'm not particularly wedded to this rule. In principle we could go so far as to import psql's code that parses commands and figures out which semicolons are command terminators --- but that is a pretty large chunk of code, and I think it'd really be overkill considering that initdb deals only with fixed input scripts. But if anyone has another simple rule for breaking SQL into commands, we can certainly discuss alternatives. Anyway, the attached patch tweaks postgres.c to follow that rule instead of slurp-to-EOF when -j is given. I doubt that being non-backwards- compatible is a problem here; in fact, I'm tempted to rip out the -j switch altogether and just have standalone mode always parse input the same way. Does anyone know of people using standalone mode other than for initdb? The other part of the patch modifies initdb to do all its post-bootstrap steps using a single standalone backend session. I had to remove the code that currently prints out progress markers for individual phases of that processing, so that now you get output that looks like creating directory /home/postgres/testversion/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok creating template1 database in /home/postgres/testversion/data/base/1 ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok Since initdb is just printing to the backend in an open-loop fashion, it doesn't really know whether each command succeeds; in fact it is usually pretty far ahead of the backend, until it does pclose() which waits for the subprocess to exit. So we can't readily keep the old progress markers. I don't think we'll miss them though. The whole "post-bootstrap initialization" step only takes a second or so on my main dev machine, so breaking down the progress more finely isn't that useful anymore. I had to change ;\n to ;\n\n in a few places in initdb's internal scripts, to ensure that VACUUM commands were executed by themselves (otherwise you get "VACUUM can't run in a transaction block" type failures). I wasn't very thorough about that though, pending a decision on exactly what the new command-boundary rule will be. The upshot of these changes is that initdb runs about 10% faster overall (more in -N mode), which is a useful savings. Also, the response to a syntax error in information_schema.sql now looks like this: creating template1 database in /home/postgres/testversion/data/base/1 ... ok performing post-bootstrap initialization ... FATAL: column
Re: [HACKERS] Bootstrap DATA is a pita
On 2015-12-12 13:28:28 -0500, Tom Lane wrote: > BTW, there's another thing I'd like to see improved in this area, which is > a problem already but will get a lot worse if we push more work into the > post-bootstrap phase of initdb. That is that the post-bootstrap phase is > both inefficient and impossible to debug. If you've ever had a failure > there, you'll have seen that the backend spits out an entire SQL script > and says there's an error in it somewhere; that's because it gets the > whole per-stage script as one submission. Seen that more than once :( > Breaking the stage scripts down further would help, but that is > unattractive because each one requires a fresh backend startup/shutdown, > including a full checkpoint. I'd like to see things rejiggered so that > there's only one post-bootstrap standalone backend session that performs > all the steps, but initdb feeds it just one SQL command at a time so that > errors are better localized. That should both speed up initdb noticeably > and make debugging easier. One way to do that would be to not use the single user mode for that stage. Afair, at that point we could actually just start the cluster "normally", with the socket pointing somewhere locally (ugh, some path length issues afoot, maybe allow relative directories? And, uh, windows), and use psql to do the splitting and everything for us. Your approach has probably some significant performance benefits, because it essentially does pipelining. So while aesthetically attractive, I'm afraid my proposal would lead to worse performance. So it's probably actually DOA :( Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
On 2015-12-12 17:31:49 -0500, Tom Lane wrote: > I thought this sounded like a nice lazy-Saturday project, so I started > poking at it, and attached is a WIP patch. Not bad, not bad at all. > After some experimentation, I came up with the idea of executing any > time that a semicolon followed by two newlines is seen. ... > but it turns out that no such case exists anywhere in initdb's data. Not pretty, but hardly any worse than the current situation. > I'm not particularly wedded to this rule. In principle we could go so > far as to import psql's code that parses commands and figures out which > semicolons are command terminators --- but that is a pretty large chunk > of code, and I think it'd really be overkill considering that initdb > deals only with fixed input scripts. Having that code somewhere abstracted wouldn't be bad though, extension scripts have a somewhat similar problem. > Anyway, the attached patch tweaks postgres.c to follow that rule instead > of slurp-to-EOF when -j is given. I doubt that being non-backwards- > compatible is a problem here; in fact, I'm tempted to rip out the -j > switch altogether and just have standalone mode always parse input the > same way. No objection here. > Does anyone know of people using standalone mode other than > for initdb? Unfortunately yes. There's docker instances around that configure users and everything using it. > The other part of the patch modifies initdb to do all its post-bootstrap > steps using a single standalone backend session. I had to remove the > code that currently prints out progress markers for individual phases > of that processing, so that now you get output that looks like That's cool too. Besides processing the .bki files, and there largely reg*_in, the many restarts are the most expensive parts of initdb. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Sat, Dec 12, 2015 at 10:31 PM, Andres Freundwrote: > On 2015-12-12 12:52:21 +0100, Andres Freund wrote: >> On 2015-12-12 20:49:52 +0900, Michael Paquier wrote: >> > Should we consider this bug a 9.5 blocker? >> >> I don't think so. But either way, I'm right now working on getting it >> fixed anyway. > > And done. Took a bit longer than predicted - I had to adjust my test > scripts to cope with < 9.4 not having tablespace mapping. Rather > annoying. Thanks! Cool to see this thread finally addressed. > It'd be really convenient if tablespaces relative to the main data > directory were supported... I got bitten at some point as well by the fact that tablespaces created after a base backup is taken have their location map to the same point for a master and a standby :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
Andres Freundwrites: > On 2015-12-12 17:31:49 -0500, Tom Lane wrote: >> Does anyone know of people using standalone mode other than >> for initdb? > Unfortunately yes. There's docker instances around that configure users > and everything using it. Hm, that means that we *do* have to worry about backwards compatibility. We might be able to get away with changing the behavior of -j mode anyway, though, since this proposal mostly only changes when execution happens and not what is valid input for -j mode. (It would probably break some apps if we took away the switch, since right now, you do not need a semicolon to terminate commands in the regular standalone mode.) Failing that, we could define a new switch, I guess. > That's cool too. Besides processing the .bki files, and there largely > reg*_in, the many restarts are the most expensive parts of initdb. Right. The proposal we were discussing upthread would move all the reg* lookups into creation of the .bki file, basically, which would improve that part of things quite a bit. (BTW, if we are concerned about initdb speed, that might be a reason not to be too eager to shift processing from bootstrap to non-bootstrap mode. Other than the reg* issue, bootstrap is certainly a far faster way to put rows into the catalogs than individual SQL commands could be.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: track last known XLOG segment in control file
Hi, this is the second improvement proposed in the thread [1] about ext4 data loss issue. It adds another field to control file, tracking the last known WAL segment. This does not eliminate the data loss, just the silent part of it when the last segment gets lost (due to forgetting the rename, deleting it by mistake or whatever). The patch makes sure the cluster refuses to start if that happens. [1] http://www.postgresql.org/message-id/56583bdd.9060...@2ndquadrant.com It's a fairly simple patch, but obviously it touches very complex part of the code. I'll add it to 2016-01 CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 71fc8ff..50f10a5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -,6 +,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) use_existent = true; openLogFile = XLogFileInit(openLogSegNo, _existent, true); openLogOff = 0; + + /* update the last known segment in the control file */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + if (ControlFile->lastKnownSegment != openLogSegNo) + { +elog(WARNING, "updating segment number = %lu", openLogSegNo); +ControlFile->lastKnownSegment = openLogSegNo; +UpdateControlFile(); + } + LWLockRelease(ControlFileLock); } /* Make sure we have the current logfile open */ @@ -5904,6 +5914,7 @@ StartupXLOG(void) XLogPageReadPrivate private; bool fast_promoted = false; struct stat st; + XLogSegNo lastLogSegNo = 0; /* * Read control file and check XLOG status looks valid. @@ -6865,6 +6876,9 @@ StartupXLOG(void) /* Remember this record as the last-applied one */ LastRec = ReadRecPtr; +/* Also remember the segment number */ +XLByteToSeg(ReadRecPtr, lastLogSegNo); + /* Allow read-only connections if we're consistent now */ CheckRecoveryConsistency(); @@ -6942,6 +6956,18 @@ StartupXLOG(void) RmgrTable[rmid].rm_cleanup(); } + /* + * Check that we've actually seen all the XLOG segments, i.e. that + * we've reached ControlFile->lastKnownSegment (this may fail for + * example when someone deletes the last XLOG segment, or in case + * of a filesystem issue). + */ + if (ControlFile->lastKnownSegment != lastLogSegNo) +ereport(FATAL, + (errmsg("not reached the last known segment (expected %lX/%lX seen %lX/%lX)", +(ControlFile->lastKnownSegment >> 8), (ControlFile->lastKnownSegment & 0xFF), +(lastLogSegNo >> 8), (lastLogSegNo & 0xFF; + ereport(LOG, (errmsg("redo done at %X/%X", (uint32) (ReadRecPtr >> 32), (uint32) ReadRecPtr))); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 32e1d81..44dde42 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -293,6 +293,9 @@ main(int argc, char *argv[]) (uint32) ControlFile.backupEndPoint); printf(_("End-of-backup record required:%s\n"), ControlFile.backupEndRequired ? _("yes") : _("no")); + printf(_("Last known segment: %lX/%X\n"), + (uint64) (ControlFile.lastKnownSegment >> 8), + (uint32) (ControlFile.lastKnownSegment & 0xFF)); printf(_("wal_level setting:%s\n"), wal_level_str(ControlFile.wal_level)); printf(_("wal_log_hints setting:%s\n"), diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index ad1eb4b..f0ba450 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -164,12 +164,18 @@ typedef struct ControlFileData * start up. If it's false, but backupStartPoint is set, a backup_label * file was found at startup but it may have been a leftover from a stray * pg_start_backup() call, not accompanied by pg_stop_backup(). + * + * lastKnownSegment is the segment sequence number of the last known XLOG + * segment. This is useful to check that the recovery actually processed + * all segments allocated before the crash (serves as a protection against + * accidentally deleted segments etc.) */ XLogRecPtr minRecoveryPoint; TimeLineID minRecoveryPointTLI; XLogRecPtr backupStartPoint; XLogRecPtr backupEndPoint; bool backupEndRequired; + XLogSegNo lastKnownSegment; /* * Parameter settings that determine if the WAL can be used for archival -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: track last known XLOG segment in control file
On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote: > On 12/12/2015 11:20 PM, Andres Freund wrote: > >On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: > >>this is the second improvement proposed in the thread [1] about ext4 data > >>loss issue. It adds another field to control file, tracking the last known > >>WAL segment. This does not eliminate the data loss, just the silent part of > >>it when the last segment gets lost (due to forgetting the rename, deleting > >>it by mistake or whatever). The patch makes sure the cluster refuses to > >>start if that happens. > > > >Uh, that's fairly expensive. In many cases it'll significantly > >increase the number of fsyncs. > > It should do exactly 1 additional fsync per WAL segment. Or do you think > otherwise? Which is nearly doubling the number of fsyncs, for a good number of workloads. And it does so to a separate file, i.e. it's not like these writes and the flushes can be combined. In workloads where pg_xlog is on a separate partition it'll add the only source of fsyncs besides checkpoint to the main data directory. > > I've a bit of a hard time believing this'll be worthwhile. > > The trouble is protections like this only seem worthwhile after the fact, > when something happens. I think it's reasonable protection against issues > similar to the one I reported ~2 weeks ago. YMMV. Meh. That argument can be used to justify about everything. Obviously we should be more careful about fsyncing files, including the directories. I do plan come back to your recent patch. > > Additionally this doesn't seem to take WAL replay into account? > > I think the comparison in StartupXLOG needs to be less strict, to allow > cases when we actually replay more WAL segments. Is that what you mean? What I mean is that the value isn't updated during recovery, afaics. You could argue that minRecoveryPoint is that, in a way. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: track last known XLOG segment in control file
On 12/12/2015 11:39 PM, Andres Freund wrote: On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote: On 12/12/2015 11:20 PM, Andres Freund wrote: On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: this is the second improvement proposed in the thread [1] about ext4 data loss issue. It adds another field to control file, tracking the last known WAL segment. This does not eliminate the data loss, just the silent part of it when the last segment gets lost (due to forgetting the rename, deleting it by mistake or whatever). The patch makes sure the cluster refuses to start if that happens. Uh, that's fairly expensive. In many cases it'll significantly increase the number of fsyncs. It should do exactly 1 additional fsync per WAL segment. Or do you think otherwise? Which is nearly doubling the number of fsyncs, for a good number of workloads. And it does so to a separate file, i.e. it's not like these writes and the flushes can be combined. In workloads where pg_xlog is on a separate partition it'll add the only source of fsyncs besides checkpoint to the main data directory. I doubt it will make any difference in practice, at least on reasonable hardware (which you should have, if fsync performance matters to you). But some performance testing will be necessary, I don't expect this to go in without that. It'd be helpful if you could describe the workload. I've a bit of a hard time believing this'll be worthwhile. The trouble is protections like this only seem worthwhile after the fact, when something happens. I think it's reasonable protection against issues similar to the one I reported ~2 weeks ago. YMMV. Meh. That argument can be used to justify about everything. Obviously we should be more careful about fsyncing files, including the directories. I do plan come back to your recent patch. My argument is that this is a reasonable protection against failures in that area - both our faults (in understanding the durability guarantees on a particular file system), or file system developer. Maybe it's not, because the chance of running into exactly the same issue in this part of code is negligible. Additionally this doesn't seem to take WAL replay into account? I think the comparison in StartupXLOG needs to be less strict, to allow cases when we actually replay more WAL segments. Is that what you mean? What I mean is that the value isn't updated during recovery, afaics. You could argue that minRecoveryPoint is that, in a way. Oh, right. Will fix if we conclude that the general idea makes sense. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Using a single standalone-backend run in initdb (was Re: [HACKERS] Bootstrap DATA is a pita)
Joe Conwaywrites: > On 12/12/2015 02:31 PM, Tom Lane wrote: >> I'm not particularly wedded to this rule. In principle we could go so >> far as to import psql's code that parses commands and figures out which >> semicolons are command terminators --- but that is a pretty large chunk >> of code, and I think it'd really be overkill considering that initdb >> deals only with fixed input scripts. But if anyone has another simple >> rule for breaking SQL into commands, we can certainly discuss >> alternatives. > Possibly inadequate, but I wrote a get_one_query() function to grab one > statement at a time from a possibly multi-statement string and it isn't > all that many lines of code: > https://github.com/jconway/pgsynck/blob/master/pgsynck.c Hmm. Doesn't look like that handles semicolons embedded in CREATE RULE; for that you'd have to track parenthesis nesting as well. (It's arguable that we won't ever need that case during initdb, but I'd just as soon not wire in such an assumption.) In general, though, I'd rather not try to teach InteractiveBackend() such a large amount about SQL syntax. With a rule like "break at ;\n\n" it's possible to ensure that command breaks occur only where wanted, though in corner cases you might have to format your input oddly. (For instance, if you needed that in a SQL literal, you might resort to E';\n\n' or use the standard's rules about concatenated string literals.) If you get it wrong the consequences aren't too disastrous: you'll get an unterminated-input syntax error, or in the other direction multiple commands will get run together for execution, which most of the time isn't a big issue. >> Does anyone know of people using standalone mode other than >> for initdb? > sepgsql uses it for installation, but it does not appear to use -j > I'm not sure why it is required but at some point I'd like to dig into that. It might be easier than starting a full postmaster and having to figure out a secure place for the socket etc. I'm prepared to back off the proposal about changing the default behavior of standalone mode; that leaves us with a choice between changing -j's behavior and inventing a new switch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreichwrote: > I currently set statement_timeout to 1s to avoid wasting time letting > postgres crunch numbers. Less than 0.5% of the queries run into this > timeout. I wonder if any of these timeouts would be interesting to look at. Some may just be very large queries that will take a few seconds to plan but others may be queries that are uncovering N^2 algorithms or even conceivably loops that are not terminating. When you hit the timeout is this implemented in your fuzzer or using statement_timeout? If the former, can you add a statement_timeout of just short of the timeout in the fuzzer and find cases where the planner might not be calling CHECK_FOR_INTERRUPTS frequently enough? > > Do you have coverage data for the corpus? > > I do have some older numbers for line coverage from before the recent grammar > extension: If you have a corpus of queries in a simple format it would be pretty convenient to add them in a regression test and then run make coverage to get html reports. Did you publish the source already? I haven't been following all along, sorry if these are all answered questions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers