[HACKERS] Many processes blocked at ProcArrayLock
Test configuration: Hardware: 4P intel server, 60 core 120 hard thread. Memory:512G SSD:2.4T PG: max_connections = 160 # (change requires restart) shared_buffers = 32GB work_mem = 128MB maintenance_work_mem = 32MB bgwriter_delay = 100ms # 10-1ms between rounds bgwriter_lru_maxpages = 200 # 0-1000 max buffers written/round bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round wal_level = minimal # minimal, archive, or hot_standby wal_buffers = 256MB # min 32kB, -1 sets based on shared_buffers autovacuum = off checkpoint_timeout=60min checkpoint_segments = 1000 archive_mode = off synchronous_commit = off fsync = off full_page_writes = off We use tpcc and pgbench to test postgresql 9.4beat2 performance. And we found the tps/tpmc could not increase with the terminal increase. The detail information is in attachment. Many processes is blocked, I dump the call stack, and found these processes is blocked at: ProcArrayLock. 60% processes is blocked in ProcArrayEndTransaction with ProcArrayLock EXCLUSIVE, 20% is in GetSnapshotData with ProcArrayLock SHARED. Others locks like XLogFlush and WALInsertLock are not very heavy. Is there any way we solve this problem? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using Core Foundation locale functions
On Fri, Nov 28, 2014 at 8:43 AM, Peter Eisentraut wrote: > At the moment, this is probably just an experiment that shows where > refactoring and better abstractions might be suitable if we want to > support multiple locale libraries. If we want to pursue ICU, I think > this could be a useful third option. FWIW, I think that the richer API that ICU provides for string transformations could be handy in optimizing sorting using abbreviated keys. For example, ICU will happily only produce parts of sort keys (the equivalent of strxfrm() blobs) if that is all that is required [1]. I think that ICU also allows clients to parse individual primary weights in a principled way (primary weights tend to be isomorphic to the Unicode code points in the original string). I think that this will enable order-preserving compression of the type anticipated by the Unicode collation algorithm [2]. That could be useful for certain languages, like Russian, where the primary weight level usually contains multi-byte code points with glibc's strxfrm() (this is generally not true of languages that use the Latin alphabet, or of East Asian languages). Note that there is already naturally a form of what you might call compression with strxfrm() [3]. This is very useful for abbreviated keys. [1] http://userguide.icu-project.org/collation/architecture [2] http://www.unicode.org/reports/tr10/#Run-length_Compression [3] http://www.postgresql.org/message-id/cam3swztywe5j69tapvzf2cm7mhskke3uhhnk9gluqckkwqo...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
Dne 2014-12-02 02:52, Tom Lane napsal: Tomas Vondra writes: On 2.12.2014 01:33, Tom Lane wrote: What I suspect you're looking at here is the detritus of creation of a huge number of memory contexts. mcxt.c keeps its own state about existing contents in TopMemoryContext. So, if we posit that those contexts weren't real small, there's certainly room to believe that there was some major memory bloat going on recently. Aha! MemoryContextCreate allocates the memory for the new context from TopMemoryContext explicitly, so that it survives resets of the parent context. Is that what you had in mind by keeping state about existing contexts? Right. That'd probably explain the TopMemoryContext size, because array_agg() creates separate context for each group. So if you have 1M groups, you have 1M contexts. Although I don's see how the size of those contexts would matter? Well, if they're each 6K, that's your 6GB right there. Yeah, but this memory should be freed after the query finishes, no? Maybe we could move this info (list of child contexts) to the parent context somehow, so that it's freed when the context is destroyed? We intentionally didn't do that, because in many cases it'd result in parent contexts becoming nonempty when otherwise they'd never have anything actually in them. The idea was that such "shell" parent contexts should be cheap, requiring only a control block in TopMemoryContext and not an actual allocation arena. This idea of a million separate child contexts was never part of the design of course; we might need to rethink whether that's a good idea. Or maybe there need to be two different policies about where to put child control blocks. Maybe. For me, the 130MB is not really a big deal, because for this to happen there really needs to be many child contexts at the same time, consuming much more memory. With 6.5GB consumed in total, 130MB amounts to ~2% which is negligible. Unless we can fix the RSS bloat. Also, this explains the TopMemoryContext size, but not the RSS size (or am I missing something)? Very possibly you're left with "islands" that prevent reclaiming very much of the peak RAM usage. It'd be hard to be sure without some sort of memory map, of course. Yes, that's something I was thinking about too - I believe what happens is that allocations of info in TopMemoryContext and the actual contexts are interleaved, and at the end only the memory contexts are deleted. The blocks allocated in TopMemoryContexts are kept, creating the "islands". If that's the case, allocating the child context info within the parent context would solve this, because these pieces would be reclaimed with the rest of the parent memory. But then again, there are probably other ways to create such islands (e.g. allocating additional block in a long-lived context while the child contexts exist). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Serialization exception : Who else was involved?
Hello, I'm using PostgreSQL .9.2.8 on Windows from a .NET application using Npgsql. I'm working in the Radiology Information System field. We have thousands of users against a big accounting database. We're using the SERIALIZABLE isolation level to ensure data consistency. Because of the large number of users, and probably because of the database design, we're facing serialization exception and we retry our transactions. So far so good. I was wondering if there was a log level in PostgreSQL that could tell me which query was the trigger of a doomed transaction. The goal is to understand the failures to improve the database and application designs. I pushed the logs to the DEBUG5 level with no luck. After carefully reviewing the documentation, it seems that there was nothing. So I downloaded the code and looked at it. Serialization conflict detection is done in src/backend/storage/lmgr/predicate.c, where transactions that are doomed to fail are marked as such with the SXACT_FLAG_DOOMED flag. I simply added elog(...) calls with the NOTIFY level, each time the flag is set, compiled the code and give it a try. The results are amazing for me, because this simple modification allows me to know which query is marking other running transactions to fail. I'm pretty sure that in the production environment of our major customers, there should be no more than a few transaction involved. I would like to see this useful and simple addition in a future version of PostgreSQL. Is it in the spirit of what is done when it comes to ease the work of the developer ? May be the level I've chosen is not appropriate ? Please let me know what you think. Kind Regards. Olivier.
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: > Ok. Though, this would affect how CATUPDATE is handled. Peter Eisentraut > previously raised a question about whether superuser checks should be > included with catupdate which led me to create the following post. > > http://www.postgresql.org/message-id/cakrt6cqovt2kiykg2gff7h9k8+jvu1149zlb0extkkk7taq...@mail.gmail.com > > Certainly, we could keep has_rolcatupdate for this case and put the > superuser check in role_has_attribute, but it seems like it might be worth > taking a look at whether a superuser can bypass catupdate or not. Just a > thought. My recollection matches the documentation- rolcatupdate should be required to update the catalogs. The fact that rolcatupdate is set by AlterUser to match rolsuper is an interesting point and one which we might want to reconsider, but that's beyond the scope of this patch. Ergo, I'd suggest keeping has_rolcatupdate, but have it do the check itself directly instead of calling down into role_has_attribute(). There's an interesting flip side to that though, which is the question of what to do with pg_roles and psql. Based on the discussion this far, it seems like we'd want to keep the distinction for pg_roles and psql based on what bits have explicitly been set rather than what's actually checked for. As such, I'd have one other function- check_has_attribute() which *doesn't* have the superuser allow-all check and is what is used in pg_roles and by psql. I'd expose both functions at the SQL level. > Ok. I had originally thought for this patch that I would try to minimize > these types of changes, though perhaps this is that opportunity previously > mentioned in refactoring those. However, the catupdate question still > remains. It makes sense to me, at least, to include removing those individual attribute functions in this patch. > I have no reason for one over the other, though I did ask myself that > question. I did find it curious that in some cases there is "has_X" and > then in others "pg_has_X". Perhaps I'm not looking in the right places, > but I haven't found anything that helps to distinguish when one vs the > other is appropriate (even if it is a general rule of thumb). Given that we're changing things anyway, it seems to me that the pg_ prefix makes sense. > Yes, we were, however the latter causes a syntax error with initdb. :-/ Ok, then just stuff the 255 back there and add a comment about why it's required and mention that cute tricks to calculate the value won't work. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] excessive amounts of consumed memory (RSS), triggering OOM killer
Dne 2 Prosinec 2014, 10:59, Tomas Vondra napsal(a): > Dne 2014-12-02 02:52, Tom Lane napsal: >> Tomas Vondra writes: >> >>> Also, this explains the TopMemoryContext size, but not the RSS size >>> (or am I missing something)? >> >> Very possibly you're left with "islands" that prevent reclaiming very >> much of the peak RAM usage. It'd be hard to be sure without some sort >> of memory map, of course. > > Yes, that's something I was thinking about too - I believe what happens > is that allocations of info in TopMemoryContext and the actual contexts > are interleaved, and at the end only the memory contexts are deleted. > The blocks allocated in TopMemoryContexts are kept, creating the > "islands". > > If that's the case, allocating the child context info within the parent > context would solve this, because these pieces would be reclaimed with > the rest of the parent memory. On second thought, I'm not sure this explains the continuous increase of consumed memory. When the first iteration consumes 2,818g of memory, why should the following iterations consume significantly more? The allocation patterns should be (almost) exactly the same, reusing the already allocated memory (either at the system or TopMemoryContext level). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat_statement normalization fails due to temporary tables
Hi, pg_stat_statement's query normalization fails when temporary tables are used in a query. That's because JumbleRangeTable() uses the relid in the RTE to build the query fingerprint. I think in this case pgss from 9.1 actually would do better than 9.2+ as the hash lookup previously didn't use the relid. I don't really have a good idea about fixing this though. The best thing that comes to mind is simply use eref->aliasname for the disambiguation... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Michael Paquier writes: > On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin wrote: >> Here's the patch rebased against current HEAD, that is including the >> recently committed action_at_recovery_target option. > If this patch gets in, it gives a good argument to jump to 10.0 IMO. > That's not a bad thing, only the cost of making recovery params as > GUCs which is still a feature wanted. > >> The default for the new GUC is 'pause', as in HEAD, and >> pause_at_recovery_target is removed completely in favor of it. > Makes sense. Another idea that popped out was to rename this parameter > as recovery_target_action as well, but that's not really something > this patch should care about. Indeed, but changing the name after the fact is straightforward. >> I've also taken the liberty to remove that part that errors out when >> finding $PGDATA/recovery.conf. > I am not in favor of this part. It may be better to let the users know > that their old configuration is not valid anymore with an error. This > patch cuts in the flesh with a huge axe, let's be sure that users do > not ignore the side pain effects, or recovery.conf would be simply > ignored and users would not be aware of that. Yeah, that is good point. I'd be in favor of a solution that works the same way as before the patch, without the need for extra trigger files, etc., but that doesn't seem to be nearly possible. Whatever tricks we might employ will likely be defeated by the fact that the oldschool user will fail to *include* recovery.conf in the main conf file. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai wrote: > Today, I had a talk with Hanada-san to clarify which can be a common portion > of them and how to implement it. Then, we concluded both of features can be > shared most of the infrastructure. > Let me put an introduction of join replacement by foreign-/custom-scan below. > > Its overall design intends to inject foreign-/custom-scan node instead of > the built-in join logic (based on the estimated cost). From the viewpoint of > core backend, it looks like a sub-query scan that contains relations join > internally. > > What we need to do is below: > > (1) Add a hook add_paths_to_joinrel() > It gives extensions (including FDW drivers and custom-scan providers) chance > to add alternative paths towards a particular join of relations, using > ForeignScanPath or CustomScanPath, if it can run instead of the built-in ones. > > (2) Informs the core backend varno/varattno mapping > One thing we need to pay attention is, foreign-/custom-scan node that performs > instead of the built-in join node must return mixture of values come from both > relations. In case when FDW driver fetch a remote record (also, fetch a record > computed by external computing resource), the most reasonable way is to store > it on ecxt_scantuple of ExprContext, then kicks projection with varnode that > references this slot. > It needs an infrastructure that tracks relationship between original varnode > and the alternative varno/varattno. We thought, it shall be mapped to > INDEX_VAR > and a virtual attribute number to reference ecxt_scantuple naturally, and > this infrastructure is quite helpful for both of ForegnScan/CustomScan. > We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan > nodes. > It contains list of the original Var node that shall be mapped on the position > according to the list index. (e.g, the first varnode is varno=INDEX_VAR and > varattno=1) > > (3) Reverse mapping on EXPLAIN > For EXPLAIN support, above varnode on the pseudo relation scan needed to be > solved. All we need to do is initialization of dpns->inner_tlist on > set_deparse_planstate() according to the above mapping. > > (4) case of scanrelid == 0 > To skip open/close (foreign) tables, we need to have a mark to introduce the > backend not to initialize the scan node according to table definition, but > according to the pseudo varnodes list. > As earlier custom-scan patch doing, scanrelid == 0 is a straightforward mark > to show the scan node is not combined with a particular real relation. > So, it also need to add special case handling around foreign-/custom-scan > code. > > We expect above changes are enough small to implement basic join push-down > functionality (that does not involves external computing of complicated > expression node), but valuable to support in v9.5. > > Please comment on the proposition above. I don't really have any technical comments on this design right at the moment, but I think it's an important area where PostgreSQL needs to make some progress sooner rather than later, so I hope that we can get something committed in time for 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statement normalization fails due to temporary tables
Andres Freund writes: > pg_stat_statement's query normalization fails when temporary tables are > used in a query. That's because JumbleRangeTable() uses the relid in the > RTE to build the query fingerprint. I think in this case pgss from 9.1 > actually would do better than 9.2+ as the hash lookup previously didn't > use the relid. > I don't really have a good idea about fixing this though. The best thing > that comes to mind is simply use eref->aliasname for the > disambiguation... Hmm ... by "fails" I suppose you mean "doesn't treat two different instances of the same temp table name as the same"? I'm not sure that's a bug. If we go over to using the aliasname then "select x from foo a" and "select x from bar a" would be treated as the same query, which clearly *is* a bug. More generally, I don't think that schema1.tab and schema2.tab should be considered the same table for this purpose. So it's hard to see how to do much better than using the OID. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statement normalization fails due to temporary tables
On 2014-12-02 09:59:00 -0500, Tom Lane wrote: > Andres Freund writes: > > pg_stat_statement's query normalization fails when temporary tables are > > used in a query. That's because JumbleRangeTable() uses the relid in the > > RTE to build the query fingerprint. I think in this case pgss from 9.1 > > actually would do better than 9.2+ as the hash lookup previously didn't > > use the relid. > > > I don't really have a good idea about fixing this though. The best thing > > that comes to mind is simply use eref->aliasname for the > > disambiguation... > > Hmm ... by "fails" I suppose you mean "doesn't treat two different > instances of the same temp table name as the same"? I'm not sure > that's a bug. Well, it did work < 9.2... > If we go over to using the aliasname then "select x from foo a" and > "select x from bar a" would be treated as the same query, which > clearly *is* a bug. Yep. The fully qualified name + alias would probably work - but IIRC isn't available in the RTE without further syscache lookups... > More generally, I don't think that schema1.tab > and schema2.tab should be considered the same table for this purpose. > So it's hard to see how to do much better than using the OID. Well, from a practical point of view it's highly annoying if half of pg_stat_statement suddenly consists out of the same query, just issued in a different session. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on mips*
On Tue, Nov 25, 2014 at 10:42 AM, Christoph Berg wrote: > Re: To Robert Haas 2014-11-24 <20141124200824.ga22...@msg.df7cb.de> >> > Does it fail every time when run on a machine where it fails sometimes? >> >> So far there's a consistent host -> fail-or-not mapping, but most >> mips/mipsel build hosts have seen only one attempt so far which >> actually came so far to actually run the shm_mq test. > > I got the build rescheduled on the same machine and it's hanging > again. > >> Atm I don't have access to the boxes where it was failing (the builds >> succeed on the mips(el) porter hosts available to Debian developers). >> I'll see if I can arrange access there and run a test. > > Julien Cristau was so kind to poke into the hanging processes. The > build has been stuck now for about 4h, while that postgres backend has > only consumed 4s of CPU time (according to plain "ps"). The currently > executing query is: > > SELECT test_shm_mq_pipelined(16384, (select > string_agg(chr(32+(random()*95)::int), '') from generate_series(1,27)), > 200, 3); > > (Waiting f, active, backend_start 6s older than xact_start, xact_start > same as query_start, state_change 19盜 newer, all 4h old) I can't tell from this exactly what's going wrong. Questions: 1. Are there any background worker processes running at the same time? If so, how many? We'd expect to see 3. 2. Can we printout of the following variables in stack frame 3 (test_shm_mq_pipelined)? send_count, loop_count, *outqh, *inqh, inqh->mqh_queue[0], outqh->mqh_queue[0] 3. What does a backtrace of each background worker process look like? If they are stuck inside copy_messages(), can you print *outqh, *inqh, inqh->mqh_queue[0], outqh->mqh_queue[0] from that stack frame? Sorry for the hassle; I just don't have a better idea how to debug this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Nitpicky doc corrections for BRIN functions of pageinspect
Peter Geoghegan wrote: > On Wed, Nov 19, 2014 at 8:02 PM, Michael Paquier > wrote: > > Just a small thing I noticed while looking at pageinspect.sgml, the > > set of SQL examples related to BRIN indexes uses lower-case characters > > for reserved keywords. This has been introduced by 7516f52. > > Patch is attached. > > My nitpicky observation about pageinspect + BRIN was that the comments > added to the pageinspect SQL file for the SQL-callable function > brin_revmap_data() have a comment header style slightly inconsistent > with the existing items. Pushed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Tue, Nov 25, 2014 at 8:20 PM, Amit Langote wrote: >> Before going too much further with this I'd mock up schemas for your >> proposed catalogs and a list of DDL operations to be supported, with >> the corresponding syntax, and float that here for comment. More people should really comment on this. This is a pretty big deal if it goes forward, so it shouldn't be based on what one or two people think. > * Catalog schema: > > CREATE TABLE pg_catalog.pg_partitioned_rel > ( >partrelidoidNOT NULL, >partkindoidNOT NULL, >partissub bool NOT NULL, >partkey int2vector NOT NULL, -- partitioning attributes >partopclass oidvector, > >PRIMARY KEY (partrelid, partissub), >FOREIGN KEY (partrelid) REFERENCES pg_class (oid), >FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid) > ) > WITHOUT OIDS ; So, we're going to support exactly two levels of partitioning? partitions with partissub=false and subpartitions with partissub=true? Why not support only one level of partitioning here but then let the children have their own pg_partitioned_rel entries if they are subpartitioned? That seems like a cleaner design and lets us support an arbitrary number of partitioning levels if we ever need them. > CREATE TABLE pg_catalog.pg_partition_def > ( >partitionid oid NOT NULL, >partitionparentrel oidNOT NULL, >partitionisoverflow bool NOT NULL, >partitionvalues anyarray, > >PRIMARY KEY (partitionid), >FOREIGN KEY (partitionid) REFERENCES pg_class(oid) > ) > WITHOUT OIDS; > > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned; What is an overflow partition and why do we want that? What are you going to do if the partitioning key has two columns of different data types? > * DDL syntax (no multi-column partitioning, sub-partitioning support as yet): > > -- create partitioned table and child partitions at once. > CREATE TABLE parent (...) > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ] > [ ( > PARTITION child >{ >VALUES LESS THAN { ... | MAXVALUE } -- for RANGE > | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST >} >[ WITH ( ... ) ] [ TABLESPACE tbs ] > [, ...] > ) ] ; How are you going to dump and restore this, bearing in mind that you have to preserve a bunch of OIDs across pg_upgrade? What if somebody wants to do pg_dump --table name_of_a_partition? I actually think it will be much cleaner to declare the parent first and then have separate CREATE TABLE statements that glue the children in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, 1). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
On Wed, Nov 26, 2014 at 10:12 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> In the context at hand, I think most of the messages in question are >> currently phrased like "must be superuser to do X". I'd be fine with >> changing that to "permission denied to do X", but not to just >> "permission denied". > > Apologies for the terseness of my (earlier) reply. This is exactly what > I'm suggesting. What was in the patch was this change: > > ! ERROR: must be superuser or replication role to use replication slots > > --- > > ! ERROR: permission denied to use replication slots > ! HINT: You must be superuser or replication role to use replication slots. Your proposed change takes two lines to convey the same amount of information that we are currently conveying in one line. How is that better? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 recovery/startup problems
On Wed, Nov 26, 2014 at 7:13 PM, Jeff Janes wrote: > If I do a pg_ctl stop -mf, then both files go away. If I do a pg_ctl stop > -mi, then neither goes away. It is only with the /sbin/reboot that I get > the fatal combination of _init being gone but the other still present. Eh? That sounds wonky. I mean, reboot normally kills processes with SIGTERM or SIGKILL, in which case I'd expect the outcome to match what you get with pg_ctl stop -mf or pg_ctl stop -mi. The only way I can see that you'd get a different behavior is if you did a hard reboot (like echo b > /proc/sysrq-trigger); if that changes things, then we might have a missing-fsync bug. How is that reboot managing to leave the main fork behind while losing the init fork? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] memory explosion on planning complex query
On Wed, Nov 26, 2014 at 7:24 PM, Andrew Dunstan wrote: > On 11/26/2014 05:00 PM, Andrew Dunstan wrote: >> Attached is some anonymized DDL for a fairly complex schema from a >> PostgreSQL Experts client. Also attached is an explain query that runs >> against the schema. The client's problem is that in trying to run the >> explain, Postgres simply runs out of memory. On my untuned 9.3 test rig, >> (Scientific Linux 6.4 with 24Gb of RAM and 24Gb of swap) vmstat clearly >> shows the explain chewing up about 7Gb of memory. When it's done the free >> memory jumps back to where it was. On a similar case on the clients test rig >> we saw memory use jump lots more. >> >> The client's question is whether this is not a bug. It certainly seems >> like it should be possible to plan a query without chewing up this much >> memory, or at least to be able to limit the amount of memory that can be >> grabbed during planning. Going from humming along happily to OOM conditions >> all through running "explain " is not very friendly. >> > > Further data point - thanks to Andrew Gierth (a.k.a. RhodiumToad) for > pointing this out. The query itself grabs about 600Mb to 700Mb to run, > whereas the EXPLAIN takes vastly more - on my system 10 times more. Surely > that's not supposed to happen? Hmm. So you can run the query but you can't EXPLAIN it? That sounds like it could well be a bug, but I'm thinking you might have to instrument palloc() to find out where all of that space is being allocated to figure out why it's happening - or maybe connect gdb to the server while the EXPLAIN is chewing up memory and pull some backtraces to figure out what section of code it's stuck in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink_get_connections() result for no connections
On Tue, Nov 25, 2014 at 3:49 PM, Tom Lane wrote: > While fooling around with the array_agg(anyarray) patch, I noted > that dblink_get_connections() returns NULL if there are no active > connections. It seems like an empty array might be a saner > definition --- thoughts? I don't think it matters much. I wouldn't break backward compatibility to change it from this behavior to that one, and if we'd picked the other one I wouldn't break compatibility to change it to this behavior. Anybody who cares can wrap a COALESCE() around it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] copy.c handling for RLS is insecure
On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost wrote: > Alright, I've done the change to use the RangeVar from CopyStmt, but > also added a check wherein we verify that the relation's OID returned > from the planned query is the same as the relation's OID that we did the > RLS check on- if they're different, we throw an error. Please let me > know if there are any remaining concerns. That's clearly an improvement, but I'm not sure it's water-tight. What if the name that originally referenced a table ended up referencing a view? Then you could get list_length(plan->relationOids) != 1. (And, in that case, I also wonder if you could get eval_const_expressions() to do evil things on your behalf while planning.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on mips*
Re: Robert Haas 2014-12-02 > I can't tell from this exactly what's going wrong. Questions: > > 1. Are there any background worker processes running at the same time? > If so, how many? We'd expect to see 3. > 2. Can we printout of the following variables in stack frame 3 > (test_shm_mq_pipelined)? send_count, loop_count, *outqh, *inqh, > inqh->mqh_queue[0], outqh->mqh_queue[0] > 3. What does a backtrace of each background worker process look like? > If they are stuck inside copy_messages(), can you print *outqh, *inqh, > inqh->mqh_queue[0], outqh->mqh_queue[0] from that stack frame? > > Sorry for the hassle; I just don't have a better idea how to debug this. No problem, I'm aware this isn't an easy target. I didn't have access to the build env myself, I'm not sure if I can find someone to retry the test there. I'll let you know when I have more data, but that will take a while (if at all :-/). Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb: Improve error recovery.
On Thu, Nov 27, 2014 at 7:28 AM, Mats Erik Andersson wrote: > I would like to improve error recovery of initdb when the > password file is empty. The present code declares "Error 0" > as the cause of failure. It suffices to use ferrer() since > fgets() returns NULL also at a premature EOF. > > In addition, a minor case is the need of a line feed in > order to print the error message on a line of its own > seems desirable. > > Best regards, > Mats Erik Andersson Please add your patch here so that it doesn't get forgotten about: https://commitfest.postgresql.org/action/commitfest_view/open Also, note that the PostgreSQL project prefers for patches to be attached rather than inline, as mailers have a tendency to mangle them. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Nov 26, 2014 at 10:12 AM, Stephen Frost wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> In the context at hand, I think most of the messages in question are > >> currently phrased like "must be superuser to do X". I'd be fine with > >> changing that to "permission denied to do X", but not to just > >> "permission denied". > > > > Apologies for the terseness of my (earlier) reply. This is exactly what > > I'm suggesting. What was in the patch was this change: > > > > ! ERROR: must be superuser or replication role to use replication slots > > > > --- > > > > ! ERROR: permission denied to use replication slots > > ! HINT: You must be superuser or replication role to use replication slots. > > Your proposed change takes two lines to convey the same amount of > information that we are currently conveying in one line. How is that > better? It includes the actual error message, unlike what we have today which is guidence as to what's required to get past the permission denied error. In other words, I disagree that the same amount of information is being conveyed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] copy.c handling for RLS is insecure
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost wrote: > > Alright, I've done the change to use the RangeVar from CopyStmt, but > > also added a check wherein we verify that the relation's OID returned > > from the planned query is the same as the relation's OID that we did the > > RLS check on- if they're different, we throw an error. Please let me > > know if there are any remaining concerns. > > That's clearly an improvement, but I'm not sure it's water-tight. > What if the name that originally referenced a table ended up > referencing a view? Then you could get > list_length(plan->relationOids) != 1. I'll test it out and see what happens. Certainly a good question and if there's an issue there then I'll get it addressed. > (And, in that case, I also wonder if you could get > eval_const_expressions() to do evil things on your behalf while > planning.) If it can be made to reference a view then there's an issue as the view might include a function call itself which is provided by the attacker.. I'm not sure that we have to really worry about anything more complicated than that. Clearly, if we found a relation originally then we need that same relation with the same OID after the conversion to a query. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] superuser() shortcuts
On Tue, Dec 2, 2014 at 11:29 AM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Nov 26, 2014 at 10:12 AM, Stephen Frost wrote: >> > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> >> In the context at hand, I think most of the messages in question are >> >> currently phrased like "must be superuser to do X". I'd be fine with >> >> changing that to "permission denied to do X", but not to just >> >> "permission denied". >> > >> > Apologies for the terseness of my (earlier) reply. This is exactly what >> > I'm suggesting. What was in the patch was this change: >> > >> > ! ERROR: must be superuser or replication role to use replication slots >> > >> > --- >> > >> > ! ERROR: permission denied to use replication slots >> > ! HINT: You must be superuser or replication role to use replication >> > slots. >> >> Your proposed change takes two lines to convey the same amount of >> information that we are currently conveying in one line. How is that >> better? > > It includes the actual error message, unlike what we have today which is > guidence as to what's required to get past the permission denied error. > > In other words, I disagree that the same amount of information is being > conveyed. So, you think that someone might see the message "must be superuser or replication role to use replication slots" and fail to understand that they have a permissions problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
> Normally you would replicate between an older master and a newer > replica, so this shouldn't be an issue. I find it unlikely that we > would de-support some syntax that works in an older version: it would > break pg_dump, for one thing. The most common way in which we break forward-compatibility is by reserving additional keywords. Logical replication can deal with this by quoting all identifiers, so it's not a big issue. It's not the only possible issue; it is certainly conceivable that we could change something in the server and then try to change pg_dump to compensate. I think we wouldn't do that with anything very big, but suppose spgist were supplanted by a new and better access method tsigps. Well, we might figure that there are few enough people accustomed to the current syntax that we can get away with hacking the pg_dump output, and yes, anyone with an existing dump from an older system will have problems, but there won't be many of them, and they can always adjust the dump by hand. If we ever decided to do such a thing, whatever syntax transformation we did in pg_dump would have to be mimicked by any logical replication solution. I think it's legitimate to say that this could be a problem, but I think it probably won't be a problem very often, and I think when it is a problem it probably won't be a very big problem, because we already have good reasons not to break the ability to restore older dumps on newer servers more often than absolutely necessary. One of the somewhat disappointing things about this is that we're talking about adding a lot of new deparsing code to the server that probably, in some sense, duplicates pg_dump. Perhaps in an ideal world there would be a way to avoid that, but in the world we really live in, there probably isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload
On Thu, Nov 27, 2014 at 8:49 AM, Bruce Momjian wrote: > On Thu, Nov 6, 2014 at 05:46:42PM -0500, Peter Eisentraut wrote: >> Finally, the fact that a configuration change is in progress is >> privileged information. Unprivileged users can deduct from the presence >> of this message that administrators are doing something, and possibly >> that they have done something wrong. >> >> I think it's fine to log a message in the server log if the pg_hba.conf >> file needs reloading. But the client shouldn't know about this at all. > > Should we do this for postgresql.conf too? It doesn't really make sense; or at least, the exact same thing doesn't make any sense. If an authentication attempt fails unexpectedly, it might be because of a pg_hba.conf change that wasn't reloaded, so it makes sense to try to tip off the DBA. But it can't really be because of a pending postgresql.conf change that hasn't been reloaded, because those don't generally affect authentication. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 recovery/startup problems
On Tue, Dec 2, 2014 at 7:41 AM, Robert Haas wrote: > On Wed, Nov 26, 2014 at 7:13 PM, Jeff Janes wrote: > > If I do a pg_ctl stop -mf, then both files go away. If I do a pg_ctl > stop > > -mi, then neither goes away. It is only with the /sbin/reboot that I get > > the fatal combination of _init being gone but the other still present. > > Eh? That sounds wonky. > > I mean, reboot normally kills processes with SIGTERM or SIGKILL, in > which case I'd expect the outcome to match what you get with pg_ctl > stop -mf or pg_ctl stop -mi. The only way I can see that you'd get a > different behavior is if you did a hard reboot (like echo b > > /proc/sysrq-trigger); if that changes things, then we might have a > missing-fsync bug. How is that reboot managing to leave the main fork > behind while losing the init fork? > During abort processing after getting a SIGTERM, the back end truncates 59288 to zero size, and unlinks all the other files (including 59288_init). The actual removal of 59288 is left until the checkpoint. So if you SIGTERM the backend, then take down the server uncleanly before the next checkpoint completes, you are left with just 59288. Here is the strace: open("base/16416/59288", O_RDWR)= 8 ftruncate(8, 0) = 0 close(8)= 0 unlink("base/16416/59288.1")= -1 ENOENT (No such file or directory) unlink("base/16416/59288_fsm") = -1 ENOENT (No such file or directory) unlink("base/16416/59288_vm") = -1 ENOENT (No such file or directory) unlink("base/16416/59288_init") = 0 unlink("base/16416/59288_init.1") = -1 ENOENT (No such file or directory) Cheers, Jeff
Re: [HACKERS] why is PG_AUTOCONF_FILENAME is pg_config_manual.h?
On Thu, Nov 27, 2014 at 12:56 PM, Fujii Masao wrote: > On Thu, Nov 27, 2014 at 11:58 PM, Peter Eisentraut wrote: >> Surely that's not a value that we expect users to be able to edit. Is >> pg_config_manual.h just abused as a place that's included everywhere? >> >> (I suggest utils/guc.h as a better place.) > > +1 +1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Report search_path value back to the client.
Hi, As of now postgres is reporting only "really" important variables, but among them there are also not so important, like 'application_name'. The only reason to report it, was: "help pgbouncer, and perhaps other connection poolers". Change was introduced by commit: 59ed94ad0c9f74a3f057f359316c845cedc4461e This fact makes me wonder, why 'search_path' value is not reported back to the client? Use-case is absolutely the same as with 'application_name' but a little bit more important. Our databases provides different version of stored procedures which are located in a different schemas: 'api_version1', 'api_version2', 'api_version5', etc... When application establish connection to the database it set search_path value for example to api_version1. At the same time, new version of the same application will set search_path value to api_version2. Sometimes we have hundreds of instances of applications which may use different versions of stored procedures which are located in different schemas. It's really crazy to keep so many (hundreds) connections to the database and it would be much better to have something like pgbouncer in front of postgres. Right now it's not possible, because pgbouncer is not aware of search_path and it's not really possible to fix pgbouncer, because there is no easy way to get current value of search_path. I would like to mark 'search_path' as GUC_REPORT: --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2904,7 +2904,7 @@ static struct config_string ConfigureNamesString[] = {"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Sets the schema search order for names that are not schema-qualified."), NULL, - GUC_LIST_INPUT | GUC_LIST_QUOTE + GUC_LIST_INPUT | GUC_LIST_QUOTE | GUC_REPORT }, &namespace_search_path, "\"$user\",public", What do you think? Regards, -- Alexander Kukushkin
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Nov 26, 2014 at 11:00 PM, Michael Paquier wrote: > On Wed, Nov 26, 2014 at 8:27 PM, Syed, Rahila wrote: >> Don't we need to initialize doPageCompression similar to doPageWrites in >> InitXLOGAccess? > Yep, you're right. I missed this code path. > >> Also , in the earlier patches compression was set 'on' even when fpw GUC is >> 'off'. This was to facilitate compression of FPW which are forcibly written >> even when fpw GUC is turned off. >> doPageCompression in this patch is set to true only if value of fpw GUC is >> 'compress'. I think its better to compress forcibly written full page writes. > Meh? (stealing a famous quote). > This is backward-incompatible in the fact that forcibly-written FPWs > would be compressed all the time, even if FPW is set to off. The > documentation of the previous patches also mentioned that images are > compressed only if this parameter value is switched to compress. If we have a separate GUC to determine whether to do compression of full page writes, then it seems like that parameter ought to apply regardless of WHY we are doing full page writes, which might be either that full_pages_writes=on in general, or that we've temporarily turned them on for the duration of a full backup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick wrote: > DDL deparsing is a feature that allows collection of DDL commands as they > are > executed in a server, in some flexible, complete, and fully-contained format > that allows manipulation, storage, and transmission. This feature has > several > use cases; the two best known ones are DDL replication and DDL auditing. > > We have came up with a design that uses a JSON structure to store commands. > It is similar to the C sprintf() call in spirit: there is a base format > string, which is a generic template for each command type, and contains > placeholders that represent the variable parts of the command. The values > for > the placeholders in each specific command are members of the JSON object. A > helper function is provided that expands the format string and replaces the > placeholders with the values, and returns the SQL command as text. This > design lets the user operate on the JSON structure in either a read-only > fashion (for example to block table creation if the names don't satisfy a > certain condition), or by modifying it (for example, to change the schema > name > so that tables are created in different schemas when they are replicated to > some remote server). > > This design is mostly accepted by the community. The one sticking point is > testing: how do we ensure that the JSON representation we have created > correctly deparses back into a command that has the same effect as the > original command. This was expressed by Robert Haas: > http://www.postgresql.org/message-id/CA+TgmoZ=vzrijmxlkqi_v0jg4k4leapmwusc6rwxs5mquxu...@mail.gmail.com > > The problem cannot be solved by a standard regression test module which runs > a > bunch of previously-defined commands and verifies the output. We need not > only check the output for the commands as they exist today, but also we need > to ensure that this does not get broken as future patches modify the > existing > commands as well as create completely new commands. > > The challenge here is to create a new testing framework that ensures the DDL > deparsing module will be maintained by future hackers as the DDL grammar is > modified. > > What and How to Test > > > Our goal should be that patch authors run "make check-world" in their > patched > copies and notice that the DDL deparse test is failing; they can then modify > deparse_utility.c to add support for the new commands, which should in > general > be pretty straightforward. This way, maintaining deparsing code would be > part > of new patches just like we require pg_dump support and documentation for > new > features. > > It would not work to require patch authors to add their new commands to a > new > pg_regress test file, because most would not be aware of the need, or they > would just forget to do it, and patches would be submitted and possibly even > committed without any realization of the breakage caused. > > There are two things we can rely on: standard regression tests, and pg_dump. > > Standard regression tests are helpful because patch authors include new test > cases in the patches that stress their new options or commands. This new > test > framework needs to be something that internally runs the regression tests > and > exercises DDL deparsing as the regression tests execute DDL. That would > mean > that new commands and options would automatically be deparse-tested by our > new > framework as soon as patch authors add standard regress support. > > One thing is first-grade testing, that is ensure that the deparsed version > of > a DDL command can be executed at all, for if the deparsed version throws an > error, it's immediately obvious that the deparse code is bogus. This is > because we know the original command did not throw an error: otherwise, the > deparse code would not have run at all, because ddl_command_end triggers are > only executed once the original command has completed execution. So > first-grade testing ensures that no trivial bugs are present. > > But there's second-grade verification as well: is the object produced by the > deparsed version identical to the one produced by the original command? One > trivial but incomplete approach is to run the command, then save the > deparsed > version; run the deparsed version, and deparse that one too; compare both. > The problem with this approach is that if the deparse code is omitting some > clause (say it omits IN TABLESPACE in a CREATE TABLE command), then both > deparsed versions would contain the same bug yet they would compare equal. > Therefore this approach is not good enough. > > The best idea we have so far to attack second-grade testing is to trust > pg_dump to expose differences: accumulate commands as they run in the > regression database, the run the deparsed versions in a different database; > then pg_dump both databases and compare the dumped outputs. > > Proof-of-concept > > > We have now implemented th
Re: [HACKERS] Using pg_rewind for differential backup
On Fri, Nov 28, 2014 at 2:49 AM, Heikki Linnakangas wrote: > It also would be quite straightforward to write a separate tool to do just > that. Would be better than conflating pg_rewind with this. You could use > pg_rewind as the basis for it - it's under the same license as PostgreSQL. If we had such a tool in core, would that completely solve the differential backup problem, or would more be needed? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov wrote: > On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier > wrote: >> Please find attached a simple patch adding fillfactor as storage parameter >> for GIN indexes. The default value is the same as the one currently aka 100 >> to have the pages completely packed when a GIN index is created. > > > That's not true. Let us discuss it a little bit. > In btree access method index is created by sorting index tuples. Once they > are sorted it can write a tree with any fullness of the pages. With > completely filled pages you will get flood of index page splits on table > updates or inserts. In order to avoid that the default fillfactor is 90. > Besides index creation, btree access method tries to follow given fillfactor > in the case of inserting ordered data. When rightmost page splits then > fillfactor percent of data goes to the left page. > Btree page life cycle is between being freshly branched by split (50% > filled) and being full packed (100% filled) and then splitted. Thus we can > roughly estimate average fullness of btree page to be 75%. This > "equilibrium" state can be achieved by huge amount of inserts over btree > index. > > Fillfactor was spreaded to other access methods. For instance, GiST. In > GiST, tree is always constructed by page splits. So, freshly created GiST is > already in its "equilibrium" state. Even more GiST doesn't splits page by > halves. Split ration depends on opclass. So, "equilibrium" fullness of > particular GiST index is even less than 75%. What does fillfactor do with > GiST? It makes GiST pages split on fillfactor fullness on index build. So, > GiST is even less fulled. But why? You anyway don't get flood of split on > fresh GiST index because it's in "equilibrium" state. That's why I would > rather delete fillfactor from GiST until we have some algorithm to construct > GiST without page splits. > > What is going on with GIN? GIN is, basically, two-level nested btree. So, > fillfactor should be applicable here. But GIN is not constructed like > btree... > GIN accumulates maintenance_work_mem of data and then inserts it into the > tree. So fresh GIN is the result of insertion of maintenance_work_mem > buckets. And each bucket is sorted. On small index (or large > maintenance_work_mem) it would be the only one bucket. GIN has two levels of > btree: entry tree and posting tree. Currently entry tree has on special > logic to control fullness during index build. So, with only one bucket entry > tree will be 50% filled. With many buckets entry tree will be at > "equilibrium" state. In some specific cases (about two buckets) entry tree > can be almost fully packed. Insertion into posting tree is always ordered > during index build because posting tree is a tree of TIDs. When posting tree > page splits it leaves left page fully packed during index build and 75% > packed during insertion (because it expects some sequential inserts). > > Idea of GIN building algorithm is that entry tree is expected to be > relatively small and fit to cache. Insertions into posting trees are orders. > Thus this algorithm should be IO efficient and have low overhead. However, > with large entry trees this algorithm can perform much worse. > > My summary is following: > 1) In order to have fully correct support of fillfactor in GIN we need to > rewrite GIN build algorithm. > 2) Without rewriting GIN build algorithm, not much can be done with entry > tree. However, you can implement some heuristics. > 3) You definitely need to touch code that selects ratio of split in > dataPlaceToPageLeaf (starting with if (!btree->isBuild)). > 3) GIN data pages are always compressed excepts pg_upgraded indexes from > pre-9.4. Take care about it in following code. > if (GinPageIsCompressed(page)) > freespace = GinDataLeafPageGetFreeSpace(page); > + else if (btree->isBuild) > + freespace = BLCKSZ * (100 - fillfactor) / 100; This is a very interesting explanation; thanks for writing it up! It does leave me wondering why anyone would want fillfactor for GIN at all, even if they were willing to rewrite the build algorithm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On Fri, Nov 28, 2014 at 11:38 AM, Alex Shulgin wrote: > Should I add this patch to the next CommitFest? If you don't want it to get forgotten about, yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On Fri, Nov 28, 2014 at 11:59 AM, Petr Jelinek wrote: >>> I'm a bit late to the party, but wouldn't >>> >>> recovery_target_action = ... >>> >>> have been a better name for this? It'd be in line with the other >>> recovery_target_* parameters, and also a bit shorter than the imho >>> somewhat ugly "action_at_recovery_target". >> >> FWIW, I too think that "recovery_target_action" is a better name. > > I agree. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
Robert Haas wrote: > On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick wrote: > > A simple schedule to demonstrate this is available; execute from the > > src/test/regress/ directory like this: > > > > ./pg_regress \ > > --temp-install=./tmp_check \ > > --top-builddir=../../.. \ > > --dlpath=. \ > > --schedule=./schedule_ddl_deparse_demo > > I haven't read the code, but this concept seems good to me. Excellent, thanks. > It has the unfortunate weakness that a difference could exist during > the *middle* of the regression test run that is gone by the *end* of > the run, but our existing pg_upgrade testing has the same weakness, so > I guess we can view this as one more reason not to be too aggressive > about having regression tests drop the unshared objects they create. Agreed. Not dropping objects also helps test pg_dump itself; the normal procedure there is run the regression tests, then pg_dump the regression database. Objects that are dropped never exercise their corresponding pg_dump support code, which I think is a bad thing. I think we should institute a policy that regression tests must keep the objects they create; maybe not all of them, but at least a sample large enough to cover all interesting possibilities. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 12/02/2014 06:25 AM, Alex Shulgin wrote: >> I am not in favor of this part. It may be better to let the users know >> > that their old configuration is not valid anymore with an error. This >> > patch cuts in the flesh with a huge axe, let's be sure that users do >> > not ignore the side pain effects, or recovery.conf would be simply >> > ignored and users would not be aware of that. > Yeah, that is good point. > > I'd be in favor of a solution that works the same way as before the > patch, without the need for extra trigger files, etc., but that doesn't > seem to be nearly possible. As previously discussed, there are ways to avoid having a trigger file for replication. However, it's hard to avoid having one for PITR recovery; at least, I can't think of a method which isn't just as awkward, and we might as well stick with the awkward method we know. > Whatever tricks we might employ will likely > be defeated by the fact that the oldschool user will fail to *include* > recovery.conf in the main conf file. Well, can we merge this patch and then fight out what to do for the transitional users as a separate patch? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of GetUserId() Usage
On Sat, Nov 29, 2014 at 3:00 PM, Stephen Frost wrote: > * Stephen Frost (sfr...@snowman.net) wrote: >> * Stephen Frost (sfr...@snowman.net) wrote: >> > Attached is a patch to address the pg_cancel/terminate_backend and the >> > statistics info as discussed previously. It sounds like we're coming to >> >> And I forgot the attachment, of course. Apologies. > > Updated patch attached which also changes the error messages (which > hadn't been updated in the prior versions and really should be). > > Barring objections, I plan to move forward with this one and get this > relatively minor change wrapped up. As mentioned in the commit, while > this might be an arguably back-patchable change, the lack of field > complaints and the fact that it changes existing behavior mean it should > go only against master, imv. This patch does a couple of different things: 1. It makes more of the crappy error message change that Andres and I already objected to on the other thread. Whether you disagree with those objections or not, don't make an end-run around them by putting more of the same stuff into patches on other threads. 2. It changes the functions in pgstatfuncs.c so that you can see the relevant information not only for your own role, but also for roles of which you are a member. That seems fine, but do we need a documentation change someplace? 3. It messes around with pg_signal_backend(). There are currently no cases in which pg_signal_backend() throws an error, which is good, because it lets you write queries against pg_stat_activity() that don't fail halfway through, even if you are missing permissions on some things. This patch introduces such a case, which is bad. I think it's unfathomable that you would consider anything in this patch a back-patchable bug fix. It's clearly a straight-up behavior change... or more properly three different changes, only one of which I agree with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On Sat, Nov 29, 2014 at 11:46 PM, Jim Nasby wrote: > What do you mean by "never succeed"? Is it skipping a large number of pages? > Might re-trying the locks within the same vacuum help, or are the user locks > too persistent? You are confused. He's talking about the relation-level lock that vacuum attempts to take before doing any work at all on a given table, not the per-page cleanup locks that it takes while processing each page. If the relation-level lock can't be acquired, the whole table is skipped. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Getting references for OID
Hello, How can I get all depend objects for oid of object? For example, I have oid of db db_id and I want to get all oids of namespaces, which contains this db. Thank you! -- С уважением, Дмитрий Воронин
Re: [HACKERS] Turning recovery.conf into GUCs
Josh Berkus wrote: > On 12/02/2014 06:25 AM, Alex Shulgin wrote: > > Whatever tricks we might employ will likely > > be defeated by the fact that the oldschool user will fail to *include* > > recovery.conf in the main conf file. > > Well, can we merge this patch and then fight out what to do for the > transitional users as a separate patch? You seem to be saying "I don't have any good idea how to solve this problem now, but I will magically have one once this is committed". I'm not sure that works very well. In any case, the proposal upthread that we raise an error if recovery.conf is found seems sensible enough. Users will see it and they will adjust their stuff -- it's a one-time thing. It's not like they switch a version forwards one week and back the following week. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
Robert Haas wrote: > On Sat, Nov 29, 2014 at 11:46 PM, Jim Nasby wrote: > > What do you mean by "never succeed"? Is it skipping a large number of pages? > > Might re-trying the locks within the same vacuum help, or are the user locks > > too persistent? > > You are confused. He's talking about the relation-level lock that > vacuum attempts to take before doing any work at all on a given table, > not the per-page cleanup locks that it takes while processing each > page. If the relation-level lock can't be acquired, the whole table > is skipped. Almost there. Autovacuum takes the relation-level lock, starts processing. Some time later, another process wants a lock that conflicts with the one autovacuum has. This is flagged by the deadlock detector, and a signal is sent to autovacuum, which commits suicide. If the table is large, the time window for this to happen is large also; there might never be a time window large enough between two lock acquisitions for one autovacuum run to complete in a table. This starves the table from vacuuming completely, until things are bad enough that an emergency vacuum is forced. By then, the bloat is disastrous. I think it's that suicide that Andres wants to disable. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 12/02/2014 10:31 AM, Alvaro Herrera wrote: > Josh Berkus wrote: >> On 12/02/2014 06:25 AM, Alex Shulgin wrote: > >>> Whatever tricks we might employ will likely >>> be defeated by the fact that the oldschool user will fail to *include* >>> recovery.conf in the main conf file. >> >> Well, can we merge this patch and then fight out what to do for the >> transitional users as a separate patch? > > You seem to be saying "I don't have any good idea how to solve this > problem now, but I will magically have one once this is committed". I'm > not sure that works very well. No, I'm saying "this problem is easy to solve technically, but we have intractable arguments on this list about the best way to solve it, even though the bulk of the patch isn't in dispute". > In any case, the proposal upthread that we raise an error if > recovery.conf is found seems sensible enough. Users will see it and > they will adjust their stuff -- it's a one-time thing. It's not like > they switch a version forwards one week and back the following week. I'm OK with that solution. Apparently others aren't though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 12/02/2014 10:35 AM, Alvaro Herrera wrote: > If the table is large, the time window for this to happen is large also; > there might never be a time window large enough between two lock > acquisitions for one autovacuum run to complete in a table. This > starves the table from vacuuming completely, until things are bad enough > that an emergency vacuum is forced. By then, the bloat is disastrous. > > I think it's that suicide that Andres wants to disable. A much better solution for this ... and one which would solve a *lot* of other issues with vacuum and autovacuum ... would be to give vacuum a way to track which blocks an incomplete vacuum had already visited. This would be even more valuable for freeze. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 2014-12-02 11:02:07 -0800, Josh Berkus wrote: > On 12/02/2014 10:35 AM, Alvaro Herrera wrote: > > If the table is large, the time window for this to happen is large also; > > there might never be a time window large enough between two lock > > acquisitions for one autovacuum run to complete in a table. This > > starves the table from vacuuming completely, until things are bad enough > > that an emergency vacuum is forced. By then, the bloat is disastrous. > > > > I think it's that suicide that Andres wants to disable. Correct. > A much better solution for this ... and one which would solve a *lot* of > other issues with vacuum and autovacuum ... would be to give vacuum a > way to track which blocks an incomplete vacuum had already visited. > This would be even more valuable for freeze. That's pretty much a different problem. Yes, some more persistent would be helpful - although it'd need to be *much* more than which pages it has visited - but you'd still be vulnerable to the same issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 12/02/2014 11:08 AM, Andres Freund wrote: > On 2014-12-02 11:02:07 -0800, Josh Berkus wrote: >> On 12/02/2014 10:35 AM, Alvaro Herrera wrote: >>> If the table is large, the time window for this to happen is large also; >>> there might never be a time window large enough between two lock >>> acquisitions for one autovacuum run to complete in a table. This >>> starves the table from vacuuming completely, until things are bad enough >>> that an emergency vacuum is forced. By then, the bloat is disastrous. >>> >>> I think it's that suicide that Andres wants to disable. > > Correct. > >> A much better solution for this ... and one which would solve a *lot* of >> other issues with vacuum and autovacuum ... would be to give vacuum a >> way to track which blocks an incomplete vacuum had already visited. >> This would be even more valuable for freeze. > > That's pretty much a different problem. Yes, some more persistent would > be helpful - although it'd need to be *much* more than which pages it > has visited - but you'd still be vulnerable to the same issue. If we're trying to solve the problem that vacuums of large, high-update tables never complete, it's solving the same problem. And in a much better way. And yeah, doing a vacuum placeholder wouldn't be simple, but it's the only solution I can think of that's worthwhile. Just disabling the vacuum releases sharelock behavior puts the user in the situation of deciding between maintenance and uptime. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane wrote: > Bottom line, given all the restrictions on whether the optimization can > happen, I have very little enthusiasm for the whole idea. I do not think > the benefit will be big enough to justify the amount of mess this will > introduce. This optimization applies to a tremendous number of real-world cases, and we really need to have it. This was a huge problem for me in my previous life as a web developer. The previous work that we did to remove LEFT JOINs was an enormous help, but it's not enough; we need a way to remove INNER JOINs as well. I thought that David's original approach of doing this in the planner was a good one. That fell down because of the possibility that apparently-valid referential integrity constraints might not be valid at execution time if the triggers were deferred. But frankly, that seems like an awfully nitpicky thing for this to fall down on. Lots of web applications are going to issue only SELECT statements that run as as single-statement transactions, and so that issue, so troubling in theory, will never occur in practice. That doesn't mean that we don't need to account for it somehow to make the code safe, but any argument that it abridges the use case significantly is, in my opinion, not credible. Anyway, David was undeterred by the rejection of that initial approach and rearranged everything, based on suggestions from Andres and later Simon, into the form it's reached now. Kudos to him for his persistance. But your point that we might have chosen a whole different plan if it had known that this join was cheaper is a good one. However, that takes us right back to square one, which is to do this at plan time. I happen to think that's probably better anyway, but I fear we're just going around in circles here. We can either do it at plan time and find some way of handling the fact that there might be deferred triggers that haven't fired yet; or we can do it at execution time and live with the fact that we might have chosen a plan that is not optimal, though still better than executing a completely-unnecessary join. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 2014-12-02 11:12:40 -0800, Josh Berkus wrote: > On 12/02/2014 11:08 AM, Andres Freund wrote: > > On 2014-12-02 11:02:07 -0800, Josh Berkus wrote: > >> On 12/02/2014 10:35 AM, Alvaro Herrera wrote: > >>> If the table is large, the time window for this to happen is large also; > >>> there might never be a time window large enough between two lock > >>> acquisitions for one autovacuum run to complete in a table. This > >>> starves the table from vacuuming completely, until things are bad enough > >>> that an emergency vacuum is forced. By then, the bloat is disastrous. > >>> > >>> I think it's that suicide that Andres wants to disable. > > > > Correct. > > > >> A much better solution for this ... and one which would solve a *lot* of > >> other issues with vacuum and autovacuum ... would be to give vacuum a > >> way to track which blocks an incomplete vacuum had already visited. > >> This would be even more valuable for freeze. > > > > That's pretty much a different problem. Yes, some more persistent would > > be helpful - although it'd need to be *much* more than which pages it > > has visited - but you'd still be vulnerable to the same issue. > > If we're trying to solve the problem that vacuums of large, high-update > tables never complete, it's solving the same problem. Which isn't what I'm talking about. The problem is that vacuum is cancelled if a conflicting lock request is acquired. Plain updates don't do that. But there's workloads where you need more heavyweight updates, and then it can easily happen. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote: > To be clear: I don't think this API is very good for its stated purpose, for > implementing global sequences for use in a cluster. For the reasons I've > mentioned before. I'd like to see two changes to this proposal: > > 1. Make the AM implementation solely responsible for remembering the "last > value". (if it's a global or remote sequence, the current value might not be > stored in the local server at all) I think that reason isn't particularly good. The practical applicability for such a implementation doesn't seem to be particularly large. > 2. Instead of the single amdata field, make it possible for the > implementation to define any number of fields with any datatype in the > tuple. That would make debugging, monitoring etc. easier. My main problem with that approach is that it pretty much nails the door shut for moving sequences into a catalog table instead of the current, pretty insane, approach of a physical file per sequence. Currently, with our without seqam, it'd not be all that hard to force it into a catalog, taking care to to force each tuple onto a separate page... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On Tue, Dec 2, 2014 at 11:12 AM, Josh Berkus wrote: > On 12/02/2014 11:08 AM, Andres Freund wrote: > > On 2014-12-02 11:02:07 -0800, Josh Berkus wrote: > >> On 12/02/2014 10:35 AM, Alvaro Herrera wrote: > >>> If the table is large, the time window for this to happen is large > also; > >>> there might never be a time window large enough between two lock > >>> acquisitions for one autovacuum run to complete in a table. This > >>> starves the table from vacuuming completely, until things are bad > enough > >>> that an emergency vacuum is forced. By then, the bloat is disastrous. > >>> > >>> I think it's that suicide that Andres wants to disable. > > > > Correct. > > > >> A much better solution for this ... and one which would solve a *lot* of > >> other issues with vacuum and autovacuum ... would be to give vacuum a > >> way to track which blocks an incomplete vacuum had already visited. > >> This would be even more valuable for freeze. > > > > That's pretty much a different problem. Yes, some more persistent would > > be helpful - although it'd need to be *much* more than which pages it > > has visited - but you'd still be vulnerable to the same issue. > > If we're trying to solve the problem that vacuums of large, high-update > tables never complete, it's solving the same problem. And in a much > better way. > > And yeah, doing a vacuum placeholder wouldn't be simple, but it's the > only solution I can think of that's worthwhile. Just disabling the > vacuum releases sharelock behavior puts the user in the situation of > deciding between maintenance and uptime. > I think it would be more promising to work on downgrading lock strengths so that fewer things conflict, and it would be not much more work than what you propose. What operations are people doing on a regular basis that take locks which cancel vacuum? create index? Cheers, Jeff
Re: [HACKERS] Getting references for OID
> Hello, > > How can I get all depend objects for oid of object? For example, I have oid > of db db_id and I want to get all oids of namespaces, which contains this db. > Thank you! Hello, Dmitry. Actually, this list is for developers (as described here [0]). You should ask this question on pgsql-general@, for example, or you can do it on pgsql-ru-general@, in Russian. And if I understood you right you can use oid2name tool which is part of contrib. More info about it can be found here [1]. [0] http://www.postgresql.org/list/ [1] http://www.postgresql.org/docs/current/static/oid2name.html > > -- > С уважением, Дмитрий Воронин > -- Vladimir
Re: [HACKERS] superuser() shortcuts
* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Dec 2, 2014 at 11:29 AM, Stephen Frost wrote: > > It includes the actual error message, unlike what we have today which is > > guidence as to what's required to get past the permission denied error. > > > > In other words, I disagree that the same amount of information is being > > conveyed. > > So, you think that someone might see the message "must be superuser or > replication role to use replication slots" and fail to understand that > they have a permissions problem? I think that the error message about a permissions problem should be that there is a permissions problem, first and foremost. We don't say 'You must have SELECT rights on relation X to select from it.' when you try to do a SELECT that you're not allowed to. We throw a permissions denied error. As pointed out up-thread, we do similar things for other cases of permissions denied and even beyond permission denied errors we tend to match up the errmsg() with the errcode(), which makes a great deal of sense to me and is why I'm advocating that position. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 2014-12-02 11:23:31 -0800, Jeff Janes wrote: > I think it would be more promising to work on downgrading lock strengths so > that fewer things conflict, and it would be not much more work than what > you propose. I think you *massively* underestimate the effort required to to lower lock levels. There's some very ugly corners you have to think about to do so. Just look at how long it took to implement the lock level reductions for ALTER TABLE - and those were the simpler cases. > What operations are people doing on a regular basis that take locks > which cancel vacuum? create index? Locking tables against modifications in this case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of GetUserId() Usage
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Sat, Nov 29, 2014 at 3:00 PM, Stephen Frost wrote: > > * Stephen Frost (sfr...@snowman.net) wrote: > > Updated patch attached which also changes the error messages (which > > hadn't been updated in the prior versions and really should be). > > > > Barring objections, I plan to move forward with this one and get this > > relatively minor change wrapped up. As mentioned in the commit, while > > this might be an arguably back-patchable change, the lack of field > > complaints and the fact that it changes existing behavior mean it should > > go only against master, imv. > > This patch does a couple of different things: > > 1. It makes more of the crappy error message change that Andres and I > already objected to on the other thread. Whether you disagree with > those objections or not, don't make an end-run around them by putting > more of the same stuff into patches on other threads. The error message clearly needed to be updated either way or I wouldn't have touched it. I changed it to match what I feel is the prevelant and certainly more commonly seen messaging from PG when it comes to permissions errors, and drew attention to it by commenting on the fact that I changed it. Doing otherwise would have drawn similar criticism (is it did upthread, by Peter or Alvaro, I believe..) that I wasn't updating it to match the messaging which we should be using. > 2. It changes the functions in pgstatfuncs.c so that you can see the > relevant information not only for your own role, but also for roles of > which you are a member. That seems fine, but do we need a > documentation change someplace? Yes, I've added the documentation changes to my branch, just hadn't posted an update yet (travelling today). > 3. It messes around with pg_signal_backend(). There are currently no > cases in which pg_signal_backend() throws an error, which is good, > because it lets you write queries against pg_stat_activity() that > don't fail halfway through, even if you are missing permissions on > some things. This patch introduces such a case, which is bad. Good point, I'll move that check up into the other functions, which will allow for a more descriptive error as well. > I think it's unfathomable that you would consider anything in this > patch a back-patchable bug fix. It's clearly a straight-up behavior > change... or more properly three different changes, only one of which > I agree with. I didn't think it was back-patchable and stated as much. I anticipated that argument and provided my thoughts on it. I *do* think it's wrong to be using GetUserId() in this case and it's only very slightly mollified by being documented that way. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
* Robert Haas (robertmh...@gmail.com) wrote: > On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane wrote: > > Bottom line, given all the restrictions on whether the optimization can > > happen, I have very little enthusiasm for the whole idea. I do not think > > the benefit will be big enough to justify the amount of mess this will > > introduce. > > This optimization applies to a tremendous number of real-world cases, > and we really need to have it. This was a huge problem for me in my > previous life as a web developer. The previous work that we did to > remove LEFT JOINs was an enormous help, but it's not enough; we need a > way to remove INNER JOINs as well. For my 2c, I'm completely with Robert on this one. There are a lot of cases this could help with, particularly things coming out of ORMs (which, yes, might possibly be better written, but that's a different issue). > I thought that David's original approach of doing this in the planner > was a good one. That fell down because of the possibility that > apparently-valid referential integrity constraints might not be valid > at execution time if the triggers were deferred. But frankly, that > seems like an awfully nitpicky thing for this to fall down on. Lots > of web applications are going to issue only SELECT statements that run > as as single-statement transactions, and so that issue, so troubling > in theory, will never occur in practice. That doesn't mean that we > don't need to account for it somehow to make the code safe, but any > argument that it abridges the use case significantly is, in my > opinion, not credible. Agreed with this also, deferred triggers are not common-place in my experience and when it *does* happen, ime at least, it's because you have a long-running data load or similar where you're not going to care one bit that large, complicated JOINs aren't as fast as they might have been otherwise. > Anyway, David was undeterred by the rejection of that initial approach > and rearranged everything, based on suggestions from Andres and later > Simon, into the form it's reached now. Kudos to him for his > persistance. But your point that we might have chosen a whole > different plan if it had known that this join was cheaper is a good > one. However, that takes us right back to square one, which is to do > this at plan time. I happen to think that's probably better anyway, > but I fear we're just going around in circles here. We can either do > it at plan time and find some way of handling the fact that there > might be deferred triggers that haven't fired yet; or we can do it at > execution time and live with the fact that we might have chosen a plan > that is not optimal, though still better than executing a > completely-unnecessary join. Right, we can't get it wrong in the face of deferred triggers either. Have we considered only doing the optimization for read-only transactions? I'm not thrilled with that, but at least we'd get out from under this deferred triggers concern. Another way might be an option to say "use the optimization, but throw an error if you run into a deferred trigger", or perhaps save both plans and use whichever one we can when we get to execution time? That could make planning time go up too much to work, but perhaps it's worth testing.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On Tue, Dec 2, 2014 at 11:41 AM, Andres Freund wrote: > On 2014-12-02 11:23:31 -0800, Jeff Janes wrote: > > I think it would be more promising to work on downgrading lock strengths > so > > that fewer things conflict, and it would be not much more work than what > > you propose. > > I think you *massively* underestimate the effort required to to lower > lock levels. There's some very ugly corners you have to think about to > do so. Just look at how long it took to implement the lock level > reductions for ALTER TABLE - and those were the simpler cases. > Or maybe I overestimate how hard it would be to make vacuum restartable. You would have to save a massive amount of state (upto maintenance_work_mem tid list, the block you left off on both the table and all of the indexes in that table), and you would somehow have to validate that saved state against any changes that might have occurred to the table or the indexes while it was saved and you were not holding the lock, which seems like it would almost as full of corner cases as weakening the lock in the first place. Aren't they logically the same thing? If we could drop the lock and take it up again later, maybe the answer is not to save the state, but just to pause the vacuum until the lock becomes free again, in effect saving the state in situ. That would allow autovac worker to be held hostage to anyone taking a lock, though. The only easy way to do it that I see is to have it only stop at the end of a index-cleaning cycle, which probably takes too long to block for. Or record a restart point at the end of each index-cleaning cycle, and then when it yields the lock it abandons all work since the last cycle end, rather than since the beginning. That would be better than what we have, but seems like a far cry from actual restarting from any point. > > > What operations are people doing on a regular basis that take locks > > which cancel vacuum? create index? > > Locking tables against modifications in this case. > So in "share mode", then? I don't think there is any reason that there can't be a lock mode that conflicts with "ROW EXCLUSIVE" but not "SHARE UPDATE EXCLUSIVE". Basically something that conflicts with logical changes, but not with physical changes. Cheers, Jeff
Re: [HACKERS] Review of GetUserId() Usage
* Stephen Frost (sfr...@snowman.net) wrote: > > 3. It messes around with pg_signal_backend(). There are currently no > > cases in which pg_signal_backend() throws an error, which is good, > > because it lets you write queries against pg_stat_activity() that > > don't fail halfway through, even if you are missing permissions on > > some things. This patch introduces such a case, which is bad. > > Good point, I'll move that check up into the other functions, which will > allow for a more descriptive error as well. Err, I'm missing something here, as pg_signal_backend() is a misc.c static internal function? How would you be calling it from a query against pg_stat_activity()? I'm fine making the change anyway, just curious.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Review of GetUserId() Usage
On Tue, Dec 2, 2014 at 2:50 PM, Stephen Frost wrote: >> 1. It makes more of the crappy error message change that Andres and I >> already objected to on the other thread. Whether you disagree with >> those objections or not, don't make an end-run around them by putting >> more of the same stuff into patches on other threads. > > The error message clearly needed to be updated either way or I wouldn't > have touched it. I changed it to match what I feel is the prevelant and > certainly more commonly seen messaging from PG when it comes to > permissions errors, and drew attention to it by commenting on the fact > that I changed it. Doing otherwise would have drawn similar criticism > (is it did upthread, by Peter or Alvaro, I believe..) that I wasn't > updating it to match the messaging which we should be using. OK, I guess that's a fair point. >> I think it's unfathomable that you would consider anything in this >> patch a back-patchable bug fix. It's clearly a straight-up behavior >> change... or more properly three different changes, only one of which >> I agree with. > > I didn't think it was back-patchable and stated as much. I anticipated > that argument and provided my thoughts on it. I *do* think it's wrong > to be using GetUserId() in this case and it's only very slightly > mollified by being documented that way. It's not wrong. It's just different than what you happen to prefer. It's fine to want to change things, but "not the way I would have done it" is not the same as "arguably a bug". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 2014-12-02 12:22:42 -0800, Jeff Janes wrote: > Or maybe I overestimate how hard it would be to make vacuum > restartable. That's a massive project. Which is why I'm explicitly *not* suggesting that. What I instead suggest is a separate threshhold after which vacuum isn't going to abort automaticlaly after a lock conflict. So after that threshold just behave like anti wraparound vacuum already does. Maybe autovacuum_vacuum/analyze_force_threshold or similar. If set to zero, the default, that behaviour is disabled. If set to a positive value it's an absolute one, if negative it's a factor of the normal autovacuum_vacuum/analyze_threshold. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
On Tue, Dec 2, 2014 at 2:35 PM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Tue, Dec 2, 2014 at 11:29 AM, Stephen Frost wrote: >> > It includes the actual error message, unlike what we have today which is >> > guidence as to what's required to get past the permission denied error. >> > >> > In other words, I disagree that the same amount of information is being >> > conveyed. >> >> So, you think that someone might see the message "must be superuser or >> replication role to use replication slots" and fail to understand that >> they have a permissions problem? > > I think that the error message about a permissions problem should be > that there is a permissions problem, first and foremost. I can't disagree with that, but it's not like the current message is: ERROR: I ran into some kind of a problem but I'm not going to tell you what it is for a long time OK I guess that's enough well actually you see there is a problem and it happens to do have to do with permissions and in particular that they are denied. It's pretty clear that the current message is complaining about a permissions problem, so the fact that it doesn't specifically include the words "permission" and "denied" doesn't bother me. Let me ask the question again: what information do you think is conveyed by your proposed rewrite that is not conveyed by the current message? > We don't say 'You must have SELECT rights on relation X to select from > it.' when you try to do a SELECT that you're not allowed to. We throw a > permissions denied error. As pointed out up-thread, we do similar > things for other cases of permissions denied and even beyond permission > denied errors we tend to match up the errmsg() with the errcode(), which > makes a great deal of sense to me and is why I'm advocating that > position. I don't thing that trying to match up the errmsg() with the errcode() is a useful exercise. That's just restricting the ways that people can write error messages in a way that is otherwise unnecessary. The errmsg() and errcode() have different purposes; the former is to provide a concise, human-readable description of the problem, and the latter is to group errors into categories so that related errors can be handled programmatically. They need not be, and in many existing cases are not, even vaguely similar; and the fact that many of our permission denied errors happen to use that phrasing is not a sufficient justification for changing the rest unless the new phrasing is a bona fide improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote: > I'd be in favor of a solution that works the same way as before the > patch, without the need for extra trigger files, etc., but that doesn't > seem to be nearly possible. Whatever tricks we might employ will likely > be defeated by the fact that the oldschool user will fail to *include* > recovery.conf in the main conf file. I think removing the ability to define a trigger file is pretty much unacceptable. It's not *too* bad to rewrite recovery.conf's contents into actual valid postgresql.conf format, but it's harder to change an existing complex failover setup that relies on the existance of such a trigger. I think aiming for removal of that is a sure way to prevent the patch from getting in. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Nov 25, 2014 at 1:38 PM, Peter Geoghegan wrote: > On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas wrote: >> - This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE. > > Actually, the word "only" is removed (because PG_CACHE_LINE_SIZE has a > new client now). So it isn't quite the same paragraph as before. Oy, I missed that. >> - I really don't think we need a #define in pg_config_manual.h for >> this. Please omit that. > > You'd prefer to not offer a way to disable abbreviation? Okay. I guess > that makes sense - it should work well as a general optimization. I'd prefer not to have a #define in pg_config_manual.h. Only stuff that we expect a reasonably decent number of users to need to change should be in that file, and this is too marginal for that. If anybody other than the developers of the feature is disabling this, the whole thing is going to be ripped back out. > I'm not sure about that. I'd prefer to have tuplesort (and one or two > other sites) set the "abbreviation is possible in principle" flag. > Otherwise, sortsupport needs to assume that the leading attribute is > going to be the abbreviation-applicable one, which might not always be > true. Still, it's not as if I feel strongly about it. When wouldn't that be true? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
> I spent a fair chunk of the weekend hacking on this patch to make > it more understandable and fix up a lot of what seemed to me pretty > clear arithmetic errors in the "upper layers" of the patch. However, > I couldn't quite convince myself to commit it, because the business > around estimation for partial histogram-bucket matches still doesn't > make any sense to me. Specifically this: > > /* Partial bucket match. */ > left_divider = inet_hist_match_divider(left, query, opr_codenum); > right_divider = inet_hist_match_divider(right, query, > opr_codenum); > > if (left_divider >= 0 || right_divider >= 0) > match += 1.0 / pow(2.0, Max(left_divider, right_divider)); > > Now unless I'm missing something pretty basic about the divider > function, it returns larger numbers for inputs that are "further away" > from each other (ie, have more not-in-common significant address bits). > So the above calculation seems exactly backwards to me: if one endpoint > of a bucket is "close" to the query, or even an exact match, and the > other endpoint is further away, we completely ignore the close/exact > match and assign a bucket match fraction based only on the further-away > endpoint. Isn't that exactly backwards? You are right that partial bucket match calculation isn't fair on some circumstances. > I experimented with logic like this: > > if (left_divider >= 0 && right_divider >= 0) > match += 1.0 / pow(2.0, Min(left_divider, right_divider)); > else if (left_divider >= 0 || right_divider >= 0) > match += 1.0 / pow(2.0, Max(left_divider, right_divider)); > > ie, consider the closer endpoint if both are valid. But that didn't seem > to work a whole lot better. I think really we need to consider both > endpoints not just one to the exclusion of the other. I have tried many combinations like this. Including arithmetic, geometric, logarithmic mean functions. I could not get good results with any of them, so I left it in a basic form. Max() works pretty well most of the time, because if the query matches the bucket generally it is close to both of the endpoints. By using Max(), we are actually crediting the ones which are close to the both of the endpoints. > I'm also not exactly convinced by the divider function itself, > specifically about the decision to fail and return -1 if the masklen > comparison comes out wrong. This effectively causes the masklen to be > the most significant part of the value (after the IP family), which seems > totally wrong. ISTM we ought to consider the number of leading bits in > common as the primary indicator of "how far apart" a query and a > histogram endpoint are. The partial match calculation with Max() is especially unfair on the buckets where more significant bits change. For example 63/8 and 64/8. Returning -1 instead of a high divider, forces it to use the divider for the other endpoint. We consider the number of leading bits in common as the primary indicator, just for the other endpoint. I have also experimented with the count of the common bits of the endpoints of the bucket for better partial match calculation. I could not find out a meaningful equation with it. > Even if the above aspects of the code are really completely right, the > comments fail to explain why. I spent a lot of time on the comments, > but so far as these points are concerned they still only explain what > is being done and not why it's a useful calculation to make. I couldn't write better comments because I don't have strong arguments about it. We can say that we don't try to make use of the both of the endpoints, because we don't know how to combine them. We only use the one with matching family and masklen, and when both of them match we use the distant one to be on the safer side. Thank you for looking at it. Comments look much better now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Tue, Nov 25, 2014 at 7:13 PM, Peter Geoghegan wrote: > Alright, so let me summarize what I think are the next steps in > working towards getting this patch committed. I should produce a new > revision which: > > * Uses default costings. > > * Applies a generic final quality check that enforces a distance of no > greater than 50% of the total string size. (The use of default > costings removes any reason to continue to do this) > > * Work through Robert's suggestions on other aspects that need work > [1], most of which I already agreed to. Sounds good so far. > What is unclear is whether or not I should continue to charge extra > for non-matching user supplied alias (and, I think more broadly, > consider multiple RTEs iff the user did use an alias) - Robert was > skeptical, but didn't seem to have made his mind up. I still think I > should cost things based on aliases, and consider multiple RTEs even > when the user supplied an alias (the penalty should just be a distance > of 1 and not 3, though, in light of other changes to the > weighing/costing). If I don't hear anything in the next day or two, > I'll more or less preserve aliases-related aspects of the patch. Basically, the case in which I think it's helpful to issue a suggestion here is when the user has used the table name rather than the alias name. I wonder if it's worth checking for that case specifically, in lieu of what you've done here, and issuing a totally different hint in that case ("HINT: You must refer to this as column as "prime_minister.id" rather than "cameron.id"). Another idea, which I think I like less well, is to check the Levenshtein distance between the allowed alias and the entered alias and, if that's within the half-the-shorter-length threshold, consider possible matches from that RTE, charge the distance between the correct alias and the entered alias as a penalty to each potential column match. What I think won't do is to look at a situation where the user has entered automobile.id and suggest that maybe they meant student.iq, or even student.id. The amount of difference between the names has got to matter for the RTE names, just as it does for the column names. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Dec 2, 2014 at 2:35 PM, Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > >> On Tue, Dec 2, 2014 at 11:29 AM, Stephen Frost wrote: > >> > It includes the actual error message, unlike what we have today which is > >> > guidence as to what's required to get past the permission denied error. > >> > > >> > In other words, I disagree that the same amount of information is being > >> > conveyed. > >> > >> So, you think that someone might see the message "must be superuser or > >> replication role to use replication slots" and fail to understand that > >> they have a permissions problem? > > > > I think that the error message about a permissions problem should be > > that there is a permissions problem, first and foremost. > > I can't disagree with that, but it's not like the current message is: > > ERROR: I ran into some kind of a problem but I'm not going to tell you > what it is for a long time OK I guess that's enough well actually you > see there is a problem and it happens to do have to do with > permissions and in particular that they are denied. I agree that doing this would be rather overkill. > It's pretty clear that the current message is complaining about a > permissions problem, so the fact that it doesn't specifically include > the words "permission" and "denied" doesn't bother me. Let me ask the > question again: what information do you think is conveyed by your > proposed rewrite that is not conveyed by the current message? I do like that it's explicitly stating that the problem is with permissions and not because of some lack-of-capability in the software and I like the consistency of messaging (to the point where I was suggesting somewhere along the way that we update the error messaging documentation to be explicit that the errmsg and the errcode should make sense and match up- NOT that we should use some simple mapping from one to the other as we'd then be reducing the information provided, but I've seen a number of cases where people have the SQL error-code also returned in their psql session; I've never been a fan of that as I much prefer words over error-codes, but I wonder if they've done that because, in some cases, it *isn't* clear what the errcode is from the errmsg..). > > We don't say 'You must have SELECT rights on relation X to select from > > it.' when you try to do a SELECT that you're not allowed to. We throw a > > permissions denied error. As pointed out up-thread, we do similar > > things for other cases of permissions denied and even beyond permission > > denied errors we tend to match up the errmsg() with the errcode(), which > > makes a great deal of sense to me and is why I'm advocating that > > position. > > I don't thing that trying to match up the errmsg() with the errcode() > is a useful exercise. That's just restricting the ways that people > can write error messages in a way that is otherwise unnecessary. The > errmsg() and errcode() have different purposes; the former is to > provide a concise, human-readable description of the problem, and the > latter is to group errors into categories so that related errors can > be handled programmatically. They need not be, and in many existing > cases are not, even vaguely similar; and the fact that many of our > permission denied errors happen to use that phrasing is not a > sufficient justification for changing the rest unless the new phrasing > is a bona fide improvement. It provides a consistency of messaging that I feel is an improvement. That they aren't related in a number of existing cases strikes me as an opportunity to improve rather than cases where we really "know better". Perhaps in some of those cases the problem is that there isn't a good errcode, and maybe we should actually add an error code instead of changing the message. Perhaps there are cases where we just know better or it's a very generic errcode without any real meaning. I agree that we want the messaging to be good, I'd like it to also be consistent with itself and with the errcode. I've not gone and tried to do a deep look at the errmsg vs errcode, and that's likely too much to bite off at once anyway, but I have gone and looked at the role attribute uses and the other permissions denied errcode cases and we're not at all consistent today and we change from release-to-release depending on changes to the permissions system (see: TRUNCATE). I specifically don't like that. If we want to say that role-attribute-related error messages should be "You need X to do Y", then I'll go make those changes, removing the 'permission denied' where those are used and be happier that we're at least consistent, but it'll be annoying if we ever make those role-attributes be GRANT'able rights (GRANT .. ON CLUSTER?) as those errmsg's will then change from "you need X" to "permission denied to do Y". Having the errdetail change bothers me a lot less. How would you feel about
Re: [HACKERS] Selectivity estimation for inet operators
> Actually, there's a second large problem with this patch: blindly > iterating through all combinations of MCV and histogram entries makes the > runtime O(N^2) in the statistics target. I made up some test data (by > scanning my mail logs) and observed the following planning times, as > reported by EXPLAIN ANALYZE: > > explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip; > explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip; > > stats targeteqjoinsel networkjoinsel > > 100 0.27 ms 1.85 ms > 10002.56 ms 167.2 ms > 1 56.6 ms 13987.1 ms > > I don't think it's necessary for network selectivity to be quite as > fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning > time for a query that might need just milliseconds to execute :-( > > It seemed to me that it might be possible to reduce the runtime by > exploiting knowledge about the ordering of the histograms, but > I don't have time to pursue that right now. I make it break the loop when we passed the last possible match. Patch attached. It reduces the runtime almost 50% with large histograms. We can also make it use only some elements of the MCV and histogram for join estimation. I have experimented with reducing the right and the left hand side of the lists on the previous versions. I remember it was better to reduce only the left hand side. I think it would be enough to use log(n) elements of the right hand side MCV and histogram. I can make the change, if you think selectivity estimation function is the right place for this optimization. diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index f854847..16f39db 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue, return 0.0; query = DatumGetInetPP(constvalue); /* "left" is the left boundary value of the current bucket ... */ left = DatumGetInetPP(values[0]); left_order = inet_inclusion_cmp(left, query, opr_codenum); for (i = 1; i < nvalues; i++) { + if (left_order == 256) + break; + /* ... and "right" is the right boundary value */ right = DatumGetInetPP(values[i]); right_order = inet_inclusion_cmp(right, query, opr_codenum); if (left_order == 0 && right_order == 0) { /* The whole bucket matches, since both endpoints do. */ match += 1.0; } else if ((left_order <= 0 && right_order >= 0) || @@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator) static int inet_inclusion_cmp(inet *left, inet *right, int opr_codenum) { if (ip_family(left) == ip_family(right)) { int order; order = bitncmp(ip_addr(left), ip_addr(right), Min(ip_bits(left), ip_bits(right))); + if (order > 0) + return 256; + if (order != 0) return order; return inet_masklen_inclusion_cmp(left, right, opr_codenum); } return ip_family(left) - ip_family(right); } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 1:00 PM, Robert Haas wrote: > I'd prefer not to have a #define in pg_config_manual.h. Only stuff > that we expect a reasonably decent number of users to need to change > should be in that file, and this is too marginal for that. If anybody > other than the developers of the feature is disabling this, the whole > thing is going to be ripped back out. I agree. The patch either works well in general and it goes in, or it doesn't and should be rejected. That doesn't mean that the standard applied is that regressions are absolutely unacceptable, but the standard shouldn't be too far off that. I feel pretty confident that we'll be able to meet that standard, though, because database luminary Jim Gray recommended this technique in about 1995. Even if the details of what I have here could stand to be tweaked, there is no getting around the fact that a good sort routine needs to strongly consider locality. That was apparent even in 1995, but now it's a very major trend. Incidentally, I think that an under-appreciated possible source of regressions here is that attributes abbreviated have a strong physical/logical correlation. I could see a small regression for one such case even though my cost model indicated that it should be very profitable. On the other hand, on other occasions my cost model (i.e. considering how good a proxy for full key cardinality abbreviated key cardinality is) was quite pessimistic. Although, at least it was only a small regression, even though the correlation was something like 0.95. And at least the sort will be very fast in any case. You'll recall that Heikki's test case involved correlation like that, even though it was mostly intended to make a point about the entropy in abbreviated keys. Correlation was actually the most important factor there. I think it might be generally true that it's the most important factor, in practice more important even than capturing sufficient entropy in the abbreviated key representation. >> I'm not sure about that. I'd prefer to have tuplesort (and one or two >> other sites) set the "abbreviation is possible in principle" flag. >> Otherwise, sortsupport needs to assume that the leading attribute is >> going to be the abbreviation-applicable one, which might not always be >> true. Still, it's not as if I feel strongly about it. > > When wouldn't that be true? It just feels a bit wrong to me. There might be a future in which we want to use the datum1 field for a non-leading attribute. For example, when it is known ahead of time that there are low cardinality integers in the leading key/attribute. Granted, that's pretty speculative, but then it's not as if I'm insisting that it must be done that way. I defer to you. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote: > Likely for most aggregates, like count, sum, max, min, bit_and and bit_or the > merge function would be the same as the transition function, as the state type > is just the same as the input type. It would only be aggregates like avg(), > stddev*(), bool_and() and bool_or() that would need a new merge function > made... These would be no more complex than the transition functions... Which > are just a few lines of code anyway. > > We'd simply just not run parallel query if any aggregates used in the query > didn't have a merge function. > > When I mentioned this, I didn't mean to appear to be placing a road block.I > was > just bringing to the table the information that COUNT(*) + COUNT(*) works ok > for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) does not. Sorry, late reply, but, FYI, I don't think our percentile functions can't be parallelized in the same way: test=> \daS *percent* List of aggregate functions Schema | Name | Result data type | Argument data types | Description +-++--+- pg_catalog | percent_rank| double precision | VARIADIC "any" ORDER BY VARIADIC "any" | fractional rank of hypothetical row pg_catalog | percentile_cont | double precision | double precision ORDER BY double precision | continuous distribution percentile pg_catalog | percentile_cont | double precision[] | double precision[] ORDER BY double precision | multiple continuous percentiles pg_catalog | percentile_cont | interval | double precision ORDER BY interval | continuous distribution percentile pg_catalog | percentile_cont | interval[] | double precision[] ORDER BY interval | multiple continuous percentiles pg_catalog | percentile_disc | anyelement | double precision ORDER BY anyelement | discrete percentile pg_catalog | percentile_disc | anyarray | double precision[] ORDER BY anyelement | multiple discrete percentiles -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 recovery/startup problems
On Tue, Dec 2, 2014 at 11:54 AM, Jeff Janes wrote: > During abort processing after getting a SIGTERM, the back end truncates > 59288 to zero size, and unlinks all the other files (including 59288_init). > The actual removal of 59288 is left until the checkpoint. So if you SIGTERM > the backend, then take down the server uncleanly before the next checkpoint > completes, you are left with just 59288. > > Here is the strace: > > open("base/16416/59288", O_RDWR)= 8 > ftruncate(8, 0) = 0 > close(8)= 0 > unlink("base/16416/59288.1")= -1 ENOENT (No such file or > directory) > unlink("base/16416/59288_fsm") = -1 ENOENT (No such file or > directory) > unlink("base/16416/59288_vm") = -1 ENOENT (No such file or > directory) > unlink("base/16416/59288_init") = 0 > unlink("base/16416/59288_init.1") = -1 ENOENT (No such file or > directory) Hmm, that's not good. I guess we can either adopt your suggestion of adjusting ResetUnloggedRelationsInDbspaceDir() to cope with the possibility that the situation has changed during recovery, or else figure out how to be more stringent about the order in which forks get removed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 4:21 PM, Peter Geoghegan wrote: >>> I'm not sure about that. I'd prefer to have tuplesort (and one or two >>> other sites) set the "abbreviation is possible in principle" flag. >>> Otherwise, sortsupport needs to assume that the leading attribute is >>> going to be the abbreviation-applicable one, which might not always be >>> true. Still, it's not as if I feel strongly about it. >> >> When wouldn't that be true? > > It just feels a bit wrong to me. There might be a future in which we > want to use the datum1 field for a non-leading attribute. For example, > when it is known ahead of time that there are low cardinality integers > in the leading key/attribute. Granted, that's pretty speculative, but > then it's not as if I'm insisting that it must be done that way. I > defer to you. Well, maybe you should make the updates we've agreed on and I can take another look at it. But I didn't think that I was proposing to change anything about the level at which the decision about whether to abbreviate or not was made; rather, I thought I was suggesting that we pass that flag down to the code that initializes the sortsupport object as an argument rather than through the sortsupport structure itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Andres Freund writes: > On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote: >> I'd be in favor of a solution that works the same way as before the >> patch, without the need for extra trigger files, etc., but that doesn't >> seem to be nearly possible. Whatever tricks we might employ will likely >> be defeated by the fact that the oldschool user will fail to *include* >> recovery.conf in the main conf file. > > I think removing the ability to define a trigger file is pretty much > unacceptable. It's not *too* bad to rewrite recovery.conf's contents > into actual valid postgresql.conf format, but it's harder to change an > existing complex failover setup that relies on the existance of such a > trigger. I think aiming for removal of that is a sure way to prevent > the patch from getting in. To make it clear, I was talking not about trigger_file, but about standby.enabled file which triggers the recovery/pitr/standby scenario in the current form of the patch and stands as a replacement check instead of the original check for the presense of recovery.conf. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote: > Well, maybe you should make the updates we've agreed on and I can take > another look at it. Agreed. > But I didn't think that I was proposing to change > anything about the level at which the decision about whether to > abbreviate or not was made; rather, I thought I was suggesting that we > pass that flag down to the code that initializes the sortsupport > object as an argument rather than through the sortsupport structure > itself. The flag I'm talking about concerns the *applicability* of abbreviation, and not whether or not it will actually be used (maybe the opclass lacks support, or decides not to for some platform specific reason). Tuplesort has a contract with abbreviation + sortsupport that considers whether or not the function pointer used to abbreviate is set, which relates to whether or not abbreviation will *actually* be used. Note that for non-abbreviation-applicable attributes, btsortsupport_worker() never sets the function pointer (nor, incidentally, does it set the other abbreviation related function pointers in the struct). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 5:16 PM, Peter Geoghegan wrote: > On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote: >> Well, maybe you should make the updates we've agreed on and I can take >> another look at it. > > Agreed. > >> But I didn't think that I was proposing to change >> anything about the level at which the decision about whether to >> abbreviate or not was made; rather, I thought I was suggesting that we >> pass that flag down to the code that initializes the sortsupport >> object as an argument rather than through the sortsupport structure >> itself. > > The flag I'm talking about concerns the *applicability* of > abbreviation, and not whether or not it will actually be used (maybe > the opclass lacks support, or decides not to for some platform > specific reason). Tuplesort has a contract with abbreviation + > sortsupport that considers whether or not the function pointer used to > abbreviate is set, which relates to whether or not abbreviation will > *actually* be used. Note that for non-abbreviation-applicable > attributes, btsortsupport_worker() never sets the function pointer > (nor, incidentally, does it set the other abbreviation related > function pointers in the struct). Right, and what I'm saying is that maybe the "applicability" flag shouldn't be stored in the SortSupport object, but passed down as an argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas wrote: > Right, and what I'm saying is that maybe the "applicability" flag > shouldn't be stored in the SortSupport object, but passed down as an > argument. But then how does that information get to any given sortsupport routine? That's the place that really needs to know if abbreviation is useful. In general, they're only passed a SortSupport object. Are you suggesting revising the signature required of SortSupport routines to add that extra flag as an additional argument? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
Peter Geoghegan writes: > On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas wrote: >> Right, and what I'm saying is that maybe the "applicability" flag >> shouldn't be stored in the SortSupport object, but passed down as an >> argument. > But then how does that information get to any given sortsupport > routine? That's the place that really needs to know if abbreviation is > useful. In general, they're only passed a SortSupport object. Are you > suggesting revising the signature required of SortSupport routines to > add that extra flag as an additional argument? I think that is what he's suggesting, and I too am wondering why it's a good idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Nitpicky doc corrections for BRIN functions of pageinspect
On Wed, Dec 3, 2014 at 12:23 AM, Alvaro Herrera wrote: > Peter Geoghegan wrote: >> On Wed, Nov 19, 2014 at 8:02 PM, Michael Paquier >> wrote: >> > Just a small thing I noticed while looking at pageinspect.sgml, the >> > set of SQL examples related to BRIN indexes uses lower-case characters >> > for reserved keywords. This has been introduced by 7516f52. >> > Patch is attached. >> >> My nitpicky observation about pageinspect + BRIN was that the comments >> added to the pageinspect SQL file for the SQL-callable function >> brin_revmap_data() have a comment header style slightly inconsistent >> with the existing items. > > Pushed. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 3, 2014 at 2:17 AM, Robert Haas wrote: > On Wed, Nov 26, 2014 at 11:00 PM, Michael Paquier > wrote: >> On Wed, Nov 26, 2014 at 8:27 PM, Syed, Rahila >> wrote: >>> Don't we need to initialize doPageCompression similar to doPageWrites in >>> InitXLOGAccess? >> Yep, you're right. I missed this code path. >> >>> Also , in the earlier patches compression was set 'on' even when fpw GUC is >>> 'off'. This was to facilitate compression of FPW which are forcibly written >>> even when fpw GUC is turned off. >>> doPageCompression in this patch is set to true only if value of fpw GUC is >>> 'compress'. I think its better to compress forcibly written full page >>> writes. >> Meh? (stealing a famous quote). >> This is backward-incompatible in the fact that forcibly-written FPWs >> would be compressed all the time, even if FPW is set to off. The >> documentation of the previous patches also mentioned that images are >> compressed only if this parameter value is switched to compress. > > If we have a separate GUC to determine whether to do compression of > full page writes, then it seems like that parameter ought to apply > regardless of WHY we are doing full page writes, which might be either > that full_pages_writes=on in general, or that we've temporarily turned > them on for the duration of a full backup. In the latest versions of the patch, control of compression is done within full_page_writes by assigning a new value 'compress'. Something that I am scared of is that if we enforce compression when full_page_writes is off for forcibly-written pages and if a bug shows up in the compression/decompression algorithm at some point (that's unlikely to happen as this has been used for years with toast but let's say "if"), we may corrupt a lot of backups. Hence why not simply having a new GUC parameter to fully control it. First versions of the patch did that but ISTM that it is better than enforcing the use of a new feature for our user base. Now, something that has not been mentioned on this thread is to make compression the default behavior in all cases so as we won't even have to use a GUC parameter. We are usually conservative about changing default behaviors so I don't really think that's the way to go. Just mentioning the possibility. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a option to disable autovacuum cancellation on lock conflict?
On 12/2/14, 2:22 PM, Jeff Janes wrote: Or maybe I overestimate how hard it would be to make vacuum restartable. You would have to save a massive amount of state (upto maintenance_work_mem tid list, the block you left off on both the table and all of the indexes in that table), and you would somehow have to validate that saved state against any changes that might have occurred to the table or the indexes while it was saved and you were not holding the lock, which seems like it would almost as full of corner cases as weakening the lock in the first place. Aren't they logically the same thing? If we could drop the lock and take it up again later, maybe the answer is not to save the state, but just to pause the vacuum until the lock becomes free again, in effect saving the state in situ. That would allow autovac worker to be held hostage to anyone taking a lock, though. Yeah, rather than messing with any of that, I think it would make a lot more sense to split vacuum into smaller operations that don't require such a huge chunk of time. The only easy way to do it that I see is to have it only stop at the end of a index-cleaning cycle, which probably takes too long to block for. Or record a restart point at the end of each index-cleaning cycle, and then when it yields the lock it abandons all work since the last cycle end, rather than since the beginning. That would be better than what we have, but seems like a far cry from actual restarting from any point. Now that's not a bad idea. This would basically mean just saving a block number in pg_class after every intermediate index clean and then setting that back to zero when we're done with that relation, right? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
> -Original Message- > From: Simon Riggs [mailto:si...@2ndquadrant.com] > Sent: Thursday, November 27, 2014 8:48 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru > Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 27 November 2014 at 10:33, Kouhei Kaigai wrote: > > > The reason why documentation portion was not yet committed is, sorry, > > it is due to quality of documentation from the standpoint of native > > English speaker. > > Now, I'm writing up a documentation stuff according to the latest code > > base, please wait for several days and help to improve. > > Happy to help with that. > > Please post to the Wiki first so we can edit it communally. > Simon, I tried to describe how custom-scan provider interact with the core backend, and expectations to individual callbacks here. https://wiki.postgresql.org/wiki/CustomScanInterface I'd like to see which kind of description should be added, from third person's viewpoint. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Many processes blocked at ProcArrayLock
On Tue, Dec 2, 2014 at 5:07 PM, Xiaoyulei wrote: > Test configuration: > Hardware: > 4P intel server, 60 core 120 hard thread. > Memory:512G > SSD:2.4T > > PG: > max_connections = 160 # (change requires restart) > shared_buffers = 32GB > work_mem = 128MB > maintenance_work_mem = 32MB > bgwriter_delay = 100ms # 10-1ms between rounds > bgwriter_lru_maxpages = 200 # 0-1000 max buffers written/round > bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers > scanned/round > wal_level = minimal # minimal, archive, or hot_standby > wal_buffers = 256MB # min 32kB, -1 sets based on > shared_buffers > autovacuum = off > checkpoint_timeout=60min > checkpoint_segments = 1000 > archive_mode = off > synchronous_commit = off > fsync = off > full_page_writes = off > > > We use tpcc and pgbench to test postgresql 9.4beat2 performance. And we found > the tps/tpmc could not increase with the terminal increase. The detail > information is in attachment. > > Many processes is blocked, I dump the call stack, and found these processes > is blocked at: ProcArrayLock. 60% processes is blocked in > ProcArrayEndTransaction with ProcArrayLock EXCLUSIVE, 20% is in > GetSnapshotData with ProcArrayLock SHARED. Others locks like XLogFlush and > WALInsertLock are not very heavy. > > Is there any way we solve this problem? Providing complete backtraces showing in which code paths those processes are blocked would help better in understand what may be going on. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan wrote: > On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote: >> Well, maybe you should make the updates we've agreed on and I can take >> another look at it. > > Agreed. Attached, revised patchset makes these updates. I continue to use the sortsupport struct to convey that a given attribute of a given sort is abbreviation-applicable (although the field is now a bool, not an enum). -- Peter Geoghegan From 865a1abeb6bfb601b1ec605afb1e339c0e444e10 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sun, 9 Nov 2014 14:38:44 -0800 Subject: [PATCH 2/2] Estimate total number of rows to be sorted Sortsupport opclasses now accept a row hint, indicating the estimated number of rows to be sorted. This gives opclasses a sense of proportion about how far along the copying of tuples is when considering aborting abbreviation. Estimates come from various sources. The text opclass now always avoids aborting abbreviation if the total number of rows to be sorted is high enough, without considering cardinality at all. --- src/backend/access/nbtree/nbtree.c | 5 ++- src/backend/access/nbtree/nbtsort.c| 14 +- src/backend/commands/cluster.c | 4 +- src/backend/executor/nodeAgg.c | 5 ++- src/backend/executor/nodeSort.c| 1 + src/backend/utils/adt/orderedsetaggs.c | 2 +- src/backend/utils/adt/varlena.c| 80 -- src/backend/utils/sort/tuplesort.c | 14 -- src/include/access/nbtree.h| 2 +- src/include/utils/sortsupport.h| 7 ++- src/include/utils/tuplesort.h | 6 +-- 11 files changed, 121 insertions(+), 19 deletions(-) diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index d881525..d26c60b 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -109,14 +109,15 @@ btbuild(PG_FUNCTION_ARGS) elog(ERROR, "index \"%s\" already contains data", RelationGetRelationName(index)); - buildstate.spool = _bt_spoolinit(heap, index, indexInfo->ii_Unique, false); + buildstate.spool = _bt_spoolinit(heap, index, indexInfo->ii_Unique, + indexInfo->ii_Predicate != NIL, false); /* * If building a unique index, put dead tuples in a second spool to keep * them out of the uniqueness check. */ if (indexInfo->ii_Unique) - buildstate.spool2 = _bt_spoolinit(heap, index, false, true); + buildstate.spool2 = _bt_spoolinit(heap, index, false, true, true); /* do the heap scan */ reltuples = IndexBuildHeapScan(heap, index, indexInfo, true, diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index 593571b..473ac54 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -73,6 +73,7 @@ #include "storage/smgr.h" #include "tcop/tcopprot.h" #include "utils/rel.h" +#include "utils/selfuncs.h" #include "utils/sortsupport.h" #include "utils/tuplesort.h" @@ -149,10 +150,13 @@ static void _bt_load(BTWriteState *wstate, * create and initialize a spool structure */ BTSpool * -_bt_spoolinit(Relation heap, Relation index, bool isunique, bool isdead) +_bt_spoolinit(Relation heap, Relation index, bool isunique, bool ispartial, + bool isdead) { BTSpool*btspool = (BTSpool *) palloc0(sizeof(BTSpool)); int btKbytes; + double estRows; + float4 relTuples; btspool->heap = heap; btspool->index = index; @@ -165,10 +169,16 @@ _bt_spoolinit(Relation heap, Relation index, bool isunique, bool isdead) * unique index actually requires two BTSpool objects. We expect that the * second one (for dead tuples) won't get very full, so we give it only * work_mem. + * + * Certain cases will always have a relTuples of 0, such as reindexing as + * part of a CLUSTER operation, or when reindexing toast tables. This is + * interpreted as "no estimate available". */ btKbytes = isdead ? work_mem : maintenance_work_mem; + relTuples = RelationGetForm(heap)->reltuples; + estRows = relTuples * (isdead || ispartial ? DEFAULT_INEQ_SEL : 1); btspool->sortstate = tuplesort_begin_index_btree(heap, index, isunique, - btKbytes, false); + btKbytes, estRows, false); return btspool; } diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index bc5f33f..8e5f536 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -890,7 +890,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, /* Set up sorting if wanted */ if (use_sort) tuplesort = tuplesort_begin_cluster(oldTupDesc, OldIndex, - maintenance_work_mem, false); + maintenance_work_mem, + RelationGetForm(OldHeap)->reltuples, + false); else tuplesort = NULL; diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 89de755..95143c3 100644 --- a/src/backend/executor/nodeAg
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 5:28 PM, Peter Geoghegan wrote: > Attached, revised patchset makes these updates. Whoops. Missed some obsolete comments. Here is a third commit that makes a further small modification to one comment. -- Peter Geoghegan From 8d1aba80f95e05742047cba5bd83d8f17aa5ef37 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Tue, 2 Dec 2014 17:42:21 -0800 Subject: [PATCH 3/3] Alter comments to reflect current naming --- src/include/utils/sortsupport.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h index 659233b..f7c73b3 100644 --- a/src/include/utils/sortsupport.h +++ b/src/include/utils/sortsupport.h @@ -127,9 +127,9 @@ typedef struct SortSupportData * Returning zero from the alternative comparator does not indicate * equality, as with a conventional support routine 1, though -- it * indicates that it wasn't possible to determine how the two abbreviated - * values compared. A proper comparison, using "auth_comparator"/ - * ApplySortComparatorFull() is therefore required. In many cases this - * results in most or all comparisons only using the cheap alternative + * values compared. A proper comparison, using "abbrev_full_comparator"/ + * ApplySortAbbrevFullComparator() is therefore required. In many cases + * this results in most or all comparisons only using the cheap alternative * comparison func, which is typically implemented as code that compiles to * just a few CPU instructions. CPU cache miss penalties are expensive; to * get good overall performance, sort infrastructure must heavily weigh -- 1.9.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] About xmllint checking for the validity of postgres.xml in 9.5
Hi all, Since commit 5d93ce2d, the output of xmllint is checked by passing --valid to it. Isn't that a regression with what we were doing for pre-9.4 versions? For example, with 9.4 and older versions it is possible to compile man pages even if the xml spec is not entirely valid when using docbook 4.2. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About xmllint checking for the validity of postgres.xml in 9.5
On Wed, Dec 3, 2014 at 12:09 PM, Michael Paquier wrote: > Since commit 5d93ce2d, the output of xmllint is checked by passing > --valid to it. Isn't that a regression with what we were doing for > pre-9.4 versions? For example, with 9.4 and older versions it is > possible to compile man pages even if the xml spec is not entirely > valid when using docbook 4.2. Another thing coming to my mind is why don't we simply have a variable to pass flags to xmllint similarly to xsltproc? Packagers would be then free to pass the arguments they want. (Note that in some of the environments where I build the docs postgres.xml is found as invalid, making build fail for master only, not for older branches). In any case, attached is a patch showing the idea, bringing more flexibility in the build, default value being "--valid --noout" if the flag is not passed by the caller. Regards, -- Michael diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 8bdd26c..99ce106 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -48,6 +48,10 @@ ifndef XMLLINT XMLLINT = $(missing) xmllint endif +ifndef XMLLINTCFLAGS +XMLLINTCFLAGS = --valid --noout +endif + ifndef XSLTPROC XSLTPROC = $(missing) xsltproc endif @@ -82,7 +86,7 @@ override SPFLAGS += -wall -wno-unused-param -wno-empty -wfully-tagged man distprep-man: man-stamp man-stamp: stylesheet-man.xsl postgres.xml - $(XMLLINT) --noout --valid postgres.xml + $(XMLLINT) $(XMLLINTCFLAGS) postgres.xml $(XSLTPROC) $(XSLTPROCFLAGS) $(XSLTPROC_MAN_FLAGS) $^ touch $@ @@ -259,13 +263,13 @@ endif xslthtml: xslthtml-stamp xslthtml-stamp: stylesheet.xsl postgres.xml - $(XMLLINT) --noout --valid postgres.xml + $(XMLLINT) $(XMLLINTCFLAGS) postgres.xml $(XSLTPROC) $(XSLTPROCFLAGS) $(XSLTPROC_HTML_FLAGS) $^ cp $(srcdir)/stylesheet.css html/ touch $@ htmlhelp: stylesheet-hh.xsl postgres.xml - $(XMLLINT) --noout --valid postgres.xml + $(XMLLINT) $(XMLLINTCFLAGS) postgres.xml $(XSLTPROC) $(XSLTPROCFLAGS) $^ %-A4.fo.tmp: stylesheet-fo.xsl %.xml @@ -287,7 +291,7 @@ FOP = fop epub: postgres.epub postgres.epub: postgres.xml - $(XMLLINT) --noout --valid $< + $(XMLLINT) $(XMLLINTCFLAGS) $< $(DBTOEPUB) $< -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 2, 2014 at 7:16 PM, Michael Paquier wrote: > In the latest versions of the patch, control of compression is done > within full_page_writes by assigning a new value 'compress'. Something > that I am scared of is that if we enforce compression when > full_page_writes is off for forcibly-written pages and if a bug shows > up in the compression/decompression algorithm at some point (that's > unlikely to happen as this has been used for years with toast but > let's say "if"), we may corrupt a lot of backups. Hence why not simply > having a new GUC parameter to fully control it. First versions of the > patch did that but ISTM that it is better than enforcing the use of a > new feature for our user base. That's a very valid concern. But maybe it shows that full_page_writes=compress is not the Right Way To Do It, because then there's no way for the user to choose the behavior they want when full_page_writes=off but yet a backup is in progress. If we had a separate GUC, we could know the user's actual intention, instead of guessing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 3, 2014 at 12:35 PM, Robert Haas wrote: > On Tue, Dec 2, 2014 at 7:16 PM, Michael Paquier > wrote: >> In the latest versions of the patch, control of compression is done >> within full_page_writes by assigning a new value 'compress'. Something >> that I am scared of is that if we enforce compression when >> full_page_writes is off for forcibly-written pages and if a bug shows >> up in the compression/decompression algorithm at some point (that's >> unlikely to happen as this has been used for years with toast but >> let's say "if"), we may corrupt a lot of backups. Hence why not simply >> having a new GUC parameter to fully control it. First versions of the >> patch did that but ISTM that it is better than enforcing the use of a >> new feature for our user base. > > That's a very valid concern. But maybe it shows that > full_page_writes=compress is not the Right Way To Do It, because then > there's no way for the user to choose the behavior they want when > full_page_writes=off but yet a backup is in progress. If we had a > separate GUC, we could know the user's actual intention, instead of > guessing. Note that implementing a separate parameter for this patch would not be much complicated if the core portion does not change much. What about the long name full_page_compression or the longer name full_page_writes_compression? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Hi Robert, From: Robert Haas [mailto:robertmh...@gmail.com] > > * Catalog schema: > > > > CREATE TABLE pg_catalog.pg_partitioned_rel > > ( > >partrelidoidNOT NULL, > >partkindoidNOT NULL, > >partissub bool NOT NULL, > >partkey int2vector NOT NULL, -- partitioning attributes > >partopclass oidvector, > > > >PRIMARY KEY (partrelid, partissub), > >FOREIGN KEY (partrelid) REFERENCES pg_class (oid), > >FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid) > > ) > > WITHOUT OIDS ; > > So, we're going to support exactly two levels of partitioning? > partitions with partissub=false and subpartitions with partissub=true? > Why not support only one level of partitioning here but then let the > children have their own pg_partitioned_rel entries if they are > subpartitioned? That seems like a cleaner design and lets us support > an arbitrary number of partitioning levels if we ever need them. > Yeah, that's what I thought at some point in favour of dropping partissub altogether. However, not that this design solves it, there is one question - if we would want to support defining for a table both partition key and sub-partition key in advance? That is, without having defined a first level partition yet; in that case, what level do we associate sub-(sub-) partitioning key with or more to the point where do we keep it? One way is to replace partissub by partkeylevel with level 0 being the topmost-level partitioning key and so on while keeping the partrelid equal to the pg_class.oid of the parent. That brings us to next question of managing hierarchies in pg_partition_def corresponding to partkeylevel in the definition of topmost partitioned relation. But I guess those are implementation details rather than representational unless I am being too naïve. > > CREATE TABLE pg_catalog.pg_partition_def > > ( > >partitionid oid NOT NULL, > >partitionparentrel oidNOT NULL, > >partitionisoverflow bool NOT NULL, > >partitionvalues anyarray, > > > >PRIMARY KEY (partitionid), > >FOREIGN KEY (partitionid) REFERENCES pg_class(oid) > > ) > > WITHOUT OIDS; > > > > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned; > > What is an overflow partition and why do we want that? > That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like: (a range partition) ... VALUES LESS THAN MAXVALUE (a list partition) ... VALUES DEFAULT There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) > What are you going to do if the partitioning key has two columns of > different data types? > Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. They are one of the most crucial elements of the scheme. I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I am thinking it's a concern only for range partitions, no? (that is, a multicolumn partition key) I think partkind switches the interpretation of the field as appropriate. Am I missing something? By the way, I had mentioned we could have two values fields each for range and list partition kind. > > * DDL syntax (no multi-column partitioning, sub-partitioning support as > > yet): > > > > -- create partitioned table and child partitions at once. > > CREATE TABLE parent (...) > > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ] > > [ ( > > PARTITION child > >{ > >VALUES LESS THAN { ... | MAXVALUE } -- for RANGE > > | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST > >} > >[ WITH ( ... ) ] [ TABLESPACE tbs ] > > [, ...] > > ) ] ; > > How are you going to dump and restore this, bearing in mind that you > have to preserve a bunch of OIDs across pg_upgrade? What if somebody > wants to do pg_dump --table name_of_a_partition? > Assuming everything's (including partitioned relation and partitions at all levels) got a pg_class entry of its own, would OIDs be a problem? Or what is the nature of this problem if it's possible that it may be. If someone pg_dump's an individual partition as a table, we could let it be dumped as just a plain table. I am thinking we should be able to do that or should be doing just that (?) > I actually think it will be much cleaner to declare the parent first > and then have separate CREATE TABLE statements that glue the children > in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, > 1). > Oh, do you mean to do away without any syntax for defining partitions with CREATE TABLE parent?
Re: [HACKERS] using Core Foundation locale functions
On Fri, Nov 28, 2014 at 11:43:28AM -0500, Peter Eisentraut wrote: > In light of the recent discussions about using ICU on OS X, I looked > into the Core Foundation locale functions (Core Foundation = traditional > Mac API in OS X, as opposed to the Unix/POSIX APIs). > > Attached is a proof of concept patch that just about works for the > sorting aspects. (The ctype aspects aren't there yet and will crash, > but they could be done similarly.) It passes an appropriately adjusted > collate.linux.utf8 test, meaning that it does produce language-aware > sort orders that are equivalent to what glibc produces. > > At the moment, this is probably just an experiment that shows where > refactoring and better abstractions might be suitable if we want to > support multiple locale libraries. If we want to pursue ICU, I think > this could be a useful third option. Does this make the backend multi-threaded? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using Core Foundation locale functions
On 12/02/2014 12:52 AM, David E. Wheeler wrote: > Gotta say, I’m thrilled to see movement on this front, and especially pleased > to see how consensus seems to be building around an abstracted interface to > keep options open. This platform-specific example really highlights the need > for it (I had no idea that there was separate and more up-to-date collation > support in Core Foundation than in the UNIX layer of OS X). It'd also potentially let us make use of Windows' native locale APIs, which AFAIK receive considerably more love on that platform than their POSIX back-country cousins. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai wrote: > > Today, I had a talk with Hanada-san to clarify which can be a common > > portion of them and how to implement it. Then, we concluded both of > > features can be shared most of the infrastructure. > > Let me put an introduction of join replacement by foreign-/custom-scan > below. > > > > Its overall design intends to inject foreign-/custom-scan node instead > > of the built-in join logic (based on the estimated cost). From the > > viewpoint of core backend, it looks like a sub-query scan that > > contains relations join internally. > > > > What we need to do is below: > > > > (1) Add a hook add_paths_to_joinrel() > > It gives extensions (including FDW drivers and custom-scan providers) > > chance to add alternative paths towards a particular join of > > relations, using ForeignScanPath or CustomScanPath, if it can run instead > of the built-in ones. > > > > (2) Informs the core backend varno/varattno mapping One thing we need > > to pay attention is, foreign-/custom-scan node that performs instead > > of the built-in join node must return mixture of values come from both > > relations. In case when FDW driver fetch a remote record (also, fetch > > a record computed by external computing resource), the most reasonable > > way is to store it on ecxt_scantuple of ExprContext, then kicks > > projection with varnode that references this slot. > > It needs an infrastructure that tracks relationship between original > > varnode and the alternative varno/varattno. We thought, it shall be > > mapped to INDEX_VAR and a virtual attribute number to reference > > ecxt_scantuple naturally, and this infrastructure is quite helpful for > both of ForegnScan/CustomScan. > > We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan > nodes. > > It contains list of the original Var node that shall be mapped on the > > position according to the list index. (e.g, the first varnode is > > varno=INDEX_VAR and > > varattno=1) > > > > (3) Reverse mapping on EXPLAIN > > For EXPLAIN support, above varnode on the pseudo relation scan needed > > to be solved. All we need to do is initialization of dpns->inner_tlist > > on > > set_deparse_planstate() according to the above mapping. > > > > (4) case of scanrelid == 0 > > To skip open/close (foreign) tables, we need to have a mark to > > introduce the backend not to initialize the scan node according to > > table definition, but according to the pseudo varnodes list. > > As earlier custom-scan patch doing, scanrelid == 0 is a > > straightforward mark to show the scan node is not combined with a > particular real relation. > > So, it also need to add special case handling around foreign-/custom-scan > code. > > > > We expect above changes are enough small to implement basic join > > push-down functionality (that does not involves external computing of > > complicated expression node), but valuable to support in v9.5. > > > > Please comment on the proposition above. > > I don't really have any technical comments on this design right at the moment, > but I think it's an important area where PostgreSQL needs to make some > progress sooner rather than later, so I hope that we can get something > committed in time for 9.5. > I tried to implement the interface portion, as attached. Hanada-san may be under development of postgres_fdw based on this interface definition towards the next commit fest. Overall design of this patch is identical with what I described above. It intends to allow extensions (FDW driver or custom-scan provider) to replace a join by a foreign/custom-scan which internally contains a result set of relations join externally computed. It looks like a relation scan on the pseudo relation. One we need to pay attention is, how setrefs.c fixes up varno/varattno unlike regular join structure. I could find IndexOnlyScan already has similar infrastructure that redirect references of varnode to a certain column on ecxt_scantuple of ExprContext using a pair of INDEX_VAR and alternative varattno. This patch put a new field: fdw_ps_tlist of ForeignScan, and custom_ps_tlist of CustomScan. It is extension's role to set a pseudo- scan target-list (so, ps_tlist) of the foreign/custom-scan that replaced a join. If it is not NIL, set_plan_refs() takes another strategy to fix up them. It calls fix_upper_expr() to map varnodes of expression-list on INDEX_VAR according to the ps_tlist, then extension is expected to put values/isnull pair on ss_ScanTupleSlot of scan-state according to the ps_tlist preliminary constructed. Regarding to the primary hook to add alternative foreign/custom-scan path instead of built-in join paths, I added the following hook on add_paths_to_joinrel(). /* Hook for plugins to get control in add_paths_to_joinrel() */ typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root, RelOptInfo *joinrel,
Re: [HACKERS] using Core Foundation locale functions
On Tue, Dec 2, 2014 at 10:07 PM, Craig Ringer wrote: > On 12/02/2014 12:52 AM, David E. Wheeler wrote: >> Gotta say, I’m thrilled to see movement on this front, and especially >> pleased to see how consensus seems to be building around an abstracted >> interface to keep options open. This platform-specific example really >> highlights the need for it (I had no idea that there was separate and more >> up-to-date collation support in Core Foundation than in the UNIX layer of OS >> X). > > It'd also potentially let us make use of Windows' native locale APIs, > which AFAIK receive considerably more love on that platform than their > POSIX back-country cousins. Not to mention the fact that a MultiByteToWideChar() call could be saved, and sortsupport for text would just work on Windows. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
> On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote: > > Likely for most aggregates, like count, sum, max, min, bit_and and > > bit_or the merge function would be the same as the transition > > function, as the state type is just the same as the input type. It > > would only be aggregates like avg(), stddev*(), bool_and() and > > bool_or() that would need a new merge function made... These would be > > no more complex than the transition functions... Which are just a few > lines of code anyway. > > > > We'd simply just not run parallel query if any aggregates used in the > > query didn't have a merge function. > > > > When I mentioned this, I didn't mean to appear to be placing a road > > block.I was just bringing to the table the information that COUNT(*) + > > COUNT(*) works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) > does not. > > Sorry, late reply, but, FYI, I don't think our percentile functions can't > be parallelized in the same way: > > test=> \daS *percent* > List of > aggregate functions > Schema | Name | Result data type | > Argument data types | Description > +-++-- > +- > >pg_catalog | percent_rank| double precision | VARIADIC > "any" ORDER BY VARIADIC "any" | fractional rank of hypothetical row >pg_catalog | percentile_cont | double precision | double > precision ORDER BY double precision | continuous distribution percentile >pg_catalog | percentile_cont | double precision[] | double > precision[] ORDER BY double precision | multiple continuous percentiles >pg_catalog | percentile_cont | interval | double > precision ORDER BY interval | continuous distribution > percentile >pg_catalog | percentile_cont | interval[] | double > precision[] ORDER BY interval | multiple continuous percentiles >pg_catalog | percentile_disc | anyelement | double > precision ORDER BY anyelement | discrete percentile >pg_catalog | percentile_disc | anyarray | double > precision[] ORDER BY anyelement | multiple discrete percentiles > Yep, it seems to me the type of aggregate function that is not obvious to split into multiple partitions. I think, it is valuable even if we can push-down a part of aggregate functions which is well known by the core planner. For example, we know count(*) = sum(nrows), we also know avg(X) can be rewritten to enhanced avg() that takes both of nrows and partial sum of X. We can utilize these knowledge to break-down aggregate functions. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On Mon, Nov 17, 2014 at 7:48 AM, Pavel Stehule wrote: > 2014-11-07 22:37 GMT+01:00 Alvaro Herrera : >> >> >> I did \o /tmp/tst, then >> \dS >> create table "eh | oh" (); >> \dS >> >> and then filtered the output file to HTML. The CREATE TABLE tag ended >> up in the same line as the title of the following table. I think >> there's a newline is missing somewhere. >> >> >> The good news is that the | in the table name was processed correctly; >> but I noticed that the table title is not using the escaped print, so I >> would imagine that if I put a | in the title, things would go wrong. >> (I don't know how to put titles on tables other than editing the >> hardcoded titles for \ commands in psql). >> >> Another thing is that spaces around the | seem gratuituous and produce >> bad results. I tried "select * from pg_class" which results in a very >> wide table, and then the HTML output contains some cells with the values >> in the second line; this makes all rows taller than they must be, >> because some cells use the first line and other cells in the same row >> use the second line for the text. I hand-edited the asciidoc and >> removed the spaces around | which makes the result nicer. (Maybe >> removing the trailing space is enough.) > > > I see a trailing spaces, but I don't see a described effect. Please, can you > send some more specific test case? This formatting problem is trivial to reproduce: =# create table "foo" (); CREATE TABLE Time: 9.826 ms =# \d .List of relations [options="header",cols=" Also, something a bit surprising is that this format produces always one newline for each command, for example in the case of a DDL: =# create table foo (); CREATE TABLE I think that this extra space should be removed as well, no? This patch has been marked as "Waiting on Author" for a couple of weeks, and the problems mentioned before have not been completely addressed, hence marking this patch as returned with feedback. It would be nice to see progress for the next CF. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On Wed, Dec 3, 2014 at 3:52 PM, Michael Paquier wrote: > This patch has been marked as "Waiting on Author" for a couple of > weeks, and the problems mentioned before have not been completely > addressed, hence marking this patch as returned with feedback. It > would be nice to see progress for the next CF. Btw, here is a resource showing table formatting in asciidoc: http://asciidoc.org/newtables.html -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Tue, Nov 18, 2014 at 12:33 AM, Robert Haas wrote: > On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund > wrote: >> On 2014-11-17 10:21:04 -0500, Robert Haas wrote: >>> Andres, where are we with this patch? >>> >>> 1. You're going to commit it, but haven't gotten around to it yet. >>> >>> 2. You're going to modify it some more and repost, but haven't gotten >>> around to it yet. >>> >>> 3. You're willing to see it modified if somebody else does the work, >>> but are out of time to spend on it yourself. >>> >>> 4. Something else? >> >> I'm working on it. Amit had found a hang on PPC that I couldn't >> reproduce on x86. Since then I've reproduced it and I think yesterday I >> found the problem. Unfortunately it always took a couple hours to >> trigger... >> >> I've also made some, in my opinion, cleanups to the patch since >> then. Those have the nice side effect of making the size of struct >> LWLock smaller, but that wasn't actually the indended effect. >> >> I'll repost once I've verified the problem is fixed and I've updated all >> commentary. >> >> The current problem is that I seem to have found a problem that's also >> reproducible with master :(. After a couple of hours a >> pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S >> against a >> -c max_connections=200 -c shared_buffers=4GB >> cluster seems to hang on PPC. With all the backends waiting in buffer >> mapping locks. I'm now making sure it's really master and not my patch >> causing the problem - it's just not trivial with 180 processes involved. > > Ah, OK. Thanks for the update. Ping? This patch is in a stale state for a couple of weeks and still marked as waiting on author for this CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Wed, Dec 3, 2014 at 3:23 PM, Kouhei Kaigai wrote: >> On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote: >> > Likely for most aggregates, like count, sum, max, min, bit_and and >> > bit_or the merge function would be the same as the transition >> > function, as the state type is just the same as the input type. It >> > would only be aggregates like avg(), stddev*(), bool_and() and >> > bool_or() that would need a new merge function made... These would be >> > no more complex than the transition functions... Which are just a few >> lines of code anyway. >> > >> > We'd simply just not run parallel query if any aggregates used in the >> > query didn't have a merge function. >> > >> > When I mentioned this, I didn't mean to appear to be placing a road >> > block.I was just bringing to the table the information that COUNT(*) + >> > COUNT(*) works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) >> does not. >> >> Sorry, late reply, but, FYI, I don't think our percentile functions can't >> be parallelized in the same way: >> >> test=> \daS *percent* >> List of >> aggregate functions >> Schema | Name | Result data type | >> Argument data types | Description >> +-++-- >> +- >> >>pg_catalog | percent_rank| double precision | VARIADIC >> "any" ORDER BY VARIADIC "any" | fractional rank of hypothetical row >>pg_catalog | percentile_cont | double precision | double >> precision ORDER BY double precision | continuous distribution percentile >>pg_catalog | percentile_cont | double precision[] | double >> precision[] ORDER BY double precision | multiple continuous percentiles >>pg_catalog | percentile_cont | interval | double >> precision ORDER BY interval | continuous distribution >> percentile >>pg_catalog | percentile_cont | interval[] | double >> precision[] ORDER BY interval | multiple continuous percentiles >>pg_catalog | percentile_disc | anyelement | double >> precision ORDER BY anyelement | discrete percentile >>pg_catalog | percentile_disc | anyarray | double >> precision[] ORDER BY anyelement | multiple discrete percentiles >> > Yep, it seems to me the type of aggregate function that is not obvious > to split into multiple partitions. > I think, it is valuable even if we can push-down a part of aggregate > functions which is well known by the core planner. > For example, we know count(*) = sum(nrows), we also know avg(X) can > be rewritten to enhanced avg() that takes both of nrows and partial > sum of X. We can utilize these knowledge to break-down aggregate > functions. Postgres-XC (Postgres-XL) has implemented such parallel aggregate logic some time ago using a set of sub functions and a finalization function to do the work. My 2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers