Re: [HACKERS] Abbreviated keys for Numeric
Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes: Gavin What are the standard deviations? Gavin Do the arithmetic means change much if you exclude the 2 fastest Gavin 2 slowest? Gavin How do the arithmetic means compare to their respective medians? Gavin Essentially, how consistent are the results, or how great is the Gavin noise? There may be better indicators than the ones I've Gavin suggested above. This is all rather missing the point. The relevant metric is not how much noise is introduced between runs of the same code, but rather how much noise is introduced as a result of non-consequential changes to the code. I can get variations of several percent - easily more than three sigmas of the timing of repeated runs of unchanged code - in the time taken to sort a float8 column simply from introducing varying amounts of padding into the body of a function which is never called in the test. Clearly, the only possible effect here is that the changed memory addresses of functions must be resulting in different patterns of cache misses / cache replacements, or TLB misses, or similar low-level effects which have nothing to do with the code as such. (That this is a low-level alignment effect is supported by the fact that the performance changes are not monotonic in the size of the padding; adding more padding may cause either speedups or slowdowns.) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
2015-02-22 3:00 GMT+01:00 Petr Jelinek p...@2ndquadrant.com: On 28/01/15 08:15, Pavel Stehule wrote: 2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com: On 1/27/15 4:36 AM, Pavel Stehule wrote: It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data. Externally cached data? Some from these functions has own caches for minimize access to typcache (array_map, array_cmp is example). And in first case, I am trying to push these information from fn_extra, in second case I don't do it, because I don't expect a repeated call (and I am expecting so type cache will be enough). You actually do caching via fn_extra in both case and I think that's the correct way, and yes that part can be moved common function. I also see that the documentation does not say what is returned by array_offset if nothing is found (it's documented in code but not in sgml). rebased + fixed docs Regards Pavel -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml new file mode 100644 index 9ea1068..b3630b4 *** a/doc/src/sgml/array.sgml --- b/doc/src/sgml/array.sgml *** SELECT * FROM sal_emp WHERE pay_by_quart *** 600,605 --- 600,614 index, as described in xref linkend=indexes-types. /para + para + You can also search any value in array using the functionarray_offset/ + function (It returns a position of first occurrence of value in the array): + + programlisting + SELECT array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon'); + /programlisting + /para + tip para Arrays are not sets; searching for specific array elements diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index da2ed67..459343a *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT NULLIF(value, '(none)') ... *** 11480,11485 --- 11480,11488 primaryarray_lower/primary /indexterm indexterm + primaryarray_offset/primary + /indexterm + indexterm primaryarray_prepend/primary /indexterm indexterm *** SELECT NULLIF(value, '(none)') ... *** 11598,11603 --- 11601,11633 /row row entry + literal + functionarray_offset/function(typeanyarray/type, typeanyelement/type optional, typeint/type/optional) + /literal + /entry + entrytypeint/type/entry + entryreturns a offset of first occurrence of some element in a array. It uses + a literalIS NOT DISTINCT FROM/ operator for comparation. Third + optional argument can specify a initial offset when searching starts. + Returns NULL when there are not occurence of value in array./entry + entryliteralarray_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon')/literal/entry + entryliteral2/literal/entry +/row +row + entry + literal + functionarray_offsets/function(typeanyarray/type, typeanyelement/type) + /literal + /entry + entrytypeint[]/type/entry + entryreturns a array of offset of all occurrences some element in a array. It uses + a literalIS NOT DISTINCT FROM/ operator for comparation. Returns empty array, + when there are no occurence of element in input array./entry + entryliteralarray_offsets(ARRAY['A','A','B','A'], 'A')/literal/entry + entryliteral{1,2,4}/literal/entry +/row +row + entry literal functionarray_prepend/function(typeanyelement/type, typeanyarray/type) /literal diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c new file mode 100644 index 5781b95..9bf0eb5 *** a/src/backend/utils/adt/array_userfuncs.c --- b/src/backend/utils/adt/array_userfuncs.c *** *** 12,20 --- 12,24 */ #include postgres.h + #include catalog/pg_type.h #include utils/array.h #include utils/builtins.h #include utils/lsyscache.h + #include utils/typcache.h + + static bool array_offset_common(FunctionCallInfo fcinfo, int *result); /* *** array_agg_array_finalfn(PG_FUNCTION_ARGS *** 644,646 --- 648,911 PG_RETURN_DATUM(result); } + + + /* + * array_offset - returns a offset of entered element in a array. + * Returns NULL when values is not a element of the array. It allow + * searching a NULL value due using a NOT DISTINCT FROM operator. + * + * Biggest difference against width_array is unsorted input array. + */ + Datum + array_offset(PG_FUNCTION_ARGS) + { + int result; + +
Re: [HACKERS] SSL renegotiation
On 2015-02-22 01:27:54 +0100, Emil Lenngren wrote: I honestly wonder why postgres uses renegotiation at all. The motivation that cryptoanalysis is easier as more data is sent seems quite far-fetched. I don't think so. There's a fair number of algorithms that can/could be much easier be attached with lots of data available. Especially if you can guess/know/control some of the data. Additionally renegotiating regularly helps to constrain a possible key leagage to a certain amount of time. With backend connections often being alive for weeks at a time that's not a bad thing. And it's not just us. E.g. openssh also triggers renegotiations based on the amount of data sent. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 22/02/15 09:57, Andres Freund wrote: On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote: On 16/02/15 10:46, Andres Freund wrote: On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote: At a quick glance, this basic design seems workable. I would suggest expanding the replication IDs to regular 4 byte oids. Two extra bytes is a small price to pay, to make it work more like everything else in the system. I don't know. Growing from 3 to 5 byte overhead per relevant record (or even 0 to 5 in case the padding is reused) is rather noticeable. If we later find it to be a limit (I seriously doubt that), we can still increase it in a major release without anybody really noticing. I agree that limiting the overhead is important. But I have related though about this - I wonder if it's worth to try to map this more directly to the CommitTsNodeId. Maybe. I'd rather go the other way round though; replication_identifier.c/h's stuff seems much more generic than CommitTsNodeId. Probably not more generic but definitely nicer as the nodes are named and yes it has richer API. I mean it is currently using that for the actual storage, but I am thinking of having the Replication Identifiers be the public API and the CommitTsNodeId stuff be just hidden implementation detail. This thought is based on the fact that CommitTsNodeId provides somewhat meaningless number while Replication Identifiers give us nice name for that number. And if we do that then the type should perhaps be same for both? I'm not sure. Given that this is included in a significant number of recordsd I'd really rather not increase the overhead as described above. Maybe we can just limit CommitTsNodeId to the same size for now? That would make sense. I also wonder about the default nodeid infrastructure the committs provides. I can't think of use-case which it can be used for and which isn't solved by replication identifiers - in the end the main reason I added that infra was to make it possible to write something like replication identifiers as part of an extension. In any case I don't think the default nodeid can be used in parallel with replication identifiers since one will overwrite the SLRU record for the other. Maybe it's enough if this is documented but I think it might be better if this patch removed that default committs nodeid infrastructure. It's just few lines of code which nobody should be using yet. Thinking about this some more and rereading the code I see that you are setting TransactionTreeSetCommitTsData during xlog replay, but not during transaction commit, that does not seem correct as the local records will not have any nodeid/origin. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
I am wondering a bit about interaction with wal_keep_segments. One thing is that wal_keep_segments is still specified in number of segments and not size units, maybe it would be worth to change it also? And the other thing is that, if set, the wal_keep_segments is the real max_wal_size from the user perspective (not from perspective of the algorithm in this patch, but user does not really care about that) which is somewhat weird given the naming. In my opinion - I think wal_keep_segments being number of segments would help a lot. In my experience, while handling production databases, to arrive at an optimal value for wal_keep_segments, we go by calculating number of segments getting generated in wal archive destination (hourly or daily basis), this would further help us calculate how many segments to keep considering various other factors in an replication environment to ensure master has enough WALs in pg_xlog when standby comes back up after the outage. Ofcourse, if we can calculate number-of-segments, we can calculate the same in terms of size too. Calculating number of segments would be more feasible. Regards, VBN
Re: [HACKERS] hash agg is slower on wide tables?
Pavel == Pavel Stehule pavel.steh...@gmail.com writes: Pavel why we read all columns from t1? [...] Pavel so it looks so hashagg doesn't eliminate source columns well I don't think it's supposed to eliminate them. This is, if I'm understanding the planner logic right, physical-tlist optimization; it's faster for a table scan to simply return the whole row (copying nothing, just pointing to the on-disk tuple) and let hashagg pick out the columns it needs, rather than for the scan to run a projection step just to select specific columns. If there's a Sort step, this isn't done because Sort neither evaluates its input nor projects new tuples on its output, it simply accepts the tuples it receives and returns them with the same structure. So now it's important to have the node providing input to the Sort projecting out only the minimum required set of columns. Why it's slower on the wider table... that's less obvious. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash agg is slower on wide tables?
On 2015-02-22 10:33:16 +, Andrew Gierth wrote: This is, if I'm understanding the planner logic right, physical-tlist optimization; it's faster for a table scan to simply return the whole row (copying nothing, just pointing to the on-disk tuple) and let hashagg pick out the columns it needs, rather than for the scan to run a projection step just to select specific columns. If there's a Sort step, this isn't done because Sort neither evaluates its input nor projects new tuples on its output, it simply accepts the tuples it receives and returns them with the same structure. So now it's important to have the node providing input to the Sort projecting out only the minimum required set of columns. Why it's slower on the wider table... that's less obvious. It's likely to just be tuple deforming. I've not tried it but I'd bet you'll see slot_deform* very high in the profile. For the narrow table only two attributes need to be extracted, for the wider one everything up to a11 will get extracted. I've wondered before if we shouldn't use the caching via slot-tts_values so freely - if you only use a couple values from a wide tuple the current implementation really sucks if those few aren't at the beginning of the tuple. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
On Sat, Feb 21, 2015 at 6:51 AM, Peter Eisentraut pete...@gmx.net wrote: On 2/20/15 1:56 AM, Michael Paquier wrote: We'd still need the .gitignore files somewhere. Do you want to move them one directory up? I am not sure I am getting what you are pointing to... For extensions that already have non-empty sql/ and expected/, they should have their own ignore entries as sql/.gitignore and expected/.gitignore. The point of the patch is to simplify the code tree of extensions that need to keep empty sql/ and expected/, for example to be able to run regression tests after a fresh repository clone for example. The affected modules have sql/.gitignore and/or expected/.gitignore files, so the case that the directory doesn't exist and needs to be created doesn't actually happen. What about extensions that do not have sql/ and extended/ in their code tree? I don't have an example of such extensions in my mind but I cannot assume either that they do not exist. See for example something that happended a couple of months back, dummy_seclabel has been trapped by that with df761e3, fixed afterwards with 3325624 (the structure has been changed again since, but that's the error showed up because of those missing sql/ and expected/). You could argue that these .gitignore files don't actually belong there, but your patch doesn't change or move those files, and even modules that have non-empty sql/ or expected/ directories have .gitignore files there, so it is considered the appropriate location. This is up to the maintainer of each extension to manage their code tree. However I can imagine that some people would be grateful if we allow them to not need sql/ and expected/ containing only one single .gitignore file ignoring everything with *, making the code tree of their extensions cleaner. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash agg is slower on wide tables?
Andres Freund and...@2ndquadrant.com writes: I've wondered before if we shouldn't use the caching via slot-tts_values so freely - if you only use a couple values from a wide tuple the current implementation really sucks if those few aren't at the beginning of the tuple. Don't see how you expect to get a win that way. Visiting column k requires crawling over columns 1..k-1 in practically all cases. You could maybe save a cycle or so by omitting the physical store into the Datum array, assuming that you never did need the column value later ... but the extra bookkeeping for more complicated tracking of which columns had been extracted would eat that savings handily. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On Mon, Feb 16, 2015 at 10:49 AM, Kevin Grittner kgri...@ymail.com wrote: What this discussion has made me reconsider is the metric for considering a transaction too old. The number of transaction IDs consumed seems inferior as the unit of measure for that to LSN or time. It looks to me to be pretty trivial (on the order of maybe 30 lines of code) to specify this GUC in minutes rather than transaction IDs. It seems to me that SQL Server also uses similar mechanism to avoid the bloat in version store (place to store previous versions or record). Mechanism used by them is that during the shrink process, the longest running transactions that have not yet generated row versions are marked as victims. If a transaction is marked as a victim, it can no longer read the row versions in the version store. When it attempts to read row versions, an error message is generated and the transaction is rolled back. I am not sure how much weight it adds to the usefulness of this patch, however I think if other leading databases provide a way to control the bloat, it indicates that most of the customers having write-intesive workload would like to see such an option. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Replication identifiers, take 4
On 2/15/15 7:24 PM, Andres Freund wrote: On 2015-02-16 01:21:55 +0100, Andres Freund wrote: Here's my next attept attempt at producing something we can agree upon. The major change that might achieve that is that I've now provided a separate method to store the origin_id of a node. I've made it conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both paths. That new method uses Heikki's xlog rework to dynamically add the origin to the record if a origin is set up. That works surprisingly simply. I'm trying to figure out what this feature is supposed to do, but the patch contains no documentation or a commit message. Where is one supposed to start? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
Amit Kapila amit.kapil...@gmail.com wrote: It seems to me that SQL Server also uses similar mechanism to avoid the bloat in version store (place to store previous versions or record). I think if other leading databases provide a way to control the bloat, it indicates that most of the customers having write-intesive workload would like to see such an option. Yeah, I think many users are surprised by the damage that can be done to a database by an idle connection or long-running read-only query, especially when coming from other databases which have protections against that. The bad news is that changing to specifying the limit in time rather than transaction IDs is far more complicated than I thought. Basically, we need to associate times in the past with the value of latestCompletedXid (or derived values like snapshot xmin) at those times. I was initially thinking that by adding a timestamp to the snapshot we could derive this from active snapshots. For any one connection, it's not that hard for it to scan the pairingheap of snapshots and pick out the right values; but the problem is that it is process-local memory and this needs to work while the connection is sitting idle for long periods of time. I've got a couple ideas on how to approach it, but neither seems entirely satisfying: (1) Use a new timer ID to interrupt the process whenever its oldest snapshot which hasn't yet crossed the old snapshot threshold does so. The problem is that this seems as though it would need to do an awful lot more work (scanning the pairing heap for the best match) than anything else is currently doing in a timeout handler. One alternative would be to keep a second pairing heap to track snapshots which have not crossed the threshold, removing items as they get old -- that would make the work needed in the timeout handler closer to other handlers. (2) Use a course enough granularity on time and a short enough maximum for the GUC to just keep a circular buffer of the mappings in memory. We might be able to make this dense enough that one minute resolution for up to 60 days could fit in 338kB. Obviously that could be reduced with courser granularity or a shorter maximum. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On 22.2.2015 09:14, Jeff Davis wrote: On Wed, 2015-01-07 at 20:07 +0100, Tomas Vondra wrote: So I started digging in the code and I noticed this: hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true); which is IMHO obviously wrong, because that accounts only for the hashtable itself. It might be correct for aggregates with state passed by value, but it's incorrect for state passed by reference (e.g. Numeric, arrays etc.), because initialize_aggregates does this: oldContext = MemoryContextSwitchTo(aggstate-aggcontext); pergroupstate-transValue = datumCopy(peraggstate-initValue, peraggstate-transtypeByVal, peraggstate-transtypeLen); MemoryContextSwitchTo(oldContext); and it's also wrong for all the user-defined aggretates that have no access to hashcontext at all, and only get reference to aggcontext using AggCheckCallContext(). array_agg() is a prime example. Actually, I believe the first one is right, and the second one is wrong. If we allocate pass-by-ref transition states in aggcontext, that defeats the purpose of the patch, because we can't release the memory when we start a new batch (because we can't reset aggcontext). Instead, the transition states should be copied into hashcontext; we should count the memory usage of hashcontext; AggCheckCallContext should return hashcontext; and after each batch we should reset hashcontext. Hmmm, maybe. I'd however suggest not to stuff everything into a single memory context, but to create a 'transvalue' context next to hashcontext, and return that one from AggCheckCallContext(). I.e. something like this: aggcontext hashcontext - hash table etc. transcontext - new context for transition values It's better to keep things separated, I guess. It might be worth refactoring a bit to call it the groupcontext or something instead, and use it for both HashAgg and GroupAgg. That would reduce the need for conditionals. Not sure that's worth it. The patch already seems complex enough. [ ... problems with array_agg subcontexts ... ] and it runs indefinitely (I gave up after a few minutes). I believe this renders the proposed memory accounting approach dead. ... The array_agg() patch I submitted to this CF would fix this particular query, as it removes the child contexts (so there's no need for recursion in MemoryContextMemAllocated), but it does nothing to the user-defined aggregates out there. And it's not committed yet. Now that your patch is committed, I don't think I'd call the memory accounting patch I proposed dead yet. We decided that creating one context per group is essentially a bug, so we don't need to optimize for that case. Yes, but the question is how many other aggregates designed similarly to array_agg are out there (in extensions atc.), and whether we actually care about them. Because those will be equally impacted by this, and we can't fix them. But maybe we don't really care, and it's the author's responsibility to test them on a new major version, and fix the performance issue just like we did with array_agg(). Your approach may be better, though. I'm not sure - it really boils down do the goal and assumptions. If we restrict ourselves to HashAggregate-only solution, under the assumption that the number of child contexts is small (and leave the fix up to the author of the extension), then your approach is probably simpler and better than the approach I proposed. If we're aiming for a general-purpose solution that might be used elsewhere, then I'm afraid we can't make the 'small number of child contexts' assumption and the simple solution may not be the right one. Although I believe that we shouldn't be creating large numbers of 'parallel' memory contexts anyway. Thank you for reviewing. I'll update my patches and resubmit for the next CF. Thanks for working on this. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash agg is slower on wide tables?
On 2015-02-22 09:58:31 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I've wondered before if we shouldn't use the caching via slot-tts_values so freely - if you only use a couple values from a wide tuple the current implementation really sucks if those few aren't at the beginning of the tuple. Don't see how you expect to get a win that way. Visiting column k requires crawling over columns 1..k-1 in practically all cases. You could maybe save a cycle or so by omitting the physical store into the Datum array, assuming that you never did need the column value later ... but the extra bookkeeping for more complicated tracking of which columns had been extracted would eat that savings handily. Depends a bit on the specifics. In this case attcacheoff would allow you direct access, which surely is going to be more efficient. I'm not sure how frequent that happens in practice. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-22 10:03:59 -0500, Peter Eisentraut wrote: On 2/15/15 7:24 PM, Andres Freund wrote: On 2015-02-16 01:21:55 +0100, Andres Freund wrote: Here's my next attept attempt at producing something we can agree upon. The major change that might achieve that is that I've now provided a separate method to store the origin_id of a node. I've made it conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both paths. That new method uses Heikki's xlog rework to dynamically add the origin to the record if a origin is set up. That works surprisingly simply. I'm trying to figure out what this feature is supposed to do, but the patch contains no documentation or a commit message. Where is one supposed to start? For a relatively short summary: http://archives.postgresql.org/message-id/20150216002155.GI15326%40awork2.anarazel.de For a longer one: http://www.postgresql.org/message-id/20140923182422.ga15...@alap3.anarazel.de Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash agg is slower on wide tables?
2015-02-22 13:22 GMT+01:00 Andres Freund and...@2ndquadrant.com: On 2015-02-22 10:33:16 +, Andrew Gierth wrote: This is, if I'm understanding the planner logic right, physical-tlist optimization; it's faster for a table scan to simply return the whole row (copying nothing, just pointing to the on-disk tuple) and let hashagg pick out the columns it needs, rather than for the scan to run a projection step just to select specific columns. If there's a Sort step, this isn't done because Sort neither evaluates its input nor projects new tuples on its output, it simply accepts the tuples it receives and returns them with the same structure. So now it's important to have the node providing input to the Sort projecting out only the minimum required set of columns. Why it's slower on the wider table... that's less obvious. It's likely to just be tuple deforming. I've not tried it but I'd bet you'll see slot_deform* very high in the profile. For the narrow table only two attributes need to be extracted, for the wider one everything up to a11 will get extracted. I've wondered before if we shouldn't use the caching via slot-tts_values so freely - if you only use a couple values from a wide tuple the current implementation really sucks if those few aren't at the beginning of the tuple. the number of columns has strong effect, but it is not only one. I tested first two columns, and bigger tables is aggregated slowly - about 30% postgres=# explain analyze select count(*), a1, a2 from t1 group by 3,2 order by 3,2; QUERY PLAN - Sort (cost=2023263.19..2023263.25 rows=24 width=4) (actual time=84073.451..84073.452 rows=24 loops=1) Sort Key: a2, a1 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=2023262.40..2023262.64 rows=24 width=4) (actual time=84073.430..84073.433 rows=24 loops=1) -- 23700 Group Key: a2, a1 - Seq Scan on t1 (cost=0.00..1497532.80 rows=70097280 width=4) (actual time=67.325..60152.052 rows=70097280 loops=1) Planning time: 0.107 ms Execution time: 84073.534 ms (8 rows) postgres=# explain analyze select count(*), a1, a2 from t2 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=1536868.33..1536868.39 rows=24 width=4) (actual time=21963.230..21963.231 rows=24 loops=1) Sort Key: a2, a1 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=1536867.54..1536867.78 rows=24 width=4) (actual time=21963.209..21963.213 rows=24 loops=1) -- 16000 Group Key: a2, a1 - Seq Scan on t2 (cost=0.00..1011137.88 rows=70097288 width=4) (actual time=0.063..5647.404 rows=70097280 loops=1) Planning time: 0.069 ms Execution time: 21963.340 ms (8 rows) Profile when data are in first two columns 7.87% postgres [.] slot_deform_tuple 7.48% postgres [.] slot_getattr 7.10% postgres [.] hash_search_with_hash_value 3.74% postgres [.] execTuplesMatch 3.68% postgres [.] ExecAgg Profile when data are in first and 11 column 20.35% postgres [.] slot_deform_tuple 6.55% postgres [.] hash_search_with_hash_value 5.86% postgres [.] slot_getattr 4.15% postgres [.] ExecAgg So your hypothesis is valid Regards Pavel Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Precedence of standard comparison operators
I wrote: Attached is a draft patch to bring the precedence of comparison operators and IS tests into line with the SQL standard. I have not yet looked into producing warnings for changes in parsing decisions ... I've made some progress on getting parse_expr.c to produce warnings by after-the-fact analysis of the raw parse tree, and I think it can be made to work. However, I've run into two stumbling blocks that greatly limit the quality of the warnings, and will need to be fixed as separate patches: 1. BooleanTest and NullTest node types don't carry location pointers, so there's no way to give an error cursor position referencing them. Simple enough addition. I think we left these out because there were no post-grammar error reports involving them, but now we need some. 2. gram.y expands BETWEEN operations into complex AND/OR nests on sight, losing the information that there was a BETWEEN there, which is important if we're to detect possible precedence changes involving BETWEEN. What we should do about #2 is preserve BETWEEN as a distinct construct in the output of raw parsing. This is a good thing anyway, because in the long run we are going to want to fix BETWEEN's semantic issues (such as multiple evaluation of possibly-volatile input expressions) and fixing what the grammar does with it is an essential first step. Barring objection I'll go do both of those things as separate patches, and then come back to the precedence warnings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for Numeric
On 22/02/15 22:59, Andrew Gierth wrote: Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes: Gavin What are the standard deviations? Gavin Do the arithmetic means change much if you exclude the 2 fastest Gavin 2 slowest? Gavin How do the arithmetic means compare to their respective medians? Gavin Essentially, how consistent are the results, or how great is the Gavin noise? There may be better indicators than the ones I've Gavin suggested above. This is all rather missing the point. The relevant metric is not how much noise is introduced between runs of the same code, but rather how much noise is introduced as a result of non-consequential changes to the code. I can get variations of several percent - easily more than three sigmas of the timing of repeated runs of unchanged code - in the time taken to sort a float8 column simply from introducing varying amounts of padding into the body of a function which is never called in the test. Clearly, the only possible effect here is that the changed memory addresses of functions must be resulting in different patterns of cache misses / cache replacements, or TLB misses, or similar low-level effects which have nothing to do with the code as such. (That this is a low-level alignment effect is supported by the fact that the performance changes are not monotonic in the size of the padding; adding more padding may cause either speedups or slowdowns.) What is you have pointed out is 'obvious', in retrospect - but still, far more extreme than I would have ever expected! There well may be analogues of the story where a DP manager was convinced to get faster hard drives, 50% faster than the old ones, to speed up an overnight batch run. The program then took about 50% longer to run. Turns out that previously the mainframe processed one block in a little less time than the old hard drive took to rotate once. So the new hard drives ended up doing two rotations per block read, so took longer, despite being faster! (A modern low end Smart Phone, probably has at least a thousand times more processing power than that poor old Mainframe!) Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deparsing utility commands
On 2015-02-21 17:30:24 +0100, Andres Freund wrote: /* + * deparse_CreateFunctionStmt + * Deparse a CreateFunctionStmt (CREATE FUNCTION) + * + * Given a function OID and the parsetree that created it, return the JSON + * blob representing the creation command. + * + * XXX this is missing the per-function custom-GUC thing. + */ Subject: [PATCH 27/42] deparse: Support ALTER FUNCTION + * deparse_AlterFunctionStmt + * Deparse a AlterFunctionStmt (ALTER FUNCTION) + * + * Given a function OID and the parsetree that created it, return the JSON + * blob representing the alter command. + * + * XXX this is missing the per-function custom-GUC thing. + */ Hm, so my earlier ptatch needs to partially be copied here. I'm running out of time now though... Updated 0003 attached that supports both SET for both CREATE and ALTER. In my previous patch I'd looked at proconfig for the values. A bit further thought revealed that that's not such a great idea: It works well enough for CREATE FUNCTION, but already breaks down at CREATE OR REPLACE FUNCTION unless we want to emit RESET ALL (SET ...)+ which seems mighty ugly. Also, the code is actually a good bit easier to understand this way. I. hate. arrays. ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From bb8b0027e82e628fb098b9707f36fa5ce08b9234 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sat, 21 Feb 2015 16:36:34 +0100 Subject: [PATCH 3/3] Support SET/RESET in CREATE/ALTER FUNCTION. --- src/backend/tcop/deparse_utility.c | 61 ++ 1 file changed, 55 insertions(+), 6 deletions(-) diff --git a/src/backend/tcop/deparse_utility.c b/src/backend/tcop/deparse_utility.c index 35c4eca..cb52cb2 100644 --- a/src/backend/tcop/deparse_utility.c +++ b/src/backend/tcop/deparse_utility.c @@ -2802,14 +2802,46 @@ deparse_CreateDomain(Oid objectId, Node *parsetree) return createDomain; } +static ObjTree * +deparse_FunctionSet(VariableSetKind kind, char *name, char *value) +{ + ObjTree *r; + + if (kind == VAR_RESET_ALL) + { + r = new_objtree_VA(RESET ALL, 0); + } + else if (value != NULL) + { + /* + * Some GUC variable names are 'LIST' type and hence must not be + * quoted. FIXME: shouldn't this and pg_get_functiondef() rather use + * guc.c to check for GUC_LIST? + */ + if (pg_strcasecmp(name, DateStyle) == 0 + || pg_strcasecmp(name, search_path) == 0) + r = new_objtree_VA(SET %{set_name}I TO %{set_value}s, 0); + else + r = new_objtree_VA(SET %{set_name}I TO %{set_value}L, 0); + + append_string_object(r, set_name, name); + append_string_object(r, set_value, value); + } + else + { + r = new_objtree_VA(RESET %{set_name}I, 0); + append_string_object(r, set_name, name); + } + + return r; +} + /* * deparse_CreateFunctionStmt * Deparse a CreateFunctionStmt (CREATE FUNCTION) * * Given a function OID and the parsetree that created it, return the JSON * blob representing the creation command. - * - * XXX this is missing the per-function custom-GUC thing. */ static ObjTree * deparse_CreateFunction(Oid objectId, Node *parsetree) @@ -2825,6 +2857,7 @@ deparse_CreateFunction(Oid objectId, Node *parsetree) char *probin; List *params; List *defaults; + List *sets = NIL; ListCell *cell; ListCell *curdef; ListCell *table_params = NULL; @@ -3089,7 +3122,21 @@ deparse_CreateFunction(Oid objectId, Node *parsetree) append_float_object(tmp, rows, procForm-prorows); append_object_object(createFunc, rows, tmp); - append_array_object(createFunc, set_options, NIL); + foreach(cell, node-options) + { + DefElem *defel = (DefElem *) lfirst(cell); + ObjTree *tmp = NULL; + + if (strcmp(defel-defname, set) == 0) + { + VariableSetStmt *sstmt = (VariableSetStmt *) defel-arg; + char *value = ExtractSetVariableArgs(sstmt); + + tmp = deparse_FunctionSet(sstmt-kind, sstmt-name, value); + sets = lappend(sets, new_object_object(tmp)); + } + } + append_array_object(createFunc, set_options, sets); if (probin == NULL) { @@ -3114,8 +3161,6 @@ deparse_CreateFunction(Oid objectId, Node *parsetree) * * Given a function OID and the parsetree that created it, return the JSON * blob representing the alter command. - * - * XXX this is missing the per-function custom-GUC thing. */ static ObjTree * deparse_AlterFunction(Oid objectId, Node *parsetree) @@ -3203,8 +3248,12 @@ deparse_AlterFunction(Oid objectId, Node *parsetree) defGetNumeric(defel)); } else if (strcmp(defel-defname, set) == 0) - elog(ERROR, unimplemented deparse of ALTER FUNCTION SET); + { + VariableSetStmt *sstmt = (VariableSetStmt *) defel-arg; + char *value = ExtractSetVariableArgs(sstmt); + tmp = deparse_FunctionSet(sstmt-kind, sstmt-name, value); + } elems = lappend(elems, new_object_object(tmp)); } --
Re: [HACKERS] Abbreviated keys for Numeric
On 22.2.2015 10:59, Andrew Gierth wrote: Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes: Gavin Essentially, how consistent are the results, or how great is the Gavin noise? There may be better indicators than the ones I've Gavin suggested above. This is all rather missing the point. The relevant metric is not how much noise is introduced between runs of the same code, but rather how much noise is introduced as a result of non-consequential changes to the code. I can get variations of several percent - easily more than three sigmas of the timing of repeated runs of unchanged code - in the time taken to sort a float8 column simply from introducing varying amounts of padding into the body of a function which is never called in the test. Clearly, the only possible effect here is that the changed memory addresses of functions must be resulting in different patterns of cache misses / cache replacements, or TLB misses, or similar low-level effects which have nothing to do with the code as such. (That this is a low-level alignment effect is supported by the fact that the performance changes are not monotonic in the size of the padding; adding more padding may cause either speedups or slowdowns.) Interesting, but I think Gavin was asking about how much variation was there for each tested case (e.g. query executed on the same code / dataset). And in those cases the padding / alignment won't change, and thus the effects you describe won't influence the results, no? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
On 2/22/15 5:41 AM, Michael Paquier wrote: You could argue that these .gitignore files don't actually belong there, but your patch doesn't change or move those files, and even modules that have non-empty sql/ or expected/ directories have .gitignore files there, so it is considered the appropriate location. This is up to the maintainer of each extension to manage their code tree. However I can imagine that some people would be grateful if we allow them to not need sql/ and expected/ containing only one single .gitignore file ignoring everything with *, making the code tree of their extensions cleaner. I would like to have an extension in tree that also does this, so we have a regression test of this functionality. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-22 04:59:30 +0100, Petr Jelinek wrote: Now that the issue with padding seems to no longer exists since the patch works both with and without padding, I went through the code and here are some comments I have (in no particular order). In CheckPointReplicationIdentifier: + * FIXME: Add a CRC32 to the end. The function already does it (I guess you forgot to remove the comment). Yep. I locally have a WIP version that's much cleaned up and doesn't contain it anymore. Using max_replication_slots as limit for replication_identifier states does not really make sense to me as replication_identifiers track remote info while and slots are local and in case of master-slave replication you need replication identifiers but don't need slots. On the other hand, it's quite cheap if unused. Not sure if several variables are worth it. In bootstrap.c: #define MARKNOTNULL(att) \ ((att)-attlen 0 || \ (att)-atttypid == OIDVECTOROID || \ - (att)-atttypid == INT2VECTOROID) + (att)-atttypid == INT2VECTOROID || \ + strcmp(NameStr((att)-attname), riname) == 0 \ +) Huh? Can this be solved in a nicer way? Yes. I'd mentioned that this is just a temporary hack ;). I've since pushed a more proper solution; BKI_FORCE_NOT_NULL can now be specified on the column. Since we call XLogFlush with local_lsn as parameter, shouldn't we check that it's actually within valid range? Currently we'll get errors like this if set to invalid value: ERROR: xlog flush request 123/123 is not satisfied --- flushed only to 0/168FB18 I think we should rather remove local_lsn from all parameters that are user callable, adding them there was a mistake. It's really only relevant for the cases where it's called by commit. In AdvanceReplicationIndentifier: +/* + * XXX: should we restore into a hashtable and dump into shmem only after + * recovery finished? + */ Probably no given that the function is also callable via SQL interface. Well, it's still a good idea regardless... As I wrote in another email, I would like to integrate the RepNodeId and CommitTSNodeId into single thing. Will reply separately there. There are no docs for the new sql interfaces. Yea. The whole thing needs docs. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hash agg is slower on wide tables?
Hi I did some benchmarks and I found some strange numbers. do $$ begin drop table if exists t1; execute 'create table t1(' || array_to_string(array(select 'a' || i || ' smallint' from generate_series(1,30) g(i)), ',') || ')'; -- special column a2, a11 insert into t1 select 2008, i % 12 + 1, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, case when random() 0.9 then 1 else 0 end, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20 from generate_series(1,7009728) g(i); drop table if exists t2; create table t2 as select a2, a11 from t1; analyze t1; analyze t2; end; $$; postgres=# \dt+ t* List of relations Schema | Name | Type | Owner | Size | Description +--+---+---++- public | t1 | table | pavel | 622 MB | public | t2 | table | pavel | 242 MB | (2 rows) postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=202327.03..202327.09 rows=24 width=4) (actual time=2609.159..2609.161 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=202326.24..202326.48 rows=24 width=4) (actual time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms Group Key: a11, a2 - Seq Scan on t1 (cost=0.00..149753.28 rows=7009728 width=4) (actual time=0.071..616.222 rows=7009728 loops=1) Planning time: 0.138 ms Execution time: 2609.247 ms (8 rows) postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=153688.03..153688.09 rows=24 width=4) (actual time=2100.058..2100.059 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=153687.24..153687.48 rows=24 width=4) (actual time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster Group Key: a11, a2 - Seq Scan on t2 (cost=0.00..101114.28 rows=7009728 width=4) (actual time=0.043..532.680 rows=7009728 loops=1) Planning time: 0.178 ms Execution time: 2100.158 ms (8 rows) postgres=# \dt+ t* List of relations Schema | Name | Type | Owner | Size | Description +--+---+---+-+- public | t1 | table | pavel | 6225 MB | public | t2 | table | pavel | 2423 MB | (2 rows) postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2; QUERY PLAN - Sort (cost=2023263.19..2023263.25 rows=24 width=4) (actual time=99453.272..99453.274 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=2023262.40..2023262.64 rows=24 width=4) (actual time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms Group Key: a11, a2 - Seq Scan on t1 (cost=0.00..1497532.80 rows=70097280 width=4) (actual time=16.935..67562.615 rows=70097280 loops=1) Planning time: 14.526 ms Execution time: 99453.413 ms (8 rows) postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=1536868.33..1536868.39 rows=24 width=4) (actual time=20656.397..20656.399 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=1536867.54..1536867.78 rows=24 width=4) (actual time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100% faster Group Key: a11, a2 - Seq Scan on t2 (cost=0.00..1011137.88 rows=70097288 width=4) (actual time=0.060..5408.205 rows=70097280 loops=1) Planning time: 0.161 ms Execution time: 20656.475 ms (8 rows) It looks like hah agg is slower when it is based on wide table about 25-100%. Is it - or I don't see something? Regards Pavel
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote: On 16/02/15 10:46, Andres Freund wrote: On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote: At a quick glance, this basic design seems workable. I would suggest expanding the replication IDs to regular 4 byte oids. Two extra bytes is a small price to pay, to make it work more like everything else in the system. I don't know. Growing from 3 to 5 byte overhead per relevant record (or even 0 to 5 in case the padding is reused) is rather noticeable. If we later find it to be a limit (I seriously doubt that), we can still increase it in a major release without anybody really noticing. I agree that limiting the overhead is important. But I have related though about this - I wonder if it's worth to try to map this more directly to the CommitTsNodeId. Maybe. I'd rather go the other way round though; replication_identifier.c/h's stuff seems much more generic than CommitTsNodeId. I mean it is currently using that for the actual storage, but I am thinking of having the Replication Identifiers be the public API and the CommitTsNodeId stuff be just hidden implementation detail. This thought is based on the fact that CommitTsNodeId provides somewhat meaningless number while Replication Identifiers give us nice name for that number. And if we do that then the type should perhaps be same for both? I'm not sure. Given that this is included in a significant number of recordsd I'd really rather not increase the overhead as described above. Maybe we can just limit CommitTsNodeId to the same size for now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On Wed, 2015-01-07 at 20:07 +0100, Tomas Vondra wrote: So I started digging in the code and I noticed this: hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true); which is IMHO obviously wrong, because that accounts only for the hashtable itself. It might be correct for aggregates with state passed by value, but it's incorrect for state passed by reference (e.g. Numeric, arrays etc.), because initialize_aggregates does this: oldContext = MemoryContextSwitchTo(aggstate-aggcontext); pergroupstate-transValue = datumCopy(peraggstate-initValue, peraggstate-transtypeByVal, peraggstate-transtypeLen); MemoryContextSwitchTo(oldContext); and it's also wrong for all the user-defined aggretates that have no access to hashcontext at all, and only get reference to aggcontext using AggCheckCallContext(). array_agg() is a prime example. Actually, I believe the first one is right, and the second one is wrong. If we allocate pass-by-ref transition states in aggcontext, that defeats the purpose of the patch, because we can't release the memory when we start a new batch (because we can't reset aggcontext). Instead, the transition states should be copied into hashcontext; we should count the memory usage of hashcontext; AggCheckCallContext should return hashcontext; and after each batch we should reset hashcontext. It might be worth refactoring a bit to call it the groupcontext or something instead, and use it for both HashAgg and GroupAgg. That would reduce the need for conditionals. [ ... problems with array_agg subcontexts ... ] and it runs indefinitely (I gave up after a few minutes). I believe this renders the proposed memory accounting approach dead. ... The array_agg() patch I submitted to this CF would fix this particular query, as it removes the child contexts (so there's no need for recursion in MemoryContextMemAllocated), but it does nothing to the user-defined aggregates out there. And it's not committed yet. Now that your patch is committed, I don't think I'd call the memory accounting patch I proposed dead yet. We decided that creating one context per group is essentially a bug, so we don't need to optimize for that case. Your approach may be better, though. Thank you for reviewing. I'll update my patches and resubmit for the next CF. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash agg is slower on wide tables?
2015-02-22 9:28 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: Hi I did some benchmarks and I found some strange numbers. do $$ begin drop table if exists t1; execute 'create table t1(' || array_to_string(array(select 'a' || i || ' smallint' from generate_series(1,30) g(i)), ',') || ')'; -- special column a2, a11 insert into t1 select 2008, i % 12 + 1, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, case when random() 0.9 then 1 else 0 end, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20 from generate_series(1,7009728) g(i); drop table if exists t2; create table t2 as select a2, a11 from t1; analyze t1; analyze t2; end; $$; postgres=# \dt+ t* List of relations Schema | Name | Type | Owner | Size | Description +--+---+---++- public | t1 | table | pavel | 622 MB | public | t2 | table | pavel | 242 MB | (2 rows) postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=202327.03..202327.09 rows=24 width=4) (actual time=2609.159..2609.161 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=202326.24..202326.48 rows=24 width=4) (actual time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms Group Key: a11, a2 - Seq Scan on t1 (cost=0.00..149753.28 rows=7009728 width=4) (actual time=0.071..616.222 rows=7009728 loops=1) Planning time: 0.138 ms Execution time: 2609.247 ms (8 rows) postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=153688.03..153688.09 rows=24 width=4) (actual time=2100.058..2100.059 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=153687.24..153687.48 rows=24 width=4) (actual time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster Group Key: a11, a2 - Seq Scan on t2 (cost=0.00..101114.28 rows=7009728 width=4) (actual time=0.043..532.680 rows=7009728 loops=1) Planning time: 0.178 ms Execution time: 2100.158 ms (8 rows) postgres=# \dt+ t* List of relations Schema | Name | Type | Owner | Size | Description +--+---+---+-+- public | t1 | table | pavel | 6225 MB | public | t2 | table | pavel | 2423 MB | (2 rows) postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2; QUERY PLAN - Sort (cost=2023263.19..2023263.25 rows=24 width=4) (actual time=99453.272..99453.274 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=2023262.40..2023262.64 rows=24 width=4) (actual time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms Group Key: a11, a2 - Seq Scan on t1 (cost=0.00..1497532.80 rows=70097280 width=4) (actual time=16.935..67562.615 rows=70097280 loops=1) Planning time: 14.526 ms Execution time: 99453.413 ms (8 rows) postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2; QUERY PLAN --- Sort (cost=1536868.33..1536868.39 rows=24 width=4) (actual time=20656.397..20656.399 rows=24 loops=1) Sort Key: a11, a2 Sort Method: quicksort Memory: 26kB - HashAggregate (cost=1536867.54..1536867.78 rows=24 width=4) (actual time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100% faster Group Key: a11, a2 - Seq Scan on t2 (cost=0.00..1011137.88 rows=70097288 width=4) (actual time=0.060..5408.205 rows=70097280 loops=1) Planning time: 0.161 ms Execution time: 20656.475 ms (8 rows) It looks like hah agg is slower when it is based on wide table about 25-100%. Is it - or I don't see something? next query? why we read all columns from t1? postgres=# explain analyze verbose select count(*),
Re: [HACKERS] Abbreviated keys for text cost model fix
On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan p...@heroku.com wrote: You should try it with the data fully sorted like this, but with one tiny difference: The very last tuple is out of order. How does that look? Another thing that may be of particular interest to you as a Czech person is how various locales perform. I believe that the collation rules of Czech and Hungarian are particularly complex, with several passes often required for strcoll() (I think maybe more than 4). You should either measure that, or control for it. I was prepared to attribute the differences in the two systems to differences in compute bandwidth between the CPUs involved (which are perhaps more noticeable now, since memory latency is less of a bottleneck). However, the inconsistent use of collations could actually matter here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
Magnus Hagander mag...@hagander.net writes: On Sun, Feb 15, 2015 at 9:46 PM, Peter Geoghegan p...@heroku.com wrote: also think that it's a waste of screen space to show who within the annotation view. Granted, the old app supported this, but I tend to think that if I actually cared who added a certain annotation, I'd be happy to drill down into history. I haven't cared so far, AFAICR. Hmm. Personally, I definitely care about who made an annotation, but i guess I could be OK with going into the history as well. Anybody else have an opinion on the matter? I don't personally feel strongly either way. Hmm ... in the old system, I agree with Peter, in fact I tended to find the who annotations positively misleading, because people frequently added other peoples' messages to the CF app. It always looked to me like this had the effect of attributing person A's patch to person B, who'd actually only updated the CF entry. I think the primary thing you usually want to know is who is the author of a given email message. With the new system of auto-adding messages, that particular problem should be gone, and annotations should be mostly special events rather than routine maintenance. So maybe who made the annotation is of more interest than it often was before. I'm inclined to say you should leave it alone for the moment. We don't have enough experience with the new system to be sure how we'll use it, so why do work you might just end up reverting? We can adjust the display later if we become convinced we don't need this info. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: In short, this fixes all the cases except for the ASC sorted data. I haven't done any code review, but I think we want this. I'll use data from the i5-2500k, but it applies to the Xeon too, except that the Xeon results are more noisy and the speedups are not that significant. For the 'text' data type, and 'random' dataset, the results are these: scaledatumcost-model --- 10 328% 323% 100 392% 391% 200 96% 565% 300 97% 572% 400 97% 571% 500 98% 570% The numbers are speedup vs. master, so 100% means exactly the same speed, 200% means twice as fast. So while with 'datum' patch this actually caused very nice speedup for small datasets - about 3-4x speedup up to 1M rows, for larger datasets we've seen small regression (~3% slower). With the cost model fix, we actually see a significant speedup (about 5.7x) for these cases. Cool. I haven't verified whether this produces the same results, but if it does this is very nice. For 'DESC' dataset (i.e. data sorted in reverse order), we do get even better numbers, with up to 6.5x speedup on large datasets. But for 'ASC' dataset (i.e. already sorted data), we do get this: scaledatumcost-model --- 10 85% 84% 100 87% 87% 200 76% 96% 300 82% 90% 400 91% 83% 500 93% 81% Ummm, not that great, I guess :-( You should try it with the data fully sorted like this, but with one tiny difference: The very last tuple is out of order. How does that look? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Sun, Feb 15, 2015 at 7:59 PM, Peter Eisentraut pete...@gmx.net wrote: On 2/14/15 7:30 AM, Magnus Hagander wrote: On Mon, Feb 9, 2015 at 4:56 PM, Robert Haas robertmh...@gmail.com Can we make it smarter, so that the kinds of things people produce intending for them to be patches are thought by the CF app to be patches? Doing it wouldn't be too hard, as the code right now is simply: # Attempt to identify the file using magic information mtype = mag.buffer(contents) if mtype.startswith('text/x-diff'): a.ispatch = True else: a.ispatch = False (where mag is the API call into the magic module) So we could easily add for example our own regexp parsing or so. The question is do we want to - because we'll have to maintain it as well. But I guess if we have a restricted enough set of rules, we can probably live with that. As I had described in http://www.postgresql.org/message-id/54dd2413.8030...@gmx.net, this is all but impossible. The above rule is certainly completely detached from the reality of what people actually send in. If you are just ignoring patches that don't match your rule set, this is not going to work very well. I think the old system where the patch submitter declared, this message contains my patch, is the only one that will work. Would you suggest removing the automated system completely, or keep it around and just make it possible to override it (either by removing the note that something is a patch, or by making something that's not listed as a patch become marked as such)? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] New CF app deployment
On Sun, Feb 15, 2015 at 9:46 PM, Peter Geoghegan p...@heroku.com wrote: On Sun, Feb 15, 2015 at 4:59 PM, Peter Eisentraut pete...@gmx.net wrote: I think the old system where the patch submitter declared, this message contains my patch, is the only one that will work. I tend to agree. That being said, calling out latest attachments is also useful (or highlighting that a particular mail has a particular file attached in general). We can keep listing the attachment and just remove the automated check of what *kind* of attachment it is. Annotations-wise, I think that it would be nice to be able to modify an annotation, in case a typo is made (the old app supported this). I I was sort of thinking it was something that happened seldom enough that you could just delete it and remove it again. But I guess if we're specifically thinking typos, it would make sense to add the ability to edit. also think that it's a waste of screen space to show who within the annotation view. Granted, the old app supported this, but I tend to think that if I actually cared who added a certain annotation, I'd be happy to drill down into history. I haven't cared so far, AFAICR. Hmm. Personally, I definitely care about who made an annotation, but i guess I could be OK with going into the history as well. Anybody else have an opinion on the matter? I don't personally feel strongly either way. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Abbreviated keys for text cost model fix
On 23.2.2015 00:16, Peter Geoghegan wrote: On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan p...@heroku.com wrote: You should try it with the data fully sorted like this, but with one tiny difference: The very last tuple is out of order. How does that look? I'm running that test now, I'll post the results tomorrow. Another thing that may be of particular interest to you as a Czech person is how various locales perform. I believe that the collation rules of Czech and Hungarian are particularly complex, with several passes often required for strcoll() (I think maybe more than 4). You should either measure that, or control for it. I was prepared to attribute the differences in the two systems to differences in compute bandwidth between the CPUs involved (which are perhaps more noticeable now, since memory latency is less of a bottleneck). However, the inconsistent use of collations could actually matter here. That's a good point, but all the tests were executed with en_US.UTF-8. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Sat, Feb 14, 2015 at 4:43 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/04/2015 11:41 PM, Josh Berkus wrote: On 02/04/2015 12:06 PM, Robert Haas wrote: On Wed, Feb 4, 2015 at 1:05 PM, Josh Berkus j...@agliodbs.com wrote: Let me push max_wal_size and min_wal_size again as our new parameter names, because: * does what it says on the tin * new user friendly * encourages people to express it in MB, not segments * very different from the old name, so people will know it works differently That's not bad. If we added a hard WAL limit in a future release, how would that fit into this naming scheme? Well, first, nobody's at present proposing a patch to add a hard limit, so I'm reluctant to choose non-obvious names to avoid conflict with a feature nobody may ever write. There's a number of reasons a hard limit would be difficult and/or undesirable. If we did add one, I'd suggest calling it wal_size_limit or something similar. However, we're most likely to only implement the limit for archives, which means that it might acually be called archive_buffer_limit or something more to the point. Ok, I don't hear any loud objections to min_wal_size and max_wal_size, so let's go with that then. Attached is a new version of this. It now comes in four patches. The first three are just GUC-related preliminary work, the first of which I posted on a separate thread today. I applied all the 4 patches to the latest master successfully and performed a test with heavy continuous load. I see no much difference in the checkpoint behaviour and all seems to be working as expected. I did a test with following parameter values - max_wal_size = 1MB min_wal_size = 1000MB checkpoint_timeout = 5min Upon performing a heavy load operation, the checkpoints were occurring based on timeouts. pg_xlog size fluctuated a bit (not very much). Initially few mins pg_xlog size stayed at 3.3G and gradually increased to 5.5G max during the operation. There was a continuous fluctuation on number of segments being removed+recycled. A part of the checkpoint logs are as follows - 2015-02-23 15:16:00.318 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:16:53.943 GMT-10 LOG: checkpoint complete: wrote 3010 buffers (18.4%); 0 transaction log file(s) added, 0 removed, 159 recycled; write=27.171 s, sync=25.945 s, total=53.625 s; sync files=20, longest=5.376 s, average=1.297 s; distance=2748844 kB, estimate=2748844 kB 2015-02-23 15:21:00.438 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:22:01.352 GMT-10 LOG: checkpoint complete: wrote 2812 buffers (17.2%); 0 transaction log file(s) added, 0 removed, 168 recycled; write=25.351 s, sync=35.346 s, total=60.914 s; sync files=34, longest=9.025 s, average=1.039 s; distance=1983318 kB, estimate=2672291 kB 2015-02-23 15:26:00.314 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:26:25.612 GMT-10 LOG: checkpoint complete: wrote 2510 buffers (15.3%); 0 transaction log file(s) added, 0 removed, 121 recycled; write=22.623 s, sync=2.477 s, total=25.297 s; sync files=20, longest=1.418 s, average=0.123 s; distance=2537230 kB, estimate=2658785 kB 2015-02-23 15:31:00.477 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:31:25.925 GMT-10 LOG: checkpoint complete: wrote 2625 buffers (16.0%); 0 transaction log file(s) added, 0 removed, 155 recycled; write=23.657 s, sync=1.592 s, total=25.447 s; sync files=13, longest=0.319 s, average=0.122 s; distance=2797386 kB, estimate=2797386 kB 2015-02-23 15:36:00.607 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:36:52.686 GMT-10 LOG: checkpoint complete: wrote 3473 buffers (21.2%); 0 transaction log file(s) added, 0 removed, 171 recycled; write=31.257 s, sync=20.446 s, total=52.078 s; sync files=33, longest=4.512 s, average=0.619 s; distance=2153903 kB, estimate=2733038 kB 2015-02-23 15:41:00.675 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:41:25.092 GMT-10 LOG: checkpoint complete: wrote 2456 buffers (15.0%); 0 transaction log file(s) added, 0 removed, 131 recycled; write=21.974 s, sync=2.282 s, total=24.417 s; sync files=27, longest=1.275 s, average=0.084 s; distance=2258648 kB, estimate=2685599 kB 2015-02-23 15:46:00.671 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:46:26.757 GMT-10 LOG: checkpoint complete: wrote 2644 buffers (16.1%); 0 transaction log file(s) added, 0 removed, 138 recycled; write=23.619 s, sync=2.181 s, total=26.086 s; sync files=12, longest=0.709 s, average=0.181 s; distance=2787124 kB, estimate=2787124 kB 2015-02-23 15:51:00.509 GMT-10 LOG: checkpoint starting: time 2015-02-23 15:53:30.793 GMT-10 LOG: checkpoint complete: wrote 13408 buffers (81.8%); 0 transaction log file(s) added, 0 removed, 170 recycled; write=149.432 s, sync=0.664 s, total=150.284 s; sync files=13, longest=0.286 s, average=0.051 s; distance=1244483 kB, estimate=2632860 kB Above checkpoint logs are generated at the time when pg_xlog size was at 5.4G *Code* *Review* I had a look at the
Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
On Mon, Feb 23, 2015 at 12:00 AM, Peter Eisentraut pete...@gmx.net wrote: On 2/22/15 5:41 AM, Michael Paquier wrote: You could argue that these .gitignore files don't actually belong there, but your patch doesn't change or move those files, and even modules that have non-empty sql/ or expected/ directories have .gitignore files there, so it is considered the appropriate location. This is up to the maintainer of each extension to manage their code tree. However I can imagine that some people would be grateful if we allow them to not need sql/ and expected/ containing only one single .gitignore file ignoring everything with *, making the code tree of their extensions cleaner. I would like to have an extension in tree that also does this, so we have a regression test of this functionality. Sure. Here is one in the patch attached added as a test module. The name of the module is regress_dynamic. Perhaps the name could be better.. -- Michael From 0b89d35f9605f866b45943d842898e30923476c3 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 23 Feb 2015 15:02:14 +0900 Subject: [PATCH 1/2] Enforce creation of input and output paths in pg_regress This is particularly useful for extensions that have only regression tests in input/ and output/ dynamically generated when running the tests to keep the code tree of such extensions clean without empty folders containing as only content a .gitignore ignoring everything else other than its own existence. --- src/test/regress/pg_regress.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 3af0e57..a7aa580 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -496,6 +496,7 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c { char testtablespace[MAXPGPATH]; char indir[MAXPGPATH]; + char result_dir[MAXPGPATH]; struct stat st; int ret; char **name; @@ -520,6 +521,14 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c /* Error logged in pgfnames */ exit(2); + /* + * Enforce creation of destination directory if it does not exist yet. + * This is useful for tests using only source files. + */ + snprintf(result_dir, MAXPGPATH, %s/%s, dest_dir, dest_subdir); + if (!directory_exists(result_dir)) + make_directory(result_dir); + snprintf(testtablespace, MAXPGPATH, %s/testtablespace, outputdir); #ifdef WIN32 @@ -565,7 +574,7 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c /* build the full actual paths to open */ snprintf(prefix, strlen(*name) - 6, %s, *name); snprintf(srcfile, MAXPGPATH, %s/%s, indir, *name); - snprintf(destfile, MAXPGPATH, %s/%s/%s.%s, dest_dir, dest_subdir, + snprintf(destfile, MAXPGPATH, %s/%s.%s, result_dir, prefix, suffix); infile = fopen(srcfile, r); -- 2.3.0 From c5e58c85e743c3c7b133234588e2010612166f8f Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 23 Feb 2015 15:23:44 +0900 Subject: [PATCH 2/2] Add regress_dynamic as a test module This simple extension has the characteristic to only contain non-static regression test content, and is aimed to test if pg_regress is able to generate appropriately input and output directories when they do not exist. --- src/test/modules/regress_dynamic/.gitignore | 8 src/test/modules/regress_dynamic/Makefile| 16 src/test/modules/regress_dynamic/README | 6 ++ src/test/modules/regress_dynamic/input/basic.source | 9 + src/test/modules/regress_dynamic/output/basic.source | 11 +++ .../modules/regress_dynamic/regress_dynamic--1.0.sql | 8 src/test/modules/regress_dynamic/regress_dynamic.control | 5 + 7 files changed, 63 insertions(+) create mode 100644 src/test/modules/regress_dynamic/.gitignore create mode 100644 src/test/modules/regress_dynamic/Makefile create mode 100644 src/test/modules/regress_dynamic/README create mode 100644 src/test/modules/regress_dynamic/input/basic.source create mode 100644 src/test/modules/regress_dynamic/output/basic.source create mode 100644 src/test/modules/regress_dynamic/regress_dynamic--1.0.sql create mode 100644 src/test/modules/regress_dynamic/regress_dynamic.control diff --git a/src/test/modules/regress_dynamic/.gitignore b/src/test/modules/regress_dynamic/.gitignore new file mode 100644 index 000..122ede3 --- /dev/null +++ b/src/test/modules/regress_dynamic/.gitignore @@ -0,0 +1,8 @@ +# Generated sub-directories +/log/ +/results/ +/tmp_check/ + +# Input and output directories of regression tests +/expected/ +/sql/ diff --git a/src/test/modules/regress_dynamic/Makefile b/src/test/modules/regress_dynamic/Makefile new file mode 100644 index 000..2cab345 --- /dev/null +++
Re: [HACKERS] multiple backends attempting to wait for pincount 1
On 02/13/2015 06:27 AM, Tom Lane wrote: Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly the same failure pattern on HEAD: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2015-02-06%2011%3A59%3A59 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tickdt=2015-02-12%2010%3A22%3A57 I'd say we have a problem. I'd even go so far as to say that somebody has completely broken locking, because this looks like autovacuum and manual vacuuming are hitting the same table at the same time. fwiw - looks like spoonbill(not doing CLOBBER_CACHE_ALWAYS) managed to trigger that ones as well: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-02-23%2000%3A00%3A06 there is also some failures from the BETWEEN changes in that regression.diff but that might be fallout from the above problem. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
Changes in this patch: - added polymorphic versions of dblink_fetch() - upped dblink version # to 1.2 because of new functions - migration 1.1 - 1.2 - DocBook changes for dblink(), dblink_get_result(), dblink_fetch() On Sun, Feb 22, 2015 at 11:38 PM, Corey Huinker corey.huin...@gmail.com wrote: nevermind. Found it. On Sun, Feb 22, 2015 at 11:18 PM, Corey Huinker corey.huin...@gmail.com wrote: Yes, that was it, I discovered it myself and should have posted a nevermind. Now I'm slogging through figuring out where to find elog() messages from the temporary server. It's slow, but it's progress. On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com wrote: + ERROR: could not stat file /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql: No such file or directory Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink Makefile? That would explain why this file has not been included in the temporary installation deployed by pg_regress. -- Michael diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile index e833b92..a00466e 100644 --- a/contrib/dblink/Makefile +++ b/contrib/dblink/Makefile @@ -6,7 +6,8 @@ PG_CPPFLAGS = -I$(libpq_srcdir) SHLIB_LINK = $(libpq) EXTENSION = dblink -DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql +DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql \ + dblink--1.2.sql dblink--1.1--1.2.sql REGRESS = paths dblink REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress \ diff --git a/contrib/dblink/dblink--1.1--1.2.sql b/contrib/dblink/dblink--1.1--1.2.sql index e5a0900..128611d 100644 --- a/contrib/dblink/dblink--1.1--1.2.sql +++ b/contrib/dblink/dblink--1.1--1.2.sql @@ -32,3 +32,15 @@ CREATE FUNCTION dblink_get_result(text, bool, anyelement) RETURNS SETOF anyelement AS 'MODULE_PATHNAME', 'dblink_get_result' LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + + diff --git a/contrib/dblink/dblink--1.2.sql b/contrib/dblink/dblink--1.2.sql index bf5ddaa..9d31e2e 100644 --- a/contrib/dblink/dblink--1.2.sql +++ b/contrib/dblink/dblink--1.2.sql @@ -71,6 +71,19 @@ RETURNS setof record AS 'MODULE_PATHNAME','dblink_fetch' LANGUAGE C STRICT; +CREATE FUNCTION dblink_fetch (text, int, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + + + + CREATE FUNCTION dblink_fetch (text, text, int) RETURNS setof record AS 'MODULE_PATHNAME','dblink_fetch' diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 009b877..fde750c 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -537,52 +537,39 @@ dblink_fetch(PG_FUNCTION_ARGS) char *curname = NULL; int howmany = 0; boolfail = true;/* default to backward compatible */ + int first_optarg; prepTuplestoreResult(fcinfo); DBLINK_INIT; - if (PG_NARGS() == 4) + if (get_fn_expr_argtype(fcinfo-flinfo,1) == TEXTOID) { - /* text,text,int,bool */ + /* text,text,int,[bool],[anytype] */ conname = text_to_cstring(PG_GETARG_TEXT_PP(0)); curname = text_to_cstring(PG_GETARG_TEXT_PP(1)); howmany = PG_GETARG_INT32(2); - fail = PG_GETARG_BOOL(3); - + first_optarg = 3; rconn = getConnectionByName(conname); if (rconn) conn = rconn-conn; } - else if (PG_NARGS() == 3) - { - /* text,text,int or text,int,bool */ - if (get_fn_expr_argtype(fcinfo-flinfo, 2) == BOOLOID) - { - curname = text_to_cstring(PG_GETARG_TEXT_PP(0)); - howmany = PG_GETARG_INT32(1); - fail = PG_GETARG_BOOL(2); - conn = pconn-conn; - } - else - { - conname = text_to_cstring(PG_GETARG_TEXT_PP(0)); - curname = text_to_cstring(PG_GETARG_TEXT_PP(1)); - howmany = PG_GETARG_INT32(2); - - rconn = getConnectionByName(conname); - if (rconn) - conn = rconn-conn; - } - } - else if (PG_NARGS() == 2) + else { - /* text,int */ + /*
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On Sun, Feb 22, 2015 at 6:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: After some more hacking, the only remaining uses of foo[1] in struct declarations are: 1. A couple of places where the array is actually the only struct member; for some unexplainable reason gcc won't let you use flexible array syntax in that case. Yes. Thanks for adding notes at those places. 2. struct sqlda_struct in ecpg's sqlda-native.h. We have a problem with using [FLEXIBLE_ARRAY_MEMBER] there because (1) pg_config.h isn't (and I think shouldn't be) #included by this file, and (2) there is very possibly application code depending on sizeof() this struct; the risk of breaking such code seems to outweigh any likely benefit. Also, despite that I tried changing [1] to [] and fixing the places I could find that depended on sizeof(struct sqlda_struct), but I apparently can't find them all because the ecpg regression tests fell over :-(. Yeah, I think we can live with that. The rest of the backend code is clean now, and the allocation calculations are more understandable. Anyway, barring excitement in the buildfarm I think this project is complete. Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_check_dir comments and implementation mismatch
On Fri, Feb 20, 2015 at 12:59 AM, Noah Misch n...@leadboat.com wrote: On Mon, Feb 02, 2015 at 03:48:33PM -0500, Robert Haas wrote: On Sat, Jan 31, 2015 at 8:28 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I've attached a new version of the patch fixing the missing closedir on readdir error. If readir() fails and closedir() succeeds, the return will be -1 but errno will be 0. Out of curiosity, have you seen a closedir() implementation behave that way? It would violate C99 (The value of errno is zero at program startup, but is never set to zero by any library function.) and POSIX. No. Good point, I didn't think about that. I think this way is safer, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_check_dir comments and implementation mismatch
Robert Haas robertmh...@gmail.com writes: On Fri, Feb 20, 2015 at 12:59 AM, Noah Misch n...@leadboat.com wrote: On Mon, Feb 02, 2015 at 03:48:33PM -0500, Robert Haas wrote: If readir() fails and closedir() succeeds, the return will be -1 but errno will be 0. Out of curiosity, have you seen a closedir() implementation behave that way? It would violate C99 (The value of errno is zero at program startup, but is never set to zero by any library function.) and POSIX. No. Good point, I didn't think about that. I think this way is safer, though. While the spec forbids library functions from setting errno to zero, there is no restriction on them changing errno in other ways despite returning success; their exit-time value of errno is only well-defined if they fail. So we do need to preserve errno explicitly across closedir(), or we may report the wrong failure from readdir(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] restrict global access to be readonly
On Tue, Feb 17, 2015 at 4:40 AM, happy times guangzhouzh...@qq.com wrote: The first choice Tom pointed makes sense to me: adding this as eqivalent to setting all subsequent transactions as read only. It is useful enough in the scenarios where disk limit for the instance is reached, we want to block all write access(this limit is typically soft limit and vacuum logs or sort spills could be permitted). I previously thought of the choice of not generating any WAL semantics, but now doubt if thats practically useful. We are forced to restart the old master with recovery mode during switching roles of master-slave, which would make it into the state of not generating any WAL. And for logical replication, seems setting transactions as readonly could do the job to avoid logs to be shipped to slave. One other thing to consider is the user to be blocked. I expect this command to prevent write access even for the superusers, since there may be other internal apps that connect as superuser and do writes, they are expected to be blocked too. And sometime we may use this command to avoid any unexpected write operation. Last thing is when the command returns. I expected it to return immediately and not waiting for existing active transactions to finish. This is to avoid existing long running transactions to block it and let the user to decide whether to wait or kill existing transactions. The use cases I can think of are: - Opening a possibly-damaged database strictly read-only so you can inspect it without risking further damage; or forcing a damaged database that is already up and running to go read-only to prevent further damage. In this case, you'd want to prohibit all writes to the data files, even hint bit changes, but the use of temporary files for sorts or hashes would be fine. - Forcing a master into a read-only state in preparation for a controlled switchover. If you can completely stop WAL generation on the master, replay all WAL generated on the master prior to the switchover on the standby, and then switch over, you could make the old master a slave of the new master. I think the requirements here are similar to the previous case. I'm not 100% sure that we need to prevent hint bits getting set in this case, but it probably wouldn't hurt. Temp file writes are, again, OK. - Taking a copy of the database for backup purposes. Same thing again. - Accessing a cluster stored on a read-only medium, like a CD or DVD. In this case, even temporary file writes are no good. - Your proposed use case of preventing the disk from being filled up is a little different. There's no real problem if the data files fill up the disk; at a certain point, users will get errors, but forcing the database read only is going to do that anyway (and sooner). There's a big problem if the xlog partition fills up, though. You could want a few different things here depending on the details of your use case, but preventing all WAL generation is one of them. Based on the above, I'm inclined to think that making the system read only should (1) prevent all WAL generation and (2) prevent all data file modifications, but (3) still allow the use of temporary files. -- It's also worth taking a look at what other systems support. SQL server support something like this feature using this (ugly) syntax: ALTER DATABASE [whatever] SET READ_ONLY WITH NO_WAIT I'm not sure what the semantics are, exactly. In Oracle, you can open a database read-only: ALTER DATABASE whatever OPEN READ ONLY; It's not clear to me whether you can use this to force an already-read-write database back to read only mode. Oracle also lets you make a tablespace read-only if there are no open transactions, running hot backups, etc anywhere in the system. That syntax is just: ALTER TABLESPACE whatever READ ONLY; That forbids all future data file modifications. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
Yes, that was it, I discovered it myself and should have posted a nevermind. Now I'm slogging through figuring out where to find elog() messages from the temporary server. It's slow, but it's progress. On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com wrote: + ERROR: could not stat file /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql: No such file or directory Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink Makefile? That would explain why this file has not been included in the temporary installation deployed by pg_regress. -- Michael
Re: [HACKERS] Precedence of standard comparison operators
Attached is an improved patch that includes optional warnings for constructs that changed parsing. It's not quite 100% but I think it's about 90% correct; the difference in size between this and the previous patch should be a pretty fair indication of what it's going to cost us to have a warning capability. What's missing from this version is that it can't tell the difference between LIKE/ILIKE/SIMILAR TO and the underlying operators, that is, it sees a LIKE b as a ~~ b because that's what the grammar emits. However, those inputs have different operator-precedence behavior. Likewise, we can't tell the difference between xmlexpr IS NOT DOCUMENT NOT (xmlexpr IS DOCUMENT) because the grammar converts the former into the latter --- and again, those two things have different precedence behavior. It wouldn't take very much additional code to fix these things by changing what the grammar emits; but I'm running out of energy for today. In any case, I thought I should put this up and see if this general approach is going to satisfy people's concerns about making such a change. regards, tom lane diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6bcb106..22a2d32 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** dynamic_library_path = 'C:\tools\postgre *** 6752,6757 --- 6752,6780 /listitem /varlistentry + varlistentry id=guc-operator-precedence-warning xreflabel=operator_precedence_warning + termvarnameoperator_precedence_warning/varname (typeboolean/type) + indexterm +primaryvarnameoperator_precedence_warning/ configuration parameter/primary + /indexterm + /term + listitem +para + When on, the parser will emit a warning for any construct that might + have changed meanings since productnamePostgreSQL/ 9.4 as a result + of changes in operator precedence. This is useful for auditing + applications to see if precedence changes have broken anything; but it + is not meant to be left turned on in production, since it will warn + about some perfectly valid, standard-compliant SQL code. + The default is literaloff/. +/para + +para + See xref linkend=sql-precedence for more information. +/para + /listitem + /varlistentry + varlistentry id=guc-quote-all-identifiers xreflabel=quote-all-identifiers termvarnamequote_all_identifiers/varname (typeboolean/type) indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 4b81b08..e7484c4 100644 *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** CAST ( 'replaceablestring/replaceable *** 984,993 associativity of the operators in productnamePostgreSQL/. Most operators have the same precedence and are left-associative. The precedence and associativity of the operators is hard-wired ! into the parser. This can lead to non-intuitive behavior; for ! example the Boolean operators literallt;/ and ! literalgt;/ have a different precedence than the Boolean ! operators literallt;=/ and literalgt;=/. Also, you will sometimes need to add parentheses when using combinations of binary and unary operators. For instance: programlisting --- 984,994 associativity of the operators in productnamePostgreSQL/. Most operators have the same precedence and are left-associative. The precedence and associativity of the operators is hard-wired ! into the parser. !/para ! !para ! You will sometimes need to add parentheses when using combinations of binary and unary operators. For instance: programlisting *** SELECT (5 !) - 6; *** 1008,1014 /para table id=sql-precedence-table ! titleOperator Precedence (decreasing)/title tgroup cols=3 thead --- 1009,1015 /para table id=sql-precedence-table ! titleOperator Precedence (highest to lowest)/title tgroup cols=3 thead *** SELECT (5 !) - 6; *** 1063,1087 /row row !entrytokenIS/token/entry !entry/entry !entryliteralIS TRUE/, literalIS FALSE/, literalIS NULL/, etc/entry ! /row ! ! row !entrytokenISNULL/token/entry !entry/entry !entrytest for null/entry ! /row ! ! row !entrytokenNOTNULL/token/entry !entry/entry !entrytest for not null/entry ! /row ! ! row !entry(any other)/entry entryleft/entry entryall other native and user-defined operators/entry /row --- 1064,1070 /row row !entry(any other operator)/entry entryleft/entry entryall other native and user-defined operators/entry /row ***
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
Hi! On 28.1.2015 05:03, Abhijit Menon-Sen wrote: At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote: It would be best to get this into a commit fest so it's not lost. It's there already. -- Abhijit I looked at this patch today, so a few comments from me: 1) I believe the patch is slightly broken, because the version was bumped to 1.3 but only partially, so I get this on make install $ make -j9 -s install /usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such file or directory ../../src/makefiles/pgxs.mk:129: recipe for target 'install' failed make[1]: *** [install] Error 1 Makefile:85: recipe for target 'install-pgstattuple-recurse' failed make: *** [install-pgstattuple-recurse] Error 2 make: *** Waiting for unfinished jobs The problem seems to be that Makefile already lists --1.3.sql in the DATA section, but the file was not renamed (there's just --1.2.sql). After fixing that, it's also necessary to fix the version in the control file, otherwise the CREATE EXTENSION fails. default_version = '1.2' - '1.3' At least that fixed the install for me. 2) Returning Datum from fbstat_heap, which is a static function seems a bit strange to me, I'd use the HeapTuple directly, but meh ... 3) The way the tuple is built by first turning the values into strings and then turned back into Datums by calling BuildTupleFromCStrings is more serious I guess. Why not to just keep the Datums and call heap_form_tuple directly? I see the other functions in pgstattuple.c do use the string way, so maybe there's a reason for that? Or is this just to keep the code consistent in the extension? 4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat' to make that consistent with the rest of pgstattuple functions. Or something similar, but 'fastbloat' is just plain confusing. Otherwise, the code looks OK to me. Now, there are a few features I'd like to have for production use (to minimize the impact): 1) no index support :-( I'd like to see support for more relation types (at least btree indexes). Are there any plans for that? Do we have an idea on how to compute that? 2) sampling just a portion of the table For example, being able to sample just 5% of blocks, making it less obtrusive, especially on huge tables. Interestingly, there's a TABLESAMPLE patch in this CF, so maybe it's possible to reuse some of the methods (e.g. functions behind SYSTEM sampling)? 3) throttling Another feature minimizing impact of running this on production might be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s' or something along those lines. 4) prefetch fbstat_heap is using visibility map to skip fully-visible pages, which is nice, but if we skip too many pages it breaks readahead similarly to bitmap heap scan. I believe this is another place where effective_io_concurrency (i.e. prefetch) would be appropriate. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Sat, Feb 21, 2015 at 11:29 PM, Petr Jelinek p...@2ndquadrant.com wrote: I am wondering a bit about interaction with wal_keep_segments. One thing is that wal_keep_segments is still specified in number of segments and not size units, maybe it would be worth to change it also? And the other thing is that, if set, the wal_keep_segments is the real max_wal_size from the user perspective (not from perspective of the algorithm in this patch, but user does not really care about that) which is somewhat weird given the naming. It seems like wal_keep_segments is more closely related to wal_*min*_size. The idea of both settings is that each is a minimum amount of WAL we want to keep around for some purpose. But they're not quite the same, I guess, because wal_min_size just forces us to keep that many files around - they can be overwritten whenever. wal_keep_segments is an amount of actual WAL data we want to keep around. Would it make sense to require that wal_keep_segments = wal_min_size? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
I seem to be getting tripped up in the regression test. This line was found in regression.diff + ERROR: could not stat file /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql: No such file or directory The file dblink--1.2.sql does exist in /home/ubuntu/src/postgres/contrib/dblink/ ~/src/postgres/contrib/dblink$ ls -la dblink--1.?.sql -rw-rw-r-- 1 ubuntu ubuntu 5.7K Feb 22 16:02 dblink--1.1.sql -rw-rw-r-- 1 ubuntu ubuntu 6.8K Feb 22 16:50 dblink--1.2.sql But it evidently isn't making it into the tmp_check dir. What needs to happen for new files to be seen by the regression test harness? On Fri, Feb 20, 2015 at 1:14 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Feb 20, 2015 at 2:50 PM, Corey Huinker corey.huin...@gmail.com wrote: Thanks - completely new to this process, so I'm going to need walking-through of it. I promise to document what I learn and try to add that to the commitfest wiki. Where can I go for guidance about documentation format and regression tests? Here are some guidelines about how to submit a patch: https://wiki.postgresql.org/wiki/Submitting_a_Patch You can have as well a look here to see how extensions like dblink are structured: http://www.postgresql.org/docs/devel/static/extend-pgxs.html What you need to do in the case of your patch is to add necessary test cases to in sql/dblink.sql for the new functions you have added and then update the output in expected/. Be sure to not break any existing things as well. After running the tests in your development environment output results are available in results then pg_regress generates a differential file in regressions.diffs. For the documentation, updating dblink.sgml with the new function prototypes would be sufficient. With perhaps an example(s) to show that what you want to add is useful. Regards, -- Michael
Re: [HACKERS] dblink: add polymorphic functions.
nevermind. Found it. On Sun, Feb 22, 2015 at 11:18 PM, Corey Huinker corey.huin...@gmail.com wrote: Yes, that was it, I discovered it myself and should have posted a nevermind. Now I'm slogging through figuring out where to find elog() messages from the temporary server. It's slow, but it's progress. On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com wrote: + ERROR: could not stat file /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql: No such file or directory Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink Makefile? That would explain why this file has not been included in the temporary installation deployed by pg_regress. -- Michael
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On 02/22/2015 11:48 AM, Kevin Grittner wrote: (2) Use a course enough granularity on time and a short enough maximum for the GUC to just keep a circular buffer of the mappings in memory. We might be able to make this dense enough that one minute resolution for up to 60 days could fit in 338kB. Obviously that could be reduced with courser granularity or a shorter maximum. This doesn't sound too bad to me. Presumably these would be tunable. I think one minute granularity would be fine for most purposes, but I can imagine people who would want it finer, like 10 seconds, say. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
Hi, On 22.2.2015 18:57, Petr Jelinek wrote: Tomas noticed that the patch is missing error check when TABLESAMPLE is used on view, so here is a new version that checks it's only used against table or matview. No other changes. Curious question - could/should this use page prefetch, similar to what bitmap heap scan does? I believe the answer is 'yes'. With SYSTEM that should be rather straightforward to implement, because it already works at page level, and it's likely to give significant performance speedup, similar to bitmap index scan: http://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com With BERNOULLI that might be more complex to implement because of the page/tuple sampling, and the benefit is probably much lower than for SYSTEM because it's likely that at least one tuple will be sampled. I'm not saying it has to be done in this CF (or that it makes the patch uncommitable). For example, this seems like a very nice project for the GSoC (clear scope, not too large, ...). -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On 23.2.2015 03:20, Jim Nasby wrote: On 2/22/15 5:41 PM, Tomas Vondra wrote: Otherwise, the code looks OK to me. Now, there are a few features I'd like to have for production use (to minimize the impact): 1) no index support:-( I'd like to see support for more relation types (at least btree indexes). Are there any plans for that? Do we have an idea on how to compute that? It'd be cleaner if had actual an actual am function for this, but see below. 2) sampling just a portion of the table For example, being able to sample just 5% of blocks, making it less obtrusive, especially on huge tables. Interestingly, there's a TABLESAMPLE patch in this CF, so maybe it's possible to reuse some of the methods (e.g. functions behind SYSTEM sampling)? 3) throttling Another feature minimizing impact of running this on production might be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s' or something along those lines. 4) prefetch fbstat_heap is using visibility map to skip fully-visible pages, which is nice, but if we skip too many pages it breaks readahead similarly to bitmap heap scan. I believe this is another place where effective_io_concurrency (i.e. prefetch) would be appropriate. All of those wishes are solved in one way or another by vacuum and/or analyze. If we had a hook in the tuple scanning loop and at the end of vacuum you could just piggy-back on it. But really all we'd need for vacuum to be able to report this info is one more field in LVRelStats, a call to GetRecordedFreeSpace for all-visible pages, and some logic to deal with pages skipped because we couldn't get the vacuum lock. Should we just add this to vacuum instead? Possibly. I think the ultimate goal is to be able to get this info easily and without disrupting the system performance too much (which is difficult without sampling/throttling). If we can stuff that into autovacuum reasonably, and then get the info from catalogs, I'm OK with that. However I'm not sure putting this into autovacuum is actually possible, because how do you merge data from multiple partial runs (when each of them skipped different pages)? Also, autovacuum is not the only place where we free space - we'd have to handle HOT for example, I guess. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On Tue, Feb 17, 2015 at 8:41 AM, Oskari Saarenmaa o...@ohmu.fi wrote: 15.01.2015, 21:58, Robert Haas kirjoitti: On Wed, Jan 14, 2015 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote: I think I'd for now simply not define pg_attribute_aligned() on platforms where it's not supported, instead of defining it empty. If we need a softer variant we can name it pg_attribute_aligned_if_possible or something. Sounds sane? Yes, that sounds like a much better plan. Attached an updated patch rebased on today's git master that never defines aligned or packed empty. This is also included in the current commitfest, https://commitfest.postgresql.org/4/115/ Is this going to play nicely with pgindent? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 23/02/15 03:24, Robert Haas wrote: On Sat, Feb 21, 2015 at 11:29 PM, Petr Jelinek p...@2ndquadrant.com wrote: I am wondering a bit about interaction with wal_keep_segments. One thing is that wal_keep_segments is still specified in number of segments and not size units, maybe it would be worth to change it also? And the other thing is that, if set, the wal_keep_segments is the real max_wal_size from the user perspective (not from perspective of the algorithm in this patch, but user does not really care about that) which is somewhat weird given the naming. It seems like wal_keep_segments is more closely related to wal_*min*_size. The idea of both settings is that each is a minimum amount of WAL we want to keep around for some purpose. But they're not quite the same, I guess, because wal_min_size just forces us to keep that many files around - they can be overwritten whenever. wal_keep_segments is an amount of actual WAL data we want to keep around. Err yes of course, min not max :) Would it make sense to require that wal_keep_segments = wal_min_size? It would to me, the patch as it stands is confusing in a sense that you can set min and max but then wal_keep_segments somewhat overrides those. And BTW this brings another point, I actually don't see check for min_wal_size = max_wal_size anywhere in the patch. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com wrote: + ERROR: could not stat file /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql: No such file or directory Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink Makefile? That would explain why this file has not been included in the temporary installation deployed by pg_regress. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On 2/22/15 5:41 PM, Tomas Vondra wrote: Otherwise, the code looks OK to me. Now, there are a few features I'd like to have for production use (to minimize the impact): 1) no index support:-( I'd like to see support for more relation types (at least btree indexes). Are there any plans for that? Do we have an idea on how to compute that? It'd be cleaner if had actual an actual am function for this, but see below. 2) sampling just a portion of the table For example, being able to sample just 5% of blocks, making it less obtrusive, especially on huge tables. Interestingly, there's a TABLESAMPLE patch in this CF, so maybe it's possible to reuse some of the methods (e.g. functions behind SYSTEM sampling)? 3) throttling Another feature minimizing impact of running this on production might be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s' or something along those lines. 4) prefetch fbstat_heap is using visibility map to skip fully-visible pages, which is nice, but if we skip too many pages it breaks readahead similarly to bitmap heap scan. I believe this is another place where effective_io_concurrency (i.e. prefetch) would be appropriate. All of those wishes are solved in one way or another by vacuum and/or analyze. If we had a hook in the tuple scanning loop and at the end of vacuum you could just piggy-back on it. But really all we'd need for vacuum to be able to report this info is one more field in LVRelStats, a call to GetRecordedFreeSpace for all-visible pages, and some logic to deal with pages skipped because we couldn't get the vacuum lock. Should we just add this to vacuum instead? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers