Re: [HACKERS] [rfc] overhauling pgstat.stat
On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Satoshi Nagayasu wrote: But, for now, I think we should have a real index for the statistics data because we already have several index storages, and it will allow us to minimize read/write operations. BTW, what kind of index would be preferred for this purpose? btree or hash? I find it hard to get excited about using the AM interface for this purpose. To me it makes a lot more sense to have separate, much simpler code. We don't need any transactionality, user defined types, user defined operators, or anything like that. +1. But, would not rewriting a lot of existing functionalities potentially lead to points of contention and/or much more effort? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
(2013/09/05 0:04), Andres Freund wrote: I'd vote for adding zeroing *after* the fallocate() first. +1, with FALLOC_FL_KEEP_SIZE flag. At least, fallocate with FALLOC_FL_KEEP_SIZE flag is faster than nothing in my developing sorted checkpoint. I adopted it to relation file, so I don't know about WAL file. With FALLOC_FL_KEEP_SIZE flag does not change file size and not do zero fill. It seems to only get continuous physical disk space in file system. Here is draft patch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dc47c47..85b582c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3409,6 +3409,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) (errcode_for_file_access(), errmsg(could not create file \%s\: %m, tmppath))); +#if defined(HAVE_FALLOCATE) + fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, XLogSegSize); +#endif + /* * Zero-fill the file. We have to do this the hard way to ensure that all * the file space has really been allocated --- on platforms that allow -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extending pg_stat_statements to expose queryid
Hello All, I am trying to revive the discussion about exposing queryid in pg_stat_statements. I did find the same request posted on hackers @ http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==z2m_vz5hhfkgw...@mail.gmail.com and http://www.postgresql.org/message-id/cacn56+nlmtwhg8eqqqnyzqe2q0negjokmgfiusk_aohw627...@mail.gmail.com From the discussions I concluded 1. The main use case for exposing queryid, is it being a better substitute to hashing the query text of a pg_stat_statements snapshot, to make a candidate key. Problems occur when hash value should be different even if query text is same. For example when a table referred in a query is dropped and recreated or when the query text is same on different schemas and schema name is not included in query text. 2. Exposing queryid was proposed earlier but was not accepted. The main reason was that queryid could be unstable as well. Since queryid was derived from hashing query tree and query tree could undergo changes between minor PostgreSQL releases, meant the queryid for same query could be different between releases, resulting in incorrect statement statistics collection. 3. Another problem is to distinguish between queries whose statistics are continuously maintained and queries which are intermittent, whose statistics might be silent reset, without the reporting tool being wiser. 4. A solution to avoid misrepresentation of intermittent queries as consistent queries would be to assign a unique number to each new row and once that row is discarded, the unique number cannot be reused. The drawbacks here is possible collision of unique values generated. 5. A patch implementing solution for identifying intermittent query is @ https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2. The solution avoids using a counter, and achieves the same result by the property that intermittent queries accumulate errors due to eviction from hashtable while consistent queries do not. Error accumulation would be the parameter by which a reporting tool can figure out if there was eviction of queries between snapshots. 6. To address the problem of unstable queryid generated from query tree, it was proposed to eliminate any possible misunderstanding that queryid will remain the same between releases, by xoring the hash from query tree with statistics sessionid. This also helps in all cases where the statistics file is reset like crash recovery,recovery mode, ensuring a new hash value for reset statistics. To avoid increasing the chance of collision, a longer session key and padding the queryid can be done to complete the XOR. Implementation of this is @ https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3 7. The patch pg_stat_statements-identification-v3 was returned with feedback for more documentation in commitfest 2013-01. Questions: 1. Is there a plan to re-introduce this patch? The code seems to be documented. 2. There was mention of further testing of error propagation using hooks. Could this be elaborated? 3. There was a use case that exposing queryid could be used to aggregate statistics across WAL based replication clusters. But now that queryid is derived from statistics session id, which is randomly generated, this use case is still not addressed. Is this correct? Regards Sameer
Re: [HACKERS] Analysis on backend-private memory usage (and a patch)
On 04.09.2013 23:56, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: One fairly simple thing we could do is to teach catcache.c to resize the caches. Then we could make the initial size of all the syscaches much smaller. I think this is attractive for the *other* reason you mention, namely preserving reasonable performance when a catcache grows larger than expected; but I'm pretty skeptical of nickel-and-diming caches that are already really small. Is it really worth cutting the TSPARSER caches from 4 pointers to 2 for instance? Yeah, that may be overdoing it. Then again, enlarging a hash table from 2 to 4 entries when needed is also very cheap. What concerns me about initially-undersized caches is that we'll waste space and time in the enlargement process. Enlarging a small hash tables is very cheap because, well, it's small. Enlarging a large hash table is more expensive, but if you have a lot of entries in the hash, then you also get the benefit of a larger hash when doing lookups. It does require some memory to hold the old and the new hash table while rehashing, but again, with a small hash table that's not significant, and with a large one the actual cached tuples take a lot more space than the buckets array anyway. Per Wikipedia [1]: If the table size increases or decreases by a fixed percentage at each expansion, the total cost of these resizings, amortized over all insert and delete operations, is still a constant, independent of the number of entries n and of the number m of operations performed. For example, consider a table that was created with the minimum possible size and is doubled each time the load ratio exceeds some threshold. If m elements are inserted into that table, the total number of extra re-insertions that occur in all dynamic resizings of the table is at most m − 1. In other words, dynamic resizing roughly doubles the cost of each insert or delete operation. Considering the amount of work involved in adding an entry to a catalog cache, I wouldn't worry about adding a tiny constant to the insertion time. I did some quick testing by creating 10 tables, and running a pgbench script that selects randomly from them: \setrandom tableno 1 10 select * from foo:tableno where i = 1; I timed the rehash operations with gettimeofday calls before and after the rehash: LOG: rehashed catalog cache id 44 for pg_class from 256 to 512 buckets: 27 us LOG: rehashed catalog cache id 47 for pg_statistic from 128 to 256 buckets: 14 us LOG: rehashed catalog cache id 44 for pg_class from 512 to 1024 buckets: 54 us LOG: rehashed catalog cache id 45 for pg_class from 256 to 512 buckets: 29 us LOG: rehashed catalog cache id 47 for pg_statistic from 256 to 512 buckets: 30 us LOG: rehashed catalog cache id 44 for pg_class from 1024 to 2048 buckets: 147 us LOG: rehashed catalog cache id 7 for pg_attribute from 512 to 1024 buckets: 87 us LOG: rehashed catalog cache id 45 for pg_class from 512 to 1024 buckets: 80 us LOG: rehashed catalog cache id 47 for pg_statistic from 512 to 1024 buckets: 88 us LOG: rehashed catalog cache id 44 for pg_class from 2048 to 4096 buckets: 342 us LOG: rehashed catalog cache id 7 for pg_attribute from 1024 to 2048 buckets: 197 us LOG: rehashed catalog cache id 45 for pg_class from 1024 to 2048 buckets: 183 us LOG: rehashed catalog cache id 47 for pg_statistic from 1024 to 2048 buckets: 194 us LOG: rehashed catalog cache id 44 for pg_class from 4096 to 8192 buckets: 764 us LOG: rehashed catalog cache id 7 for pg_attribute from 2048 to 4096 buckets: 401 us LOG: rehashed catalog cache id 45 for pg_class from 2048 to 4096 buckets: 383 us LOG: rehashed catalog cache id 47 for pg_statistic from 2048 to 4096 buckets: 406 us LOG: rehashed catalog cache id 44 for pg_class from 8192 to 16384 buckets: 1758 us LOG: rehashed catalog cache id 7 for pg_attribute from 4096 to 8192 buckets: 833 us LOG: rehashed catalog cache id 45 for pg_class from 4096 to 8192 buckets: 842 us LOG: rehashed catalog cache id 47 for pg_statistic from 4096 to 8192 buckets: 859 us LOG: rehashed catalog cache id 44 for pg_class from 16384 to 32768 buckets: 3564 us LOG: rehashed catalog cache id 7 for pg_attribute from 8192 to 16384 buckets: 1769 us LOG: rehashed catalog cache id 45 for pg_class from 8192 to 16384 buckets: 1752 us LOG: rehashed catalog cache id 47 for pg_statistic from 8192 to 16384 buckets: 1719 us LOG: rehashed catalog cache id 44 for pg_class from 32768 to 65536 buckets: 7538 us LOG: rehashed catalog cache id 7 for pg_attribute from 16384 to 32768 buckets: 3644 us LOG: rehashed catalog cache id 45 for pg_class from 16384 to 32768 buckets: 3609 us LOG: rehashed catalog cache id 47 for pg_statistic from 16384 to 32768 buckets: 3508 us LOG: rehashed catalog cache id 44 for pg_class from 65536 to 131072 buckets: 16457 us LOG: rehashed catalog cache id 7 for pg_attribute from
[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???
On Thu, Sep 05, 2013 at 11:07:43AM +0800, wangs...@highgo.com.cn wrote: ??? 2013-09-04 19:30, Noah Misch ??: On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn wrote: I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Noah Misch n...@leadboat.com wrote: n_header is part of the numeric field's data. That's not just pedantry: the display scale stored in n_header affects how numeric_out() formats the value. Thanks for your reply. Just because of what you said, I think increasing scale only lead to differently diaplay. There's no difference between 5.25 and 5.2500 in use. So thers's no need to rewrite the table. Right or wrong, our numeric type caters to applications that do care about the difference between those outputs. I grant that others do not care. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] get rid of SQL_ASCII?
Can we consider getting rid of the SQL_ASCII server-side encoding? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client encoding SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 5, 2013 at 7:47 AM, Peter Eisentraut pete...@gmx.net wrote: Can we consider getting rid of the SQL_ASCII server-side encoding? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client encoding SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII? performance? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On 05.09.2013 15:47, Peter Eisentraut wrote: Can we consider getting rid of the SQL_ASCII server-side encoding? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. I think warts all over the code is an overstatement. There aren't that many places in the code that care about SQL_ASCII, and they're all related to encoding conversions. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client encoding SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII? One use is if you want to use some special encoding that's not supported by PostgreSQL, and you want PostgreSQL to just regurgitate any strings as is. It's not common, but would be strange to remove that capability altogether, IMHO. I agree it would be nice to have a real ASCII encoding, which only accepts 7-bit ASCII characters. And it would be nice if SQL_ASCII was called something else, like UNDEFINED or BYTE_PER_CHAR, to make the meaning more clear. But I'm not in favor of deprecating it altogether. Also, during backend initialization there is a phase where client_encoding has not been set yet, and we don't do any conversions yet. That's exactly what SQL_ASCII means, so even if we get rid of SQL_ASCII, we'd still need to have some encoding value in the backend to mean that intermediate state. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 08:47:32AM -0400, Peter Eisentraut wrote: Can we consider getting rid of the SQL_ASCII server-side encoding? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client encoding SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII? Hi Peter, Yes, we have processes that insert data from a large number of locales into the same database and we need to process the information in a locale agnostic way, just a a range of bytes. Not to mention how much faster it can be. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for XML Schema Validation
On 2013-08-09 14:14:34, Craig Ringer wrote: Well, if you're planning on relying on libxml in core (and it'll have to be in core if you're adding new syntax) then you'll need a solid, well researched answer to that one or an alternative XML library that's portable and doesn't have those issues. After doing some research regarding this issue and going through mailing list, we were unable to find a fix for memory management issue with libxml when used with postgresql can you please suggest us any alternative to libxml or redirect to some resources. Thanks and Regrads, Pridhvi and Vikranthsingh, IIITB -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs
On Wed, Aug 28, 2013 at 12:09:05AM -0400, Peter Eisentraut wrote: On Tue, 2013-08-27 at 09:44 -0400, Tom Lane wrote: Boszormenyi Zoltan z...@cybertec.at writes: When adding regression tests, can you please add intentional syntax error cases to exercise all the new ereport()s? Please do not add test cases merely to prove that. Yeah, you should probably have exercised each error case in devel testing, but that does not mean that every future run of the regression tests needs to do it too. I disagree. The next person who wants to hack on this feature should be given the confidence that he's not breaking behavior that the last guy put in. +1. I wouldn't make full error-outcome test coverage a condition of patch acceptance. However, when an author chooses to submit high-quality tests with that level of detail, our source tree is the place to archive them. I share Tom's desire for a Makefile target that completes quickly and checks only those behaviors most likely to break, but not at the cost of letting deep test coverage dissipate in a mailing list attachment or in the feature author's home directory. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analysis on backend-private memory usage (and a patch)
Heikki Linnakangas hlinnakan...@vmware.com writes: I ran pgbench for ten seconds, and printed the number of tuples in each catcache after that: [ very tiny numbers ] I find these numbers a bit suspicious. For example, we must have hit at least 13 different system catalogs, and more than that many indexes, in the course of populating the syscaches you show as initialized. How is it there are only 4 entries in the RELOID cache? I wonder if there were cache resets going on. A larger issue is that pgbench might not be too representative. In a quick check, I find that cache 37 (OPERNAMENSP) starts out empty, and contains 1 entry after select 2=2, which is expected since the operator-lookup code will start by looking for int4 = int4 and will get an exact match. But after select 2=2::numeric there are 61 entries, as a byproduct of having thumbed through every binary operator named = to resolve the ambiguous match. We went so far as to install another level of caching in front of OPERNAMENSP because it was getting too expensive to deal with heavily-overloaded operators like that one. In general, we've had to spend enough sweat on optimizing catcache searches to make me highly dubious of any claim that the caches are usually almost empty. I understand your argument that resizing is so cheap that it might not matter, but nonetheless reducing these caches as far as you're suggesting sounds to me to be penny-wise and pound-foolish. I'm okay with setting them on the small side rather than on the large side as they are now, but not with choosing sizes that are guaranteed to result in resizing cycles during startup of any real app. PS. Once the hashes are resized on demand, perhaps we should get rid of the move-to-the-head behavior in SearchCatCache. If all the buckets -1. If the bucket in fact has just one member, dlist_move_head reduces to just one comparison. And again I argue that you're optimizing for the wrong case. Pure luck will result in some hash chains being (much) longer than the average, and if we don't do move-to-front we'll get hurt there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
Joshua D. Drake wrote: On 09/05/2013 09:42 AM, Josh Berkus wrote: Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. iconv? Command Prompt helped a customer normalize encodings in their data, which was a mixture of Latin1 and UTF8. PGLoader was used for this, in two stages; the first run in UTF8 saved the rejected data to a file which was loaded in the second run as Latin1. This worked like a charm. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?
arthernan wrote: OidFunctionCall7 is called from costsize.c, in function void cost_index. From the comments, I understand that there is an indirection mechanism here to allow for different index access methods. Looking at fmgr.h where it is declared is not clear to me where I would ultimately find the code. Any pointers will be greatly appreciated The code you're really looking for is the amcostestimate function of the AM that the index uses. So if it's a btree you want btcostestimate, for instance. You can see the AM definitions in the pg_am system catalog and src/include/catalog/pg_am.h. /* * Call index-access-method-specific code to estimate the processing cost * for scanning the index, as well as the selectivity of the index (ie, * the fraction of main-table tuples we will have to retrieve) and its * correlation to the main-table tuple order. */ OidFunctionCall7(index-amcostestimate, PointerGetDatum(root), PointerGetDatum(path), Float8GetDatum(loop_count), PointerGetDatum(indexStartupCost), PointerGetDatum(indexTotalCost), PointerGetDatum(indexSelectivity), PointerGetDatum(indexCorrelation)); -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Magnus Hagander mag...@hagander.net writes: On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote: I have developed the attached patch which implements an auto-tuned effective_cache_size which is 4x the size of shared buffers. I had to set effective_cache_size to its old 128MB default so the EXPLAIN regression tests would pass unchanged. That's not really autotuning though. ISTM that making the *default* 4 x shared_buffers might make perfect sense, but do we really need to hijack the value of -1 for that? That might be useful for some time when we have actual autotuning, that somehow inspects the system and tunes it from there. Well, the real problem with this patch is that it documents what the auto-tuning algorithm is; without that commitment, just saying -1 means autotune might be fine. Did you consider the alternative of just tweaking initdb to insert a default for effective_cache_size that's 4x whatever it picks for shared_buffers? That would probably be about 3 lines of code, and it wouldn't nail down any particular server-side behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Wed, Sep 4, 2013 at 8:08 PM, Andres Freund and...@2ndquadrant.com wrote: You've proposed an framework and algorithm for something I'd really, really like to see. I don't think that it can work explicitly as you proposed, so I roughly sketched out a solution I can see. I don't want my idea to win, I want a idea to win. I haven't fleshed out my idea to the point where I would consider it something ready to implement or something. You're the patch author here whose plans are laid open to be scrutinized ;). If you think my idea has merit, use and adapt it to reality. If not, find another, better, solution. Even if our path to that goal is confrontational at times, the goal is to find a good solution, not the argument itself. Totally agreed. I haven't argued about INSERT ... DUPLICATE LOCK because the page locking scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue about the fancier version in that case. Check. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory
On Wed, Sep 4, 2013 at 6:38 PM, Jim Nasby j...@nasby.net wrote: No, I meant something that would live as long as the postmaster and die when it dies. ISTM that at some point we'll want to look at putting top-level shared memory into this system (ie: allowing dynamic resizing of GUCs that affect shared memory size). A lot of people want that, but being able to resize the shared memory chunk itself is only the beginning of the problem. So I wouldn't hold my breath. Wouldn't it protect against a crash while writing the file? I realize the odds of that are pretty remote, but AFAIK it wouldn't cost that much to write a new file and do an atomic mv... If there's an OS-level crash, we don't need the state file; the shared memory will be gone anyway. And if it's a PostgreSQL-level failure, this game neither helps nor hurts. Sure. A messed-up backend can clobber the control segment just as it can clobber anything else in shared memory. There's really no way around that problem. If the control segment has been overwritten by a memory stomp, we can't use it to clean up. There's no way around that problem except to not the control segment, which wouldn't be better. Are we trying to protect against memory stomps when we restart after a backend dies? I thought we were just trying to ensure that all shared data structures were correct and consistent. If that's the case, then I was thinking that by using a pointer that can be updated in a CPU-atomic fashion we know we'd never end up with a corrupted entry that was in use; the partial write would be to a slot with nothing pointing at it so it could be safely reused. When we restart after a backend dies, shared memory contents are completely reset, from scratch. This is true of both the fixed size shared memory segment and of the dynamic shared memory control segment. The only difference is that, with the dynamic shared memory control segment, we need to use the segment for cleanup before throwing it out and starting over. Extra caution is required because we're examining memory that could hypothetically have been stomped on; we must not let the postmaster do anything suicidal. Should dsm_impl_op sanity check the arguments after op? I didn't notice checks in the type-specific code but I also didn't read all of it... are we just depending on the OS to sanity-check? Sanity-check for what? Presumably there's limits to what the arguments can be rationally set to. IIRC there's nothing down-stream that's checking them in our code, so I'm guessing we're just depending on the kernel to sanity-check. Pretty much. It's possible more thought is needed there, but the shape of those additional thoughts is not clear to me at this time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Sep 3, 2013 at 09:32:44PM -0400, Bruce Momjian wrote: In this test, SELECT NULL (which internally would produce SELECT ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting returns false. This has made me adjust my goal and change it so SELECT ROW(NULL) IS NULL returns true, and any further nesting returns false. Attached is a patch which accomplishes this, and a documentation update. I have not heard any feedback on this patch, so I would like to apply it to give us a nested ROW/IS NULL API we can document. It would have to be marked in the release notes as a backward incompatibility. I don't have time to look at this in detail right now, but I think that's considerably premature. I'm not convinced that we understand all of the problems in this area are yet, let alone the solutions. And I notice that you haven't substantively responded to some of Tom's concerns. So -1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On Wed, Sep 4, 2013 at 11:02 PM, Quan Zongliang quanzongli...@gmail.com wrote: I think of a new idea. Add a compare method column to pg_collation. Every collation has its own compare function or null. When function varstr_cmp is called, if specified collation has compare function, call it instead of strcoll(). I think we're going to need to have two kinds of collations: OS-derived collations (which get all of their smarts from the OS), and PG-internal collations (which use PG-aware code for everything). Which I suspect is a bit more involved than what you're imagining, but mixing and matching doesn't seem likely to end well. However, what you're proposing might serve as a useful demonstration of how much performance there is to be gained here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jan 8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote: On 01/08/2013 08:08 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... And I don't especially like the idea of trying to make it depend directly on the box's physical RAM, for the same practical reasons Robert mentioned. For the record, I don't believe those problems would be particularly hard to solve. Well, the problem of find out the box's physical RAM is doubtless solvable if we're willing to put enough sweat and tears into it, but I'm dubious that it's worth the trouble. The harder part is how to know if the box is supposed to be dedicated to the database. Bear in mind that the starting point of this debate was the idea that we're talking about an inexperienced DBA who doesn't know about any configuration knob we might provide for the purpose. I'd prefer to go with a default that's predictable and not totally foolish --- and some multiple of shared_buffers seems like it'd fit the bill. +1. That seems to be by far the biggest bang for the buck. Anything else will surely involve a lot more code for not much more benefit. I have developed the attached patch which implements an auto-tuned effective_cache_size which is 4x the size of shared buffers. I had to set effective_cache_size to its old 128MB default so the EXPLAIN regression tests would pass unchanged. That's not really autotuning though. ISTM that making the *default* 4 x shared_buffers might make perfect sense, but do we really need to hijack the value of -1 for that? That might be useful for some time when we have actual autotuning, that somehow inspects the system and tunes it from there. I also don't think it should be called autotuning, when it's just a smarter default value. I like the feature, though, just not the packaging. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On 2013-09-05 12:44:18 -0400, Robert Haas wrote: On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-03 11:40:57 -0400, Robert Haas wrote: 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement I'm still unconvinced we want this. Ok, so the reason for the existance of this patch is that currently there is no way to represent a unset CommandId. This is a problem for the following patches because we need to log the cmin, cmax of catalog rows and obviously there can be rows where cmax is unset. For heap tuples, we solve this problem by using flag bits. Why not adopt the same approach? We can, while it makes the amount of data stored/logged slightly larger and it seems to lead to less idiomatic code to me, so if there's another -1 I'll go that way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-03 11:40:57 -0400, Robert Haas wrote: 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement I'm still unconvinced we want this. Ok, so the reason for the existance of this patch is that currently there is no way to represent a unset CommandId. This is a problem for the following patches because we need to log the cmin, cmax of catalog rows and obviously there can be rows where cmax is unset. For heap tuples, we solve this problem by using flag bits. Why not adopt the same approach? The reason I chose to change the definition of CommandIds is that the other ondisk types we use like TransactionIds, XLogRecPtrs and such have an invalid type, CommandIds don't. Changing their definition to have 0 - analogous to the previous examples - as their invalid value is not a problem because CommandIds from pg_upgraded clusters may never be used for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem like a problem to me. Imo the CommandIds should have been defined that way from the start. Makes some sense? I don't have a problem with this if other people think it's a good idea. But I think it needs a few +1s and not too many -1s first, and so far (AFAIK) no one else has weighed in with an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
Hi, On 2013-09-03 11:40:57 -0400, Robert Haas wrote: On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund and...@2ndquadrant.com wrote: 0005 wal_decoding: Log xl_running_xact's at a higher frequency than checkpoints are done * benefits hot standby startup I tried to update the patch to address the comments you made. 0003 wal_decoding: Allow walsender's to connect to a specific database * biggest problem is how to specify the connection we connect to. Currently with the patch walsender connects to a database if it's not named replication (via dbname). Perhaps it's better to invent a replication_dbname parameter? I've updated the patch so it extends the replication startup parameter to not only specify a boolean but also database. In the latter case it will connect to the database specified in dbname. As discussed downthread, this patch doesn't have an immediate advantage for users until the changeset extraction patch itself is applied. Whether or not it should be applied separately is unclear. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 2aa39548f5990e9663e95f011f25a89a0dc8d8a1 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 19 Aug 2013 13:24:30 +0200 Subject: [PATCH 2/9] wal_decoding: Allow walsender's to connect to a specific database Extend the existing 'replication' parameter to not only allow a boolean value but also database. If the latter is specified we connect to the database specified in 'dbname'. This is useful for future walsender commands which need database interaction, e.g. changeset extraction. --- doc/src/sgml/protocol.sgml | 24 +--- src/backend/postmaster/postmaster.c| 23 ++-- .../libpqwalreceiver/libpqwalreceiver.c| 4 +- src/backend/replication/walsender.c| 43 +++--- src/backend/utils/init/postinit.c | 5 +++ src/bin/pg_basebackup/pg_basebackup.c | 4 +- src/bin/pg_basebackup/pg_receivexlog.c | 4 +- src/bin/pg_basebackup/receivelog.c | 4 +- src/include/replication/walsender.h| 1 + 9 files changed, 89 insertions(+), 23 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 0b2e60e..2ea14e5 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1301,10 +1301,13 @@ para To initiate streaming replication, the frontend sends the -literalreplication/ parameter in the startup message. This tells the -backend to go into walsender mode, wherein a small set of replication commands -can be issued instead of SQL statements. Only the simple query protocol can be -used in walsender mode. +literalreplication/ parameter in the startup message. A boolean value +of literaltrue/ tells the backend to go into walsender mode, wherein a +small set of replication commands can be issued instead of SQL statements. Only +the simple query protocol can be used in walsender mode. +Passing a literaldatabase/ as the value instructs walsender to connect to +the database specified in the literaldbname/ paramter which will in future +allow some additional commands to the ones specified below to be run. The commands accepted in walsender mode are: @@ -1314,7 +1317,7 @@ The commands accepted in walsender mode are: listitem para Requests the server to identify itself. Server replies with a result - set of a single row, containing three fields: + set of a single row, containing four fields: /para para @@ -1356,6 +1359,17 @@ The commands accepted in walsender mode are: /listitem /varlistentry + varlistentry + term + dbname + /term + listitem + para + Database connected to or NULL. + /para + /listitem + /varlistentry + /variablelist /para /listitem diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 01d2618..a31b01d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1894,10 +1894,21 @@ retry1: port-cmdline_options = pstrdup(valptr); else if (strcmp(nameptr, replication) == 0) { -if (!parse_bool(valptr, am_walsender)) +/* + * Due to backward compatibility concerns replication is a + * bybrid beast which allows the value to be either a boolean + * or the string 'database'. The latter connects to a specific + * database which is e.g. required for changeset extraction. + */ +if (strcmp(valptr, database) == 0) +{ + am_walsender = true; + am_db_walsender = true; +} +else if (!parse_bool(valptr, am_walsender)) ereport(FATAL, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(invalid value for boolean option \replication\))); +
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 09:42:17AM -0700, Josh Berkus wrote: Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. +++1 :) Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On 09/05/2013 10:02 AM, Alvaro Herrera wrote: Joshua D. Drake wrote: iconv? Command Prompt helped a customer normalize encodings in their data, which was a mixture of Latin1 and UTF8. PGLoader was used for this, in two stages; the first run in UTF8 saved the rejected data to a file which was loaded in the second run as Latin1. This worked like a charm. There's certainly alternatives. But all of the alternatives increase the cost of the migration (either in staff time or in downtime), which increases the likelyhood that the organization will abandon the migration. Anyway, I think we've established that there are enough legitimate uses for SQL_ASCII that we can't casually discard it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 09:53:18AM -0700, Joshua D. Drake wrote: On 09/05/2013 09:42 AM, Josh Berkus wrote: Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. iconv? Yes, you can use iconv but then you have to check that it generated values that do not break your system including the application logic. That can prove a major stumbling block to changing DBs. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Where can I find the code for extern OidFunctionCall7?
OidFunctionCall7 is called from costsize.c, in function void cost_index. From the comments, I understand that there is an indirection mechanism here to allow for different index access methods. Looking at fmgr.h where it is declared is not clear to me where I would ultimately find the code. Any pointers will be greatly appreciated /* * Call index-access-method-specific code to estimate the processing cost * for scanning the index, as well as the selectivity of the index (ie, * the fraction of main-table tuples we will have to retrieve) and its * correlation to the main-table tuple order. */ OidFunctionCall7(index-amcostestimate, PointerGetDatum(root), PointerGetDatum(path), Float8GetDatum(loop_count), PointerGetDatum(indexStartupCost), PointerGetDatum(indexTotalCost), PointerGetDatum(indexSelectivity), PointerGetDatum(indexCorrelation)); -- View this message in context: http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?
Thank you guys! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769753.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analysis on backend-private memory usage (and a patch)
On 05.09.2013 17:22, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: I ran pgbench for ten seconds, and printed the number of tuples in each catcache after that: [ very tiny numbers ] I find these numbers a bit suspicious. For example, we must have hit at least 13 different system catalogs, and more than that many indexes, in the course of populating the syscaches you show as initialized. How is it there are only 4 entries in the RELOID cache? I wonder if there were cache resets going on. Relcache is loaded from the init file. The lookups of those system catalogs and indexes never hit the syscache, because the entries are found in relcache. When I delete the init file and launch psql, without running any queries, I get this (caches with 0 tups left out): LOG: cache id 45 on pg_class: 7 tups LOG: cache id 32 on pg_index: 63 tups LOG: cache id 21 on pg_database: 1 tups LOG: cache id 11 on pg_authid: 1 tups LOG: cache id 10 on pg_authid: 1 tups LOG: cache id 2 on pg_am: 1 tups A larger issue is that pgbench might not be too representative. In a quick check, I find that cache 37 (OPERNAMENSP) starts out empty, and contains 1 entry after select 2=2, which is expected since the operator-lookup code will start by looking for int4 = int4 and will get an exact match. But after select 2=2::numeric there are 61 entries, as a byproduct of having thumbed through every binary operator named = to resolve the ambiguous match. We went so far as to install another level of caching in front of OPERNAMENSP because it was getting too expensive to deal with heavily-overloaded operators like that one. In general, we've had to spend enough sweat on optimizing catcache searches to make me highly dubious of any claim that the caches are usually almost empty. I understand your argument that resizing is so cheap that it might not matter, but nonetheless reducing these caches as far as you're suggesting sounds to me to be penny-wise and pound-foolish. I'm okay with setting them on the small side rather than on the large side as they are now, but not with choosing sizes that are guaranteed to result in resizing cycles during startup of any real app. Ok, committed the attached. To choose the initial sizes, I put a WARNING into the rehash function, ran the regression suite, and adjusted the sizes so that most regression tests run without rehashing. With the attached patch, 18 regression tests cause rehashing (see regression.diffs). The ones that do are because they are exercising some parts of the system more than a typical application would do: the enum regression test for example causes rehashes of the pg_enum catalog cache, and the aggregate regression test causes rehashing of pg_aggregate, and so on. A few regression tests do a database-wide VACUUM or ANALYZE; those touch all relations, and cause a rehash of pg_class and pg_index. - Heikki diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index cca0572..c467f11 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -728,21 +728,20 @@ InitCatCache(int id, int nkeys, const int *key, int nbuckets) { CatCache *cp; MemoryContext oldcxt; int i; /* - * nbuckets is the number of hash buckets to use in this catcache. - * Currently we just use a hard-wired estimate of an appropriate size for - * each cache; maybe later make them dynamically resizable? + * nbuckets is the initial number of hash buckets to use in this catcache. + * It will be enlarged later if it becomes too full. * * nbuckets must be a power of two. We check this via Assert rather than * a full runtime check because the values will be coming from constant * tables. * * If you're confused by the power-of-two check, see comments in * bitmapset.c for an explanation. */ Assert(nbuckets 0 (nbuckets -nbuckets) == nbuckets); @@ -769,19 +768,20 @@ InitCatCache(int id, on_proc_exit(CatCachePrintStats, 0); #endif } /* * allocate a new cache structure * * Note: we rely on zeroing to initialize all the dlist headers correctly */ - cp = (CatCache *) palloc0(sizeof(CatCache) + nbuckets * sizeof(dlist_head)); + cp = (CatCache *) palloc0(sizeof(CatCache)); + cp-cc_bucket = palloc0(nbuckets * sizeof(dlist_head)); /* * initialize the cache's relation information for the relation * corresponding to this cache, and initialize some of the new cache's * other internal fields. But don't open the relation yet. */ cp-id = id; cp-cc_relname = (not known yet); cp-cc_reloid = reloid; @@ -808,18 +808,55 @@ InitCatCache(int id, /* * back to the old context before we return... */ MemoryContextSwitchTo(oldcxt); return cp; } /* + * Enlarge a catcache, doubling the number of buckets. + */ +static void +RehashCatCache(CatCache *cp) +{ + dlist_head *newbucket; + int newnbuckets; + int i; + +
Re: [HACKERS] get rid of SQL_ASCII?
On 09/05/2013 09:42 AM, Josh Berkus wrote: Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. iconv? -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] missing SSI check in heapgettup_pagemode() ?
I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and noticed that the former calls CheckForSerializableConflictOut() for tuples read, and the latter doesn't. Is there a reason for this discrepancy? If so, I think it warrants a comment about the missing check. Thanks, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - primary/candidate key in relcache
Hi Kevin, On 2013-09-03 11:40:57 -0400, Robert Haas wrote: On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund and...@2ndquadrant.com wrote: 0007 wal_decoding: Add information about a tables primary key to struct RelationData * Could be used in the matview refresh code I think you and Kevin should discuss whether this is actually the right way to do this. ISTM that if logical replication and materialized views end up selecting different approaches to this problem, everybody loses. The patch we're discussion here adds a new struct RelationData field called 'rd_primary' (should possibly be renamed) which contains information about the best candidate key available for a table. From the header comments: /* * The 'best' primary or candidate key that has been found, only set * correctly if RelationGetIndexList has been called/rd_indexvalid 0. * * Indexes are chosen in the following order: * * Primary Key * * oid index * * the first (OID order) unique, immediate, non-partial and * non-expression index over one or more NOT NULL'ed columns */ Oid rd_primary; I thought we could use that in matview.c:refresh_by_match_merge() to select a more efficient diff if rd_primary has a valid index. In that case you only'd need to compare that index's fields which should result in an more efficient plan. Maybe it's also useful in other cases for you? If it's relevant at all, would you like to have a different priority list than the one above? Regards, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From ee85b3bd8d8cc25fa547c004f6f6ea6bccef7c66 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 19 Aug 2013 13:24:30 +0200 Subject: [PATCH 4/9] wal_decoding: Add information about a tables primary key to struct RelationData 'rd_primary' now contains the Oid of an index over uniquely identifying columns. Several types of indexes are interesting and are collected in that order: * Primary Key * oid index * the first (OID order) unique, immediate, non-partial and non-expression index over one or more NOT NULL'ed columns To gather rd_primary value RelationGetIndexList() needs to have been called. This is helpful because for logical replication we frequently - on the sending and receiving side - need to lookup that index and RelationGetIndexList already gathers all the necessary information. This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but would change the meaning of that, so it seems to require additional discussion. --- src/backend/utils/cache/relcache.c | 52 +++--- src/include/utils/rel.h| 12 + 2 files changed, 61 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 66fb63b..c588c29 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -3465,7 +3465,9 @@ RelationGetIndexList(Relation relation) ScanKeyData skey; HeapTuple htup; List *result; - Oid oidIndex; + Oid oidIndex = InvalidOid; + Oid pkeyIndex = InvalidOid; + Oid candidateIndex = InvalidOid; MemoryContext oldcxt; /* Quick exit if we already computed the list. */ @@ -3522,17 +3524,61 @@ RelationGetIndexList(Relation relation) Assert(!isnull); indclass = (oidvector *) DatumGetPointer(indclassDatum); + if (!IndexIsValid(index)) + continue; + /* Check to see if it is a unique, non-partial btree index on OID */ - if (IndexIsValid(index) - index-indnatts == 1 + if (index-indnatts == 1 index-indisunique index-indimmediate index-indkey.values[0] == ObjectIdAttributeNumber indclass-values[0] == OID_BTREE_OPS_OID heap_attisnull(htup, Anum_pg_index_indpred)) oidIndex = index-indexrelid; + + if (index-indisunique + index-indimmediate + heap_attisnull(htup, Anum_pg_index_indpred)) + { + /* always prefer primary keys */ + if (index-indisprimary) +pkeyIndex = index-indexrelid; + else if (!OidIsValid(pkeyIndex) + !OidIsValid(oidIndex) + !OidIsValid(candidateIndex)) + { +int key; +bool found = true; +for (key = 0; key index-indnatts; key++) +{ + int16 attno = index-indkey.values[key]; + Form_pg_attribute attr; + /* internal column, like oid */ + if (attno = 0) + continue; + + attr = relation-rd_att-attrs[attno - 1]; + if (!attr-attnotnull) + { + found = false; + break; + } +} +if (found) + candidateIndex = index-indexrelid; + } + } } systable_endscan(indscan); + + if (OidIsValid(pkeyIndex)) + relation-rd_primary = pkeyIndex; + /* prefer oid indexes over normal candidate ones */ + else if (OidIsValid(oidIndex)) + relation-rd_primary = oidIndex; + else if (OidIsValid(candidateIndex)) +
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On Thu, Sep 5, 2013 at 12:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-05 12:44:18 -0400, Robert Haas wrote: On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-03 11:40:57 -0400, Robert Haas wrote: 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement I'm still unconvinced we want this. Ok, so the reason for the existance of this patch is that currently there is no way to represent a unset CommandId. This is a problem for the following patches because we need to log the cmin, cmax of catalog rows and obviously there can be rows where cmax is unset. For heap tuples, we solve this problem by using flag bits. Why not adopt the same approach? We can, while it makes the amount of data stored/logged slightly larger and it seems to lead to less idiomatic code to me, so if there's another -1 I'll go that way. OK. Consider me more of a -0 than a -1. Like I say, I don't really want to block it; I just don't feel comfortable committing it unless a few other people say something like I don't see a problem with that. Or maybe point me to relevant changeset extraction code that's going to get messier. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
On Wed, Sep 4, 2013 at 10:06 PM, wangs...@highgo.com.cn wrote: 于 2013-09-04 23:41, Jeff Janes 回复: On Tue, Sep 3, 2013 at 9:08 PM, wangs...@highgo.com.cn wrote: Hi, Hackers! I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Anyone has more idea about this, please come to talk about this! Jeff Janes jeff.ja...@gmail.com wrote: This was fixed in version 9.2. You must be using an older version. Cheers, Jeff Thanks for your reply. To declare a column of type numeric use the syntax: NUMERIC(precision, scale). What I said is this scale,not yours. You're right, I had tested a change in precision, not in scale. Sorry. In order to avoid the rewrite, the code would have to be changed to look up the column definition and if it specifies the scale, then ignore the per-row n_header, and look at the n_header only if the column is NUMERIC with no precision or scale. That should conceptually be possible, but I don't know how hard it would be to implement--it sounds pretty invasive to me. Then if the column was altered from NUMERIC with scale to be a plain NUMERIC, it would have to rewrite the table to enforce the row-wise scale to match the old column-wise scale. Where as now that alter doesn't need a re-write. I don't know if this would be an overall gain or not. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?
On Thu, Sep 5, 2013 at 12:38 PM, arthernan arther...@hotmail.com wrote: OidFunctionCall7 is called from costsize.c, in function void cost_index. From the comments, I understand that there is an indirection mechanism here to allow for different index access methods. Looking at fmgr.h where it is declared is not clear to me where I would ultimately find the code. Any pointers will be greatly appreciated Use 'git grep'. It's a macro defined in fmgr.h. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing SSI check in heapgettup_pagemode() ?
On 05.09.2013 20:30, Alvaro Herrera wrote: I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and noticed that the former calls CheckForSerializableConflictOut() for tuples read, and the latter doesn't. Is there a reason for this discrepancy? If so, I think it warrants a comment about the missing check. In page-at-a-time mode, the CheckForSerializableConflictOut() is done in heapgetpage(). Same with the visibility checks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote: I have not heard any feedback on this patch, so I would like to apply it to give us a nested ROW/IS NULL API we can document. It would have to be marked in the release notes as a backward incompatibility. I don't have time to look at this in detail right now, but I think that's considerably premature. I'm not convinced that we understand all of the problems in this area are yet, let alone the solutions. And I notice that you haven't substantively responded to some of Tom's concerns. In particular, I don't think it's a good idea to change eval_const_expression's behavior in an incompatible way that simply makes it differently inconsistent with other IS NULL code paths. We should leave things alone until we have a full fix, otherwise we'll just be breaking people's apps repeatedly. I would also say that allowing eval_const_expression to drive what we think the right behavior is is completely backwards, because it's about the least performance-critical case. You should be looking at execQual.c first. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
Robert Haas robertmh...@gmail.com writes: OK. Consider me more of a -0 than a -1. Like I say, I don't really want to block it; I just don't feel comfortable committing it unless a few other people say something like I don't see a problem with that. FWIW, I've always thought it was a wart that there wasn't a recognized InvalidCommandId value. It was never pressing to fix it before, but if LCR needs it, let's do so. I definitely *don't* find it cleaner to eat up another flag bit to avoid that. We don't have many to spare. Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing SSI check in heapgettup_pagemode() ?
Alvaro Herrera alvhe...@2ndquadrant.com wrote: I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and noticed that the former calls CheckForSerializableConflictOut() for tuples read, and the latter doesn't. Is there a reason for this discrepancy? If so, I think it warrants a comment about the missing check. That is covered in heapgetpage(). Do you want to offer a patch to add a comment to that effect, or would you prefer that I do? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
Hi, Thanks for weighin in. On 2013-09-05 14:21:33 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: OK. Consider me more of a -0 than a -1. Like I say, I don't really want to block it; I just don't feel comfortable committing it unless a few other people say something like I don't see a problem with that. FWIW, I've always thought it was a wart that there wasn't a recognized InvalidCommandId value. It was never pressing to fix it before, but if LCR needs it, let's do so. Yes, its a bit anomalous to the other types. I definitely *don't* find it cleaner to eat up another flag bit to avoid that. We don't have many to spare. It wouldn't need to be a flag bit in any existing struct, so that's not a problem. Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. The patch actually does change it exactly that way. My argument for that being valid is that CommandIds don't play any role outside of their own transaction. Now, somebody could argue that SELECT cmin, cmax can be done outside the transaction, but: Those values are already pretty much meaningless today since cmin/cmax have been merged. They also don't check whether the field is initialized at all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
Andres Freund and...@2ndquadrant.com writes: On 2013-09-05 14:21:33 -0400, Tom Lane wrote: Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. The patch actually does change it exactly that way. Oh. I hadn't looked at the patch, but I had (mis)read what Robert said to think that you were proposing introducing InvalidCommandId = 0x while leaving FirstCommandId alone. That would make more sense to me as (1) it doesn't change the interpretation of anything that's (likely to be) on disk; (2) it allows the check for overflow in CommandCounterIncrement to not involve recovering from an *actual* overflow. With the horsing around we've been seeing from the gcc boys lately, I don't have a warm feeling about whether they won't break that test someday on the grounds that overflow is undefined behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small catcache optimization
Hi, Heikki's catcache rehashing stuff reminded me that I'd posted an optimization to catcache (20121220153555.gh4...@awork2.anarazel.de) some time back which I didn't have energy to pursue at that point. I've brushed the patch up a bit and verified it still gives be a performance improvement. It's still about 2% in a readonly pgbench on my elderly laptop. There's basically two tricks in the patch: 1) Don't always copy the cache's ScanKey to the stack. Instead pass an array of arguments around. That get's rid of a good amount of memcpy()ing in the common, cached case. 2) If we have to memcpy() because we need to pass a ScanKey to systable_*, copy only cache-cc_nkey * sizeof(ScanKeyData) instead of always copying the maximum size. I'd be nicer to get rid of the mostly copied HeapKeyTestArg, but I don't immediately see how. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From df40dab17e7582ab597214d3c7fcefd953dcc9f7 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Thu, 5 Sep 2013 20:30:44 +0200 Subject: [PATCH] Improve catalog cache lookup performance. Do so by not copying the entire ScanKey upon entering SearchCatCache, SearchCatCacheList. Instead perform operations like hashing on a smaller array containing only the passed in Datums. When a lookup into the underlying relations need to happen, copy the Datums into a local copy of the ScanKey as before, but use a smaller memcpy() if possible. --- src/backend/utils/cache/catcache.c | 120 - src/include/access/valid.h | 56 + 2 files changed, 122 insertions(+), 54 deletions(-) diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index c467f11..fb7eaa2 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -74,7 +74,7 @@ static CatCacheHeader *CacheHdr = NULL; static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, - ScanKey cur_skey); + Datum *argument); static uint32 CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple); @@ -174,7 +174,7 @@ GetCCHashEqFuncs(Oid keytype, PGFunction *hashfunc, RegProcedure *eqfunc) * Compute the hash value associated with a given set of lookup keys */ static uint32 -CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey) +CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum *argument) { uint32 hashValue = 0; uint32 oneHash; @@ -189,28 +189,28 @@ CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey) case 4: oneHash = DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[3], - cur_skey[3].sk_argument)); + argument[3])); hashValue ^= oneHash 24; hashValue ^= oneHash 8; /* FALLTHROUGH */ case 3: oneHash = DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[2], - cur_skey[2].sk_argument)); + argument[2])); hashValue ^= oneHash 16; hashValue ^= oneHash 16; /* FALLTHROUGH */ case 2: oneHash = DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[1], - cur_skey[1].sk_argument)); + argument[1])); hashValue ^= oneHash 8; hashValue ^= oneHash 24; /* FALLTHROUGH */ case 1: oneHash = DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[0], - cur_skey[0].sk_argument)); + argument[0])); hashValue ^= oneHash; break; default: @@ -229,17 +229,14 @@ CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey) static uint32 CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple) { - ScanKeyData cur_skey[CATCACHE_MAXKEYS]; + Datum arguments[CATCACHE_MAXKEYS]; bool isNull = false; - /* Copy pre-initialized overhead data for scankey */ - memcpy(cur_skey, cache-cc_skey, sizeof(cur_skey)); - /* Now extract key fields from tuple, insert into scankey */ switch (cache-cc_nkeys) { case 4: - cur_skey[3].sk_argument = + arguments[3] = (cache-cc_key[3] == ObjectIdAttributeNumber) ? ObjectIdGetDatum(HeapTupleGetOid(tuple)) : fastgetattr(tuple, @@ -249,7 +246,7 @@ CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple) Assert(!isNull); /* FALLTHROUGH */ case 3: - cur_skey[2].sk_argument = + arguments[2] = (cache-cc_key[2] == ObjectIdAttributeNumber) ? ObjectIdGetDatum(HeapTupleGetOid(tuple)) : fastgetattr(tuple, @@ -259,7 +256,7 @@ CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple) Assert(!isNull); /* FALLTHROUGH */ case 2: - cur_skey[1].sk_argument = + arguments[1] = (cache-cc_key[1] == ObjectIdAttributeNumber) ? ObjectIdGetDatum(HeapTupleGetOid(tuple)) : fastgetattr(tuple, @@ -269,7 +266,7 @@
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On Thu, Sep 5, 2013 at 11:30 AM, Andres Freund and...@2ndquadrant.com wrote: Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. The patch actually does change it exactly that way. My argument for that being valid is that CommandIds don't play any role outside of their own transaction. Right. It seems like this should probably be noted in the documentation under 5.4. System Columns -- I just realized that it isn't. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hstore: Query speedups with Gin index
Hi Peter, Thanks for checking the tests. I wasn't able to duplicate your test results. Did you run the hstore regression tests with the revised patch I attached in the thread? Attached is the output I got with the latest patch applied. Thanks! Blake On Tue, Sep 3, 2013 at 8:04 PM, Peter Eisentraut pete...@gmx.net wrote: On Tue, 2013-09-03 at 09:24 -0500, Blake Smith wrote: Thanks for the feedback everyone. I've attached the patch that we are now running in production to service our hstore include queries. We rebuilt the index to account for the on-disk incompatibility. I've submitted the patch to commitfest here: https://commitfest.postgresql.org/action/patch_view?id=1203 I'm getting the attached failure from the hstore regression test. -- Blake Smith http://blakesmith.me @blakesmith regressions.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On 2013-09-05 21:02:44 +0200, Andres Freund wrote: On 2013-09-05 14:37:01 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-09-05 14:21:33 -0400, Tom Lane wrote: Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. The patch actually does change it exactly that way. Oh. I hadn't looked at the patch, but I had (mis)read what Robert said to think that you were proposing introducing InvalidCommandId = 0x while leaving FirstCommandId alone. That would make more sense to me as (1) it doesn't change the interpretation of anything that's (likely to be) on disk; (2) it allows the check for overflow in CommandCounterIncrement to not involve recovering from an *actual* overflow. With the horsing around we've been seeing from the gcc boys lately Ok, I can do it that way. LCR obviously shouldn't care. It doesn't care to the point that the patch already does exactly what you propose. It's just my memory that remembered things differently. So, a very slightly updated patch attached. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 0592af4ae2e5a2bb1e4919560ab2768a18d88dd4 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 19 Aug 2013 13:24:30 +0200 Subject: [PATCH] Introduce InvalidCommandId Do so by reducing the limit of allowed CommandIds by one and declare ~0 as InvalidCommandId. This decreases the possible number of subtransactions by one which seems unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are never looked at outside the context of their own transaction. --- src/backend/access/transam/xact.c | 4 ++-- src/include/c.h | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 31e868d..0591f3f 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -766,12 +766,12 @@ CommandCounterIncrement(void) if (currentCommandIdUsed) { currentCommandId += 1; - if (currentCommandId == FirstCommandId) /* check for overflow */ + if (currentCommandId == InvalidCommandId) { currentCommandId -= 1; ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg(cannot have more than 2^32-1 commands in a transaction))); + errmsg(cannot have more than 2^32-2 commands in a transaction))); } currentCommandIdUsed = false; diff --git a/src/include/c.h b/src/include/c.h index 5961183..14bfdcd 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset; typedef uint32 CommandId; #define FirstCommandId ((CommandId) 0) +#define InvalidCommandId (~(CommandId)0) /* * Array indexing support -- 1.8.3.251.g1462b67 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On 2013-09-05 14:37:01 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-09-05 14:21:33 -0400, Tom Lane wrote: Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1, but I suppose we can't have that without an on-disk compatibility break. The patch actually does change it exactly that way. Oh. I hadn't looked at the patch, but I had (mis)read what Robert said to think that you were proposing introducing InvalidCommandId = 0x while leaving FirstCommandId alone. That would make more sense to me as (1) it doesn't change the interpretation of anything that's (likely to be) on disk; (2) it allows the check for overflow in CommandCounterIncrement to not involve recovering from an *actual* overflow. With the horsing around we've been seeing from the gcc boys lately Ok, I can do it that way. LCR obviously shouldn't care. I don't have a warm feeling about whether they won't break that test someday on the grounds that overflow is undefined behavior. Unsigned overflow is pretty strictly defined, so I don't see much danger there. Also, we'd feel the pain pretty definitely with xids... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
On Thu, Sep 5, 2013 at 6:41 PM, Jeff Janes jeff.ja...@gmail.com wrote: Then if the column was altered from NUMERIC with scale to be a plain NUMERIC, it would have to rewrite the table to enforce the row-wise scale to match the old column-wise scale. Where as now that alter doesn't need a re-write. I don't know if this would be an overall gain or not. We've talked about cases like this in the past. It's mostly a SOP and I think it may already be on the TODO. The main difficulty is that Postgres is very extensible. So to implement this you need to think bigger than NUMERIC. It should also be possible to alter a column from varchar(5) to varchar(10) for example (but not the other way around). One way to do it would be to extend pg_type to have another column that specifies a function. That function would take the old and new typmod (which is what stores the 5 in varchar(5)) and tell the server whether it's a safe change to make without rechecking. Another way might be to overload the cast functions, though they currently receive no information about the typmod. They might have the benefit of being able to handle things like varchar(5) - text though. But it has to be that general. Any data type should be able to specify whether an old and new typmod are compatible. -- greg
Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr
Kevin Grittner kgri...@ymail.com wrote: New patch attached. Pushed (to master only). 277607d600fb71e25082b94302ca1716403cd0bc -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
Greg Stark escribió: The main difficulty is that Postgres is very extensible. So to implement this you need to think bigger than NUMERIC. It should also be possible to alter a column from varchar(5) to varchar(10) for example (but not the other way around). We already allow that. See commits 8f9fe6edce358f7904e0db119416b4d1080a83aa and 3cc0800829a6dda5347497337b0cf43848da4acf -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, Sep 5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote: I have developed the attached patch which implements an auto-tuned effective_cache_size which is 4x the size of shared buffers. I had to set effective_cache_size to its old 128MB default so the EXPLAIN regression tests would pass unchanged. That's not really autotuning though. ISTM that making the *default* 4 x shared_buffers might make perfect sense, but do we really need to hijack the value of -1 for that? That might be useful for some time when we have actual autotuning, that somehow inspects the system and tunes it from there. I also don't think it should be called autotuning, when it's just a smarter default value. I like the feature, though, just not the packaging. That auto-tuning text came from the wal_buffer documentation, which does exactly this based on shared_buffers: The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy -- server where many clients are committing at once. The auto-tuning --- selected by the default setting of -1 should give reasonable results in most cases. I am fine with rewording and not using -1, but we should change the wal_buffer default and documentation too then. I am not sure what other value than -1 to use? 0? I figure if we ever get better auto-tuning, we would just remove this functionality and make it better. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Thu, Sep 5, 2013 at 02:14:39PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote: I have not heard any feedback on this patch, so I would like to apply it to give us a nested ROW/IS NULL API we can document. It would have to be marked in the release notes as a backward incompatibility. I don't have time to look at this in detail right now, but I think that's considerably premature. I'm not convinced that we understand all of the problems in this area are yet, let alone the solutions. And I notice that you haven't substantively responded to some of Tom's concerns. In particular, I don't think it's a good idea to change eval_const_expression's behavior in an incompatible way that simply makes it differently inconsistent with other IS NULL code paths. Let me summarize what eval_const_expressions_mutator() does: it takes a ROW() IS NULL construct, and converts it to individual _value_ IS NULL clauses so they can be better optimized. It changes: ROW(x, ROW(y)) IS NULL to x IS NULL AND ROW(y) IS NULL By removing this one level of ROW construct, it causes ROW(ROW(NULL)) IS NULL to return true, while more nested cases do not. It also tries to short-circuit the IS NULL clause if it finds a literal NULL constant inside the row. My patch uses this short-circuit logic to return false for IS NULL if it finds an embedded ROW construct. This makes the code match the code in ExecEvalNullTest(), which already treats embedded ROW values as non-null, e.g. it already treats ROW(ROW(x)) IS NULL as false. Specifically, any code path that doesn't pass through eval_const_expressions_mutator() and gets to execQual.c will return false for this. We should leave things alone until we have a full fix, otherwise we'll just be breaking people's apps repeatedly. Yes, we don't want that. I would also say that allowing eval_const_expression to drive what we think the right behavior is is completely backwards, because it's about the least performance-critical case. You should be looking at execQual.c first. I am making eval_const_expression match execQual.c, specifically ExecEvalNullTest(), which calls heap_attisnull(), and looks at the NULL indicator, which can't be true for an attribute containing a ROW value. I am confused why others don't see what I am seeing. Another possible fix would be to avoid the IS NULL value optimizer expansion if a ROW construct is inside a ROW(). I have attached a patch that does this for review. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c new file mode 100644 index 76c032c..54e59dc *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *** eval_const_expressions_mutator(Node *nod *** 3100,3105 --- 3100,3122 return makeBoolConst(false, false); continue; } + + /* Do we have a ROW() expression in the row? */ + if (type_is_rowtype(exprType(relem))) + { + /* + * If this ROW() has an embedded ROW() value, + * our optimization of removing a level of ROW() + * nesting causes ROW(ROW(NULL)) IS NULL to return + * true, so just return the original clause. + */ + newntest = makeNode(NullTest); + newntest-arg = (Expr *) arg; + newntest-nulltesttype = ntest-nulltesttype; + newntest-argisrow = ntest-argisrow; + return (Node *) newntest; + } + newntest = makeNode(NullTest); newntest-arg = (Expr *) relem; newntest-nulltesttype = ntest-nulltesttype; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Extend namespace of valid guc names
On 2013-02-26 13:08:54 +0100, Andres Freund wrote: Just for reference, thats the grammar for SET/SHOW: var_name: ColId { $$ = $1; } | var_name '.' ColId Any arguments whether we should try to validate postgres -c, set_config and SET/SHOW the same? Any input here? Currently set_config() and postgres -c basically don't have any restrictions on the passed guc name whereas SHOW, SET and postgresql.conf do. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, Sep 5, 2013 at 12:48:54PM -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote: I have developed the attached patch which implements an auto-tuned effective_cache_size which is 4x the size of shared buffers. I had to set effective_cache_size to its old 128MB default so the EXPLAIN regression tests would pass unchanged. That's not really autotuning though. ISTM that making the *default* 4 x shared_buffers might make perfect sense, but do we really need to hijack the value of -1 for that? That might be useful for some time when we have actual autotuning, that somehow inspects the system and tunes it from there. Well, the real problem with this patch is that it documents what the auto-tuning algorithm is; without that commitment, just saying -1 means autotune might be fine. OK, but I did this based on wal_buffers, which has a -1 default, calls it auto-tuning, and explains how the default is computed. Did you consider the alternative of just tweaking initdb to insert a default for effective_cache_size that's 4x whatever it picks for shared_buffers? That would probably be about 3 lines of code, and it wouldn't nail down any particular server-side behavior. The problem there is that many users are told to tune shared_buffers, but don't touch effective cache size. Having initdb set the effective_cache_size value would not help there. Again, this is all based on the auto-tuning of wal_buffers. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: I vote for adapting the patch to additionally zero out the file via write(). In your tests that seemed to perform at least as good as the old method... It also has the advantage that we can use it a littlebit more as a testbed for possibly using it for heap extensions one day. We're pretty early in the cycle, so I am not worried about this too much... I dunno, I'm pretty disappointed that this doesn't actually improve things. Just following this casually, it looks like it might be some kind of locking issue in the kernel that's causing it to be slower; or at least some code path that isn't exercise terribly much and therefore hasn't been given the love that it should. Definitely interested in what Ts'o says, but if we can't figure out why it's slower *without* writing out the zeros, I'd say we punt on this until Linux and the other OS folks improve the situation. FYI, the patch has been reverted. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 09/05/2013 02:16 PM, Bruce Momjian wrote: Well, the real problem with this patch is that it documents what the auto-tuning algorithm is; without that commitment, just saying -1 means autotune might be fine. OK, but I did this based on wal_buffers, which has a -1 default, calls it auto-tuning, and explains how the default is computed. I don't see a real problem with this. For users who have set their shared_buffers correctly, effective_cache_size should also be correct. The problem there is that many users are told to tune shared_buffers, but don't touch effective cache size. Having initdb set the effective_cache_size value would not help there. Again, this is all based on the auto-tuning of wal_buffers. Standard advice we've given in the past is 25% shared buffers, 75% effective_cache_size. Which would make EFS *3X* shared_buffers, not 4X. Maybe we're changing the conventional calculation, but I thought I'd point that out. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?
The comments on this sourcebase are really good, plus it is very well structured! I read somewhere that you can actually store an execution plan by using stored procedures. If that is the case. I would imagine that variables would prevent the histograms from being used. I don't mind finding the code myself, but... Where is the code that does that switch? Clues are welcome too!! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769800.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?
Never mind I think I found it. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769801.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, Sep 5, 2013 at 5:11 PM, Josh Berkus j...@agliodbs.com wrote: On 09/05/2013 02:16 PM, Bruce Momjian wrote: Well, the real problem with this patch is that it documents what the auto-tuning algorithm is; without that commitment, just saying -1 means autotune might be fine. OK, but I did this based on wal_buffers, which has a -1 default, calls it auto-tuning, and explains how the default is computed. I don't see a real problem with this. For users who have set their shared_buffers correctly, effective_cache_size should also be correct. Agreed. I think -1 is the right setting for autotune as things stand today. If we want something else, then we should change other settings as well (like wal_buffers) and that is not in the scope of this patch. The problem there is that many users are told to tune shared_buffers, but don't touch effective cache size. Having initdb set the effective_cache_size value would not help there. Again, this is all based on the auto-tuning of wal_buffers. Standard advice we've given in the past is 25% shared buffers, 75% effective_cache_size. Which would make EFS *3X* shared_buffers, not 4X. Maybe we're changing the conventional calculation, but I thought I'd point that out. This was debated upthread. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Thu, Sep 5, 2013 at 03:11:53PM -0700, Josh Berkus wrote: On 09/05/2013 02:16 PM, Bruce Momjian wrote: Well, the real problem with this patch is that it documents what the auto-tuning algorithm is; without that commitment, just saying -1 means autotune might be fine. OK, but I did this based on wal_buffers, which has a -1 default, calls it auto-tuning, and explains how the default is computed. I don't see a real problem with this. For users who have set their shared_buffers correctly, effective_cache_size should also be correct. The problem there is that many users are told to tune shared_buffers, but don't touch effective cache size. Having initdb set the effective_cache_size value would not help there. Again, this is all based on the auto-tuning of wal_buffers. Standard advice we've given in the past is 25% shared buffers, 75% effective_cache_size. Which would make EFS *3X* shared_buffers, not 4X. Maybe we're changing the conventional calculation, but I thought I'd point that out. Yes, I had wondered that myself, and 3x and 4x were thrown out as options. There were more people who liked 4x, but one of the reasons was that 3x sounded odd --- not sure what to make of that, but I went with the most popular. I am fine with 3x, and I do think it logically makes more sense, and is less likely to over-estimate than 4x. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Wed, Sep 4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote: Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. Have I missed some nuance? I am looking back at this issue now and I think you are correct. The commit you mention (Oct 7 2011) says ANALYZE updates the visibility map, and the code matches that: if (!inh) vac_update_relstats(onerel, RelationGetNumberOfBlocks(onerel), totalrows, -- visibilitymap_count(onerel), hasindex, InvalidTransactionId); so if an index scan was not being used after an ANALYZE, it isn't a bad allvisibile estimate but something else. This code was in PG 9.2. Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages) 2) When are massive rows added but VM bits not set: copy 3) When are VM bits cleared: insert/update/delete vacuum (non-full) 4) When are VM map files cleared: vacuum full cluster 5) When is pg_class.relallvisible updated via a VM map file scan: vacuum (non-full) analyze Vacuums run by autovacuum are driven by n_dead_tuples, which is only update and delete. Therefore, any scenario where vacuum (non-full) is never run will not have significant VM bits set. The only bits that will be set will be by pages visited randomly by analyze. The following table activities will not set proper VM bits: vacuum full cluster copy insert-only If updates and deletes happen, there will eventually be sufficient reason for autovacuum to vacuum the table and set proper VM bits, and pg_class.relallvisible. The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote: Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages) Analyze doesn't set visibility-map bits. It only updates statistics about how many are set. The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. Yeah, we didn't. I think the hard part is figuring out what behavior would be best. Counting inserts as well as updates and deletes would be a simple approach, but I don't have much confidence in it. My experience is that having vacuum or analyze kick in during a bulk-load operation is a disaster. We'd kinda like to come up with a way to make vacuum run after the bulk load is complete, maybe, but how would we identify that time, and there are probably cases where that's not right either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore multiple --function options
On Tue, Aug 27, 2013 at 1:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Assuming no objections, I'll apply the attached patch to 9.3 and master later tonight. Just a little stylistic nitpick: could we pluralize the --help outputs for the modified options so that they make clear that multiple specifications are supported. E.g. restore named index(es) instead of just restore named index. The last round of such changes tried to make such --help tweaks, it would be nice to keep 'em consistent. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On 06/09/13 13:10, Robert Haas wrote: On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote: Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages) Analyze doesn't set visibility-map bits. It only updates statistics about how many are set. The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. Yeah, we didn't. I think the hard part is figuring out what behavior would be best. Counting inserts as well as updates and deletes would be a simple approach, but I don't have much confidence in it. My experience is that having vacuum or analyze kick in during a bulk-load operation is a disaster. We'd kinda like to come up with a way to make vacuum run after the bulk load is complete, maybe, but how would we identify that time, and there are probably cases where that's not right either. How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction syntax?) that would: 1. only be valid in a transaction 2. initiate a vacuum after the current transaction completed 3. defer any vacuum triggered due to other criteria If the transaction was rolled back: then if there was a pending vacuum, due to other reasons, it would then be actioned. On normal transaction completion, then if there was a pending vacuum it would be combined with the one in the transaction. Still would need some method of ensuring any pending vacuum was done if the transaction hung, or took too long. Cheers, Gavin
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote: On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote: Actually, I now realize it is more complex than that, and worse. There are several questions to study to understand when pg_class.relallvisible is updated (which is used to determine if index-only scans are a good optimization choice), and when VM all-visible bits are set so heap pages can be skipped during index-only scans: 1) When are VM bits set: vacuum (non-full) analyze (only some random pages) Analyze doesn't set visibility-map bits. It only updates statistics about how many are set. Sorry, yes you are correct. The calculus we should use to determine when we need to run vacuum has changed with index-only scans, and I am not sure we ever fully addressed this. Yeah, we didn't. I think the hard part is figuring out what behavior would be best. Counting inserts as well as updates and deletes would be a simple approach, but I don't have much confidence in it. My experience is that having vacuum or analyze kick in during a bulk-load operation is a disaster. We'd kinda like to come up with a way to make vacuum run after the bulk load is complete, maybe, but how would we identify that time, and there are probably cases where that's not right either. I am unsure how we have gone a year with index-only scans and I am just now learning that it only works well with update/delete workloads or by running vacuum manually. I only found this out going back over January emails. Did other people know this? Was it not considered a serious problem? Well, our logic has been that vacuum is only for removing expired rows. I think we either need to improve that, or somehow make sequential scans update the VM map, and then find a way to trigger update of relallvisible even without inserts. Ideas - I think we need to detect tables that do not have VM bits set and try to determine if they should be vacuumed. If a table has most of its VM bits set, there in need to vacuum it for VM bit setting. Autovacuum knows how many pages are in the table via its file size, and it can scan the VM map to see how many pages are _not_ marked all-visible. If the VM map has many pages that are _not_ marked as all-visible, and change count since last vacuum is low, those pages might now be all-visible and vacuum might find them. One problem is that a long-running transaction is not going to update relallvisible until commit, so you might be vacuuming a table that is being modified, e.g. bulk loads. Do we have any way of detecting if a backend is modifying a table? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
On Thu, Sep 05, 2013 at 10:41:25AM -0700, Jeff Janes wrote: In order to avoid the rewrite, the code would have to be changed to look up the column definition and if it specifies the scale, then ignore the per-row n_header, and look at the n_header only if the column is NUMERIC with no precision or scale. That should conceptually be possible, but I don't know how hard it would be to implement--it sounds pretty invasive to me. Then if the column was altered from NUMERIC with scale to be a plain NUMERIC, it would have to rewrite the table to enforce the row-wise scale to match the old column-wise scale. Where as now that alter doesn't need a re-write. I don't know if this would be an overall gain or not. Invasive indeed. The type-supplementary data would need to reach essentially everywhere we now convey a type OID. Compare the invasiveness of adding collation support. However, this is not the first time it would have been useful. We currently store a type OID in every array and composite datum. That's wasteful and would be unnecessary if we reliably marshalled similar information to all the code needing it. Given a few more use cases, the effort would perhaps start to look credible relative to the benefits. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 09/05/2013 03:30 PM, Merlin Moncure wrote: Standard advice we've given in the past is 25% shared buffers, 75% effective_cache_size. Which would make EFS *3X* shared_buffers, not 4X. Maybe we're changing the conventional calculation, but I thought I'd point that out. This was debated upthread. Actually, no, it wasn't. Tom threw out a suggestion that we use 4X for historical reasons. That's all, there was no discussion. So, my point stands: our historical advice has been to set EFS to 75% of RAM. Maybe we're changing that advice, but if so, let's change it. Otherwise 3X makes more sense. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
Bruce Momjian escribió: Ideas - I think we need to detect tables that do not have VM bits set and try to determine if they should be vacuumed. If a table has most of its VM bits set, there in need to vacuum it for VM bit setting. I think it's shortsighted to keep thinking of autovacuum as just a way to run VACUUM and ANALYZE. We have already discussed work items that need to be done separately, such as truncating the last few empty pages on a relation that was vacuumed recently. We also need to process a GIN index' pending insertion list; and with minmax indexes I will want to run summarization of heap page ranges. So maybe instead of trying to think of VM bit setting as part of vacuum, we could just keep stats about how many pages we might need to scan because of possibly needing to set the bit, and then doing that in autovacuum, independently from actually vacuuming the relation. I'm not sure if we need to expose all these new maintenance actions as SQL commands. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers