Re: [HACKERS] [rfc] overhauling pgstat.stat
Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions
From: Robert Haas robertmh...@gmail.com OK, patch committed and back-patched to 9.3. The patch file turned out to be sorta garbled. I'm not sure if a broken version of diff was used to generate this or whether MauMau hand-edited it after the fact, but the number of lines that were indicated in the control lines didn't match the actual hunks, and patch threw up. So it took me 20 minutes to do what should have taken 5, but now it's done. Thank you, and I'm sorry I caused you much trouble. I edited the patch by hand, being careless about the messy effect. I'll avoid hand-editing. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
(2013/09/04 15:23), Atri Sharma wrote: Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? For example, when you read only a single block from your table, then you need to write all values in your database statistics next. It often generates large amount of i/o operation. However, if random access is allowed in the statistics, you can update only as single record for the specific table which you read. It would be less than 100 bytes for each table. I have no idea about how a priority queue can work here so far. However, if the statistics is overhauled, PostgreSQL would be able to host a much larger number of customers more easily. Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???
On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn wrote: I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. n_header is part of the numeric field's data. That's not just pedantry: the display scale stored in n_header affects how numeric_out() formats the value. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Satoshi, * Satoshi Nagayasu (sn...@uptime.jp) wrote: (2013/09/04 13:07), Alvaro Herrera wrote: We already changed it: commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Mon Feb 18 17:56:08 2013 -0300 Split pgstat file in smaller pieces Thanks for the comments. I forgot to mention that. Yes, we have already split single pgstat.stat file into several pieces. However, we still need to read/write large amount of statistics data when we have a large number of tables in single database or multiple databases being accessed. Right? Would simply also splitting per tablespace help? I think the issue here is that it is necessary to write/read statistics data even it's not actually changed. So, I'm wondering how we can minimize read/write operations on these statistics data files with using heap and btree. It does sound like an interesting idea to use heap/btree instead but I wonder about the effort involved, particularly around coordinating access. We wouldn't want to end up introducing additional contention points by doing this.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [rfc] overhauling pgstat.stat
Sent from my iPad On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote: (2013/09/04 15:23), Atri Sharma wrote: Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? For example, when you read only a single block from your table, then you need to write all values in your database statistics next. It often generates large amount of i/o operation. However, if random access is allowed in the statistics, you can update only as single record for the specific table which you read. It would be less than 100 bytes for each table. I have no idea about how a priority queue can work here so far. However, if the statistics is overhauled, PostgreSQL would be able to host a much larger number of customers Ah, now I get it. Thanks a ton for the detailed explanation. Yes, a BTree will sufficiently isolate per table stats here and allow for random access. Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
2013/9/4 Atri Sharma atri.j...@gmail.com Sent from my iPad On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote: (2013/09/04 15:23), Atri Sharma wrote: Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? For example, when you read only a single block from your table, then you need to write all values in your database statistics next. It often generates large amount of i/o operation. However, if random access is allowed in the statistics, you can update only as single record for the specific table which you read. It would be less than 100 bytes for each table. I have no idea about how a priority queue can work here so far. However, if the statistics is overhauled, PostgreSQL would be able to host a much larger number of customers Ah, now I get it. Thanks a ton for the detailed explanation. Yes, a BTree will sufficiently isolate per table stats here and allow for random access. Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though. we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1) Regards Pavel Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
2013/9/4 Atri Sharma atri.j...@gmail.com Sent from my iPad On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp wrote: (2013/09/04 15:23), Atri Sharma wrote: Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? For example, when you read only a single block from your table, then you need to write all values in your database statistics next. It often generates large amount of i/o operation. However, if random access is allowed in the statistics, you can update only as single record for the specific table which you read. It would be less than 100 bytes for each table. I have no idea about how a priority queue can work here so far. However, if the statistics is overhauled, PostgreSQL would be able to host a much larger number of customers Ah, now I get it. Thanks a ton for the detailed explanation. Yes, a BTree will sufficiently isolate per table stats here and allow for random access. Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though. we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1 Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On 2013-09-04 10:02:05 -0400, Robert Haas wrote: On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think it particularly needs to be configurable, but I wonder if we can't be a bit smarter about when we do it. For example, suppose we logged it every 15 s but only until we log a non-overflowed snapshot. There's actually more benefits than just overflowed snapshots (pruning of the known xids machinery, exclusive lock cleanup). I know that, but I thought the master and slave could only lose sync on those things after a master crash and that once per checkpoint cycle was enough for those other benefits. Am I wrong? The xid tracking can keep track without the additional records but it sometimes needs a good bit more memory to do so if the primary burns to xids quite fast. Everytime we see an running xacts record we can do cleanup (that's the ExpireOldKnownAssignedTransactionIds() in ProcArrayApplyRecoveryInfo()). The problem with using dbname=replication as a trigger for anything is that we actually allow databases to be created with that name. Perhaps that was a design mistake. It seemed like a good idea at the time, but maybe it wasn't. I'm not sure where to go with it at this point; a forcible backward compatibility break would probably screw things up for a lot of people. Yes, breaking things now doesn't seem like a good idea. I wondered about turning replication from a boolean into something like off|0, on|1, database. dbname= gets only used in the latter variant. That would be compatible with previous versions and would even support using old tools (since all of them seem to do replication=1). I don't love that, but I don't hate it, either. Ok. Will update the patch that way. Seems better than it's current state. But it still doesn't answer the following question, which I think is important: if I (or someone else) commits this patch, how will that make things better for users? At the moment it's just adding a knob that doesn't do anything for you when you twist it. I am not sure it's a good idea to commit it before we're sure were going to commit the changeset extraction. It's an independently reviewable and testable piece of code that's simple enough to understand quickly in contrast to the large changeset extraction patch. That's why I kept it separate. On the other hand, as you know, it's not without precedent to commit pieces of infrastructure that aren't really useful for the enduser (think FDW). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The PostgreSQL License requires LICENSE file?
On 09/03/2013 11:20 PM, Tomonari Katsumata wrote: I understands that the PostgreSQL license does not require any specified file name and missing LICENSE is not problem. I'm Sorry for the stupid question. No, thank *you* for pointing it out! None of us had actually looked at that page. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Tue, 2013-09-03 at 15:42 -0400, Robert Haas wrote: Although this is true, the administrator's estimate of whether that guarantee is or is not provided might not be as consistent as the hardware behavior itself. I am generally of the feeling that having to restart the server to change setting sucks, and while it surely sucks less for this setting than some, mostly because few people change this setting in the first place, I'm still not convinced that this is moving in the right direction. I think code complexity matters quite a lot. If we can eliminate some complex code in a complex area, and all we give up is a feature with essentially no use case, that sounds like we're moving in the right direction to me. I suppose some might be using it as a hack when they really just want to temporarily disable WAL during a load or something. Seems like a blunt tool though, and I haven't heard of anyone doing that or suggesting it. And it doesn't store the page hole anyway, so the FPI during a load is ordinarily quite small. Then, I intend to write another patch to make the full-page writes for checksums honor the full_page_writes setting. That will be easier to write once it's a PGC_POSTMASTER. I don't think I know exactly what this means. XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at the full page writes setting (like in XLogInsert()). That means, if checksums are enabled and full_page_writes is off, we'll still write some full page images for checksums. I'd like to remedy that. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown
On 2013-09-04 10:53:19 -0400, Tom Lane wrote: Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes: I propose a tiny fix to clarify this. Please find the attached patch. That's not an accurate description of what happens, though. AFAIR, we do not throw away pg_stats files as a result of recovery. StartupXLOG() does a pgstat_reset_all() in the if (InRecovery) branch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On Tue, Sep 3, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think it particularly needs to be configurable, but I wonder if we can't be a bit smarter about when we do it. For example, suppose we logged it every 15 s but only until we log a non-overflowed snapshot. There's actually more benefits than just overflowed snapshots (pruning of the known xids machinery, exclusive lock cleanup). I know that, but I thought the master and slave could only lose sync on those things after a master crash and that once per checkpoint cycle was enough for those other benefits. Am I wrong? The patch as-is only writes if there has been WAL written since the last time it logged a running_xacts. I think it's not worth building more smarts than that? Hmm, maybe. Because I don't see any reason to believe that this WAL record is any more important or urgent than any other WAL record that might get logged. I can't follow the logic behind that statement. Just about all WAL records are either pretty immediately flushed afterwards or are done in the context of a transaction where we flush (or do a XLogSetAsyncXactLSN) at transaction commit. XLogBackgroundFlush() won't necessarily flush the running_xacts record. OK, this was the key point I was missing. It seems we need some more design there. Perhaps entering replication mode could be triggered by writing either dbname=replication or replication=yes. But then, do the replication commands simply become SQL commands? I've certainly seen hackers use them that way. And I can imagine that being a sensible approach, but this patch seems like it's only covering a fairly small fraction of what really ought to be a single commit. Yes. I think you're right that we need more input/design here. I've previously started threads about it, but nobody replied :(. The problem with using dbname=replication as a trigger for anything is that we actually allow databases to be created with that name. Perhaps that was a design mistake. It seemed like a good idea at the time, but maybe it wasn't. I'm not sure where to go with it at this point; a forcible backward compatibility break would probably screw things up for a lot of people. I wondered about turning replication from a boolean into something like off|0, on|1, database. dbname= gets only used in the latter variant. That would be compatible with previous versions and would even support using old tools (since all of them seem to do replication=1). I don't love that, but I don't hate it, either. But it still doesn't answer the following question, which I think is important: if I (or someone else) commits this patch, how will that make things better for users? At the moment it's just adding a knob that doesn't do anything for you when you twist it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the security-barrier view stuff did not present itself as a 100% solution. But perhaps more to the point, it was conceptually simple to implement, ie don't flatten views if they have this bit set, and don't push down quals into such sub-selects unless they're marked leakproof. Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. Simon and Greg are arguing that when an INSERT or UPDATE happens, we ought to also check that the NEW row matches the RLS qual. I don't find that to be terribly important because you can accomplish the same thing with a per-row trigger today; and I also don't think everyone will want that behavior. Some people will, I'm pretty sure, want to let users give away rows, either unconditionally or subject to defined restrictions. Perhaps it's worth having, but it's a separate feature, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On 2013-09-04 08:01:30 -0700, Jeff Davis wrote: On Thu, 2013-08-08 at 17:42 -0400, Tom Lane wrote: Jon Nelson jnelson+pg...@jamponi.net writes: At this point I'm convinced that the issue is a pathological case in ext4. The performance impact disappears as soon as the unwritten extent(s) are written to with real data. Thus, even though allocating files with posix_fallocate is - frequently - orders of magnitude quicker than doing it with write(2), the subsequent re-write can be more expensive. At least, that's what I'm gathering from the various threads. Why this issue didn't crop up in earlier testing and why I can't seem to make test_fallocate do it (even when I modify test_fallocate to write to the newly-allocated file in a mostly-random fashion) has me baffled. Does your test program use all the same writing options that the real WAL writes do (like O_DIRECT)? Should this feature be reconsidered? Well, ext4 isn't the whole world, but it's sure a big part of it. The real point though is that obviously we didn't do enough performance testing, so we'd better do more before deciding what to do. Greg Smith, Do you have some thoughts about this? More tests we should run? We may need to revert the patch. I'd vote for adding zeroing *after* the fallocate() first. That's what's suggested by kernel hackers and what several other large applications do. As it looks like it's what we would have to do if we ever get to use fallocate for relation extension where we would have actual benefits from it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
Robert Haas robertmh...@gmail.com writes: On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's entirely sensible to question whether we should reject (not hold up) RLS if it has major covert-channel problems. We've already had this argument before, about the security_barrier view stuff, and that code got committed and is already released. So the horse is already out of the barn and no amount of wishing will put it back in. Well, the security-barrier view stuff did not present itself as a 100% solution. But perhaps more to the point, it was conceptually simple to implement, ie don't flatten views if they have this bit set, and don't push down quals into such sub-selects unless they're marked leakproof. I haven't reviewed this patch in a long time, but I would expect that it's basically just reusing that same infrastructure; in fact, I'd expect that it's little more than syntactic sugar around that infrastructure. I've not read it in great detail, but it isn't that. It's whacking the planner around in ways that I have no confidence in, and probably still wouldn't have any confidence in if they'd been adequately documented. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On 2013-09-04 07:57:15 -0700, Jeff Davis wrote: XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at the full page writes setting (like in XLogInsert()). That means, if checksums are enabled and full_page_writes is off, we'll still write some full page images for checksums. I'd like to remedy that. I don't think that's really as easy as it sounds without removing the ability to do base backups with full_page_writes = off. The interlocking that would require makes things complex... Personally I'd rather forbid enabling checkpoints in the combination with full_page_writes = off. That doesn't seem like a good idea to me and I am far from convinced it's actually going to work in all corner cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Sun, Sep 1, 2013 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote: And if someone can INSERT values that they can't actually see once they're committed, that's a similarly bad we should describe. This is desirable in some cases but not others. If the goal is compartmentalization, then it's sensible to prevent this. But you might also have a drop-box environment - e.g. a student submits coursework to a professor, and can't access the submitted work after it's submitted. FWIW, my CS classes in college had a tool that worked just this way. Or maybe an analyst writes a report and is then permitted to give away the document to his boss for revisions. Once the ownership of the document has changed, the analyst can't see it any more, because he can only see the documents he owns. And maybe he's not permitted to give away documents to just anyone (polluting their sandbox), but he can give them to his boss (who expects to receive them). The point is that we should be in the business of providing mechanism, not policy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
2013/9/3 Bruce Momjian br...@momjian.us: On Sun, Sep 1, 2013 at 11:05:58AM -0700, Josh Berkus wrote: Security community also concludes it is not avoidable nature as long as human can observe system behavior and estimate something, thus, security evaluation criteria does not require eliminate covert-channels or does not pay attention about covert-channels for the products that is installed on the environment with basic robustness (that means, non-military, regular enterprise usage). To be completely blunt, the security community does not understand databases. At all. I'd think if anything had become clear through the course of work on SEPosgres, it would be that. Agreed. The security community realizes these covert channels exist, but doesn't really have any recommendations on how to avoid them. You could argue that avoiding them is too tied to specific database implementations, but there are general channels, like insert with a unique key, that should at least have well-defined solutions. The security community also provides an extreme solution, but I don't think it is suitable for flexible security policy and PostgreSQL wants it. Their extreme solution manipulate definition of PK that automatically become combined key that takes user-given key and security level being set mandatory. Thus, it does not conflict even if two different users with different security level tries to insert a row with same primary key. This technology is called polyinstantiation. http://en.wikipedia.org/wiki/Polyinstantiation However, of course, I'm not favor to port this technology to PostgreSQL world. Its side-effects are too much towards the problem to be solved. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On Thu, 2013-08-08 at 17:42 -0400, Tom Lane wrote: Jon Nelson jnelson+pg...@jamponi.net writes: At this point I'm convinced that the issue is a pathological case in ext4. The performance impact disappears as soon as the unwritten extent(s) are written to with real data. Thus, even though allocating files with posix_fallocate is - frequently - orders of magnitude quicker than doing it with write(2), the subsequent re-write can be more expensive. At least, that's what I'm gathering from the various threads. Why this issue didn't crop up in earlier testing and why I can't seem to make test_fallocate do it (even when I modify test_fallocate to write to the newly-allocated file in a mostly-random fashion) has me baffled. Does your test program use all the same writing options that the real WAL writes do (like O_DIRECT)? Should this feature be reconsidered? Well, ext4 isn't the whole world, but it's sure a big part of it. The real point though is that obviously we didn't do enough performance testing, so we'd better do more before deciding what to do. Greg Smith, Do you have some thoughts about this? More tests we should run? We may need to revert the patch. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think it's entirely sensible to question whether we should reject (not hold up) RLS if it has major covert-channel problems. We've already had this argument before, about the security_barrier view stuff, and that code got committed and is already released. So the horse is already out of the barn and no amount of wishing will put it back in. I haven't reviewed this patch in a long time, but I would expect that it's basically just reusing that same infrastructure; in fact, I'd expect that it's little more than syntactic sugar around that infrastructure. (If it it's instead introducing a whole new mechanism, then I think that's reason enough to reject it right there.) My main question about this is whether that syntactic sugar is really worth having given that it doesn't add any real new functionality, not about the covert channel issues, which are already a done deal. And frankly, I'm with the group that says the covert channel issues are not really a big deal. In many real-world cases, the user can control only the values that get subbed into queries that get sent to the database, not the queries themselves, which eliminates a large category of attacks. Real-world example, from last job: sales reps only get to see their own accounts, not accounts of other sales reps. They could input new accounts (with sales_rep_id set to their ID) and they could query the list of accounts (limited to those where sales_rep_id matched their ID) - pulling either all of them or searching by account name, both through a web application. Yeah, a sales rep could have launched a timing attack through the web interface, and they could also have polled for the existence of accounts by trying to create accounts with names that might already exist in the system to see whether a duplicate got flagged. But neither attack had enough useful bandwidth to matter; a sales rep wishing to learn our full account list (so that he could try to poach them after leaving the company) could have learned a lot more a lot faster via social engineering, and with less risk of being caught doing something that would have resulted in his or her immediate termination. The point is, I don't think RLS needs to solve every problem. What it needs to do is solve one problem well, so that it can be used in combination with other techniques to achieve a certain set of security objectives. If, in a particular environment, EXPLAIN is an issue, then it can be blocked in that environment via a variety of well-understood techniques. That's not very hard to do even without core support, and if we want to add core support, fine, but that's a separate patch. This is a patch to add row-level security, and it deserves to be judged on how well or poorly it does that, not on whether it solves covert channel problems that represent a mostly orthogonal set of technical issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
* Andres Freund (and...@2ndquadrant.com) wrote: I'd vote for adding zeroing *after* the fallocate() first. That's what's suggested by kernel hackers and what several other large applications do. As it looks like it's what we would have to do if we ever get to use fallocate for relation extension where we would have actual benefits from it. Does that actually end up doing anything different from what we were doing pre-patch here? At best, it *might* end up using a larger extent, but unless we can actually be confident that it does, I'm not convinced the additional complexity is worth it and would rather see this simply reverted. One might ask why the kernel guys aren't doing this themselves or figuring out why it's necessary to make it worthwhile. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [v9.4] row level security
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. No, it won't, because we don't support direct update/delete on views (and if you look, you'll notice the auto-updatable-view stuff doesn't think a security-barrier view is auto-updatable). AFAICT, to deal with update/delete the RLS patch needs to constrain order of qual application without the crutch of having a separate level of subquery; and it's that behavior that I have zero confidence in, either as to whether it works as submitted or as to our odds of not breaking it in the future. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
Hi, On 2013-09-04 11:15:37 -0400, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: I'd vote for adding zeroing *after* the fallocate() first. That's what's suggested by kernel hackers and what several other large applications do. As it looks like it's what we would have to do if we ever get to use fallocate for relation extension where we would have actual benefits from it. Does that actually end up doing anything different from what we were doing pre-patch here? At best, it *might* end up using a larger extent, but unless we can actually be confident that it does, I'm not convinced the additional complexity is worth it and would rather see this simply reverted. It does end up with larger extents. That's why e.g. several of the virtualization solutions do it for container files. I was sceptical of the idea of using fallocate() for WAL file allocation from the get go and voiced that repeatedly, but the argument was made that it's a convenient testcase to familiarize ourselves with fallocate(). As that argument won we should stick to it and learn the ropes. Part of that is zeroing, so we should do that. It's a single write() + error checking, so I don't really see much complexity that way. One might ask why the kernel guys aren't doing this themselves or figuring out why it's necessary to make it worthwhile. So, if I understood Ted correctly the reason it's slower is that fallocate() onl marks the extents as containing 'zero' without actually touching the data. When then a partial write into the area is done the rest of the page has to be actively zeroed on disk. I am not sure that explains all the slowdown, but given that according to tests published in this thread, the additional zeroing helps to get at the former + small win state... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of maintainer-check
On 2013-09-03 22:41:17 -0400, Peter Eisentraut wrote: The maintainer-check target never really caught on, I think. Most people don't run it, and that in turn annoys those who do. Also, it doesn't provide much functionality. I propose that we get rid of it and roll the functionality into the regular build. Specifically: - Running duplicate_oids during the regular build was already discussed elsewhere recently. There are some details to be resolved there, but it's doable. Maybe we should also badger cpluspluscheck into a state where it can be run as part of a normal build if a c++ compiler was detected? I think it misses vpath support and it might be dependant on some bashims. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of maintainer-check
On Tue, Sep 3, 2013 at 10:41 PM, Peter Eisentraut pete...@gmx.net wrote: The maintainer-check target never really caught on, I think. Most people don't run it, and that in turn annoys those who do. Also, it doesn't provide much functionality. I propose that we get rid of it and roll the functionality into the regular build. Specifically: - Running duplicate_oids during the regular build was already discussed elsewhere recently. There are some details to be resolved there, but it's doable. - Checking for tabs in SGML files can be run during the regular documentation build without problems. - The NLS checks can also be run during the regular NLS-enabled build. That's it. Any concerns? I can't speak for anyone else, but personally I think that sounds like a significant improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
* Robert Haas (robertmh...@gmail.com) wrote: On Sun, Sep 1, 2013 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote: And if someone can INSERT values that they can't actually see once they're committed, that's a similarly bad we should describe. This is desirable in some cases but not others. If the goal is compartmentalization, then it's sensible to prevent this. But you might also have a drop-box environment - e.g. a student submits coursework to a professor, and can't access the submitted work after it's submitted. FWIW, my CS classes in college had a tool that worked just this way. Agreed, and part of the discussion that I had w/ KaiGai and Simon was that we should provide a way to let the user pick which they'd like. This is the concept around 'insert privileges' being different from 'select privileges' wrt RLS. The point is that we should be in the business of providing mechanism, not policy. ++ Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [v9.4] row level security
Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. +1 Simon and Greg are arguing that when an INSERT or UPDATE happens, we ought to also check that the NEW row matches the RLS qual. I don't find that to be terribly important because you can accomplish the same thing with a per-row trigger today; and I also don't think everyone will want that behavior. As an example from my Wisconsin Courts days, source documents come in which need to be entered, which may contain a Social Security Number, and most of the Clerk of Courts staff should be authorized to enter that into the database. Once it is entered, most people should not be allowed to view it, including many of the people who were authorized to enter it initially. It's one thing for a line staff person to have a handful of documents come across their desk with SSN on a given day; it's quite another if they could dump a list of names, addresses, dates of birth, and SSNs for the entire database on demand. In a way this issue is similar to the covert channel issue -- volume matters. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown
Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes: I propose a tiny fix to clarify this. Please find the attached patch. That's not an accurate description of what happens, though. AFAIR, we do not throw away pg_stats files as a result of recovery. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the security-barrier view stuff did not present itself as a 100% solution. But perhaps more to the point, it was conceptually simple to implement, ie don't flatten views if they have this bit set, and don't push down quals into such sub-selects unless they're marked leakproof. Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: I'd vote for adding zeroing *after* the fallocate() first. That's what's suggested by kernel hackers and what several other large applications do. As it looks like it's what we would have to do if we ever get to use fallocate for relation extension where we would have actual benefits from it. Does that actually end up doing anything different from what we were doing pre-patch here? At best, it *might* end up using a larger extent, but unless we can actually be confident that it does, I'm not convinced the additional complexity is worth it and would rather see this simply reverted. One might ask why the kernel guys aren't doing this themselves or figuring out why it's necessary to make it worthwhile. The larger picture is that that isn't the committed behavior, but a different one, one which would need performance testing. At this point, I vote for reverting the patch and allowing it to be resubmitted for a fresh round of testing with the zeroing added. And this time we'll need to do the testing more carefully. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
Jeff Davis pg...@j-davis.com writes: On Tue, 2013-09-03 at 15:42 -0400, Robert Haas wrote: Although this is true, the administrator's estimate of whether that guarantee is or is not provided might not be as consistent as the hardware behavior itself. I am generally of the feeling that having to restart the server to change setting sucks, and while it surely sucks less for this setting than some, mostly because few people change this setting in the first place, I'm still not convinced that this is moving in the right direction. I think code complexity matters quite a lot. If we can eliminate some complex code in a complex area, and all we give up is a feature with essentially no use case, that sounds like we're moving in the right direction to me. Isn't this whole discussion academic in view of Andres' point? http://www.postgresql.org/message-id/20130903121935.gb5...@awork2.anarazel.de regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
2013/9/4 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the security-barrier view stuff did not present itself as a 100% solution. But perhaps more to the point, it was conceptually simple to implement, ie don't flatten views if they have this bit set, and don't push down quals into such sub-selects unless they're marked leakproof. Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? This patch does not care about insert, because it shall be done around the place where we usually put before-row-insert; that is not related to planner. Regarding to update/delete, this patch also enhanced to allow update or delete mechanism allows to take a sub-query on top of the table scan plan. So, its explain output shows as follows: postgres= EXPLAIN (costs off) UPDATE customer SET email = 'al...@example.com'; QUERY PLAN -- Update on customer - Subquery Scan on customer - Seq Scan on customer customer_1 Filter: (current_user() = uname) You can see this update has Subquery plan instead of regular relation scan. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
2013/9/1 Greg Stark st...@mit.edu: On Sun, Sep 1, 2013 at 8:31 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Or, any other criteria even though? My (current) preference is plan (c: we will be able to fix up *this* cover-channel with reasonable efforts on explain code. probably, we can close it if we don't print filtered rows under the sub-query with security-barrier attribute. I think the criteria being discussed in this thread are too strict. It may be the case that Postgres cannot make a strong general case that it protects against covert channels. However it may still be able to make the much weaker case that it is *possible* to arrange your database such that the covert channels are kept under control. Yes. I have to admit it is difficult to determine a strict and regular rule to handle covert-channel scenario. Sorry, the later half of this sentence is uncertain for me. Are you saying, even if we could have a strict rule, we may have many possible covert channel for information leakage?? So I think up above Tom is wrong about why it's ok that INSERT leaks keys when it reports a unique key violation. The reason why it's ok that there's a covert channel there is that the DBA can avoid that covert channel by being careful when creating unique constraints. He or she should be aware that creating a unique constraint implicitly provides a kind of limited access to data to users who have INSERT privilege even if they lack the real SELECT privilege. IIRC, we discussed and concluded that the above information leakage scenario shall be described in the documentation, and the way to avoid valuable information leakage using alternative keys, a few years before. Likewise, as long as the covert channels in RLS are things the DBA has even a modicum of control over I wouldn't be too worried. Afaict from skimming the thread it looks like creating any indexes on columns leaks what values of the index key exist in the table. Is it the case that non-indexed columns do not get leaked? According to the scenario reported by Korotkov, he could find number of rows being filtered by the given qualifier, thus it implies existence of the row with a value in a particular range. Its solution is that I noted above, and I'm working for it now. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Boguk, Maksym maks...@fast.au.fujitsu.com writes: Hi, my task is implementing ANSI NATIONAL character string types as part of PostgreSQL core. No, that's not a given. You have a problem to solve, ie store some UTF8 strings in a database that's mostly just 1-byte data. It is not clear that NATIONAL CHARACTER is the best solution to that problem. And I don't think that you're going to convince anybody that this is an improvement in spec compliance, because there's too much gap between what you're doing here and what it says in the spec. Both of these approach requires dump/restore the whole database which is not always an opinion. That's a disadvantage, agreed, but it's not a large enough one to reject the approach, because what you want to do also has very significant disadvantages. I think it is extremely likely that we will end up rejecting a patch based on NATIONAL CHARACTER altogether. It will require too much duplicative code, it requires too many application-side changes to make use of the functionality, and it will break any applications that are relying on the current behavior of that syntax. But the real problem is that you're commandeering syntax defined in the SQL spec for what is in the end quite a narrow usage. I agree that the use-case will be very handy for some applications ... but if we were ever to try to achieve real spec compliance for the SQL features around character sets, this doesn't look like a step on the way to that. I think you'd be well advised to take a hard look at the specialized-database-encoding approach. From here it looks like a 99% solution for about 1% of the effort; and since it would be quite uninvasive to the system as a whole, it's unlikely that such a patch would get rejected. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Wed, Sep 4, 2013 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the security-barrier view stuff did not present itself as a 100% solution. But perhaps more to the point, it was conceptually simple to implement, ie don't flatten views if they have this bit set, and don't push down quals into such sub-selects unless they're marked leakproof. Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. I haven't reviewed this patch in a long time, but I would expect that it's basically just reusing that same infrastructure; in fact, I'd expect that it's little more than syntactic sugar around that infrastructure. I've not read it in great detail, but it isn't that. It's whacking the planner around in ways that I have no confidence in, and probably still wouldn't have any confidence in if they'd been adequately documented. If that's the case, then I agree that we should not accept it, at least in its present form. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
On Tue, Sep 3, 2013 at 9:08 PM, wangs...@highgo.com.cn wrote: Hi, Hackers! I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Anyone has more idea about this, please come to talk about this! This was fixed in version 9.2. You must be using an older version. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
On 7/19/13 11:59 AM, Atri Sharma wrote: Hi all, This is our current work-in-progress patch for WITHIN GROUP. This patch needs to be rebased. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
2013/9/4 Kevin Grittner kgri...@ymail.com: Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. +1 Simon and Greg are arguing that when an INSERT or UPDATE happens, we ought to also check that the NEW row matches the RLS qual. I don't find that to be terribly important because you can accomplish the same thing with a per-row trigger today; and I also don't think everyone will want that behavior. As an example from my Wisconsin Courts days, source documents come in which need to be entered, which may contain a Social Security Number, and most of the Clerk of Courts staff should be authorized to enter that into the database. Once it is entered, most people should not be allowed to view it, including many of the people who were authorized to enter it initially. It's one thing for a line staff person to have a handful of documents come across their desk with SSN on a given day; it's quite another if they could dump a list of names, addresses, dates of birth, and SSNs for the entire database on demand. In a way this issue is similar to the covert channel issue -- volume matters. I think an important nature of this behavior is it is configurable. In case when both of reader and writer side need to have same security policy, it's good. One configuration allows to apply a consistent security policy to fetch a row from table, and to write a row to table. If they don't want to check security policy on writer side, all they need to do is setting a security policy for SELECT only, even though its functionality is not implemented yet. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr
Is this transformation correct? If I read this correctly, you're missing the rangeTableEntry_used() condition, no? *** a/src/backend/rewrite/rewriteHandler.c --- b/src/backend/rewrite/rewriteHandler.c *** *** 1273,1287 matchLocks(CmdType event, } } ! if (oneLock-event == event) ! { ! if (parsetree-commandType != CMD_SELECT || ! (oneLock-attrno == -1 ? ! rangeTableEntry_used((Node *) parsetree, varno, 0) : ! attribute_used((Node *) parsetree, ! varno, oneLock-attrno, 0))) ! matching_locks = lappend(matching_locks, oneLock); ! } } return matching_locks; --- 1273,1280 } } ! if (oneLock-event == event parsetree-commandType != CMD_SELECT) ! matching_locks = lappend(matching_locks, oneLock); } return matching_locks; -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I think code complexity matters quite a lot. If we can eliminate some complex code in a complex area, and all we give up is a feature with essentially no use case, that sounds like we're moving in the right direction to me. Isn't this whole discussion academic in view of Andres' point? Maybe complex code was an overstatement. We'd be able to eliminate the XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of XLogCtlInsert; and make xlog.c slightly shorter in the process. The first time I looked at doing the patch to honor full_page_writes=off when checksums are on, the fact that fullPageWrites was changeable was a distraction. Since I saw little or no value in what the code offered, my instinct was to see if we could get rid of it. It looks like Simon went to significant effort to maintain the full_page_writes as a PGC_SUGHUP: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8366c780 Maybe he has the best perspective on the value versus complexity? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Sent from my iPad On 04-Sep-2013, at 21:38, Peter Eisentraut pete...@gmx.net wrote: On 7/19/13 11:59 AM, Atri Sharma wrote: Hi all, This is our current work-in-progress patch for WITHIN GROUP. This patch needs to be rebased. This version of patch is quite old.We will be sending an updated patch before the start of September commitfest, with all the points you mentioned taken care of. Thanks for the points. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr
This was previously discussed here: http://www.postgresql.org/message-id/flat/24836.1370713...@sss.pgh.pa.us#24836.1370713...@sss.pgh.pa.us The attached patch implements what I think we agreed on. To recap, ev_attr was present in pg_rewrite at the point that Postgres95 version 1.01 source code was imported to version control, with a default of -1 to mean all columns. It became obsolete in 2002 with commit 95ef6a344821655ce4d0a74999ac49dd6af6d342, which went into PostgreSQL version 7.3, removing the ability to define a rule on a specific column; however, this column and over 100 lines of vestigial code was left behind. Later code was written as though 0 was used to mean all columns, as is done elsewhere in the code, although pre-existing code was not changed to match. That inconsistency didn't much matter since there was no way to define anything which exercised the code, short of hacking the system tables directly. The patch removes the obsolete column from pg_rewrite, and all the vestigial code I was able to find. The justification for the patch is to eliminate over 100 lines of code from an area which is confusing enough without it. Unless someone has an objection or thinks this needs to go through the CF process, I will commit it tomorrow, with a catversion bump. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 5039,5052 /row row - entrystructfieldev_attr/structfield/entry - entrytypeint2/type/entry - entry/entry - entryThe column this rule is for (currently, always -1 to - indicate the whole table)/entry - /row - - row entrystructfieldev_type/structfield/entry entrytypechar/type/entry entry/entry --- 5039,5044 *** a/src/backend/rewrite/rewriteDefine.c --- b/src/backend/rewrite/rewriteDefine.c *** *** 58,64 static Oid InsertRule(char *rulname, int evtype, Oid eventrel_oid, - AttrNumber evslot_index, bool evinstead, Node *event_qual, List *action, --- 58,63 *** *** 86,92 InsertRule(char *rulname, namestrcpy(rname, rulname); values[Anum_pg_rewrite_rulename - 1] = NameGetDatum(rname); values[Anum_pg_rewrite_ev_class - 1] = ObjectIdGetDatum(eventrel_oid); - values[Anum_pg_rewrite_ev_attr - 1] = Int16GetDatum(evslot_index); values[Anum_pg_rewrite_ev_type - 1] = CharGetDatum(evtype + '0'); values[Anum_pg_rewrite_ev_enabled - 1] = CharGetDatum(RULE_FIRES_ON_ORIGIN); values[Anum_pg_rewrite_is_instead - 1] = BoolGetDatum(evinstead); --- 85,90 *** *** 117,123 InsertRule(char *rulname, * When replacing, we don't need to replace every attribute */ MemSet(replaces, false, sizeof(replaces)); - replaces[Anum_pg_rewrite_ev_attr - 1] = true; replaces[Anum_pg_rewrite_ev_type - 1] = true; replaces[Anum_pg_rewrite_is_instead - 1] = true; replaces[Anum_pg_rewrite_ev_qual - 1] = true; --- 115,120 *** *** 238,244 DefineQueryRewrite(char *rulename, List *action) { Relation event_relation; - int event_attno; ListCell *l; Query *query; bool RelisBecomingView = false; --- 235,240 *** *** 495,501 DefineQueryRewrite(char *rulename, /* * This rule is allowed - prepare to install it. */ - event_attno = -1; /* discard rule if it's null action and not INSTEAD; it's a no-op */ if (action != NIL || is_instead) --- 491,496 *** *** 503,509 DefineQueryRewrite(char *rulename, ruleId = InsertRule(rulename, event_type, event_relid, - event_attno, is_instead, event_qual, action, --- 498,503 *** a/src/backend/rewrite/rewriteHandler.c --- b/src/backend/rewrite/rewriteHandler.c *** *** 1273,1287 matchLocks(CmdType event, } } ! if (oneLock-event == event) ! { ! if (parsetree-commandType != CMD_SELECT || ! (oneLock-attrno == -1 ? ! rangeTableEntry_used((Node *) parsetree, varno, 0) : ! attribute_used((Node *) parsetree, ! varno, oneLock-attrno, 0))) ! matching_locks = lappend(matching_locks, oneLock); ! } } return matching_locks; --- 1273,1280 } } ! if (oneLock-event == event parsetree-commandType != CMD_SELECT) ! matching_locks = lappend(matching_locks, oneLock); } return matching_locks; *** *** 1295,1301 static Query * ApplyRetrieveRule(Query *parsetree, RewriteRule *rule, int rt_index, - bool relation_level, Relation relation, List *activeRIRs, bool forUpdatePushedDown) --- 1288,1293 *** *** 1309,1316 ApplyRetrieveRule(Query *parsetree, elog(ERROR, expected just one rule action); if (rule-qual != NULL)
Re: [HACKERS] [v9.4] row level security
2013/9/4 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 4, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Right. IMHO, this new feature should be similarly simple: when an unprivileged user references a table, treat that as a reference to a leakproof view over the table, with the RLS qual injected into the view. And for insert/update/delete, we do what exactly? The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. No, it won't, because we don't support direct update/delete on views (and if you look, you'll notice the auto-updatable-view stuff doesn't think a security-barrier view is auto-updatable). AFAICT, to deal with update/delete the RLS patch needs to constrain order of qual application without the crutch of having a separate level of subquery; and it's that behavior that I have zero confidence in, either as to whether it works as submitted or as to our odds of not breaking it in the future. Are you suggesting to rewrite update / delete statement to filter out unprivileged rows from manipulation? Yes. I also thought it is a simple solution that does not need additional enhancement to allow update / delete to take sub-query on top of reader side plan. For example, if security policy is (t1.owner = current_user) and the given query was UPDATE t1 SET value = value || '_updated' WHERE value like '%abc%', this query may be able to rewritten as follows: UPDATE t1 SET value = value || '_updated' WHERE tid = ( SELECT tid FROM t1 WHERE t1.owner = current_user ) AND value like '%abc%'; This approach makes implementation simple, but it has to scan the relation twice, thus its performance it not ideal, according to the past discussion. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Wed, 2013-09-04 at 17:00 +0200, Andres Freund wrote: On 2013-09-04 07:57:15 -0700, Jeff Davis wrote: XLogSaveBufferForHint() calls XLogCheckBuffer() but doesn't also look at the full page writes setting (like in XLogInsert()). That means, if checksums are enabled and full_page_writes is off, we'll still write some full page images for checksums. I'd like to remedy that. I don't think that's really as easy as it sounds without removing the ability to do base backups with full_page_writes = off. The interlocking that would require makes things complex... I didn't dig into that part yet. I was mostly distracted by the code to support changing full_page_writes with SIGHUP. One option would be to have XLogInsert return early if full_page_writes is off, it's an XLOG_FPI record, and forcePageWrites is off. Personally I'd rather forbid enabling checkpoints in the combination with full_page_writes = off. That doesn't seem like a good idea to me and I am far from convinced it's actually going to work in all corner cases. Hmm. It's good to be cautious when deploying on a less-common configuration. However, I don't think it's a good idea to reject seemingly valid combinations that are supposed to work due to a lack of confidence in the review/testing process. Might be an area warranting some further review and testing; I'll take a look, but feel free to tell me if you can think of specific problem areas. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr
Kevin Grittner kgri...@ymail.com writes: Unless someone has an objection or thinks this needs to go through the CF process, I will commit it tomorrow, with a catversion bump. Shouldn't attribute_used() be removed from rewriteManip.h? I was a bit surprised by your removal of the rangeTableEntry_used() test in the hunk at rewriteHandler.c:1273ff. That's probably all right, but it takes this out of the realm of a mechanical change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Wed, Sep 4, 2013 at 11:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: The same mechanism will prevent UPDATE and DELETE from seeing any rows the user shouldn't be able to touch. No, it won't, because we don't support direct update/delete on views (and if you look, you'll notice the auto-updatable-view stuff doesn't think a security-barrier view is auto-updatable). AFAICT, to deal with update/delete the RLS patch needs to constrain order of qual application without the crutch of having a separate level of subquery; and it's that behavior that I have zero confidence in, either as to whether it works as submitted or as to our odds of not breaking it in the future. I don't really see why. AIUI, the ModifyTable node just needs to get the right TIDs. It's not like that has to be stacked directly on top of a scan; indeed, in cases like UPDATE .. FROM and DELETE .. USING it already isn't. Maybe there's some reason why the intervening level can be a Join but not a SubqueryScan, but if so I'd expect we could find some way of lifting that limitation without suffering too much pain. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
On Tue, Sep 3, 2013 at 3:13 AM, wangs...@highgo.com.cn wrote: 于 2013-09-03 08:15, David Johnston 回复: Jeff Davis-8 wrote Is there any semantic difference between marking a constraint as DISABLED and simply dropping it? Or does it just make it easier to re-add it later? David Johnston wrote: I cannot answer the question but if there is none then the main concern I'd have is capturing meta-information about WHY such a constraint has been disabled instead of dropped. Drop/build and disable/enable constraint has no fundamental difference, and could achieve the same purpose.What I do also more convenient for the user. Recording the disabled constraints is easier than recoding all the constrains. What's more, a lot of people ever asked about turing off constraint and The sql2008 support this.So I think it's necessary in some ways. Please add your patch to the upcoming CommitFest so we don't forget about it. https://commitfest.postgresql.org/action/commitfest_view/open Please see also https://wiki.postgresql.org/wiki/Submitting_a_Patch -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
On 7/8/13 10:05 AM, Pavel Stehule wrote: I am testing your code, and It increase speed of sum about 24% faster then original implementation. This patch needs to be rebased (and/or the later version registered in the commit fest). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr
Tom Lane t...@sss.pgh.pa.us wrote: Shouldn't attribute_used() be removed from rewriteManip.h? Yeah, I don't know how I missed that. Thanks. I was a bit surprised by your removal of the rangeTableEntry_used() test in the hunk at rewriteHandler.c:1273ff. That's probably all right, but it takes this out of the realm of a mechanical change. [ also questioned by Álvaro ] I'll leave that as it was -- it can be discussed separately from the mechanical changes. New patch attached. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 5039,5052 /row row - entrystructfieldev_attr/structfield/entry - entrytypeint2/type/entry - entry/entry - entryThe column this rule is for (currently, always -1 to - indicate the whole table)/entry - /row - - row entrystructfieldev_type/structfield/entry entrytypechar/type/entry entry/entry --- 5039,5044 *** a/src/backend/rewrite/rewriteDefine.c --- b/src/backend/rewrite/rewriteDefine.c *** *** 58,64 static Oid InsertRule(char *rulname, int evtype, Oid eventrel_oid, - AttrNumber evslot_index, bool evinstead, Node *event_qual, List *action, --- 58,63 *** *** 86,92 InsertRule(char *rulname, namestrcpy(rname, rulname); values[Anum_pg_rewrite_rulename - 1] = NameGetDatum(rname); values[Anum_pg_rewrite_ev_class - 1] = ObjectIdGetDatum(eventrel_oid); - values[Anum_pg_rewrite_ev_attr - 1] = Int16GetDatum(evslot_index); values[Anum_pg_rewrite_ev_type - 1] = CharGetDatum(evtype + '0'); values[Anum_pg_rewrite_ev_enabled - 1] = CharGetDatum(RULE_FIRES_ON_ORIGIN); values[Anum_pg_rewrite_is_instead - 1] = BoolGetDatum(evinstead); --- 85,90 *** *** 117,123 InsertRule(char *rulname, * When replacing, we don't need to replace every attribute */ MemSet(replaces, false, sizeof(replaces)); - replaces[Anum_pg_rewrite_ev_attr - 1] = true; replaces[Anum_pg_rewrite_ev_type - 1] = true; replaces[Anum_pg_rewrite_is_instead - 1] = true; replaces[Anum_pg_rewrite_ev_qual - 1] = true; --- 115,120 *** *** 238,244 DefineQueryRewrite(char *rulename, List *action) { Relation event_relation; - int event_attno; ListCell *l; Query *query; bool RelisBecomingView = false; --- 235,240 *** *** 495,501 DefineQueryRewrite(char *rulename, /* * This rule is allowed - prepare to install it. */ - event_attno = -1; /* discard rule if it's null action and not INSTEAD; it's a no-op */ if (action != NIL || is_instead) --- 491,496 *** *** 503,509 DefineQueryRewrite(char *rulename, ruleId = InsertRule(rulename, event_type, event_relid, - event_attno, is_instead, event_qual, action, --- 498,503 *** a/src/backend/rewrite/rewriteHandler.c --- b/src/backend/rewrite/rewriteHandler.c *** *** 1276,1285 matchLocks(CmdType event, if (oneLock-event == event) { if (parsetree-commandType != CMD_SELECT || ! (oneLock-attrno == -1 ? ! rangeTableEntry_used((Node *) parsetree, varno, 0) : ! attribute_used((Node *) parsetree, ! varno, oneLock-attrno, 0))) matching_locks = lappend(matching_locks, oneLock); } } --- 1276,1282 if (oneLock-event == event) { if (parsetree-commandType != CMD_SELECT || ! rangeTableEntry_used((Node *) parsetree, varno, 0)) matching_locks = lappend(matching_locks, oneLock); } } *** *** 1295,1301 static Query * ApplyRetrieveRule(Query *parsetree, RewriteRule *rule, int rt_index, - bool relation_level, Relation relation, List *activeRIRs, bool forUpdatePushedDown) --- 1292,1297 *** *** 1309,1316 ApplyRetrieveRule(Query *parsetree, elog(ERROR, expected just one rule action); if (rule-qual != NULL) elog(ERROR, cannot handle qualified ON SELECT rule); - if (!relation_level) - elog(ERROR, cannot handle per-attribute ON SELECT rule); if (rt_index == parsetree-resultRelation) { --- 1305,1310 *** *** 1632,1645 fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) if (rule-event != CMD_SELECT) continue; - if (rule-attrno 0) - { - /* per-attr rule; do we need it? */ - if (!attribute_used((Node *) parsetree, rt_index, - rule-attrno, 0)) - continue; - } - locks = lappend(locks, rule); } --- 1626,1631 *** *** 1664,1670 fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) parsetree = ApplyRetrieveRule(parsetree, rule,
Re: [HACKERS] Improving avg performance for numeric
2013/9/4 Peter Eisentraut pete...@gmx.net On 7/8/13 10:05 AM, Pavel Stehule wrote: I am testing your code, and It increase speed of sum about 24% faster then original implementation. This patch needs to be rebased (and/or the later version registered in the commit fest). I updated a commit fest info Regards Pavel
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On 2013-09-04 09:23:20 -0700, Jeff Davis wrote: On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I think code complexity matters quite a lot. If we can eliminate some complex code in a complex area, and all we give up is a feature with essentially no use case, that sounds like we're moving in the right direction to me. Isn't this whole discussion academic in view of Andres' point? Maybe complex code was an overstatement. We'd be able to eliminate the XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of XLogCtlInsert; and make xlog.c slightly shorter in the process. That path is also executed during a normal restart and during promotion. Check the invocation of UpdateFullPageWrites() in StartupXLOG(). Note that a standby needs to be able to follow a primaries full_page_writes setting during a promotion. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
On 4.9.2013 07:24, Satoshi Nagayasu wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Hi, as one of the authors of the 9.3 patch (per database stats), I planned to work on this a bit more in the 9.4 cycle. So a few comments / ideas. I'm not entirely sure splitting the stats per table would be that bad. After all we already have per-relation data files, so either the users already have serious problems (so this won't make it noticeably worse) or it will work fine. But I'm not saying it's the right choice either. My idea was to keep the per-database stats, but allow some sort of random access - updating / deleting the records in place, adding records etc. The simplest way I could think of was adding a simple index - a mapping of OID to position in the stat file. I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or something like that. This would make it quite simple to access existing record 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for delete set isdeleted flag (needs to be added to all entries) or reading all the records (just read the whole file as today). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
Kohei KaiGai kai...@kaigai.gr.jp writes: 2013/9/4 Tom Lane t...@sss.pgh.pa.us: And for insert/update/delete, we do what exactly? Regarding to update/delete, this patch also enhanced to allow update or delete mechanism allows to take a sub-query on top of the table scan plan. So, its explain output shows as follows: postgres= EXPLAIN (costs off) UPDATE customer SET email = 'al...@example.com'; QUERY PLAN -- Update on customer - Subquery Scan on customer - Seq Scan on customer customer_1 Filter: (current_user() = uname) You can see this update has Subquery plan instead of regular relation scan. Really? That wasn't apparent from reading the patch. (Have I mentioned it's desperately underdocumented? Aside from needing a lot more in-code comments than it's got, it would benefit from having an overview section added to optimizer/README explaining stuff at the level of this discussion.) I'm a bit surprised that setrefs.c doesn't eliminate the Subquery Scan as being a no-op, given that no quals end up getting applied at that level. You might look into why not, since if that plan node were eliminated at the end, it'd fix any efficiency complaints about this approach. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
On 4.9.2013 14:43, Pavel Stehule wrote: 2013/9/4 Atri Sharma atri.j...@gmail.com mailto:atri.j...@gmail.com Sent from my iPad On 04-Sep-2013, at 15:39, Satoshi Nagayasu sn...@uptime.jp mailto:sn...@uptime.jp wrote: (2013/09/04 15:23), Atri Sharma wrote: Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu sn...@uptime.jp mailto:sn...@uptime.jp wrote: Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu sn...@uptime.jp mailto:sn...@uptime.jp wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? For example, when you read only a single block from your table, then you need to write all values in your database statistics next. It often generates large amount of i/o operation. However, if random access is allowed in the statistics, you can update only as single record for the specific table which you read. It would be less than 100 bytes for each table. I have no idea about how a priority queue can work here so far. However, if the statistics is overhauled, PostgreSQL would be able to host a much larger number of customers Ah, now I get it. Thanks a ton for the detailed explanation. Yes, a BTree will sufficiently isolate per table stats here and allow for random access. Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though. we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1 It works quite well as long as you have the objects (tables, indexes, functions) spread across multiple databases. Once you have one database with very large number of objects, tmpfs is not as effective. It's going to help with stats I/O, but it's not going to help with high CPU usage (you're reading and parsing the stat files over and over) and every rewrite creates a copy of the file. So if you have 400MB stats, you will need 800MB tmpfs + some slack (say, 200MB). That means you'll use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't be used for page cache etc. OTOH, it's true that if you have that many objects, 600MB of RAM is not going to help you anyway. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1 It works quite well as long as you have the objects (tables, indexes, functions) spread across multiple databases. Once you have one database with very large number of objects, tmpfs is not as effective. It's going to help with stats I/O, but it's not going to help with high CPU usage (you're reading and parsing the stat files over and over) and every rewrite creates a copy of the file. So if you have 400MB stats, you will need 800MB tmpfs + some slack (say, 200MB). That means you'll use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't be used for page cache etc. OTOH, it's true that if you have that many objects, 600MB of RAM is not going to help you anyway. and just idea - can we use a database for storing these files. It can be used in unlogged tables. Second idea - hold a one bg worker as persistent memory key value database and hold data in memory with some optimizations - using anti cache and similar memory database fetures. Pavel Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Tomas Vondra wrote: My idea was to keep the per-database stats, but allow some sort of random access - updating / deleting the records in place, adding records etc. The simplest way I could think of was adding a simple index - a mapping of OID to position in the stat file. I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or something like that. This would make it quite simple to access existing record 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for delete set isdeleted flag (needs to be added to all entries) or reading all the records (just read the whole file as today). Sounds reasonable. However, I think the index should be a real index, i.e. have a tree structure that can be walked down, not just a plain array. If you have a 400 MB stat file, then you must have about 4 million tables, and you will not want to scan such a large array every time you want to find an entry. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further XLogInsert scaling tweaking
On 03.09.2013 16:22, Merlin Moncure wrote: On Mon, Sep 2, 2013 at 10:32 PM, Bruce Momjianbr...@momjian.us wrote: On Mon, Sep 2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 39c58d0..28e62ea 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert uint64 CurrBytePos; uint64 PrevBytePos; - /* insertion slots, see above for details */ - XLogInsertSlotPadded *insertSlots; + /* + * Make sure the above heavily-contended spinlock and byte positions are + * on their own cache line. In particular, the RedoRecPtr and full page + * write variables below should be on a different cache line. They are + * read on every WAL insertion, but updated rarely, and we don't want + * those reads to steal the cache line containing Curr/PrevBytePos. + */ + charpad[128]; Do we adjust for cache line lengths anywhere else? PGPROC? Should it be a global define? +1 -- that is, I think it should be. Ok, committed that way. No, we adjust for cache line lengths anywhere else. As Alvaro noted, LWLocks are padded, but that's just to keep them from crossing cache line boundaries, not to keep two lwlocks on separate cache lines. Thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote: After some thinking I don't think any solution primarily based on holding page level locks across other index operations is going to scale ok. I'd like to chime in with a large +1 for this sentiment and pretty much everything else Andres said further downthread. The operations across which you're proposing to hold buffer locks seem at least an order of magnitude too complex to get away with something like that. Concurrent readers will block in a non-interruptible wait if they try to access a buffer, and that's a situation that will be intolerable if, for example, it can persist across a disk I/O. And I don't see any way to avoid that. One possible alternative to inserting promises into the index pages themselves might be to use some kind of heavyweight lock. The way that SIREAD locks work is not entirely dissimilar to what's needed here, I think. Of course, the performance implications of checking for lots of extra locks during inserts could be pretty bad, so you'd probably need some way of avoiding that in common cases, which I don't know exactly how to do, but maybe there's a way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving avg performance for numeric
On 9/4/13 2:26 PM, Pavel Stehule wrote: 2013/9/4 Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net On 7/8/13 10:05 AM, Pavel Stehule wrote: I am testing your code, and It increase speed of sum about 24% faster then original implementation. This patch needs to be rebased (and/or the later version registered in the commit fest). I updated a commit fest info The new patch also needs to be rebased. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lcr v5 - introduction of InvalidCommandId
On 2013-09-03 11:40:57 -0400, Robert Haas wrote: 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new maximum for CommandCounterIncrement I'm still unconvinced we want this. Ok, so the reason for the existance of this patch is that currently there is no way to represent a unset CommandId. This is a problem for the following patches because we need to log the cmin, cmax of catalog rows and obviously there can be rows where cmax is unset. The reason I chose to change the definition of CommandIds is that the other ondisk types we use like TransactionIds, XLogRecPtrs and such have an invalid type, CommandIds don't. Changing their definition to have 0 - analogous to the previous examples - as their invalid value is not a problem because CommandIds from pg_upgraded clusters may never be used for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem like a problem to me. Imo the CommandIds should have been defined that way from the start. Makes some sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] De-normalization optimizer research project
I am working on a research project that looks at an entire database access layer written against a normalized database and suggests de-normalizations to be done to the database structure. In effect looking at de-normalizations as a kind of cache. I am looking for any code I can leverage to do this work. As documentation for the plan optimizer I found the document below. Is this still relatively accurate? http://www.postgresql.org/files/developer/optimizer.pdf I have looked at the postgresql optimizer code years ago. And it was not too hard to follow. I wonder if it is fairly independent from the rest of the system. Or if anyone has suggestions of using a different optimizer codebase. Here is a discussion I found here from years ago. http://postgresql.1045698.n5.nabble.com/About-method-of-PostgreSQL-s-Optimizer-td1933859.html -- View this message in context: http://postgresql.1045698.n5.nabble.com/De-normalization-optimizer-research-project-tp5769655.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
On 4.9.2013 20:59, Alvaro Herrera wrote: Tomas Vondra wrote: My idea was to keep the per-database stats, but allow some sort of random access - updating / deleting the records in place, adding records etc. The simplest way I could think of was adding a simple index - a mapping of OID to position in the stat file. I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or something like that. This would make it quite simple to access existing record 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for delete set isdeleted flag (needs to be added to all entries) or reading all the records (just read the whole file as today). Sounds reasonable. However, I think the index should be a real index, i.e. have a tree structure that can be walked down, not just a plain array. If you have a 400 MB stat file, then you must have about 4 million tables, and you will not want to scan such a large array every time you want to find an entry. I was thinking about a sorted array, so a bisection would be a simple and fast way to search. New items could be added to another small unsorted array (say, 1000 elements) and this would be extended and resorted only when this small one gets full. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
On 4.9.2013 14:13, Stephen Frost wrote: * Satoshi Nagayasu (sn...@uptime.jp) wrote: Yes, we have already split single pgstat.stat file into several pieces. However, we still need to read/write large amount of statistics data when we have a large number of tables in single database or multiple databases being accessed. Right? Would simply also splitting per tablespace help? I don't think that's a good solution. Forcing the users to use tablespaces just to minimize the stats overhead is not that far from forcing them to use multiple databases. Tablespaces have disadvantages too, so I'd suggest to look for a solution metting the just works criterion. I think the issue here is that it is necessary to write/read statistics data even it's not actually changed. So, I'm wondering how we can minimize read/write operations on these statistics data files with using heap and btree. It does sound like an interesting idea to use heap/btree instead but I wonder about the effort involved, particularly around coordinating access. We wouldn't want to end up introducing additional contention points by doing this.. Yes, this is definitely an important thing to keep in mind. Maintaining the index (no matter what kind of index will be used) will cause some contention, but I believe it will be manageable with a bit of effort. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] encouraging index-only scans
On Thu, Dec 13, 2012 at 03:31:06PM +, Peter Geoghegan wrote: On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote: ANALYZE does not set that value, and is not going to start doing so, because it doesn't scan enough of the table to derive a trustworthy value. I'm slightly surprised by your remarks here, because the commit message where the relallvisible column was added (commit a2822fb9337a21f98ac4ce850bb4145acf47ca27) says: Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. Have I missed some nuance? I am looking back at this issue now and I think you are correct. The commit you mention (Oct 7 2011) says ANALYZE updates the visibility map, and the code matches that: if (!inh) vac_update_relstats(onerel, RelationGetNumberOfBlocks(onerel), totalrows, -- visibilitymap_count(onerel), hasindex, InvalidTransactionId); so if an index scan was not being used after an ANALYZE, it isn't a bad allvisibile estimate but something else. This code was in PG 9.2. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote: Concurrent readers will block in a non-interruptible wait if they try to access a buffer, and that's a situation that will be intolerable if, for example, it can persist across a disk I/O. And I don't see any way to avoid that. Then I have some bad news for you - that's already quite possible. _bt_insertonpg() is called with the very same buffer exclusive locked, and is where we do btree page splits. The first thing that _bt_split does is this: /* Acquire a new page to split into */ rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE); (Obviously this may ultimately result in the storage manager extending the index relation). Plus the split is WAL-logged immediately afterwards, which could result in us blocking on someone else's I/O under contention (granted, the XLogInsert scaling patch has now considerably ameliorated that general problem). All the while holding an exclusive lock on the same buffer. Note also that _bt_insertonpg() is called again recursively after a page split. And of course we WAL-log btree index tuple insertion proper all the time. Let me be clear about something, though: I am not at all dismissive of Andres' concerns. I was concerned about many of the same things before I posted the patch. I think that Andres and I ought to re-frame this discussion a little bit. Right now, the presumption we seem to be making, perhaps without even realizing it, is this is about providing functionality equivalent to MySQL's INSERT IGNORE; insert tuples proposed for insertion where possible, otherwise do not. However, Andres and I, not to mention almost every Postgres user, are actually much more interested in something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. That's what this mechanism has to support, even if it is *technically* possible to commit just INSERT...ON DUPLICATE KEY IGNORE and punt on these other questions. It was silly of me not to do that up-front. So I'm going to try and produce a patch that does this as well for my next revision. Maybe this will enable Andres to refute my position that the buffer locking approach to speculative insertion/value locking may actually be acceptable. If that gets us closer to having the feature committed in some form, then I welcome it. I fully expect to be held to this new standard - it would be insane to do anything less. I don't want to throw out an old IGNORE value locking mechanism and invent a whole new one for upserting a little bit down the line. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Analysis on backend-private memory usage (and a patch)
I received a complaint that each backend consumes a lot of backend-private memory, even if it's completely idle. a lot is of course very subjective and how much memory is actually used depends heavily on the application. In this case, the database is fairly small, but they have 250 connections. 'top' output says that each backend is consuming roughly 3MB of memory (RES - SHR). That's 750 MB of backend-private memory, which is a significant chunk of total RAM. So I spent some time analyzing backend memory usage, looking for any low-hanging fruit. This isn't *that* big an issue, so I don't think we'd want to do any big rearchitecting for this. On my laptop, just starting psql, the backend uses 1632 KB of private memory. Running a simple query like select * from foo where i = 1 makes no noticeable difference, but after \d (which I'm using to represent a somewhat more complicated query), it goes up to 1960 KB. The largest consumer of that memory is the relcache and syscaches. After starting psql, without running any queries, MemoryContextStats says: CacheMemoryContext: 817840 total in 20 blocks; 134824 free (4 chunks); 683016 used plus there is one sub-memorycontext for each index in the relcache, each using about 1KB. After \d: CacheMemoryContext: 1342128 total in 21 blocks; 517472 free (1 chunks); 824656 used Another thing that can consume a lot of memory is PrivateRefCount lookup table. It's an array with one int32 for each shared buffer, ie. 512 KB for each GB of shared_buffers. See previous discussion here: http://www.postgresql.org/message-id/flat/1164624036.3778.107.ca...@silverbirch.site. That discussion didn't lead to anything, but I think there's some potential in turning PrivateRefCount into a tiny hash table or simply a linear array. Or even simpler, change it from int32 to int16, and accept that you will get an error if you try to hold more than 2^16 pins one a buffer in one backend. One fairly simple thing we could do is to teach catcache.c to resize the caches. Then we could make the initial size of all the syscaches much smaller. At the moment, we use fairly caches for catalogs like pg_enum (256 entries) and pg_usermapping (128), even though most databases don't use those features at all. If they could be resized on demand, we could easily allocate them initially with just, say, 4 entries. Attached is a patch for that. That saves about 300 KB, for a backend that does nothing. Resizing the caches on demand also has the benefit that if you have a lot more objects of some type than usual, lookups won't be bogged down by a too small cache. I haven't tried to measure that, though. - Heikki diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index cca0572..36fbc67 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -734,9 +734,8 @@ InitCatCache(int id, int i; /* - * nbuckets is the number of hash buckets to use in this catcache. - * Currently we just use a hard-wired estimate of an appropriate size for - * each cache; maybe later make them dynamically resizable? + * nbuckets is the initial number of hash buckets to use in this catcache. + * It will be enlarged later if it becomes too full. * * nbuckets must be a power of two. We check this via Assert rather than * a full runtime check because the values will be coming from constant @@ -775,7 +774,8 @@ InitCatCache(int id, * * Note: we rely on zeroing to initialize all the dlist headers correctly */ - cp = (CatCache *) palloc0(sizeof(CatCache) + nbuckets * sizeof(dlist_head)); + cp = (CatCache *) palloc0(sizeof(CatCache)); + cp-cc_bucket = palloc0(nbuckets * sizeof(dlist_head)); /* * initialize the cache's relation information for the relation @@ -814,6 +814,44 @@ InitCatCache(int id, } /* + * Enlarge a catcache, doubling the number of buckets. + */ +static void +RehashCatCache(CatCache *cp) +{ + dlist_head *newbucket; + int newnbuckets; + int i; + + elog(DEBUG1, rehashing cache with id %d for %s; %d tups, %d buckets, + cp-id, cp-cc_relname, cp-cc_ntup, cp-cc_nbuckets); + + /* Allocate a new, larger, hash table. */ + newnbuckets = cp-cc_nbuckets * 2; + newbucket = (dlist_head *) MemoryContextAllocZero(CacheMemoryContext, newnbuckets * sizeof(dlist_head)); + + /* Move all entries from old hash table to new. */ + for (i = 0; i cp-cc_nbuckets; i++) + { + while (!dlist_is_empty(cp-cc_bucket[i])) + { + dlist_node * node = dlist_pop_head_node(cp-cc_bucket[i]); + CatCTup *ct = dlist_container(CatCTup, cache_elem, node); + int hashIndex; + + hashIndex = HASH_INDEX(ct-hash_value, newnbuckets); + + dlist_push_head(newbucket[hashIndex], ct-cache_elem); + } + } + + /* Switch to the new array */ + pfree(cp-cc_bucket); + cp-cc_nbuckets = newnbuckets; + cp-cc_bucket = newbucket; +} + +/* * CatalogCacheInitializeCache * * This function does final initialization of a
Re: [HACKERS] dynamic shared memory
On 8/31/13 7:17 AM, Robert Haas wrote: On Thu, Aug 29, 2013 at 8:12 PM, Jim Nasby j...@nasby.net wrote: On 8/13/13 8:09 PM, Robert Haas wrote: is removed, the segment automatically goes away (we could allow for server-lifespan segments as well with only trivial changes, but I'm not sure whether there are compelling use cases for that). To clarify... you're talking something that would intentionally survive postmaster restart? I don't see use for that either... No, I meant something that would live as long as the postmaster and die when it dies. ISTM that at some point we'll want to look at putting top-level shared memory into this system (ie: allowing dynamic resizing of GUCs that affect shared memory size). But as you said, it'd be trivial to add that later. Other comments... + * If the state file is empty or the contents are garbled, it probably means + * that the operating system rebooted before the data written by the previous + * postmaster made it to disk. In that case, we can just ignore it; any shared + * memory from before the reboot should be gone anyway. I'm a bit concerned about this; I know it was possible in older versions for the global shared memory context to be left behind after a crash and needing to clean it up by hand. Dynamic shared mem potentially multiplies that by 100 or more. I think it'd be worth changing dsm_write_state_file so it always writes a new file and then does an atomic mv (or something similar). I agree that the possibilities for leftover shared memory segments are multiplied with this new facility, and I've done my best to address that. However, I don't agree that writing the state file in a different way would improve anything. Wouldn't it protect against a crash while writing the file? I realize the odds of that are pretty remote, but AFAIK it wouldn't cost that much to write a new file and do an atomic mv... +* If some other backend exited uncleanly, it might have corrupted the +* control segment while it was dying. In that case, we warn and ignore +* the contents of the control segment. This may end up leaving behind +* stray shared memory segments, but there's not much we can do about +* that if the metadata is gone. Similar concern... in this case, would it be possible to always write updates to an un-used slot and then atomically update a pointer? This would be more work than what I suggested above, so maybe just a TODO for now... Though... is there anything a dying backend could do that would corrupt the control segment to the point that it would screw up segments allocated by other backends and not related to the dead backend? Like marking a slot as not used when it is still in use and isn't associated with the dead backend? Sure. A messed-up backend can clobber the control segment just as it can clobber anything else in shared memory. There's really no way around that problem. If the control segment has been overwritten by a memory stomp, we can't use it to clean up. There's no way around that problem except to not the control segment, which wouldn't be better. Are we trying to protect against memory stomps when we restart after a backend dies? I thought we were just trying to ensure that all shared data structures were correct and consistent. If that's the case, then I was thinking that by using a pointer that can be updated in a CPU-atomic fashion we know we'd never end up with a corrupted entry that was in use; the partial write would be to a slot with nothing pointing at it so it could be safely reused. Like I said before though, it may not be worth worrying about this case right now. Should dsm_impl_op sanity check the arguments after op? I didn't notice checks in the type-specific code but I also didn't read all of it... are we just depending on the OS to sanity-check? Sanity-check for what? Presumably there's limits to what the arguments can be rationally set to. IIRC there's nothing down-stream that's checking them in our code, so I'm guessing we're just depending on the kernel to sanity-check. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analysis on backend-private memory usage (and a patch)
Heikki Linnakangas hlinnakan...@vmware.com writes: One fairly simple thing we could do is to teach catcache.c to resize the caches. Then we could make the initial size of all the syscaches much smaller. I think this is attractive for the *other* reason you mention, namely preserving reasonable performance when a catcache grows larger than expected; but I'm pretty skeptical of nickel-and-diming caches that are already really small. Is it really worth cutting the TSPARSER caches from 4 pointers to 2 for instance? What concerns me about initially-undersized caches is that we'll waste space and time in the enlargement process. I'd suggest trying to get some numbers about the typical size of each cache in a backend that's done a few things (not merely started up --- we should not be optimizing for the case of connections that get abandoned without running any queries). Then set the initial size to the next larger power of 2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 regression
On Wed, Sep 4, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: I'd vote for adding zeroing *after* the fallocate() first. That's what's suggested by kernel hackers and what several other large applications do. As it looks like it's what we would have to do if we ever get to use fallocate for relation extension where we would have actual benefits from it. Does that actually end up doing anything different from what we were doing pre-patch here? At best, it *might* end up using a larger extent, but unless we can actually be confident that it does, I'm not convinced the additional complexity is worth it and would rather see this simply reverted. One might ask why the kernel guys aren't doing this themselves or figuring out why it's necessary to make it worthwhile. The larger picture is that that isn't the committed behavior, but a different one, one which would need performance testing. At this point, I vote for reverting the patch and allowing it to be resubmitted for a fresh round of testing with the zeroing added. And this time we'll need to do the testing more carefully. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On 2013-09-04 15:01:57 -0700, Peter Geoghegan wrote: On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote: Concurrent readers will block in a non-interruptible wait if they try to access a buffer, and that's a situation that will be intolerable if, for example, it can persist across a disk I/O. And I don't see any way to avoid that. Then I have some bad news for you - that's already quite possible. _bt_insertonpg() is called with the very same buffer exclusive locked, and is where we do btree page splits. The first thing that _bt_split does is this: /* Acquire a new page to split into */ rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE); (Obviously this may ultimately result in the storage manager extending the index relation). I don't think that's an argument for much TBH. Those operations are way much less heavyweight than the ones you're proposing to hold the pages locked over and there actually is a forward guarantee. And it's very hard to avoid locking a page exlusively once you've decided that you need to split the page. You cannot just release the lock while you look for a victim buffer. I think that Andres and I ought to re-frame this discussion a little bit. Right now, the presumption we seem to be making, perhaps without even realizing it, is this is about providing functionality equivalent to MySQL's INSERT IGNORE; insert tuples proposed for insertion where possible, otherwise do not. However, Andres and I, not to mention almost every Postgres user, are actually much more interested in something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. Yes, the promises approach gets more advantageous if you think about UPSERT because most of the work will be paid of when the UPDATE occurs. Maybe this will enable Andres to refute my position that the buffer locking approach to speculative insertion/value locking may actually be acceptable. Sorry to be harsh here, but I don't think I need to do that. I've explained most of the reasons I see that that approach won't work out and so far I don't see those refuted. And to me those issues seem to be fatal for the approach. If you find a solution to the problems noted uppon - great. So far it seems neither Robert nor me see how that is possible, but that obviously doesn't mean it's impossible that you find a way. But why should I argue further until you proof me wrong (newer patch or explaining changed algorithms)? If you don't think my arguments are valid, well, I've brought those up I see as relevant and that's it. Can't do much further. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Wed, Sep 4, 2013 at 3:39 PM, Andres Freund and...@2ndquadrant.com wrote: Sorry to be harsh here, but I don't think I need to do that. I've explained most of the reasons I see that that approach won't work out and so far I don't see those refuted. And to me those issues seem to be fatal for the approach. If you find a solution to the problems noted uppon - great. So far it seems neither Robert nor me see how that is possible, but that obviously doesn't mean it's impossible that you find a way. It seems you've misunderstood. My position was that I don't think it's terribly useful to have a discussion about approaches to value locking without considering how that needs to fit in with row locking too. So it makes sense as an immediate goal to introduce that into the patch. Any scheme is going to be constrained by having to think about the interplay with value locking and row locking going forward. For example, in my scheme, I couldn't block on locking the row if that meant that buffer locks would be held indefinitely. There are also deadlock hazards there for either scheme that must be carefully considered. What possible objection could you have? Suppose it was the case that I was dead set on using buffer locking like this, because I'm stubborn or whatever. I've just made life harder for myself, while probably not also putting the same degree of burden on alternative proposals. Maybe I am stubborn, but I don't think I'm stubborn about the basic approach taken in this particular patch. I've merely been pointing out, as I feel is my role as a participant in the community's adversarial system of reaching agreement, the problems that exist with your proposal, and some inconsistencies in your objections to mine. Obviously the basic approach will remain the most difficult and probably controversial part of this. Even if I threw my hands up and immediately accepted everything you said, that would still be true. We need to get all of the constraints in place sooner rather than later. But why should I argue further until you proof me wrong (newer patch or explaining changed algorithms)? I didn't ask you to. You shouldn't. If you don't think my arguments are valid, well, I've brought those up I see as relevant and that's it. Can't do much further. Uh, I just said that I thought your arguments were totally valid. I couldn't have been clearer about that. Actually, I'm pretty surprised that you haven't been the one insisting that I add a row locking component from quite early on for exactly these reasons. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
Hi, We seem to be miscommunication a bit. You've proposed an framework and algorithm for something I'd really, really like to see. I don't think that it can work explicitly as you proposed, so I roughly sketched out a solution I can see. I don't want my idea to win, I want a idea to win. I haven't fleshed out my idea to the point where I would consider it something ready to implement or something. You're the patch author here whose plans are laid open to be scrutinized ;). If you think my idea has merit, use and adapt it to reality. If not, find another, better, solution. Even if our path to that goal is confrontational at times, the goal is to find a good solution, not the argument itself. I haven't argued about INSERT ... DUPLICATE LOCK because the page locking scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue about the fancier version in that case. Regards, Andres Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Wed, Sep 4, 2013 at 5:08 PM, Andres Freund and...@2ndquadrant.com wrote: I don't want my idea to win, I want a idea to win. I know. I want the same thing. You're the patch author here whose plans are laid open to be scrutinized ;). If you think my idea has merit, use and adapt it to reality. If not, find another, better, solution. Sure. Even if our path to that goal is confrontational at times, the goal is to find a good solution, not the argument itself. Agreed. I haven't argued about INSERT ... DUPLICATE LOCK because the page locking scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue about the fancier version in that case. I see where you're coming from, but my point is precisely that adding a row locking component *isn't* fancier. I've come to realize that it's an integral part of the patch, and that my previous omission of row locking - and the subsequent defence of that decision I made in passing - was ridiculous. In a world where IGNORE/not locking is a feature we support, it can only exist as an adjunct to ON DUPLICATE KEY LOCK - certainly not the other way around. The tail cannot be allowed to wag the dog. In posting the patch with a row locking component, I'll only be asking you to consider that aspect separately. You may find that seeing the problems I encounter and how I handle them will make you (or others) re-assess your thoughts on value locking in a direction that nobody expects right now. Equally, I myself may reassess things. Now, I don't guarantee that that's the case, but it certainly seems very possible. And so even if I were to concede right now that the buffer locking approach is not workable, I feel it would be a little premature to seriously get down to talking about the alternatives in detail. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On 05/09/13 08:26, Robert Haas wrote: On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote: After some thinking I don't think any solution primarily based on holding page level locks across other index operations is going to scale ok. I'd like to chime in with a large +1 for this sentiment and pretty much everything else Andres said further downthread. The operations across which you're proposing to hold buffer locks seem at least an order of magnitude too complex to get away with something like that. Concurrent readers will block in a non-interruptible wait if they try to access a buffer, and that's a situation that will be intolerable if, for example, it can persist across a disk I/O. And I don't see any way to avoid that. One possible alternative to inserting promises into the index pages themselves might be to use some kind of heavyweight lock. The way that SIREAD locks work is not entirely dissimilar to what's needed here, I think. Of course, the performance implications of checking for lots of extra locks during inserts could be pretty bad, so you'd probably need some way of avoiding that in common cases, which I don't know exactly how to do, but maybe there's a way. How about an 'Expensive bit' (of course, renamed to sound more professional and to better indicate what it does!) - if the bit is set, then do the expensive processing. This should have minimal impact for the common case, so extensive checking would only be required when lots of locks need to be checked. I strongly suspect that the situation, is way more complicated, than I imply above - but possibly, a more sophisticated version of the above might help?
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, Jan 8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote: On 01/08/2013 08:08 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... And I don't especially like the idea of trying to make it depend directly on the box's physical RAM, for the same practical reasons Robert mentioned. For the record, I don't believe those problems would be particularly hard to solve. Well, the problem of find out the box's physical RAM is doubtless solvable if we're willing to put enough sweat and tears into it, but I'm dubious that it's worth the trouble. The harder part is how to know if the box is supposed to be dedicated to the database. Bear in mind that the starting point of this debate was the idea that we're talking about an inexperienced DBA who doesn't know about any configuration knob we might provide for the purpose. I'd prefer to go with a default that's predictable and not totally foolish --- and some multiple of shared_buffers seems like it'd fit the bill. +1. That seems to be by far the biggest bang for the buck. Anything else will surely involve a lot more code for not much more benefit. I have developed the attached patch which implements an auto-tuned effective_cache_size which is 4x the size of shared buffers. I had to set effective_cache_size to its old 128MB default so the EXPLAIN regression tests would pass unchanged. I considered a new available_ram variable but that just gives us another variable, and in a way shared_buffers is a fixed amount, while effective_cache_size is an estimate, so I thought driving everything from shared_buffers made sense. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 23ebc11..de2374b *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include 'filename' *** 2758,2764 para Random access to mechanical disk storage is normally much more expensive ! than four-times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while --- 2758,2764 para Random access to mechanical disk storage is normally much more expensive ! than four times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while *** include 'filename' *** 2841,2849 listitem para Sets the planner's assumption about the effective size of the ! disk cache that is available to a single query. This is ! factored into estimates of the cost of using an index; a ! higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both productnamePostgreSQL/productname's shared buffers and the --- 2841,2857 listitem para Sets the planner's assumption about the effective size of the ! disk cache that is available to a single query. The default ! setting of -1 selects a size equal to four times the size of xref ! linkend=guc-shared-buffers, but not less than the size of one ! shared buffer page, typically literal8kB/literal. This value ! can be set manually if the automatic choice is too large or too ! small. !/para ! !para ! This value is factored into estimates of the cost of using an index; ! a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both productnamePostgreSQL/productname's shared buffers and the *** include 'filename' *** 2855,2862 memory allocated by productnamePostgreSQL/productname, nor does it reserve kernel disk cache; it is used only for estimation purposes. The system also does not assume data remains in ! the disk cache between queries. The default is 128 megabytes ! (literal128MB/). /para /listitem /varlistentry --- 2863,2872 memory allocated by productnamePostgreSQL/productname, nor does it reserve kernel disk cache; it is used only for
Re: [HACKERS] Frontend/backend protocol improvements proposal (request).
On Fri, Jun 21, 2013 at 12:37:32PM +0400, Dmitriy Igrishin wrote: 2013/6/21 Albe Laurenz laurenz.a...@wien.gv.at Dmitriy Igrishin wrote: While developing a C++ client library for Postgres I felt lack of extra information in command tags in the CommandComplete (B) message for the following commands: PREPARE; DEALLOCATE; DECLARE; CLOSE; LISTEN; UNLISTEN; SET; RESET. Namely, for example, users of my library can prepare statements by using protocol directly or via PREPARE command. Since the protocol does not supports prepared statement deallocation, I wrote a wrapper over DEALLOCATE command. The library knows about all prepared statements and invalidates them automatically when user performs deallocate() wrapper. But users can go with DEALLOCATE command directly and in these cases I need to query the database to get the list of currently prepared statements whenever CommandComplete message with DEALLOCATE command tag is consumed. Moreover, I need to do it *synchronously* and this breaks asynchronous API. I propose to include name of the object in the CommandComplete (B) message for the above commands. That would be a change in the protocol, so it's not likely to happen soon. There is a page where proposed changes to the wire protocol are collected: http://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes Well, even if this proposal moves to the TODO, it would be nice. That's worth at least considering when we start to revise the protocol, so I have added it to the TODO list. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Tue, Sep 3, 2013 at 09:32:44PM -0400, Bruce Momjian wrote: In this test, SELECT NULL (which internally would produce SELECT ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting returns false. This has made me adjust my goal and change it so SELECT ROW(NULL) IS NULL returns true, and any further nesting returns false. Attached is a patch which accomplishes this, and a documentation update. I have not heard any feedback on this patch, so I would like to apply it to give us a nested ROW/IS NULL API we can document. It would have to be marked in the release notes as a backward incompatibility. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On 07/23/2013 09:42 PM, Craig Ringer wrote: (Replying on phone, please forgive bad quoting) Isn't this pretty much what adopting ICU is supposed to give us? OS-independent collations? I'd be interested in seeing the rest data for this performance report, partly as I'd like to see how ICU collations would compare when ICU is crudely hacked into place for testing. I think of a new idea. Add a compare method column to pg_collation. Every collation has its own compare function or null. When function varstr_cmp is called, if specified collation has compare function, call it instead of strcoll(). How about this? Regards. Quan Zongliang -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown
On Wed, Sep 4, 2013 at 11:56 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-04 10:53:19 -0400, Tom Lane wrote: Tsunakawa, Takayuki tsunakawa.ta...@jp.fujitsu.com writes: I propose a tiny fix to clarify this. Please find the attached patch. That's not an accurate description of what happens, though. AFAIR, we do not throw away pg_stats files as a result of recovery. StartupXLOG() does a pgstat_reset_all() in the if (InRecovery) branch. Yes. And this causes one problem that the statistics data would be reset at the start after the clean shutdown of the standby server. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???
于 2013-09-04 19:30, Noah Misch 回复: On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn wrote: I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Noah Misch n...@leadboat.com wrote: n_header is part of the numeric field's data. That's not just pedantry: the display scale stored in n_header affects how numeric_out() formats the value. Thanks for your reply. Just because of what you said, I think increasing scale only lead to differently diaplay. There's no difference between 5.25 and 5.2500 in use. So thers's no need to rewrite the table. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
(2013/09/05 3:50), Pavel Stehule wrote: we very successfully use a tmpfs volume for pgstat files (use a backport of multiple statfiles from 9.3 to 9.1 It works quite well as long as you have the objects (tables, indexes, functions) spread across multiple databases. Once you have one database with very large number of objects, tmpfs is not as effective. It's going to help with stats I/O, but it's not going to help with high CPU usage (you're reading and parsing the stat files over and over) and every rewrite creates a copy of the file. So if you have 400MB stats, you will need 800MB tmpfs + some slack (say, 200MB). That means you'll use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't be used for page cache etc. OTOH, it's true that if you have that many objects, 600MB of RAM is not going to help you anyway. and just idea - can we use a database for storing these files. It can be used in unlogged tables. Second idea - hold a one bg worker as persistent memory key value database and hold data in memory with some optimizations - using anti cache and similar memory database fetures. Yeah, I'm interested in this idea too. If the stat collector has a dedicated connection to the backend in order to store statistics into dedicated tables, we can easily take advantages of index (btree, or hash?) and heap storage. Is this worth trying? Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
于 2013-09-05 01:56, Robert Haas 回复: On Tue, Sep 3, 2013 at 3:13 AM, wangs...@highgo.com.cn wrote: 于 2013-09-03 08:15, David Johnston 回复: Jeff Davis-8 wrote Is there any semantic difference between marking a constraint as DISABLED and simply dropping it? Or does it just make it easier to re-add it later? David Johnston wrote: I cannot answer the question but if there is none then the main concern I'd have is capturing meta-information about WHY such a constraint has been disabled instead of dropped. Drop/build and disable/enable constraint has no fundamental difference, and could achieve the same purpose.What I do also more convenient for the user. Recording the disabled constraints is easier than recoding all the constrains. What's more, a lot of people ever asked about turing off constraint and The sql2008 support this.So I think it's necessary in some ways. Please add your patch to the upcoming CommitFest so we don't forget about it. https://commitfest.postgresql.org/action/commitfest_view/open Please see also https://wiki.postgresql.org/wiki/Submitting_a_Patch -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company I had committed the patch to the Server Features (https://commitfest.postgresql.org/action/commitfest_view/open). Is this right ? If not, please give me more advice,thanks ! Wang Shuo HighGo Software Co.,Ltd. September 5, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
(2013/09/05 3:59), Alvaro Herrera wrote: Tomas Vondra wrote: My idea was to keep the per-database stats, but allow some sort of random access - updating / deleting the records in place, adding records etc. The simplest way I could think of was adding a simple index - a mapping of OID to position in the stat file. I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or something like that. This would make it quite simple to access existing record 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for delete set isdeleted flag (needs to be added to all entries) or reading all the records (just read the whole file as today). Sounds reasonable. However, I think the index should be a real index, i.e. have a tree structure that can be walked down, not just a plain array. If you have a 400 MB stat file, then you must have about 4 million tables, and you will not want to scan such a large array every time you want to find an entry. I thought an array structure at first. But, for now, I think we should have a real index for the statistics data because we already have several index storages, and it will allow us to minimize read/write operations. BTW, what kind of index would be preferred for this purpose? btree or hash? If we use btree, do we need range scan thing on the statistics tables? I have no idea so far. Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Sent from my iPad On 05-Sep-2013, at 8:58, Satoshi Nagayasu sn...@uptime.jp wrote: (2013/09/05 3:59), Alvaro Herrera wrote: Tomas Vondra wrote: My idea was to keep the per-database stats, but allow some sort of random access - updating / deleting the records in place, adding records etc. The simplest way I could think of was adding a simple index - a mapping of OID to position in the stat file. I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or something like that. This would make it quite simple to access existing record 1: get position from the index 2: read sizeof(Entry) from the file 3: if it's update, just overwrite the bytes, for delete set isdeleted flag (needs to be added to all entries) or reading all the records (just read the whole file as today). Sounds reasonable. However, I think the index should be a real index, i.e. have a tree structure that can be walked down, not just a plain array. If you have a 400 MB stat file, then you must have about 4 million tables, and you will not want to scan such a large array every time you want to find an entry. I thought an array structure at first. But, for now, I think we should have a real index for the statistics data because we already have several index storages, and it will allow us to minimize read/write operations. BTW, what kind of index would be preferred for this purpose? btree or hash? If we use btree, do we need range scan thing on the statistics tables? I have no idea so far. The thing I am interested in is range scan. That is the reason I wish to explore range tree usage here, maybe as a secondary index. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Thu, Sep 5, 2013 at 1:55 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-04 09:23:20 -0700, Jeff Davis wrote: On Wed, 2013-09-04 at 11:32 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I think code complexity matters quite a lot. If we can eliminate some complex code in a complex area, and all we give up is a feature with essentially no use case, that sounds like we're moving in the right direction to me. Isn't this whole discussion academic in view of Andres' point? Maybe complex code was an overstatement. We'd be able to eliminate the XLOG_FPW_CHANGE, UpdateFullPageWrites(), and one of the members of XLogCtlInsert; and make xlog.c slightly shorter in the process. That path is also executed during a normal restart and during promotion. Check the invocation of UpdateFullPageWrites() in StartupXLOG(). Note that a standby needs to be able to follow a primaries full_page_writes setting during a promotion. Yes, this is required for the backup from the standby. If we make the GUC contect to PGC_POSTMASTER, I think that we can remove XLOG_FPW_CHANGE and treat full_page_writes the same way as wal_level, max_connections, i.e., the parameter which CheckRequiredParameterValues() handles. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
于 2013-09-04 23:41, Jeff Janes 回复: On Tue, Sep 3, 2013 at 9:08 PM, wangs...@highgo.com.cn wrote: Hi, Hackers! I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Anyone has more idea about this, please come to talk about this! Jeff Janes jeff.ja...@gmail.com wrote: This was fixed in version 9.2. You must be using an older version. Cheers, Jeff Thanks for your reply. To declare a column of type numeric use the syntax: NUMERIC(precision, scale). What I said is this scale,not yours. I made a test on PG9.2, as fellow: postgres=# select version(); version -- PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8), 64-bit (1 row) postgres=# create table tt(t1 numeric(10,5)); CREATE TABLE postgres=# insert into tt values (2.56); INSERT 0 1 postgres=# select pg_relation_filepath('tt'); pg_relation_filepath -- base/12914/16384 (1 row) postgres=# alter table tt alter COLUMN t1 type numeric(10,6); ALTER TABLE postgres=# select pg_relation_filepath('tt'); pg_relation_filepath -- base/12914/16387 (1 row) So thers's no need to rewrite the table. Wang Shuo HighGo Software Co.,Ltd. September 5, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Satoshi Nagayasu wrote: But, for now, I think we should have a real index for the statistics data because we already have several index storages, and it will allow us to minimize read/write operations. BTW, what kind of index would be preferred for this purpose? btree or hash? I find it hard to get excited about using the AM interface for this purpose. To me it makes a lot more sense to have separate, much simpler code. We don't need any transactionality, user defined types, user defined operators, or anything like that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers