Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Tue, Oct 29, 2013 at 9:31 PM, Tom Lane wrote: > Sergey Konoplev writes: >> On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen >> wrote: >>> This is a slightly reworked version of the patch submitted by Richard >>> Poole last month, which was based on Christian Kruse's earlier patch. > >> Is it possible that this patch will be included in a minor version of >> 9.3? IMHO hugepages is a very important ability that postgres lost in >> 9.3, and it would be great to have it back ASAP. > > Say what? There's never been any hugepages support in Postgres. There were an ability to back shared memory with hugepages when using <=9.2. I use it on ~30 servers for several years and it brings 8-17% of performance depending on the memory size. Here you will find several paragraphs of the description about how to do it https://github.com/grayhemp/pgcookbook/blob/master/database_server_configuration.md. Just search for the 'hugepages' word on the page. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something fishy happening on frogmouth
On Wed, Oct 30, 2013 at 12:42 AM, Tom Lane wrote: > The last two buildfarm runs on frogmouth have failed in initdb, > like this: > > creating directory > d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 100 > selecting default shared_buffers ... 128MB > selecting dynamic shared memory implementation ... windows > creating configuration files ... ok > creating template1 database in > d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... > FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": > Not enough space > child process exited with exit code 1 In windows implementation of dynamic shared memory, Size calculation for creating dynamic shared memory is assuming that requested size for creation of dynamic shared memory segment is uint64, which is changed by commit d2aecae, so we need to change that calculation as well. Please find the attached patch to fix this problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com bug_size_calc_dsm_win.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
At 2013-10-24 19:00:28 +0200, and...@2ndquadrant.com wrote: > > I think we should log when we tried to use hugepages but fell back to > plain mmap, currently it's hard to see whether they are used. Good idea, thanks. I'll do this in the next patch I post (which will be after we reach some consensus about how to handle the rounding problem). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
At 2013-10-24 16:06:19 +0300, hlinnakan...@vmware.com wrote: > > Let's get rid of the rounding. I share Andres's concern that the bug is present in various recent kernels that are going to stick around for quite some time. Given the rather significant performance gain, I think it's worth doing something, though I'm not a big fan of the directory-scanning code myself. As a compromise, perhaps we can unconditionally round the size up to be a multiple of 2MB? That way, we can use huge pages more often, but also avoid putting in a lot of code and effort into the workaround and waste only a little space (if any at all). > Other comments: > > * guc.c doesn't actually need sys/mman.h for anything. Getting rid > of the #include also lets you remove the configure test. You're right, guc.c doesn't use it any more; I've removed the #include. sysv_shmem.c does use it (MAP_*, PROT_*), however, so I've left the test in configure alone. I see that sys/mman.h is included elsewhere with an #ifdef WIN32 or HAVE_SHM_OPEN guard, but HAVE_SYS_MMAN_H seems better. > * the documentation should perhaps mention that the setting only has > an effect if POSIX shared memory is used. As Robert said, this is not correct, so I haven't changed anything. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Sergey Konoplev writes: > On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen > wrote: >> This is a slightly reworked version of the patch submitted by Richard >> Poole last month, which was based on Christian Kruse's earlier patch. > Is it possible that this patch will be included in a minor version of > 9.3? IMHO hugepages is a very important ability that postgres lost in > 9.3, and it would be great to have it back ASAP. Say what? There's never been any hugepages support in Postgres. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On Wed, Oct 23, 2013 at 11:03 PM, Abhijit Menon-Sen wrote: > This is a slightly reworked version of the patch submitted by Richard > Poole last month, which was based on Christian Kruse's earlier patch. Is it possible that this patch will be included in a minor version of 9.3? IMHO hugepages is a very important ability that postgres lost in 9.3, and it would be great to have it back ASAP. Thank you. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hello, > > With this index, you will get a different plan like this, > > > Exactly my point, can we look at making windowing functions > smart and make use of available indexes? I might have guessed.. > > Does this satisfies your needs? > > > Not exactly. If I have missed to mention, this is not a > production issue for me. I am trying to see if PostgreSQL > planner produces best plans for Data Warehouse and mining > oriented queries. I agree to the point. > I think Hashes can be efficiently used for sorting (and I > believe they are used for joins too when a pre-sorted data set > is not available via indexes). This again could my > misinterpretation. It is true if 'Sorting' means 'key classification without orderings'. Hashes should always appear at inner side of a join, I'm convinced. The "ordered' nature is not required for the case if outer side is already ordered. If not, separate sorting will needed. > I lost you somewhere here. My be this is above my pay-grade :-) Sorry for my crumsy english :-< > Well, at least with Oracle and DB2 planners I have seen that > the plan produced with dense_rank performs better than a series > of nested SELECT MAX(). I see your point. Although I don't know what plans they generates, and I don't see how to ordering and ranking without sorting. Could you let me see what they look like? # Nevertheless, I don't have the confidence that I can be of some # help.. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
On 10/30/2013 11:25 AM, Kohei KaiGai wrote: > 2013/10/30 Craig Ringer : >> On 10/30/2013 10:50 AM, Tom Lane wrote: >>> Craig Ringer writes: > I'd kind of like to see FK constraints affected by RLS for > non-superusers, at least as an option. >>> I think that's a complete nonstarter. Aside from the fact that such a >>> constraint will have no definable semantics, even the possibility that a >>> constraint doesn't mean what it appears to mean will prevent us from >>> making use of FK constraints for optimization --- something that's >>> pretty high on the planner to-do list. >> >> Good point. That's another good argument for FK constraints to disregard >> RLS. In which case, is there actually any way to determine when an SPI >> query is being invoked directly from an FK constraint? We'll need a way >> to tell so RLS can skip adding the row-security check predicate. >> > For your reference, my implementation patches on ri_PerformCheck() > as follows. It didn't skip all the case (only when PK is modified), however, > its overall idea can be common. That makes plenty of sense. The only concern that comes immediately to mind for me there is what happens when the RI trigger, running with SECURITY_ROW_LEVEL_DISABLED context, does a cascade UPDATE or DELETE that results in the invocation of user-defined triggers. Otherwise an RLS-constrained user who owns a table could add a trigger to that table that, when executed via cascade from an RI check, leaks information about other tables it queries while RLS is disabled. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
2013/10/30 Craig Ringer : > On 10/30/2013 10:50 AM, Tom Lane wrote: >> Craig Ringer writes: >>> > I'd kind of like to see FK constraints affected by RLS for >>> > non-superusers, at least as an option. >> I think that's a complete nonstarter. Aside from the fact that such a >> constraint will have no definable semantics, even the possibility that a >> constraint doesn't mean what it appears to mean will prevent us from >> making use of FK constraints for optimization --- something that's >> pretty high on the planner to-do list. > > Good point. That's another good argument for FK constraints to disregard > RLS. In which case, is there actually any way to determine when an SPI > query is being invoked directly from an FK constraint? We'll need a way > to tell so RLS can skip adding the row-security check predicate. > For your reference, my implementation patches on ri_PerformCheck() as follows. It didn't skip all the case (only when PK is modified), however, its overall idea can be common. --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -3008,6 +3008,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo, int spi_result; Oid save_userid; int save_sec_context; + int temp_sec_context; Datum vals[RI_MAX_NUMKEYS * 2]; charnulls[RI_MAX_NUMKEYS * 2]; @@ -3087,8 +3088,18 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo, /* Switch to proper UID to perform check as */ GetUserIdAndSecContext(&save_userid, &save_sec_context); + + /* +* Row-level security should be disabled in case when foreign-key +* relation is queried to check existence of tuples that references +* the primary-key being modified. +*/ + temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE; + if (source_is_pk) + temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED; + SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner, - save_sec_context | SECURITY_LOCAL_USERID_CHANGE); + temp_sec_context); /* Finally we can run the query. */ spi_result = SPI_execute_snapshot(qplan, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Hello, > Please add your patches to the currently-open CommitFest so that we > don't lose track of them. > > https://commitfest.postgresql.org/action/commitfest_view/open > > I'm not sure which approach to this problem is best, but I agree that > it is worth solving. Thank you, I've regsitered this on CF3. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
On 10/30/2013 10:50 AM, Tom Lane wrote: > Craig Ringer writes: >> > I'd kind of like to see FK constraints affected by RLS for >> > non-superusers, at least as an option. > I think that's a complete nonstarter. Aside from the fact that such a > constraint will have no definable semantics, even the possibility that a > constraint doesn't mean what it appears to mean will prevent us from > making use of FK constraints for optimization --- something that's > pretty high on the planner to-do list. Good point. That's another good argument for FK constraints to disregard RLS. In which case, is there actually any way to determine when an SPI query is being invoked directly from an FK constraint? We'll need a way to tell so RLS can skip adding the row-security check predicate. Users who want FK-constraint-like behaviour can DIY with triggers, getting whatever behaviour they need in the face of RLS. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
Craig Ringer writes: > I'd kind of like to see FK constraints affected by RLS for > non-superusers, at least as an option. I think that's a complete nonstarter. Aside from the fact that such a constraint will have no definable semantics, even the possibility that a constraint doesn't mean what it appears to mean will prevent us from making use of FK constraints for optimization --- something that's pretty high on the planner to-do list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
On 10/29/2013 11:21 PM, Kohei KaiGai wrote: > My vote is, system should keep referencial integrity as if RLS policy is > not configured. It is more fundamental stuff than RLS policy per user > basis. > I agree, and right now that is not how it works, causing some pretty confusing behaviour. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
On 10/29/2013 10:01 PM, Tom Lane wrote: > As I recall, I've been saying since day one that row-level security cannot > sensibly coexist with foreign-key constraints, and I've been told that the > potential users of such a feature don't care. I'm glad to see somebody > else complaining. I'm concerned, rather than complaining. It seems other DBMS vendors just say "FK constraints are exempt from RLS". In the absence of a more consistent way to do it this might be the best option. My concern is that right now the superuser is still affected by RLS because triggers on tables owned by non-superusers run with the rights (and therefore RLS visibility) of those users even in response to operations invoked by the superuser. If the superuser can see a row, but when they insert a row into another table that references it they get an error, that seems just plain wrong to me. I'd kind of like to see FK constraints affected by RLS for non-superusers, at least as an option. It'd be really handy when you do have consistent RLS visibility rules across a set of tables. Problems only arise when the RLS visibility rules _differ_ between referrer and referee. > Here's another example wherein there basically isn't a sensible solution: > suppose you have delete rights on table A, and there is a table B > with a foreign-key reference to A, and RLS says that there are rows in > B that you can't see. You try to delete some row in A that is referenced > by an invisible-to-you row in B. There are only two possible outcomes: > the system refuses your request, and thereby exposes to you the fact that > a referencing row exists; or the system allows the FK constraint to be > violated. Yep, that's the flip-side of the ON DELETE CASCADE. > As far as the points you're making go, I think we must say that RLS checks > are not applied during FK trigger queries, ie the FK triggers can always > see everything even though they don't run as superuser. I think that's the sane way to go for now. If we can come up with a way of making FK constraints or some RLS-aware variant of them work, I tend to think that's a separate job to implementing the core of RLS. > Otherwise you're > going to end up with constraint violations, and as a database weenie > I consider that unacceptable. Yeah, and we can't re-check FKs as every combination of user for every FK reference whenever any FK or any RLS rule changes, not sanely anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something fishy happening on frogmouth
On 10/29/2013 03:47 PM, Andrew Dunstan wrote: On 10/29/2013 03:12 PM, Tom Lane wrote: It may not be unrelated that this machine was happy before commit d2aecae went in. I'll try a run with that reverted just to see if that's it. This is a 32 bit compiler on a 32 bit (virtual) machine, so the change to Size is definitely more than cosmetic here. And with this reverted it's perfectly happy. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
> Hmm, you realise Alvaro is working on MinMax indexes in this release? > They are very efficient with regard to index inserts and specially > designed for use on large tables. > > Prior work by Heikki on Grouped Item Tuples was a way of reducing the > size of indexes, yet still allowing uniqueness checks. That is > implemented in SQLServer already and is very useful. Ah! Didn't know that! > Your comment about the lack of development in indexes seems counter to > the literature that I've seen. The main problem is people keep > patenting things, making it fairly difficult for everyone. Mmh, maybe I wasn't clear: I meant lack of development (maybe I should have said "implementation"?) in postgresql and in the other "sql databases" of the fast-insertion indexes you can find in literature. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something fishy happening on frogmouth
On 10/29/2013 03:12 PM, Tom Lane wrote: The last two buildfarm runs on frogmouth have failed in initdb, like this: creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... windows creating configuration files ... ok creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space child process exited with exit code 1 It shouldn't be failing like that, considering that we just finished probing for acceptable max_connections and shared_buffers without hitting any apparent limit. I suppose it's possible that the final shm segment size is a bit larger than what was tested at the shared_buffer step, but that doesn't seem very likely to be the explanation. What seems considerably more probable is that the probe for a shared memory implementation is screwing up the system state somehow. It may not be unrelated that this machine was happy before commit d2aecae went in. I'll try a run with that reverted just to see if that's it. This is a 32 bit compiler on a 32 bit (virtual) machine, so the change to Size is definitely more than cosmetic here. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 29 October 2013 07:53, Leonardo Francalanci wrote: > I don't see much interest in insert-efficient indexes. Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a way of reducing the size of indexes, yet still allowing uniqueness checks. That is implemented in SQLServer already and is very useful. Your comment about the lack of development in indexes seems counter to the literature that I've seen. The main problem is people keep patenting things, making it fairly difficult for everyone. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Something fishy happening on frogmouth
The last two buildfarm runs on frogmouth have failed in initdb, like this: creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... windows creating configuration files ... ok creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space child process exited with exit code 1 It shouldn't be failing like that, considering that we just finished probing for acceptable max_connections and shared_buffers without hitting any apparent limit. I suppose it's possible that the final shm segment size is a bit larger than what was tested at the shared_buffer step, but that doesn't seem very likely to be the explanation. What seems considerably more probable is that the probe for a shared memory implementation is screwing up the system state somehow. It may not be unrelated that this machine was happy before commit d2aecae went in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Jeff Janes writes: > Robert removed the lmgr lock on the meta page by using a retry loop with > lightweight locks. I've outlined how to remove the heavyweight lock on the > bucket page as well, but it would require an on-disk change to the index so > that each page knows how far the bucket it is in has been split, and it > also might abuse the intention of lightweight locks a bit. FWIW, I don't think that on-disk changes to hash indexes would be a showstopper problem at this point. We could force people to reindex them by means of changing the index version number on the metapage. The reindex downtime would be annoying for production situations --- but given the lack of WAL support, who'd be using one in production anyway? > But I'm > reluctant to put much time into that without there being any prospects of > solving the problem of how to WAL bucket splits when buckets can have an > unbounded number of overflow pages. Agreed, if we don't see how to implement WAL logging then it's improbable they'll ever get to production quality :-(. ISTM the issue here is that we'd need to acknowledge incompletely-split buckets as a valid state, no? But that could be a good thing anyway, if it'd mean that we don't have to completely lock a bucket while splitting it. All the other index types have comparable situations where a maintenance operation might be only partly done. Not that I'm volunteering to put any time into this myself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 8:16 AM, Tom Lane wrote: > Leonardo Francalanci writes: > >> Before getting too excited about some new academic index type, it's > worth > >> noting the sad state in which hash indexes have languished for years. > > > Aren't hash indexes in a poor state because they are not faster than > btree in every condition? > > They should, in theory, be faster than btrees -- O(1) not O(log N) page > fetches per lookup. However, all but one or two of those page fetches are almost surely cached, so if the problem is IO then the benefits are not likely to be seen. > In practice they don't seem to be faster, and > nobody's bothered to find out exactly why. We know why, more or less. Hash indexes use lmgr locks to protect against bucket splits conflicting with ordinary operations, and that destroys performance even in isolation, and destroys it even more in concurrent situations. Robert removed the lmgr lock on the meta page by using a retry loop with lightweight locks. I've outlined how to remove the heavyweight lock on the bucket page as well, but it would require an on-disk change to the index so that each page knows how far the bucket it is in has been split, and it also might abuse the intention of lightweight locks a bit. But I'm reluctant to put much time into that without there being any prospects of solving the problem of how to WAL bucket splits when buckets can have an unbounded number of overflow pages. (Once each page knows its own split level, we could also remove the need for even a light-weight lock on the metapage for most operations by stuffing some of the key info from that into the relcache.) Cheers, Jeff
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 29, 2013 at 11:43 AM, Andres Freund wrote: >> I think modifying GetNewRelFileNode() is attacking the problem from >> the wrong end. The point is that when a table is dropped, that fact >> can be communicated to the same machine machinery that's been tracking >> the CTID->CTID mappings. Instead of saying "hey, the tuples that were >> in relfilenode 12345 are now in relfilenode 67890 in these new >> positions", it can say "hey, the tuples that were in relfilenode 12345 >> are now GONE". > > Unfortunately I don't understand what you're suggesting. What I am > worried about is something like: > > <- decoding is here > VACUUM FULL pg_class; -- rewrites filenode 1 to 2 > VACUUM FULL pg_class; -- rewrites filenode 2 to 3 > VACUUM FULL pg_class; -- rewrites filenode 3 to 1 > <- now decode up to here > > In this case there are two possible (cmin,cmax) values for a specific > tuple. One from the original filenode 1 and one for the one generated > from 3. > Now that will only happen if there's an oid wraparound which hopefully > shouldn't happen very often, but I'd like to not rely on that. Ah, OK. I didn't properly understand the scenario you were concerned about. There's only a potential problem here if we get behind by more than 4 billion relfilenodes, which seems remote, but maybe not: http://www.pgcon.org/2013/schedule/events/595.en.html This still seems to me to be basically an accounting problem. At any given time, we should *know* where the catalog tuples are located. We can't be decoding changes that require a given system catalog while that system catalog is locked, so any given decoding operation happens either before or after, not during, the rewrite of the corresponding catalog. As long as that VACUUM FULL operation is responsible for updating the logical decoding metadata, we should be fine. Any relcache entries referencing the old relfilenode need to be invalidated, and any CTID->[cmin,cmax] maps we're storing for those old relfilenodes need to be invalidated, too. >> >> Completely aside from this issue, what >> >> keeps a relation from being dropped before we've decoded all of the >> >> changes made to its data before the point at which it was dropped? (I >> >> hope the answer isn't "nothing".) >> > >> > Nothing. But there's no need to prevent it, it'll still be in the >> > catalog and we don't ever access a non-catalog relation's data during >> > decoding. >> >> Oh, right. But what about a drop of a user-catalog table? > > Currently nothing prevents that. I am not sure it's worth worrying about > it, do you think we should? Maybe. Depends partly on how ugly things get if it happens, I suppose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Tue, Oct 29, 2013 at 11:37 AM, Andres Freund wrote: > On 2013-10-29 11:29:24 -0400, Robert Haas wrote: >> On Tue, Oct 29, 2013 at 10:32 AM, Andres Freund >> wrote: >> > On 2013-10-25 09:26:29 -0400, Robert Haas wrote: >> >> > In any case, it's very far from obvious to me that CLUSTER ought >> >> > to throw away information by default, which is what you're proposing. >> >> >> >> I find it odd to referring to this as throwing away information. I >> >> know that you have a general concern about throwing away XIDs that are >> >> still needed for forensic purposes, but that is clearly the ONLY >> >> purpose that those XIDs serve, and the I/O advantages of freezing by >> >> default could be massive for many of our users. What's going to >> >> happen in practice is that experienced users will simply recommend >> >> CLUSTER FREEZE rather than plain CLUSTER, and you won't have the >> >> forensic information *anyway*. >> > >> > I think we should just apply your "preserve forensic information when >> > freezing" patch. Then we're good to go without big arguments ;) >> >> Well, I'm happy with that, too. But you wanted it significantly >> reworked and I haven't had time to do that. > > I did? I only seem to remember suggesting to introduce > HeapTupleHeaderGetRawXmin() and some bugfix around rewriteheap.c? I > think the RawXmin() thing is a judgement call... Well every place that currently gets the xmin will have to be changed to get the raw-xmin instead, with the exception of hunks like this: - targetxmin = HeapTupleHeaderGetXmin(tuple->t_data); + if (HeapTupleHeaderXminFrozen(tuple->t_data)) + targetxmin = FrozenTransactionId; + else + targetxmin = HeapTupleHeaderGetXmin(tuple->t_data); ...which will instead need to be reverted. The rename is mostly mechanical, but going through and looking for places where the difference between Xmin() and RawXmin() means that other hunks can be reverted is less so. I suppose it wouldn't take more than a few hours; I've just been up to my ears in parallelism stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
> I bet you've mis-diagnosed the problem. Btrees don't have a problem > keeping up with 50m records; you're problem is that after a certain > point your page cache can't keep up with the pseudo-random i/o > patterns and you start seeing faults to storage. > [...] This has nothing to do the btree algorithm except to the > extent it affects i/o patterns. Of course; that's why those "different" index types aim to use more sequential than random writes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci wrote: > I don't see much interest in insert-efficient indexes. Presumably someone will get around to implementing a btree index insertion buffer one day. I think that would be a particularly compelling optimization for us, because we could avoid ever inserting index tuples that are already dead when the deferred insertion actually occurs. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 10:49 AM, Leonardo Francalanci wrote: >> Another point to add: I don't really see btree as a barrier to >> performance for most of the problems I face. The real barriers to >> database performance are storage, contention, and query planning. > > Ehm that's true for regular OLTP stuff, which I understand is what most > (95%?) of people use/need. But if you try to insert rows into a 50M table > with a couple of indexes, btrees just can't keep up. > Of course, you can't have it all: fast at big table insertion, good > contention, good query times... > >> Postgres btreee indexes are pretty fast and for stuff like bulk >> insertions there are some optimization techniques available (such as >> sharding or create index concurrently). > > > At the moment I'm relying on partitioning + creating indexes in bulk on > "latest" table (the partitioning is based on time). But that means K*log(N) > search times (where K is the number of partitions). > That's why I gave a look at these different indexing mechanisms. I bet you've mis-diagnosed the problem. Btrees don't have a problem keeping up with 50m records; you're problem is that after a certain point your page cache can't keep up with the pseudo-random i/o patterns and you start seeing faults to storage. Disk storage is several order of magnitude slower than memory and thus performance collapses. This has nothing to do the btree algorithm except to the extent it affects i/o patterns. With the advances in storage over the last several years such that commodity priced SSD is available I think that all lot of assumptions under these trade-offs will change. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 1:10 PM, Peter Geoghegan wrote: > On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci > wrote: > > I don't see much interest in insert-efficient indexes. > > Presumably someone will get around to implementing a btree index > insertion buffer one day. I think that would be a particularly > compelling optimization for us, because we could avoid ever inserting > index tuples that are already dead when the deferred insertion > actually occurs. Well, that should be relatively easy the way web mining does it (with inverted indexes). Have a small (presumably RAM-fitting) staging index where inserts take place, and regularly dump it into the "master index", the rationale being that by the time you dump it, it'll be more efficient to do many inserts at once for one, and there will be lots of dead tuples you don't even have to consider for two. And when I say relatively easy, I mean it in the sense that it only needs careful juggling of existing data structures.
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
Tom Lane-2 wrote > Craig Ringer < > craig@ > > writes: >> During my testing of Kohei KaiGai's row-security patches I've been >> looking into how foreign keys should be and are handled. There are some >> interesting wrinkles around FK cascades, the rights under which FK >> checks execute, and about the consistency effects of changing or >> applying an RLS policy. > > As I recall, I've been saying since day one that row-level security cannot > sensibly coexist with foreign-key constraints, and I've been told that the > potential users of such a feature don't care. I'm glad to see somebody > else complaining. > > As far as the points you're making go, I think we must say that RLS checks > are not applied during FK trigger queries, ie the FK triggers can always > see everything even though they don't run as superuser. Is there some way to enforce that the PK and FK hosting tables have compatible RLS definitions? The examples that come to mind are: 1) both tables have RLS filters on at least one of the FK relationship columns so in a multi-tenant situation a given user is likely (hard to enforce perfectly) to be restricted to at least checking only the subset of rows in the PK belong to their tenant. 2) the PK table has no filter AND the FK table does not have an RLS filter on any of the columns being used in the FK. This covers shared lookup tables. I see no serious problem with DELETE FK-triggers but the ability to PK probe by inserting into a FK table does seem to need limitation. Of course the normal direct insert RLS checks will help (and maybe totally) to cover #1 above. The other question is whether such a hidden relationship constitutes a mis-configuration of RLS. This goes back to compatibility - is there some algorithm that can be applied to FK constraints and the associated tables that can measure compatibility and generate warnings when a constraint or RLS definition is added or changed on those tables? An error is probably to severe; especially at first. Lacking a use-case for when two incompatible tables need to have a FK-PK relationship I'm more inclined to force the application of RLS across the relationship constraint and consider these trigger errors to be symptoms of a mis-configuration of the RLS policy that need to be fixed by the DBA. In the presence of a mis-configured policy the ability to provide security guarantees is shot and the examples so far all prove that. Table "B" should have the PK record visible for corresponding visible FK records on table "A" otherwise there would have been no way to insert the table "A" initially which means there was a time when an (invalid) constraint was added that broke the relationship and at that point an error should have been raised. Hopefully this all sparks some thoughts from others much more familiar with RLS than I. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-should-row-security-affects-ON-UPDATE-RESTRICT-CASCADE-tp5776229p5776273.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
> They should, in theory, be faster than btrees -- O(1) not O(log N) page > fetches per lookup. In practice they don't seem to be faster, and > nobody's bothered to find out exactly why. Again, this isn't a terribly > encouraging precedent for implementing some other index type that's > supposed to (sometimes) be faster than btrees. Yes, I understand. Which is also why I was curious to know if the "claims" those papers (and the databases using them) make were real... Thank you everybody for your replies. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
> Another point to add: I don't really see btree as a barrier to > performance for most of the problems I face. The real barriers to > database performance are storage, contention, and query planning. Ehm that's true for regular OLTP stuff, which I understand is what most (95%?) of people use/need. But if you try to insert rows into a 50M table with a couple of indexes, btrees just can't keep up. Of course, you can't have it all: fast at big table insertion, good contention, good query times... > Postgres btreee indexes are pretty fast and for stuff like bulk > insertions there are some optimization techniques available (such as > sharding or create index concurrently). At the moment I'm relying on partitioning + creating indexes in bulk on "latest" table (the partitioning is based on time). But that means K*log(N) search times (where K is the number of partitions). That's why I gave a look at these different indexing mechanisms. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-29 11:28:44 -0400, Robert Haas wrote: > On Tue, Oct 29, 2013 at 10:47 AM, Andres Freund > wrote: > > On 2013-10-28 11:54:31 -0400, Robert Haas wrote: > >> > There's one snag I currently can see, namely that we actually need to > >> > prevent that a formerly dropped relfilenode is getting reused. Not > >> > entirely sure what the best way for that is. > >> > >> I'm not sure in detail, but it seems to me that this all part of the > >> same picture. If you're tracking changed relfilenodes, you'd better > >> track dropped ones as well. > > > > What I am thinking about is the way GetNewRelFileNode() checks for > > preexisting relfilenodes. It uses SnapshotDirty to scan for existing > > relfilenodes for a newly created oid. Which means already dropped > > relations could be reused. > > I guess it could be as simple as using SatisfiesAny (or even better a > > wrapper around SatisfiesVacuum that knows about recently dead tuples). > > I think modifying GetNewRelFileNode() is attacking the problem from > the wrong end. The point is that when a table is dropped, that fact > can be communicated to the same machine machinery that's been tracking > the CTID->CTID mappings. Instead of saying "hey, the tuples that were > in relfilenode 12345 are now in relfilenode 67890 in these new > positions", it can say "hey, the tuples that were in relfilenode 12345 > are now GONE". Unfortunately I don't understand what you're suggesting. What I am worried about is something like: <- decoding is here VACUUM FULL pg_class; -- rewrites filenode 1 to 2 VACUUM FULL pg_class; -- rewrites filenode 2 to 3 VACUUM FULL pg_class; -- rewrites filenode 3 to 1 <- now decode up to here In this case there are two possible (cmin,cmax) values for a specific tuple. One from the original filenode 1 and one for the one generated from 3. Now that will only happen if there's an oid wraparound which hopefully shouldn't happen very often, but I'd like to not rely on that. > >> Completely aside from this issue, what > >> keeps a relation from being dropped before we've decoded all of the > >> changes made to its data before the point at which it was dropped? (I > >> hope the answer isn't "nothing".) > > > > Nothing. But there's no need to prevent it, it'll still be in the > > catalog and we don't ever access a non-catalog relation's data during > > decoding. > > Oh, right. But what about a drop of a user-catalog table? Currently nothing prevents that. I am not sure it's worth worrying about it, do you think we should? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 2013-10-29 11:29:24 -0400, Robert Haas wrote: > On Tue, Oct 29, 2013 at 10:32 AM, Andres Freund > wrote: > > On 2013-10-25 09:26:29 -0400, Robert Haas wrote: > >> > In any case, it's very far from obvious to me that CLUSTER ought > >> > to throw away information by default, which is what you're proposing. > >> > >> I find it odd to referring to this as throwing away information. I > >> know that you have a general concern about throwing away XIDs that are > >> still needed for forensic purposes, but that is clearly the ONLY > >> purpose that those XIDs serve, and the I/O advantages of freezing by > >> default could be massive for many of our users. What's going to > >> happen in practice is that experienced users will simply recommend > >> CLUSTER FREEZE rather than plain CLUSTER, and you won't have the > >> forensic information *anyway*. > > > > I think we should just apply your "preserve forensic information when > > freezing" patch. Then we're good to go without big arguments ;) > > Well, I'm happy with that, too. But you wanted it significantly > reworked and I haven't had time to do that. I did? I only seem to remember suggesting to introduce HeapTupleHeaderGetRawXmin() and some bugfix around rewriteheap.c? I think the RawXmin() thing is a judgement call... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Tue, Oct 29, 2013 at 10:32 AM, Andres Freund wrote: > On 2013-10-25 09:26:29 -0400, Robert Haas wrote: >> > In any case, it's very far from obvious to me that CLUSTER ought >> > to throw away information by default, which is what you're proposing. >> >> I find it odd to referring to this as throwing away information. I >> know that you have a general concern about throwing away XIDs that are >> still needed for forensic purposes, but that is clearly the ONLY >> purpose that those XIDs serve, and the I/O advantages of freezing by >> default could be massive for many of our users. What's going to >> happen in practice is that experienced users will simply recommend >> CLUSTER FREEZE rather than plain CLUSTER, and you won't have the >> forensic information *anyway*. > > I think we should just apply your "preserve forensic information when > freezing" patch. Then we're good to go without big arguments ;) Well, I'm happy with that, too. But you wanted it significantly reworked and I haven't had time to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 29, 2013 at 10:47 AM, Andres Freund wrote: > On 2013-10-28 11:54:31 -0400, Robert Haas wrote: >> > There's one snag I currently can see, namely that we actually need to >> > prevent that a formerly dropped relfilenode is getting reused. Not >> > entirely sure what the best way for that is. >> >> I'm not sure in detail, but it seems to me that this all part of the >> same picture. If you're tracking changed relfilenodes, you'd better >> track dropped ones as well. > > What I am thinking about is the way GetNewRelFileNode() checks for > preexisting relfilenodes. It uses SnapshotDirty to scan for existing > relfilenodes for a newly created oid. Which means already dropped > relations could be reused. > I guess it could be as simple as using SatisfiesAny (or even better a > wrapper around SatisfiesVacuum that knows about recently dead tuples). I think modifying GetNewRelFileNode() is attacking the problem from the wrong end. The point is that when a table is dropped, that fact can be communicated to the same machine machinery that's been tracking the CTID->CTID mappings. Instead of saying "hey, the tuples that were in relfilenode 12345 are now in relfilenode 67890 in these new positions", it can say "hey, the tuples that were in relfilenode 12345 are now GONE". >> Completely aside from this issue, what >> keeps a relation from being dropped before we've decoded all of the >> changes made to its data before the point at which it was dropped? (I >> hope the answer isn't "nothing".) > > Nothing. But there's no need to prevent it, it'll still be in the > catalog and we don't ever access a non-catalog relation's data during > decoding. Oh, right. But what about a drop of a user-catalog table? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
> > So, for now, the counters only track sockets created from an inbound > (client to server) connection. here's v3 of the patch (rebase and cleanup). -nigel. *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** *** 586,592 CREATE VIEW pg_stat_activity AS S.state_change, S.waiting, S.state, ! S.query FROM pg_database D, pg_stat_get_activity(NULL) AS S, pg_authid U WHERE S.datid = D.oid AND S.usesysid = U.oid; --- 586,594 S.state_change, S.waiting, S.state, ! S.query, ! S.bytes_sent, ! S.bytes_received FROM pg_database D, pg_stat_get_activity(NULL) AS S, pg_authid U WHERE S.datid = D.oid AND S.usesysid = U.oid; *** *** 601,606 CREATE VIEW pg_stat_replication AS --- 603,610 S.client_hostname, S.client_port, S.backend_start, + S.bytes_sent, + S.bytes_received, W.state, W.sent_location, W.write_location, *** *** 634,639 CREATE VIEW pg_stat_database AS --- 638,645 pg_stat_get_db_deadlocks(D.oid) AS deadlocks, pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time, pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time, + pg_stat_get_db_bytes_sent(D.oid) AS bytes_sent, + pg_stat_get_db_bytes_received(D.oid) AS bytes_received, pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset FROM pg_database D; *** a/src/backend/libpq/be-secure.c --- b/src/backend/libpq/be-secure.c *** *** 74,80 #include "libpq/libpq.h" #include "tcop/tcopprot.h" #include "utils/memutils.h" ! #ifdef USE_SSL --- 74,80 #include "libpq/libpq.h" #include "tcop/tcopprot.h" #include "utils/memutils.h" ! #include "pgstat.h" #ifdef USE_SSL *** *** 307,312 rloop: --- 307,318 n = recv(port->sock, ptr, len, 0); client_read_ended(); + + if (n > 0) + { + /* we received data from the socket that needs to be reported */ + pgstat_report_commreceived(n); + } } return n; *** *** 441,447 wloop: --- 447,460 } else #endif + { n = send(port->sock, ptr, len, 0); + if (n > 0) + { + /* we sent data over the socket that needs to be reported */ + pgstat_report_commsent(n); + } + } return n; } *** *** 488,493 my_sock_read(BIO *h, char *buf, int size) --- 501,512 client_read_ended(); + if (res > 0) + { + /* we received data from the socket that needs to be reported */ + pgstat_report_commreceived(res); + } + return res; } *** *** 504,509 my_sock_write(BIO *h, const char *buf, int size) --- 523,533 BIO_set_retry_write(h); } } + else + { + /* we sent data over the socket that needs to be reported */ + pgstat_report_commsent(res); + } return res; } *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** *** 298,303 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); --- 298,305 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len); static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); + static void pgstat_recv_commsent(PgStat_MsgComm *msg, int len); + static void pgstat_recv_commreceived(PgStat_MsgComm *msg, int len); /* * Public functions called from postmaster follow *** *** 1249,1259 pgstat_reset_shared_counters(const char *target) if (strcmp(target, "bgwriter") == 0) msg.m_resettarget = RESET_BGWRITER; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized reset target: \"%s\"", target), ! errhint("Target must be \"bgwriter\"."))); pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER); pgstat_send(&msg, sizeof(msg)); --- 1251,1263 if (strcmp(target, "bgwriter") == 0) msg.m_resettarget = RESET_BGWRITER; + else if (strcmp(target, "socket") == 0) + msg.m_resettarget = RESET_SOCKET; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized reset target: \"%s\"", target), ! errhint("Target must be \"bgwriter\" or \"socket\"."))); pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER); pgstat_send(&msg, sizeof(msg)); *** *** 2531,2536 pgstat_bestart(void) --- 2535,2542 beentry->st_clienthostname[NAMEDATALEN - 1] = '\0'; beentry->st_appname[NAMEDATALEN - 1] = '\0'; beentry->st_activity[pgstat_t
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
2013/10/29 Tom Lane : > Craig Ringer writes: >> During my testing of Kohei KaiGai's row-security patches I've been >> looking into how foreign keys should be and are handled. There are some >> interesting wrinkles around FK cascades, the rights under which FK >> checks execute, and about the consistency effects of changing or >> applying an RLS policy. > > As I recall, I've been saying since day one that row-level security cannot > sensibly coexist with foreign-key constraints, and I've been told that the > potential users of such a feature don't care. I'm glad to see somebody > else complaining. > Not only RLS, it is not avoidable someone to estimate invisible records using FK constraints, even if either of referencing or referenced records were protected by column-level database privilege. I don't remember how many times we had discussed about this topic. Its conclusions was that access control itself is not capable to prevent information leak (1bit; whether a particular key exists, or not) using FK constraint, however, whole of its feature makes sense as long as user's environment where RLS+PostgreSQL is installed allows such a small fraction of information leak. In case when user's environment does not allow to leak any bit, it is not a reasonable solution, even though I don't know "reasonable solution" in this prerequisites. All of other commercial databases are standing on same assumption. Even though their promotion white-paper might not say, their solution of course have same weakness that may allow to leak something. > Here's another example wherein there basically isn't a sensible solution: > suppose you have delete rights on table A, and there is a table B > with a foreign-key reference to A, and RLS says that there are rows in > B that you can't see. You try to delete some row in A that is referenced > by an invisible-to-you row in B. There are only two possible outcomes: > the system refuses your request, and thereby exposes to you the fact that > a referencing row exists; or the system allows the FK constraint to be > violated. > My vote is, system should keep referencial integrity as if RLS policy is not configured. It is more fundamental stuff than RLS policy per user basis. > As far as the points you're making go, I think we must say that RLS checks > are not applied during FK trigger queries, ie the FK triggers can always > see everything even though they don't run as superuser. > Existing my implementation does as above. If a record is referenced by invisible records, its deletion shall fail in spite of the information leakage. > Otherwise you're > going to end up with constraint violations, and as a database weenie > I consider that unacceptable. This will mean that a poorly-chosen FK > arrangement will allow some leakage of row-existence info, but I don't > believe that that can be avoided anyway, per the above example. > OK, Let's drop table-level and column-level privileges also. They will be able to leak existence of invisible records, even if user don't have privilege to reference. :-) Any tools have its expected usage and suitable situation to be applied. A significant thing is to use a feature with understanding its purpose and limitations. As everybody knows, we have no silver bullets for security, but useful tool can help us, depending on situation. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Leonardo Francalanci writes: >> Before getting too excited about some new academic index type, it's worth >> noting the sad state in which hash indexes have languished for years. > Aren't hash indexes in a poor state because they are not faster than btree in > every condition? They should, in theory, be faster than btrees -- O(1) not O(log N) page fetches per lookup. In practice they don't seem to be faster, and nobody's bothered to find out exactly why. Again, this isn't a terribly encouraging precedent for implementing some other index type that's supposed to (sometimes) be faster than btrees. None of this is meant to discourage you from trying to write an index type if you have the time and motivation to pursue it. Just trying to answer your question as to why nobody's done it already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 02:53:37PM +, Leonardo Francalanci wrote: > > Before getting too excited about some new academic index type, it's worth > > noting the sad state in which hash indexes have languished for years. > > Nobody's bothered to add WAL support, let alone do any other real work > > on them. The non-btree index types that have been getting love are the > > ones that offer the ability to index queries that btree can't. I think > > a new index type whose only benefit is the claim to be faster in a narrow > > use-case is likely to end up like hash, not getting used enough to be > > properly maintained. > > regards, tom lane > > Aren't hash indexes in a poor state because they are not faster than btree in > every condition? > Hi Leonardo, If there was ONE perfect index, better in every condition, postgres would be using it. As in everything else, each type has its strengths and weaknesses. The hash index allows equality searches for very large key lengths using a relatively very small index size. As has been mentioned before, we still do not have WAL logging for hash indexes. But even so, for I/O bound systems hash indexes are twice as fast for searches than the btree equivalent. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Leonardo Francalanci wrote: > > Before getting too excited about some new academic index type, it's worth > > noting the sad state in which hash indexes have languished for years. > > Nobody's bothered to add WAL support, let alone do any other real work > > on them. The non-btree index types that have been getting love are the > > ones that offer the ability to index queries that btree can't. I think > > a new index type whose only benefit is the claim to be faster in a narrow > > use-case is likely to end up like hash, not getting used enough to be > > properly maintained. > > Aren't hash indexes in a poor state because they are not faster than > btree in every condition? Chicken and egg. Maybe they can be made faster than btrees (in some situations) with enough tweaks, but because there are so many outstanding problems, no one wants to do the huge amount of legwork to even get to the point where such tweaks can be made in the first place. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
> Before getting too excited about some new academic index type, it's worth > noting the sad state in which hash indexes have languished for years. > Nobody's bothered to add WAL support, let alone do any other real work > on them. The non-btree index types that have been getting love are the > ones that offer the ability to index queries that btree can't. I think > a new index type whose only benefit is the claim to be faster in a narrow > use-case is likely to end up like hash, not getting used enough to be > properly maintained. > regards, tom lane Aren't hash indexes in a poor state because they are not faster than btree in every condition? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-28 11:54:31 -0400, Robert Haas wrote: > > There's one snag I currently can see, namely that we actually need to > > prevent that a formerly dropped relfilenode is getting reused. Not > > entirely sure what the best way for that is. > > I'm not sure in detail, but it seems to me that this all part of the > same picture. If you're tracking changed relfilenodes, you'd better > track dropped ones as well. What I am thinking about is the way GetNewRelFileNode() checks for preexisting relfilenodes. It uses SnapshotDirty to scan for existing relfilenodes for a newly created oid. Which means already dropped relations could be reused. I guess it could be as simple as using SatisfiesAny (or even better a wrapper around SatisfiesVacuum that knows about recently dead tuples). > Completely aside from this issue, what > keeps a relation from being dropped before we've decoded all of the > changes made to its data before the point at which it was dropped? (I > hope the answer isn't "nothing".) Nothing. But there's no need to prevent it, it'll still be in the catalog and we don't ever access a non-catalog relation's data during decoding. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 2013-10-25 09:26:29 -0400, Robert Haas wrote: > > In any case, it's very far from obvious to me that CLUSTER ought > > to throw away information by default, which is what you're proposing. > > I find it odd to referring to this as throwing away information. I > know that you have a general concern about throwing away XIDs that are > still needed for forensic purposes, but that is clearly the ONLY > purpose that those XIDs serve, and the I/O advantages of freezing by > default could be massive for many of our users. What's going to > happen in practice is that experienced users will simply recommend > CLUSTER FREEZE rather than plain CLUSTER, and you won't have the > forensic information *anyway*. I think we should just apply your "preserve forensic information when freezing" patch. Then we're good to go without big arguments ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 2:53 AM, Leonardo Francalanci wrote: > Hi, > > > I don't see much interest in insert-efficient indexes. These are the ones > I've found: > > - LSM-tree (used by Cassandra and SQLite4?) > - Y-Tree > (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf > ) > - Fractal indexes (TokuDB, patented) > > While I understand that b*trees are still the best compromise in > insertion/search speed, disk size, concurrency, and more in general in OLTP > workloads, they are useless when it comes to insertion in big data tables > (>50M rows) of random values (not ordered values). > > I would like to know if the lack of development in this area (not only in > Postgresql, but in databases in general) is due to: > > 1) complex implementation > 2) poor search performance > 3) poor concurrency performance > 4) not interesting for most users > 5) something else??? > > I thought this was going to change due to the fast-insertion speeds needs of > "Social Applications", but only TokuDB seems to be the only "successful" > player in the area (I don't know how much of it is due to good marketing). > Most other DB technology claims faster insertion speed (MongoDB and the > like...) but in the end they rely on the old b*tree + sharding instead of > using different indexing mechanisms (with the exception of Cassandra). Another point to add: I don't really see btree as a barrier to performance for most of the problems I face. The real barriers to database performance are storage, contention, and query planning. Postgres btreee indexes are pretty fast and for stuff like bulk insertions there are some optimization techniques available (such as sharding or create index concurrently). Stuff I'd like to see in terms of postgres indexing: *) faster wal logged hash index *) composite gist/gin *) faster gist/gin (to the extent that it's possible). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Craig Ringer writes: > On 10/29/2013 03:53 PM, Leonardo Francalanci wrote: >> 5) something else??? > Quite likely nobody has had the enthusiasm and interest to implement a > viable, quality implementation and stick with it long enough to get it > committed. > There are a great many good ideas for improvements to Pg that just don't > have the people and time behind them to make them happen. Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on them. The non-btree index types that have been getting love are the ones that offer the ability to index queries that btree can't. I think a new index type whose only benefit is the claim to be faster in a narrow use-case is likely to end up like hash, not getting used enough to be properly maintained. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Creating partial index on a relation
Hello I am doing a small project in Postgress where i have to achieve the following: Suppose i know the index name(lets say index1) and the relation(table1) on which partial index to has to be build. I was looking through the code and found that IndexStmt-->whereClause is the one that i need to populate manually . As of know i am successful in creating index by giving a call to Method DefineIndex of indexcmds.c that creates a index on the relation, but this index is not an partial index. I found that DefineIndex takes *Expr *predicate* as an parameter which specifies that partial index has to be built. This predicate is nothing but *IndexStmt-->whereClause * So is there a way to populate manually IndexStmt-->whereClause An example of partial index that i need is if my My table1 schema is (a int ,b int ,c int) index on c where a is null, b is null and c is not null Thanks in advance Regards -- View this message in context: http://postgresql.1045698.n5.nabble.com/Creating-partial-index-on-a-relation-tp5776230.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
Craig Ringer writes: > During my testing of Kohei KaiGai's row-security patches I've been > looking into how foreign keys should be and are handled. There are some > interesting wrinkles around FK cascades, the rights under which FK > checks execute, and about the consistency effects of changing or > applying an RLS policy. As I recall, I've been saying since day one that row-level security cannot sensibly coexist with foreign-key constraints, and I've been told that the potential users of such a feature don't care. I'm glad to see somebody else complaining. Here's another example wherein there basically isn't a sensible solution: suppose you have delete rights on table A, and there is a table B with a foreign-key reference to A, and RLS says that there are rows in B that you can't see. You try to delete some row in A that is referenced by an invisible-to-you row in B. There are only two possible outcomes: the system refuses your request, and thereby exposes to you the fact that a referencing row exists; or the system allows the FK constraint to be violated. As far as the points you're making go, I think we must say that RLS checks are not applied during FK trigger queries, ie the FK triggers can always see everything even though they don't run as superuser. Otherwise you're going to end up with constraint violations, and as a database weenie I consider that unacceptable. This will mean that a poorly-chosen FK arrangement will allow some leakage of row-existence info, but I don't believe that that can be avoided anyway, per the above example. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 10/29/2013 03:53 PM, Leonardo Francalanci wrote: > 5) something else??? Quite likely nobody has had the enthusiasm and interest to implement a viable, quality implementation and stick with it long enough to get it committed. There are a great many good ideas for improvements to Pg that just don't have the people and time behind them to make them happen. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
On 10/29/2013 04:09 PM, Craig Ringer wrote: > Problem is, that won't necessarily happen, because the FK check is run > with the rights of the table owner. Some further reading suggests that another vendor's implementation ignores row security policy for foreign key constraint checks. So FK constraint checks can be used to probe for data, but there are no wacky inconsistencies with FKs. I'm not sure I'm thrilled with that answer, but on the other hand a better one isn't leaping out at me right now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?
During my testing of Kohei KaiGai's row-security patches I've been looking into how foreign keys should be and are handled. There are some interesting wrinkles around FK cascades, the rights under which FK checks execute, and about the consistency effects of changing or applying an RLS policy. It seems clear that if a user tries to INSERT a tuple into a table that they would not be able to see if they tried to read it, the insert should fail with a permission denied error, at least by default or as an easy option. That stops users probing for keys by looking for unique constraint errors, i.e. key proving. This works for inserts directly into a table, but gets complicated with foreign keys. If the user tries to insert a row into table A and table A has a FK to table B, if the user cannot see the referenced value in B then they should not be able to insert the row into A, instead getting the same error as if the tuple in B really didn't exist. Otherwise, again, they can probe for the existence of keys by using foreign key relationships. Problem is, that won't necessarily happen, because the FK check is run with the rights of the table owner. So you may find that you can't insert a row that references a foreign row you can see; the FK constraint check will fail even though you can clearly see the row. Similarly, you _can_ insert a row that references a row you cannot see. It gets even weirder when you're a superuser because you're exempt from RLS checks when you query a table directly but when you add a FK constraint the check is run with the table owner's rights - so it might fail even with the rows obviously visible to you. Take the following fairly nonsensical session, which is based on the RLS test suite: test=# \dt rls_regress_schema.* List of relations Schema | Name | Type | Owner +--+---+--- rls_regress_schema | category | table | rls_regress_user0 rls_regress_schema | document | table | rls_regress_user0 test=# select * from rls_regress_schema.category; cid | cname -+- 11 | novel 22 | science fiction 33 | technology 44 | manga (4 rows) test=# INSERT INTO rls_regress_schema.document (did, cid, dlevel, dtitle) VALUES (9, 22, 0, 'blah'); ERROR: insert or update on table "document" violates foreign key constraint "document_cid_fkey" DETAIL: Key (cid)=(22) is not present in table "category". Um. WTF? As Kohei KaiGai pointed out when I asked him about it, this is because the foreign key check trigger runs as the table owner, in this case rls_regress_user0. rls_regress_user0 is set up so it can't see any rows in the 'category' table even though it's the owner, so no foreign key pointing to this table can ever succeed. I don't think this is a usable situation, but I don't have any easy answers. We can't run the trigger as the current_user because it might not then have the required GRANTs for table-level access, and because it could cause malicious functions to run in the security context of the invoking user. Yet running it in the context of the owning user seems to get rid of half the point of RLS by making it incredibly hard to use it with foreign keys sanely. Thoughts? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fast insertion indexes: why no developments
Hi, I don't see much interest in insert-efficient indexes. These are the ones I've found: - LSM-tree (used by Cassandra and SQLite4?) - Y-Tree (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf ) - Fractal indexes (TokuDB, patented) While I understand that b*trees are still the best compromise in insertion/search speed, disk size, concurrency, and more in general in OLTP workloads, they are useless when it comes to insertion in big data tables (>50M rows) of random values (not ordered values). I would like to know if the lack of development in this area (not only in Postgresql, but in databases in general) is due to: 1) complex implementation 2) poor search performance 3) poor concurrency performance 4) not interesting for most users 5) something else??? I thought this was going to change due to the fast-insertion speeds needs of "Social Applications", but only TokuDB seems to be the only "successful" player in the area (I don't know how much of it is due to good marketing). Most other DB technology claims faster insertion speed (MongoDB and the like...) but in the end they rely on the old b*tree + sharding instead of using different indexing mechanisms (with the exception of Cassandra). Thank you in advance Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ "is not a valid Win32 application"
Hi Sandeep > I think, you should change the subject line to "Unquoted service path > containing space is vulnerable and can be exploited on Windows" to get the > attention.. :) Thank you for advice! I'll try to post to pgsql-bugs again. > BTW, in your case, the file "Program" should be an exe and not just any other > file to exploit this vulnerability. Right? Yes, "Program" is a simple file I made. Best Regards, Naoya > Hi Naoya > > I think, you should change the subject line to "Unquoted service path > containing space is vulnerable and can be exploited on Windows" to get the > attention.. :) > > BTW, in your case, the file "Program" should be an exe and not just any other > file to exploit this vulnerability. Right? > > > On Tue, Oct 29, 2013 at 11:34 AM, Naoya Anzai > wrote: > > > Hi,Sandeep > > Thanks. > > Sorry, There was a mistake in what I said. > > I said > > > Not only "pg_ctl.exe" but "postgres.exe" also have the same > problem. > > but, to say it correctly, > > "postgres.exe" does not have the problem. > Source that contains the problem is only "pg_ctl.c". > > > > So, this is not an installer issue. Is this bug raised to the > PostgreSQL community? If yes, you should submit the patch there. > > YES, I had submitted there already,But nobody has responded me yet. > > > http://postgresql.1045698.n5.nabble.com/PostgreSQL-Service-on-Windows-does-not-start-td5774206.html > > Regards, > Naoya > > > > So, this is not an installer issue. Is this bug raised to the > PostgreSQL community? If yes, you should submit the patch there. > > > > > > On Tue, Oct 29, 2013 at 6:23 AM, Naoya Anzai > wrote: > > > > > > Hi, Asif > > > > Thank you for providing my patch (pg_ctl.c.patch) to Sandeep on > my behalf. > > > > > > > Good finding. I have attached another version of patch > (pg_ctl.c_windows_vulnerability.patch) attached that has fewer lines of code > changes, can you please take a look ?. Thanks. > > > > > > I think your patch is not sufficient to fix. > > Not only "pg_ctl.exe" but "postgres.exe" also have the same > problem. > > Even if your patch is attached, > > A Path of "postgres.exe" passed to CreateRestrictedProcess is > not enclosed in quotation.(See pgwin32_ServiceMain at pg_ctl.c) > > > > So, processing enclosed in quotation should do in both > conditions. > > > > > > Regards, > > Naoya > > > > --- > > Naoya Anzai > > Engineering Department > > NEC Soft, Ltd. > > E-Mail: anzai-na...@mxu.nes.nec.co.jp > > --- > > > > > > > Hi Sandeep, > > > > > > PFA Naoya's patch (pg_ctl.c.patch). > > > > > > Hi Naoya, > > > > > > Good finding. I have attached another version of patch > (pg_ctl.c_windows_vulnerability.patch) attached that has fewer lines of code > changes, can you please take a look ?. Thanks. > > > > > > Best Regards, > > > Asif Naeem > > > > > > > > > On Mon, Oct 28, 2013 at 4:46 PM, Sandeep Thakkar > wrote: > > > > > > > > > Hi Dave > > > > > > We register the service using pg_ctl. When I manually > executed the following on the command prompt, I saw that the service path of > the registered service did not have the pg_ctl.exe path in quotes. May be it > should be handled in the pg_ctl code. > > > > > > c:\Users\Sandeep Thakkar\Documents>"c:\Program > Files\PostgreSQL\9.3\bin\pg_ctl.e > > > xe" register -N "pg-9.3" -U "NT > AUTHORITY\NetworkService" -D "c:\Program Files\P > > > ostgreSQL\9.3\data" -w > > > > > > Naoya, I could not find your patch here. Can you > please share it again? > > > > > > > > > > > > On Mon, Oct 28, 2013 at 2:53 PM, Dave Page > wrote: > > > > > > > > > Sandeep, can you look at this please? Thanks. > > > > > > On Mon, Oct 28, 2013 at 8:18 AM, Asif Naeem > wrote: > > > > It is related to windows unquoted service > path vulnerability in the the > > > > installer that creates service path without > quotes that make service.exe to > > > > look for undesirable path for executable. > > > > > > >