Re: [HACKERS] Feature Request: pg_replication_master()
On 19 December 2012 06:10, Magnus Hagander mag...@hagander.net wrote: This sounds like my previous suggestion of returning the primary conninfo value, but with just ip. That one came with a pretty bad patch, and was later postponed until we folded recovery.conf into the main configuration file parsing. I'm not really sure what happened to that project? (the configuration file one) It stalled because the patch author decided not to implement the request to detect recovery.conf in data directory, which allows backwards compatibility. I proposed a solution to how to do that, so we can move forwards if people have time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - assign result of query to psql variable
2012/12/19 Shigeru Hanada shigeru.han...@gmail.com: On Tue, Dec 18, 2012 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Attached updated patch Seems fine. I'll mark this as ready for committer. Nice work! thank you very much Regards Pavel -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
On 19 December 2012 06:34, Magnus Hagander mag...@hagander.net wrote: Granting executability on pg_read_xyz is pretty darn close to granting superuser, without explicitly asking for it. Well, you get read only superuser. If we want to make that step as easy as just GRANT, we really need to write some *very* strong warnings in the documentation so that people realize this. I doubt most people will realize it unless we do that (and those who don't read the docs, whch is probably a majority, never will). Good point. Can we do that explicitly with fine grained superuser-ness? GRANT SUPERUSER ON FUNCTION TO foo; If you use SECURITY DEFINER, you can limit the functions to *the specific files that you want to grant read on*. Which makes it possible to actually make it secure. E.g. you *don't* have to give full read to your entire database. Even better point If you're comparing it to a blanket SECURITY DEFINER with no checks, then yes, it's a simpler way to fire the cannon into your own foot, yes. But if also gives you a way that makes it more likely that you don't *realize* that you're about to fire a cannon into your foot. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 12/18/2012 11:57 PM, Simon Riggs wrote: On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. I think you are being optimistic. We should probably have some logic that prevents circular replication. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 19 December 2012 08:11, Joshua D. Drake j...@commandprompt.com wrote: On 12/18/2012 11:57 PM, Simon Riggs wrote: On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. I think you are being optimistic. We should probably have some logic that prevents circular replication. My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. If you don't test your config and don't monitor either, good luck with HA. Patches welcome, if you think this important enough. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
Can you post the full definition of the table on this public email list? Also, why did the error think this was in the public schema? Any idea? --- 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting .. snip It's all what I'm found about this table. -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; Owner: postgres; Tablespace: -- CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ ( @Файл integer NOT NULL, Страница integer NOT NULL, Данные bytea ); ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres; -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); -- -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл FOREIGN KEY (@Файл) REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE; -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; Owner: postgres -- REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC; REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres; GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres; GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user; There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure.in and setproctitle/optreset problem
Re: Tom Lane 2012-12-18 8705.1355845...@sss.pgh.pa.us The correct fix, IMO/IIRC, is to add LDFLAGS=-Wl,--as-needed before running most of the configure checks, instead of after. Meh. It's not clear to me at all that that fixes the issue here, or at least that it does so in any way that's reliable. The proposal to add --as-needed during configure was made to fix a different problem, namely making the wrong decision about whether libintl needs to be pulled in explicitly. We don't seem to have done anything about that To me, twiddling with --as-needed sounds like trading one set of possible problems for a different one, configure checks should be as deterministic as possible. It might still be that Peter's --as-needed suggestion is a good fix, but I believe the issue I reported should also be fixed by the patch Tom sent. Reiterating a point from my original message, why is -l{readline,edit} included in these configure checks at all? Most (if not all) of the function checks in that block are not related to input editing anyway. Mit freundlichen Grüßen, Christoph Berg -- Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Wed, Dec 12, 2012 at 12:06 AM, Bruce Momjian br...@momjian.us wrote: On Sat, Dec 8, 2012 at 08:59:00AM -0500, Phil Sorber wrote: On Sat, Dec 8, 2012 at 7:50 AM, Michael Paquier michael.paqu...@gmail.com wrote: Bruce mentionned that pg_isready could be used directly by pg_ctl -w. Default as being non-verbose would make sense. What are the other tools you are thinking about? Some utilities in core? I think Bruce meant that PQPing() is used by pg_ctl -w, not that he would use pg_isready. Right. OK cool. If you have some spare room to write a new version with verbose option as default, I'll be pleased to review it and then write it as ready for committer. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Makefiles don't seem to remember to rebuild everything anymore
On Wed, Dec 19, 2012 at 12:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 17, 2012 at 1:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: This is definitely not per make's contract, either. I think maybe the Don't rebuild the list if only the OBJS have changed hack in common.mk is a brick or two shy of a load, but I don't know how to fix that. I feel like it's been this way for a while - at least I feel like I've noticed this before. I think there is some inevitable kludginess around having one makefile per subdirectory that leads to these kinds of issues. Maybe we should get rid of all the makefiles under src/backend except for the top-level one and just do everything there. I mentioned this paper last year, but maybe it's time to start taking it seriously: http://aegis.sourceforge.net/auug97.pdf +1 from me. I don't know that just fixing src/backend will do a whole lot to improve build times in and of itself, but I do think it might reduce the required amount of alchemy to keep things working. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 19.12.2012 04:57, Josh Berkus wrote: Heikki, I ran into an unexpected issue while testing. I just wanted to fire up a chain of 5 replicas to see if I could connect them in a loop. However, I ran into a weird issue when starting up r3: it refused to come out of the database is starting up mode until I did a write on the master. Then it came up fine. master--r1--r2--r3--r4 I tried doing the full replication sequence (basebackup, startup, test) with it twice and got the exact same results each time. This is very strange because I did not encounter the same issues with r2 or r4. Nor have I seen this before in my tests. Ok.. I'm going to need some more details on how to reproduce this, I'm not seeing that when I set up four standbys. I'm also seeing Thom's spurious error message now. Each of r2, r3 and r4 have the following message once in their logs: LOG: database system was interrupted while in recovery at log time 2012-12-19 02:49:34 GMT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. This message doesn't seem to signify anything. Yep. You get that message when you start up the system from a base backup that was taken from a standby server. It's just noise, it would be nice if we could dial it down somehow. In general, streaming replication and backups tend to be awfully noisy. I've been meaning to go through all the messages that get printed during normal operation and think carefully which ones are really necessary, which ones could perhaps be merged into more compact messages. But haven't gotten around to it; that would be a great project for someone who actually sets up these systems regularly in production. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
On Tue, Dec 18, 2012 at 10:33:12AM +0100, Dimitri Fontaine wrote: Robert Haas robertmh...@gmail.com writes: And on the other hand, if you could get a clean split between the two grammars, then regardless of exactly what the split was, it might seem a win. But it seemed to me when I looked at this that you'd have to duplicate a lot of stuff and the small parser still wouldn't end up being very small, which I found hard to get excited about. I think the goal is not so much about getting a much smaller parser, but more about have a separate parser that you don't care about the bloat of, so that you can improve DDL without fearing about main parser performance regressions. In addition to this, there could well be uses for a more modular parser. For example, if it turns out to be possible to do our parser in a way that is exportable and (can be converted to a type that) looks forward, client code could do a lot of interesting (and provably correct) things. If we come out with no regression and no gain on the main query parser, I say it still worth the separation effort. And anyway we only add things to the main parser (queries) when the standard saith we have to. Tom Lane t...@sss.pgh.pa.us writes: I'm not sure what other tool might be better though. I looked through http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages but it seems our options are a bit limited if we want something that produces C. It's not clear to me that any of the likely options are as mature as bison, let alone likely to substantially outperform it. (For instance, Hyacc sounded pretty promising until I got to the part about it doesn't yet support %union or %type.) Still, I didn't spend much time on this --- maybe somebody else would like to do a bit more research. I did spend a very little time on it too, with a different search angle, and did find that experimental thing that might be worth looking at, or maybe not. http://en.wikipedia.org/wiki/Parsing_expression_grammar http://piumarta.com/software/peg/ More angles: http://wwwiti.cs.uni-magdeburg.de/~rosenmue/publications/SKS+08.pdf Might anyone here wish to investigate this, given some kind of resources for the initial study? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles
On Tue, Dec 18, 2012 at 5:05 PM, Jeff Janes jeff.ja...@gmail.com wrote: Sorry for the malformed and duplicate post. I was not trying to be emphatic; I was testing out gmail offline. Clearly the test didn't go too well. Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FDW: ForeignPlan and parameterized paths
Hello. I've noticed that, when implementing a FDW, it is difficult to use a plan which best path is a parameterized path. This comes from the fact that the parameterized clause is not easily available at plan time. This is what I understood from how it works: - The clauses coming from the best path restrictinfo are not available in the scan_clauses argument to the GetForeignPlan function. - They are, however, directly available on the path, but at this point the clauses are of the form InnerVar OPERATOR OuterVar. The outer Var node is then replaced by a Param node, using the replace_nestloop_params function. It could be useful to make the parameterized version of the clause (in the form InnerVar OPERATOR Param) available to the fdw at plan time. Could this be possible ? Maybe by replacing the clauses on the restrictinfo nodes from the path param info by the parameterized clauses, and then adding these to the scan clauses passed to GetForeignPlan ? Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: On the other hand, the HOT prune operation itself is worthless when we're running a SELECT. The only reason we do it that way is that we have to prune before the query starts to use the page, else pruning might invalidate pointers-to-tuples that are being held within the query plan tree. Maybe it's time to look at what it'd take for the low-level scan operations to know whether they're scanning the target relation of an UPDATE query, so that we could skip pruning altogether except when a HOT update could conceivably ensue. I think this was discussed back when HOT went in, but nobody wanted to make the patch more invasive than it had to be. I think it's wrong to assume that HOT pruning has no value except in this case. Truncating dead tuples to line pointers and collapsing HOT chains speeds up future page scans, and if we were able to set the all-visible bit, that would help even more. The problem is that this is all somewhat prospective: HOT pruning the page doesn't help the *current* scan - in fact, it can sometimes slow it down considerably - but it can be a great help to the next scan that comes through. We say, oh, don't worry, VACUUM will take care of it, but there are plenty of cases where a page can be scanned a very large number of times before VACUUM comes along; and you do can lose a lot of performance in those cases. That having been said, I agree with the concerns expressed elsewhere in this thread that setting the visibility map bit too aggressively will be a waste. If the page is about to get dirtied again we surely don't want to go there. Aside from the obvious problem of doing work that may not be necessary, it figures to create buffer-lock contention on the visibility map page. One of the strengths of the current design is that we avoid that pretty effectively. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Documentation bug for LDAP authentication
While playing with LDAP authentication, I discovered two documentation bugs. First, user and password for the first step in the two-step authentication mode are ldapbinddn and ldapbindpasswd, not ldapbinduser and ldapbinddn. This bug has been there since 8.4. The second one is new in 9.3 with the URL syntax: It is not possible to specify user and password in the LDAP URL. The first hunk should be backpatched. Yours, Laurenz Albe ldapdoc.patch Description: ldapdoc.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 19.12.2012 15:55, Heikki Linnakangas wrote: On 19.12.2012 04:57, Josh Berkus wrote: Heikki, I ran into an unexpected issue while testing. I just wanted to fire up a chain of 5 replicas to see if I could connect them in a loop. However, I ran into a weird issue when starting up r3: it refused to come out of the database is starting up mode until I did a write on the master. Then it came up fine. master--r1--r2--r3--r4 I tried doing the full replication sequence (basebackup, startup, test) with it twice and got the exact same results each time. This is very strange because I did not encounter the same issues with r2 or r4. Nor have I seen this before in my tests. Ok.. I'm going to need some more details on how to reproduce this, I'm not seeing that when I set up four standbys. Ok, I managed to reproduce this now. The problem seems to be a timing problem, when a standby switches to follow a new timeline. Four is not a magic number here, it can happen with just one cascading standby too. When the timline switch happens, for example, the standby changes recovery target timeline from 1 to 2, at WAL position 0/30002D8, it has all the WAL up to that WAL position. However, it only has that WAL in file 00010003, corresponding to timeline 1, and not in the file 00020003, corresponding to the new timeline. When a cascaded standby connects, it requests to start streaming from point 0/300 at timeline 2 (we always start streaming from the beginning of a segment, to avoid leaving partially-filled segments in pg_xlog). The walsender in the 1st standby tries to read that from file 00020003, which does not exist yet. The problem goes away after some time, after the 1st standby has streamed the contents of 00020003 and written it to disk, and the cascaded standby reconnects. But it would be nice to avoid that situation. I'm not sure how to do that yet, we might need to track the timeline we're currently receiving/sending more carefully. Or perhaps we need to copy the previous WAL segment to the new name when switching recovery target timeline, like we do when a server is promoted. I'll try to come up with something... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 19.12.2012 17:27, Heikki Linnakangas wrote: On 19.12.2012 15:55, Heikki Linnakangas wrote: On 19.12.2012 04:57, Josh Berkus wrote: Heikki, I ran into an unexpected issue while testing. I just wanted to fire up a chain of 5 replicas to see if I could connect them in a loop. However, I ran into a weird issue when starting up r3: it refused to come out of the database is starting up mode until I did a write on the master. Then it came up fine. master--r1--r2--r3--r4 I tried doing the full replication sequence (basebackup, startup, test) with it twice and got the exact same results each time. This is very strange because I did not encounter the same issues with r2 or r4. Nor have I seen this before in my tests. Ok.. I'm going to need some more details on how to reproduce this, I'm not seeing that when I set up four standbys. Ok, I managed to reproduce this now. Hmph, no I didn't, I replied to wrong email. The problem I managed to reproduce was the one where you get requested WAL segment 00020003 has already been removed errors, reported by Thom. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Hello again, well, still everything is working. What information do you need to get into this issue? Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more or less changed enum_add and enum_del (Which are appended at the end) to be able to change enums within transactions. And that this happened to the beta server and not to the staging server, might be because we sometimes have to drop the whole stuff of staging, because of some failures we did, so old enum values will not be persistent in old indexes. if you need more info, just ask. :) regards Bernhard SET check_function_bodies = false; CREATE OR REPLACE FUNCTION enum_add (enum_name character varying, enum_elem character varying) RETURNS void AS $body$ DECLARE _enum_typid INTEGER; version_int INTEGER; _highest_enumsortorder REAL; BEGIN -- get enumtypid SELECT oid FROM pg_type WHERE typtype='e' AND typname=enum_name INTO _enum_typid; SELECT INTO version_int setting FROM pg_settings WHERE name = 'server_version_num'; --postgres 9.2 or higher IF version_int 90200 THEN SELECT MAX(enumsortorder) FROM pg_enum WHERE enumtypid = _enum_typid INTO _highest_enumsortorder; -- check if elem already exists in enum IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumlabel = enum_elem AND enumtypid = _enum_typid) THEN INSERT INTO pg_enum(enumtypid, enumlabel, enumsortorder) VALUES ( _enum_typid, enum_elem, _highest_enumsortorder + 1 ); END IF; ELSE -- check if elem already exists in enum IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumlabel = enum_elem AND enumtypid = _enum_typid) THEN INSERT INTO pg_enum(enumtypid, enumlabel) VALUES ( _enum_typid, enum_elem ); END IF; END IF; END; $body$ LANGUAGE plpgsql; -- -- Definition for function enum_del: -- CREATE OR REPLACE FUNCTION enum_del (enum_name character varying, enum_elem character varying) RETURNS void AS $body$ DECLARE type_oid INTEGER; rec RECORD; sql VARCHAR; ret INTEGER; BEGIN SELECT pg_type.oid FROM pg_type WHERE typtype = 'e' AND typname = enum_name INTO type_oid; -- check if enum exists IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumtypid = type_oid) THEN RETURN; END IF; -- check if element in enum exists IF NOT FOUND THEN RAISE EXCEPTION 'Cannot find a enum: %', enum_name; END IF; -- Check column DEFAULT value references. SELECT * FROM pg_attrdef JOIN pg_attribute ON attnum = adnum AND atttypid = type_oid JOIN pg_class ON pg_class.oid = attrelid JOIN pg_namespace ON pg_namespace.oid = relnamespace WHERE adsrc = quote_literal(enum_elem) || '::' || quote_ident(enum_name) LIMIT 1 INTO rec; IF FOUND THEN RAISE EXCEPTION 'Cannot delete the ENUM element %.%: column %.%.% has DEFAULT value of ''%''', quote_ident(enum_name), quote_ident(enum_elem), quote_ident(rec.nspname), quote_ident(rec.relname), rec.attname, quote_ident(enum_elem); END IF; -- Check data references. FOR rec IN SELECT * FROM pg_attribute JOIN pg_class ON pg_class.oid = attrelid JOIN pg_namespace ON pg_namespace.oid = relnamespace WHERE atttypid = type_oid AND relkind = 'r' LOOP sql := 'SELECT 1 FROM ONLY ' || quote_ident(rec.nspname) || '.' || quote_ident(rec.relname) || ' ' || ' WHERE ' || quote_ident(rec.attname) || ' = ' || quote_literal(enum_elem) || ' LIMIT 1'; EXECUTE sql INTO ret; IF ret IS NOT NULL THEN RAISE EXCEPTION 'Cannot delete the ENUM element %.%: column %.%.% contains references', quote_ident(enum_name), quote_ident(enum_elem), quote_ident(rec.nspname), quote_ident(rec.relname), rec.attname; END IF; END LOOP; -- OK. We may delete. DELETE FROM pg_enum WHERE enumtypid = type_oid AND enumlabel = enum_elem; END; $body$ LANGUAGE plpgsql; -- Bernhard Schrader System Administration InnoGames GmbH Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany Tel +49 40 7889335-53 Fax +49 40 7889335-22 Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com -- bernhard.schra...@innogames.de
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote: Hello again, well, still everything is working. What information do you need to get into this issue? Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more or less changed enum_add and enum_del (Which are appended at the end) to be able to change enums within transactions. That explains everything. You *CANNOT* do that. There are some pretty fundamental reasons why you are not allowed to add enums in a transaction. And even more reasons why deleting from enums isn't allowed at all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 8:32 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: On the other hand, the HOT prune operation itself is worthless when we're running a SELECT. The only reason we do it that way is that we have to prune before the query starts to use the page, else pruning might invalidate pointers-to-tuples that are being held within the query plan tree. Maybe it's time to look at what it'd take for the low-level scan operations to know whether they're scanning the target relation of an UPDATE query, so that we could skip pruning altogether except when a HOT update could conceivably ensue. I think this was discussed back when HOT went in, but nobody wanted to make the patch more invasive than it had to be. I think it's wrong to assume that HOT pruning has no value except in this case. Truncating dead tuples to line pointers and collapsing HOT chains speeds up future page scans, and if we were able to set the all-visible bit, that would help even more. Good point. The problem is that this is all somewhat prospective: HOT pruning the page doesn't help the *current* scan - in fact, it can sometimes slow it down considerably - but it can be a great help to the next scan that comes through. We say, oh, don't worry, VACUUM will take care of it, but there are plenty of cases where a page can be scanned a very large number of times before VACUUM comes along; and you do can lose a lot of performance in those cases. Also, since we discount for number of tuples pruned by HOT pruning while tracking number of dead tuples in a table, in a perfectly stable system, autovacuum may not ever pick the table for vacuuming, slowly stopping index-only scans from working. Soon we will have a situation when all VM bits are clear, but autovacuum would fail to pick the table. Tom had a good suggestion to periodically count vm bits to choose tables for vacuuming even if there are no dead tuples or dead line pointers to remove. I'm not sure though if the extra vacuum will be better than setting the bit after HOT prune. Also, deciding when to count the bits can be tricky. Do it every vacuum cycle ? Or after every 5/10 cycles ? I don't have the answer. That having been said, I agree with the concerns expressed elsewhere in this thread that setting the visibility map bit too aggressively will be a waste. If the page is about to get dirtied again we surely don't want to go there. Yeah, I agree. If we could figure out that we are soon going to UPDATE a tuple in the page again, it will be worthless to set the bit. But predicting that also could turn out to be tricky. Even if we could somehow tell that the scan is happening on the result relation of an UPDATE operation, not every page may receive updates because of where quals etc. So we may get lots of false positives. Aside from the obvious problem of doing work that may not be necessary, it figures to create buffer-lock contention on the visibility map page. One of the strengths of the current design is that we avoid that pretty effectively. Its a valid concern, though my limited pgbench tests did not show any drop in the number. But thats hardly any proof. We can possibly mitigate this by conditional update to the VM bit. Do it only if you get a conditional exclusive lock on the buffer page. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ThisTimeLineID in checkpointer and bgwriter processes
In both checkpointer.c and bgwriter.c, we do this before entering the main loop: /* * Use the recovery target timeline ID during recovery */ if (RecoveryInProgress()) ThisTimeLineID = GetRecoveryTargetTLI(); That seems reasonable. However, since it's only done once, when the process starts up, ThisTimeLineID is never updated in those processes, even though the startup process changes recovery target timeline. That actually seems harmless to me, and I also haven't heard of any complaints of misbehavior in 9.1 or 9.2 caused by that. I'm not sure why we bother to set ThisTimeLineID in those processes in the first place. I think we did it when streaming replication was introduced because it was an easy thing to do, because back then recovery target timeline never changed after recovery was started. But now that it can change, I don't think that makes sense anymore. So, I propose that we simply remove those, and leave ThisTimeLineID at zero in checkpointer and bgwriter processes, while we're still in recovery. ThisTimeLineID is updated anyway before performing the first checkpoint after finishing recovery, and AFAICS that's the first time the value is needed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/19/2012 10:56 AM, Andres Freund wrote: On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote: Hello again, well, still everything is working. What information do you need to get into this issue? Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more or less changed enum_add and enum_del (Which are appended at the end) to be able to change enums within transactions. That explains everything. You *CANNOT* do that. There are some pretty fundamental reasons why you are not allowed to add enums in a transaction. And even more reasons why deleting from enums isn't allowed at all. Yes, this is exactly what I referred to in my recent reply to Tom. This is a recipe for database corruption. Hacking the catalog generally is something to be done only with the most extreme caution, IMNSHO. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
Robert Haas robertmh...@gmail.com writes: On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe it's time to look at what it'd take for the low-level scan operations to know whether they're scanning the target relation of an UPDATE query, so that we could skip pruning altogether except when a HOT update could conceivably ensue. I think this was discussed back when HOT went in, but nobody wanted to make the patch more invasive than it had to be. I think it's wrong to assume that HOT pruning has no value except in this case. Truncating dead tuples to line pointers and collapsing HOT chains speeds up future page scans, and if we were able to set the all-visible bit, that would help even more. The problem is that this is all somewhat prospective: HOT pruning the page doesn't help the *current* scan - in fact, it can sometimes slow it down considerably - but it can be a great help to the next scan that comes through. Well, no. The problem with the way we do it now is that doing it every time a query scan arrives at a page is too often, resulting in a lot of wasted work. That wasted work is somewhat tolerable as long as it only involves looking at the current page and ending up not actually changing it. If we start generating a lot of useless WAL activity and I/O as a result of thrashing the all-visible bit, it won't be so tolerable anymore. But the problem is not so much the desire to set the bit as that we're doing this whole activity at the wrong place and time. Perhaps doing it every time an UPDATE arrives at the page is too far in the other direction, and we need to look for some other mechanism entirely. I think my core point still stands: the way that HOT pruning is done now is an artifact of having wanted to shoehorn it into the system with minimum changes. Which was reasonable at the time given the experimental status of the feature, but now it's time to reconsider. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Andres Freund and...@2ndquadrant.com writes: On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote: Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more or less changed enum_add and enum_del (Which are appended at the end) to be able to change enums within transactions. That explains everything. You *CANNOT* do that. Yeah. So this was not pg_upgrade's fault at all: that code would have created problems in 9.1 or later even without using pg_upgrade. For the record, the reason you can't safely do this is exactly what we saw here: it's possible for deleted/never-committed values of the type to remain behind in upper levels of btree indexes. Since we now need to be able to consult pg_enum to know how to compare values of an enum type, deleted values are uncomparable. enum_add is all right as long as you are careful to commit its transaction before inserting the new value anywhere. enum_del is quite unsafe unless you REINDEX all indexes on columns of the type after making sure the value is gone from the tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we start generating a lot of useless WAL activity and I/O as a result of thrashing the all-visible bit, it won't be so tolerable anymore. What if we wrap that into the WAL generated by HOT prune itself ? Would that address your concerns for extra WAL logging ? I also suggested doing it conditionally to avoid contention on the VM buffer. (I actually wonder why we WAL-log set operation at all except for HS to be able to do IOS, but thats a topic for separate thread may be) Also, if extra WAL-logging is really worrisome, may be we should again seriously reconsider my idea of *not* clearing the bit at HOT update. My apologies for repeating myself. But that seems very important in a steady system when almost every update is a HOT update. So you don't clear the bit at HOT update and so don't need to set it back either, thus saving two WAL activity. You definitely don't need any vacuum in this case if pruning keeps reclaiming dead space at appropriate time and make it available for the next update. More so, IOS still works great. Why is this so bad ? I haven't forgotten your complaints about changed meaning of the bit, but I tried to explain that we can read it in a slightly different way and still show it as an invariant. I think my core point still stands: the way that HOT pruning is done now is an artifact of having wanted to shoehorn it into the system with minimum changes. Which was reasonable at the time given the experimental status of the feature, but now it's time to reconsider. ISTM that you already have concret ideas about what are those places where HOT prune would be more effective. My worry is changing anything there is going to be a lot trickier and will require heavy testing. Our initial work has served us well so far. Of course, I've no problem changing that if its going to benefit users. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
On 12/19/2012 11:31 AM, Tom Lane wrote: enum_add is all right as long as you are careful to commit its transaction before inserting the new value anywhere. It's not really all right for post-9.0 versions. For example, this is wrong: --postgres 9.2 or higher IF version_int 90200 THEN It should really be IF version_int = 90100 THEN what is even worse is that this procedure doesn't take any care at all of the ordering rule for even numbered enum oids. We could have oid wraparound to an even numbered oid and it would break the rule. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 12/19/2012 12:34 AM, Simon Riggs wrote: My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. If you don't test your config and don't monitor either, good luck with HA. I am not arguing whether you are right. I am arguing whether or not we want to shoot all but our experts users in the foot. People make mistakes, when reasonable we should help them not make those mistakes. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: Also, if extra WAL-logging is really worrisome, may be we should again seriously reconsider my idea of *not* clearing the bit at HOT update. My apologies for repeating myself. But that seems very important in a steady system when almost every update is a HOT update. So you don't clear the bit at HOT update and so don't need to set it back either, thus saving two WAL activity. You definitely don't need any vacuum in this case if pruning keeps reclaiming dead space at appropriate time and make it available for the next update. More so, IOS still works great. Why is this so bad ? It's bad because then sequential scans will return wrong answers, unless we also rip out the optimization that uses PD_ALL_VISIBLE as an excuse to skip all visibility checks for the page. That optimization is worth a significant amount of performance. It's also bad because then vacuum won't visit the page, and it really should. It's much better to have vacuum prune the page in the background than to have some query do it in the foreground, although the latter is still better than not doing it at all. We could potentially have two or three bits per page to suit these different needs: (1) page can benefit from a vacuum, (2) page is safe for IOS purposes, and (3) page is safe for seqscan purposes. But I think that might be overengineering. IMHO, the goal here should be to have some method of setting the visibility map, in some set of circumstances, outside of vacuum. Figuring out which set of circumstances is appropriate is the hard part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade
Andrew Dunstan and...@dunslane.net writes: what is even worse is that this procedure doesn't take any care at all of the ordering rule for even numbered enum oids. Good point. You really should use ALTER TYPE ADD VALUE, on versions where that's available. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: postgres= INSERT INTO t1 VALUES (4,'ddd'); INSERT 0 1 postgres= INSERT INTO t1 VALUES (5,'eee'); ERROR: new row for relation t1 violates row-secirity DETAIL: Failing row contains (5, eee). I've argued against this before - and maybe I should drop my objection, because a number of people seem to be on the other side. But I still think there will be some people who don't want this behavior. Right now, for example, you can give someone INSERT but not SELECT permission on a table, and they will then be able to put rows into the table that they cannot read back. Similarly, in the RLS case, it is not necessarily undesirable for a user to be able to insert a row that they can't read back; or for them to be able to update a row from a value that they can see to one that they cannot. Some people will want to prohibit that, while others will not. Previously, I suggested that we handle this by enforcing row-level security only on data read from the table - the OLD row, so to speak - and not on data written to the table - the NEW row, so to speak - because the latter case can be handled well enough by triggers. (The OLD case cannot, because not seeing the row is different from erroring out when you do see it.) There are other alternatives, like allowing the user to specify which behavior they want. But I think that simply decreeing that the policy will apply not only to rows read but also rows written in all cases will be less flexible than we will ultimately want to be. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 10:40 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: Also, if extra WAL-logging is really worrisome, may be we should again seriously reconsider my idea of *not* clearing the bit at HOT update. My apologies for repeating myself. But that seems very important in a steady system when almost every update is a HOT update. So you don't clear the bit at HOT update and so don't need to set it back either, thus saving two WAL activity. You definitely don't need any vacuum in this case if pruning keeps reclaiming dead space at appropriate time and make it available for the next update. More so, IOS still works great. Why is this so bad ? It's bad because then sequential scans will return wrong answers, unless we also rip out the optimization that uses PD_ALL_VISIBLE as an excuse to skip all visibility checks for the page. That optimization is worth a significant amount of performance. This can be handled by breaking 1-to-1 mapping on VM bit and PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It was proposed by Andres up thread, but shot down by Tom and Simon. But I still feel that was over reaction and there is a lot of merit in the idea. As I said elsewhere, it will also help the case when there are DEAD line pointers in a page. Today we can't mark such pages all-visible, but if we break this mapping, we can do that. I would like to run some pgbench tests where we get the system in a steady state such as all/most updates are HOT updates (not entirely unlikely scenario for many real life cases). And then try running some concurrent queries which can be executed via IOS. My gut feel is that, today we will see slow and continuous drop in performance for these queries because IOS will slowly stop working. It's also bad because then vacuum won't visit the page, and it really should. It's much better to have vacuum prune the page in the background than to have some query do it in the foreground, although the latter is still better than not doing it at all. Hmm. This is a good point and I don't have an easy answer. I'm not sure how this will pan out in real life cases though. We definitely made great progress by having HOT, though the same concerns were raised even then that we are moving work from background to foreground. But I think generally HOT made great difference to the system as a whole, may be at a cost of slowdown for some read-only, select queries. And HOT prune is not the only operation that we do in foreground. We also set hint bits and make buffers dirty in an otherwise read-only queries. IMHO, the goal here should be to have some method of setting the visibility map, in some set of circumstances, outside of vacuum. Figuring out which set of circumstances is appropriate is the hard part. Yeah, if we can figure that out conclusively, I'm sure we might be able to auto-tune the system even further. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
Heikki, The problem goes away after some time, after the 1st standby has streamed the contents of 00020003 and written it to disk, and the cascaded standby reconnects. But it would be nice to avoid that situation. I'm not sure how to do that yet, we might need to track the timeline we're currently receiving/sending more carefully. Or perhaps we need to copy the previous WAL segment to the new name when switching recovery target timeline, like we do when a server is promoted. I'll try to come up with something... Would it be accurate to say that this issue only happens when all of the replicated servers have no traffic? --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On 19 December 2012 16:21, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe it's time to look at what it'd take for the low-level scan operations to know whether they're scanning the target relation of an UPDATE query, so that we could skip pruning altogether except when a HOT update could conceivably ensue. I think this was discussed back when HOT went in, but nobody wanted to make the patch more invasive than it had to be. I think it's wrong to assume that HOT pruning has no value except in this case. Truncating dead tuples to line pointers and collapsing HOT chains speeds up future page scans, and if we were able to set the all-visible bit, that would help even more. The problem is that this is all somewhat prospective: HOT pruning the page doesn't help the *current* scan - in fact, it can sometimes slow it down considerably - but it can be a great help to the next scan that comes through. Well, no. The problem with the way we do it now is that doing it every time a query scan arrives at a page is too often, resulting in a lot of wasted work. That wasted work is somewhat tolerable as long as it only involves looking at the current page and ending up not actually changing it. If we start generating a lot of useless WAL activity and I/O as a result of thrashing the all-visible bit, it won't be so tolerable anymore. But the problem is not so much the desire to set the bit as that we're doing this whole activity at the wrong place and time. Perhaps doing it every time an UPDATE arrives at the page is too far in the other direction, and we need to look for some other mechanism entirely. The benefit of saying that only UPDATEs clean the block is that this penalises only the workload making the mess, rather than everybody cleaning up repeatedly over one messy guy. Having a random SELECT clean the block causes both delay in non-UPDATEing process, contention and additional writes. We definitely know we write too often; this has been measured and discussed over a period of years. It would be useful to have a table-level option of hot_cleanup= SELECT | UPDATE | NONE to allow people to minimise cleanup and test the difference this makes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
Heikki, The next time I get the issue, and I'm not paying for 5 cloud servers by the hour, I'll give you a login. --Josh - Original Message - On 19.12.2012 17:27, Heikki Linnakangas wrote: On 19.12.2012 15:55, Heikki Linnakangas wrote: On 19.12.2012 04:57, Josh Berkus wrote: Heikki, I ran into an unexpected issue while testing. I just wanted to fire up a chain of 5 replicas to see if I could connect them in a loop. However, I ran into a weird issue when starting up r3: it refused to come out of the database is starting up mode until I did a write on the master. Then it came up fine. master--r1--r2--r3--r4 I tried doing the full replication sequence (basebackup, startup, test) with it twice and got the exact same results each time. This is very strange because I did not encounter the same issues with r2 or r4. Nor have I seen this before in my tests. Ok.. I'm going to need some more details on how to reproduce this, I'm not seeing that when I set up four standbys. Ok, I managed to reproduce this now. Hmph, no I didn't, I replied to wrong email. The problem I managed to reproduce was the one where you get requested WAL segment 00020003 has already been removed errors, reported by Thom. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 19 December 2012 17:25, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: postgres= INSERT INTO t1 VALUES (4,'ddd'); INSERT 0 1 postgres= INSERT INTO t1 VALUES (5,'eee'); ERROR: new row for relation t1 violates row-secirity DETAIL: Failing row contains (5, eee). I've argued against this before - and maybe I should drop my objection, because a number of people seem to be on the other side. But I still think there will be some people who don't want this behavior. Right now, for example, you can give someone INSERT but not SELECT permission on a table, and they will then be able to put rows into the table that they cannot read back. Similarly, in the RLS case, it is not necessarily undesirable for a user to be able to insert a row that they can't read back; or for them to be able to update a row from a value that they can see to one that they cannot. Some people will want to prohibit that, while others will not. I can see a use case for not having security apply for users who have *only* INSERT privilege. This would allow people to run bulk loads of data into a table with row security. We should add that. That is not the common case, so with proper documentation that should be a useful feature without relaxing default security. Never applying security for INSERT and then forcing them to add BEFORE triggers if they want full security is neither secure nor performant. Previously, I suggested that we handle this by enforcing row-level security only on data read from the table - the OLD row, so to speak - and not on data written to the table - the NEW row, so to speak - because the latter case can be handled well enough by triggers. (The OLD case cannot, because not seeing the row is different from erroring out when you do see it.) There are other alternatives, like allowing the user to specify which behavior they want. But I think that simply decreeing that the policy will apply not only to rows read but also rows written in all cases will be less flexible than we will ultimately want to be. As discussed, we should add a security feature that is secure by default. Adding options to make it less secure can follow initial commit. We might even make it in this release if the review of the main feature goes well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
Groshev Andrey wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ There is a limit on identifiers of 63 *bytes* (not characters) after which the name is truncated. In UTF8 encoding, the underscore would be in the 64th position. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Robert Haas robertmh...@gmail.com writes: On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: postgres= INSERT INTO t1 VALUES (4,'ddd'); INSERT 0 1 postgres= INSERT INTO t1 VALUES (5,'eee'); ERROR: new row for relation t1 violates row-secirity DETAIL: Failing row contains (5, eee). I've argued against this before - and maybe I should drop my objection, because a number of people seem to be on the other side. But I still think there will be some people who don't want this behavior. Right now, for example, you can give someone INSERT but not SELECT permission on a table, and they will then be able to put rows into the table that they cannot read back. There is also precedent for your opinion in the spec-mandated behavior of updatable views: it is perfectly possible to INSERT a row that you can't read back via the view, or UPDATE it to a state you can't see via the view. The RLS patch's current behavior corresponds to a view created WITH CHECK OPTION --- which we don't support yet. Whether we add that feature soon or not, what seems important for the current debate is that the SQL spec authors chose not to make it the default behavior. This seems to weigh heavily against making it the default, much less only, behavior for RLS cases. I'd also suggest that throw an error is not the only response that people are likely to want for attempts to insert/update non-compliant rows, so hard-wiring that choice is insufficiently flexible even if you grant that local policy is to not allow such updates. (As an example, they might prefer to log the attempt and substitute some other value.) Previously, I suggested that we handle this by enforcing row-level security only on data read from the table - the OLD row, so to speak - and not on data written to the table - the NEW row, so to speak - because the latter case can be handled well enough by triggers. +1. I'm less than excited about RLS in the first place, so the less complexity we have to put into the core system for it the better IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
Kevin Grittner kgri...@mail.com writes: Groshev Andrey wrote:  Mismatch of relation names: database database, old rel public.lob.ÐеÑÑиÑÐнеÑнегоÐокÑменÑа$ÐокÑменÑ_pkey, new rel public.plob.ÐеÑÑиÑÐнеÑнегоÐокÑменÑа$ÐокÑÐ¼ÐµÐ½Ñ There is a limit on identifiers of 63 *bytes* (not characters) after which the name is truncated. In UTF8 encoding, the underscore would be in the 64th position. Hmm ... that is a really good point, except that you are not counting the lob. or plob. part, which we previously saw is part of the relation name not the schema name. Counting that part, it's already overlimit, which seems to be proof that Andrey isn't using UTF8 but some single-byte encoding. Anyway, that would only explain the issue if pg_upgrade were somehow changing the database encoding, which surely we'd have heard complaints about already? Or maybe this has something to do with pg_upgrade's client-side encoding rather than the server encoding... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
Pavan Deolasee pavan.deola...@gmail.com writes: On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we start generating a lot of useless WAL activity and I/O as a result of thrashing the all-visible bit, it won't be so tolerable anymore. What if we wrap that into the WAL generated by HOT prune itself ? What WAL? The case we're worried about here is that there's nothing else for HOT prune to do. I think my core point still stands: the way that HOT pruning is done now is an artifact of having wanted to shoehorn it into the system with minimum changes. Which was reasonable at the time given the experimental status of the feature, but now it's time to reconsider. ISTM that you already have concret ideas about what are those places where HOT prune would be more effective. No, I don't; I'm just suggesting that we ought to think outside the box of the way it's being done now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 19 December 2012 18:40, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: postgres= INSERT INTO t1 VALUES (4,'ddd'); INSERT 0 1 postgres= INSERT INTO t1 VALUES (5,'eee'); ERROR: new row for relation t1 violates row-secirity DETAIL: Failing row contains (5, eee). I've argued against this before - and maybe I should drop my objection, because a number of people seem to be on the other side. But I still think there will be some people who don't want this behavior. Right now, for example, you can give someone INSERT but not SELECT permission on a table, and they will then be able to put rows into the table that they cannot read back. There is also precedent for your opinion in the spec-mandated behavior of updatable views: it is perfectly possible to INSERT a row that you can't read back via the view, or UPDATE it to a state you can't see via the view. The RLS patch's current behavior corresponds to a view created WITH CHECK OPTION --- which we don't support yet. Whether we add that feature soon or not, what seems important for the current debate is that the SQL spec authors chose not to make it the default behavior. This seems to weigh heavily against making it the default, much less only, behavior for RLS cases. This is security, not spec compliance. By default, we need full security. Nobody has argued that it should be the only behaviour, only that it is the most typically requested behaviour and the most secure, therefore the one we should do first. I'd also suggest that throw an error is not the only response that people are likely to want for attempts to insert/update non-compliant rows, so hard-wiring that choice is insufficiently flexible even if you grant that local policy is to not allow such updates. (As an example, they might prefer to log the attempt and substitute some other value.) Previously, I suggested that we handle this by enforcing row-level security only on data read from the table - the OLD row, so to speak - and not on data written to the table - the NEW row, so to speak - because the latter case can be handled well enough by triggers. +1. I'm less than excited about RLS in the first place, so the less complexity we have to put into the core system for it the better IMO. Agree with the need for less complexity, but that decision increases complexity for the typical user and does very little to the complexity of the patch. Treating a security rule as a check constraint is natural and obvious, so there are no core system problems here. If we don't enforce rules on INSERT the user has to specifically add a trigger, which makes things noticeably slower. There is more maintenance work for the average user, less performance and more mistakes to make. The way to do this is by adding an option to allow users to specify INSERT should be exempt from the security rule, which Kaigai and I agreed on list some weeks back should come after the initial patch, to no other comment. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW: ForeignPlan and parameterized paths
Ronan Dunklau rdunk...@gmail.com writes: I've noticed that, when implementing a FDW, it is difficult to use a plan which best path is a parameterized path. This comes from the fact that the parameterized clause is not easily available at plan time. This is what I understood from how it works: - The clauses coming from the best path restrictinfo are not available in the scan_clauses argument to the GetForeignPlan function. - They are, however, directly available on the path, but at this point the clauses are of the form InnerVar OPERATOR OuterVar. The outer Var node is then replaced by a Param node, using the replace_nestloop_params function. It could be useful to make the parameterized version of the clause (in the form InnerVar OPERATOR Param) available to the fdw at plan time. Could this be possible ? I intentionally did the nestloop_params substitution after calling GetForeignPlan not before. It's not apparent to me why it would be useful to do it before, because the FDW is going to have no idea what those params represent. (Note that they represent values coming from some other, probably local, relation; not from the foreign table.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Simon Riggs wrote: This is security, not spec compliance. By default, we need full security. But you are arguing that users should not be able to make something secure if they, and everyone with the same permissions, could not later access it. I thought about situations where I've seen a need for something like this, and probably the best fit that I've seen is the ability of a judge to order that something is sealed. There are various levels where that can happen, but I'll focus on just one which Wisconsin Courts have implemented at the application level, but which would be nice to be able to support at the database level. Let's say we're talking about Milwaukee County, where hundreds of people work for the courts and related organizations with some rights to view the court data. Let's say a battered wife has moved to a new address she wants to keep secret for safety. She files a case with the court for a temporary restraining order, prohibiting the husband from coming near her. The court needs her address for the official record, but the judge will order the address sealed so that only people with a certain authority can see it. The authority is very limited, for obvious reasons. It is quite likely that the person initially entering the address and flagging it as sealed will not have authority to see the address if they go back and look up the case. Neither will the dozens of other people making the same kind of entries in the county. Obviously, if the person doing the initial entry is a friend of the husband, the data is compromised; but not allowing entry of the data in a state sealed by people without authority to look it up exposes the data to every other person with entry authority, with fairly obvious risks. The more secure behavior is to allow entry of data which will not be visible by the person doing the entry. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 19 December 2012 19:46, Kevin Grittner kgri...@mail.com wrote: But you are arguing that users should not be able to make something secure if they, and everyone with the same permissions, could not later access it. Not exactly, no. I've argued that row security should apply to ALL commands by default. Which is exactly the same default as Oracle, as well as being the obvious common sense understanding of row security, which does not cause a POLA violation. I have no objection to an option to allow row security to not apply to inserts, if people want that. I do object to the idea that row security for inserts/updates should only happen via triggers, which is an ugly and non-performant route, as well as complicating security. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 2012-12-19 14:46:18 -0500, Kevin Grittner wrote: Simon Riggs wrote: This is security, not spec compliance. By default, we need full security. But you are arguing that users should not be able to make something secure if they, and everyone with the same permissions, could not later access it. I thought about situations where I've seen a need for something like this, and probably the best fit that I've seen is the ability of a judge to order that something is sealed. There are various levels where that can happen, but I'll focus on just one which Wisconsin Courts have implemented at the application level, but which would be nice to be able to support at the database level. Let's say we're talking about Milwaukee County, where hundreds of people work for the courts and related organizations with some rights to view the court data. Let's say a battered wife has moved to a new address she wants to keep secret for safety. She files a case with the court for a temporary restraining order, prohibiting the husband from coming near her. The court needs her address for the official record, but the judge will order the address sealed so that only people with a certain authority can see it. The authority is very limited, for obvious reasons. It is quite likely that the person initially entering the address and flagging it as sealed will not have authority to see the address if they go back and look up the case. Neither will the dozens of other people making the same kind of entries in the county. Obviously, if the person doing the initial entry is a friend of the husband, the data is compromised; but not allowing entry of the data in a state sealed by people without authority to look it up exposes the data to every other person with entry authority, with fairly obvious risks. The more secure behavior is to allow entry of data which will not be visible by the person doing the entry. I don't think it is that simple. Allowing inserts without regard for row level restrictions makes it far easier to probe for data. E.g. by inserting rows and checking for unique violations. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Simon Riggs wrote: I've argued that row security should apply to ALL commands by default. Which is exactly the same default as Oracle, as well as being the obvious common sense understanding of row security, which does not cause a POLA violation. I have no objection to an option to allow row security to not apply to inserts, if people want that. I do object to the idea that row security for inserts/updates should only happen via triggers, which is an ugly and non-performant route, as well as complicating security. In the software where I've seen this managed at an application level, an address (for example) can be sealed by an update to the sealed column or inserted with the sealed column set to true. I'm not sure why you would want to allow one and not the other. Now, I can envision a situation where it could make sense to use the same predicate for limiting what a role could read and what a role could write, but I'm not buying that that is more secure. In fact, I see cases where you cannot achieve decent security without the ability for both INSERT and UPDATE to write rows which security excludes on reads. Functionally, I don't see anything which can't be accomplished with just the read security and triggers. I do agree that it would be nice if there were an easy way to specify that the same predicate applies to both reads and writes. I hope we can leave the syntax for this feature open to such specification, even if the initial implementation only supports limiting reads. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Andres Freund wrote: I don't think it is that simple. Allowing inserts without regard for row level restrictions makes it far easier to probe for data. E.g. by inserting rows and checking for unique violations. Unless you want to go to a military-style security level system where people at each security level have a separate version of the same data, primary keys (and I think other unique constraints) can leak. It seems clear enough that sensitive data should not be used for such constraints. That doesn't even require completely meaningless numeric keys. Court cases in Wisconsin have been numbered within county by year, case type, a sequential portion, and an optional apha suffix since before things were computerized -- you may know that there is a 2010 case in Dane County for mental commitment number 45, but that doesn't leak any sensitive data. In the sealed address example, if that were moved to the database layer, someone might be able to test whether addess number 5 existed on a case by seeing whether an insert succeeded; but if it did, there would be a record of that insert with their user ID that they could not retract in any way -- they would know very little, and be exposed as having done something inappropriate. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 19 December 2012 20:23, Kevin Grittner kgri...@mail.com wrote: I hope we can leave the syntax for this feature open to such specification, even if the initial implementation only supports limiting reads. Well, I hope the opposite: that we can support simple full security by default, while leaving syntax open. The basic model for this is complete separation of data between customers/people. They can't see my data, I can't see theirs. Simple privacy. Obvious. Sure, more complex applications exist, but forcing the simple/common usage to adopt triggers because of that is not a sensible way forwards. Simple basic functionality, with an option for more advanced cases is what we need. Setting a status flag so that the current user no longer sees the row is a good example of more complex workflows in secure applications, I agree, but its not the common case by any means. When we have these discussions about priority, it seems people think this means don't do it ever. It doesn't, it means do the most important things first and then do other stuff later. I always wish to do both, but circumstances teach me that hard cutoffs and deadlines mean we can't always have everything if debates overrun and decisions aren't forthcoming. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
The more secure behavior is to allow entry of data which will not be visible by the person doing the entry. I don't think it is that simple. Allowing inserts without regard for row level restrictions makes it far easier to probe for data. E.g. by inserting rows and checking for unique violations. So the PK column(s) are not as secure as, say, the address-related column. Vice-versa I may know that someone lives at a given address (because my attempt to place someone else there failed) but I would have no way of knowing who that other person is. My recourse would be to escalate the data-entry request to someone with higher security permissions who could read and write to the appropriate tables and resolve the conflict. In both cases the direct write-only situation necessitates that some level of exposure occurs. The work-around if that is unacceptable would be to accept all data but any entries that cannot be directly inserted into the table would remain in a staging area that someone with higher security would have to monitor and clear as needed. The same intervention is required but in the first situation you can at least avoid coding the special logic and instead trade security for ease-of-use. As a default level of security we could throw a generic secure DLL rejected for ROW(...) and not tell the user anything about the cause. If that person knows all unique indexes and constraints defined on the table they could use trial-and-error to discover information about stored records but even then if they get an error on two different columns they still have no way of knowing if those errors belong to the same record. Beyond that level you provide the user with some information as to the cause so that they have a reasonable chance to catch typos and other mistakes instead of escalating an benign issue. Lastly is the custom solution whereby the developers accept ALL data entered as being correct but saved to a staging table. A review process by someone with higher security clearances would then process and clear out that table as necessary. If the user is write-only then regardless of whether the entry succeeded or failed they are considered to be done with their task at that point and no meaningful results from the system can be supplied to them. None of these options disallows the presence of non-security related check constraints to be checked, enforced, and communicated to the user. I've probably lost sight of the bigger picture as my response to mostly informed by these last couple of messages. David J. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 19 December 2012 20:37, Kevin Grittner kgri...@mail.com wrote: Andres Freund wrote: I don't think it is that simple. Allowing inserts without regard for row level restrictions makes it far easier to probe for data. E.g. by inserting rows and checking for unique violations. Unless you want to go to a military-style security level system where people at each security level have a separate version of the same data, primary keys (and I think other unique constraints) can leak. It seems clear enough that sensitive data should not be used for such constraints. But there is the more obvious case where you shouldn't be able to insert medical history for a patient you have no responsibility for. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On 19.12.2012 06:30, Jeevan Chalke wrote: On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Hi, attached is a new version of the patch that (a) converts the 'log_step_seconds' variable to a constant (and does not allow changing it using a command-line option etc.) (b) keeps the current logging as a default (c) adds a -q switch that enables the new logging with a 5-second interval I'm still not convinced there should be yet another know for tuning the log interval - opinions? It seems that you have generated a patch over your earlier version and due to that it is not cleanly applying on fresh sources. Please generate patch on fresh sources. Seems you're right - I've attached the proper patch against current master. However, I absolutely no issues with the design. Also code review is already done and looks good to me. I think to move forward on this we need someone from core-team. So I am planning to change its status to ready-for-committor. Before that please provide updated patch for final code review. thanks Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e376452..f3953a7 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -39,6 +39,7 @@ #include portability/instr_time.h #include ctype.h +#include math.h #ifndef WIN32 #include sys/time.h @@ -102,6 +103,7 @@ extern int optind; #define MAXCLIENTS 1024 #endif +#define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define DEFAULT_NXACTS 10 /* default nxacts */ intnxacts = 0; /* number of transactions per client */ @@ -150,6 +152,7 @@ char *index_tablespace = NULL; #define naccounts 10 bool use_log;/* log transaction latencies to a file */ +bool use_quiet; /* quiet logging onto stderr */ bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ intmain_pid; /* main process id used in log filename */ @@ -389,6 +392,7 @@ usage(void) -v vacuum all four standard tables before tests\n \nCommon options:\n -d print debugging output\n +-q quiet logging (a message each 5 seconds)\n -h HOSTNAMEdatabase server host or socket directory\n -p PORTdatabase server port number\n -U USERNAMEconnect as specified database user\n @@ -1362,6 +1366,11 @@ init(bool is_no_vacuum) charsql[256]; int i; + /* used to track elapsed time and estimate of the remaining time */ + instr_time start, diff; + double elapsed_sec, remaining_sec; + int log_interval = 1; + if ((con = doConnect()) == NULL) exit(1); @@ -1430,6 +1439,8 @@ init(bool is_no_vacuum) } PQclear(res); + INSTR_TIME_SET_CURRENT(start); + for (i = 0; i naccounts * scale; i++) { int j = i + 1; @@ -1441,10 +1452,33 @@ init(bool is_no_vacuum) exit(1); } - if (j % 10 == 0) + /* If we want to stick with the original logging, print a message each +* 100k inserted rows. */ + if ((! use_quiet) (j % 10 == 0)) fprintf(stderr, %d of %d tuples (%d%%) done.\n, - j, naccounts * scale, - (int) (((int64) j * 100) / (naccounts * scale))); + j, naccounts * scale, + (int) (((int64) j * 100) / (naccounts * scale))); + /* let's not call the timing for each row, but only each 100 rows */ + else if (use_quiet (j % 100 == 0)) + { + INSTR_TIME_SET_CURRENT(diff); + INSTR_TIME_SUBTRACT(diff, start); + + elapsed_sec = INSTR_TIME_GET_DOUBLE(diff); + remaining_sec = (scale * naccounts - j) * elapsed_sec / j; + + /* have we reached the next interval (or end)? */ + if ((j == scale * naccounts) || (elapsed_sec = log_interval * LOG_STEP_SECONDS)) { + + fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n, + j, naccounts * scale, +
Re: [HACKERS] system administration functions with hardcoded superuser checks
On 19.12.2012 07:34, Magnus Hagander wrote: On Wed, Dec 19, 2012 at 1:58 AM, Tomas Vondra t...@fuzzy.cz wrote: On 18.12.2012 18:38, Pavel Stehule wrote: 2012/12/18 Peter Eisentraut pete...@gmx.net: There are some system administration functions that have hardcoded superuser checks, specifically: pg_reload_conf pg_rotate_logfile Some of these are useful in monitoring or maintenance tools, and the hardcoded superuser checks require that these tools run with maximum privileges. Couldn't we just install these functions without default privileges and allow users to grant privileges as necessary? isn't it too strong gun for some people ??? I believe so some one can decrease necessary rights and it opens doors to system. No one was speaking about making them executable by a wider group of users by default (i.e. decreasing necessary rights). Today, when you need to provide the EXECUTE privilege on those functions, you have three options Given how limited these functions are in scope, I don't see a problem here. pg_read_file pg_read_file_all pg_read_binary_file pg_read_binary_file_all pg_stat_file pg_ls_dir is relative dangerous and I am not for opening these functions. power user can simply to write extension, but he knows what he does/ I see only dangers that are already present. Granting executability on pg_read_xyz is pretty darn close to granting superuser, without explicitly asking for it. Well, you get read only superuser. If we want to make that step as easy as just GRANT, we really need to write some *very* strong warnings in the documentation so that people realize this. I doubt most people will realize it unless we do that (and those who don't read the docs, whch is probably a majority, never will). Yup, that's what I meant by possibility to perform additional parameter values checks ;-) Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
Simon Riggs wrote: Kevin Grittner kgri...@mail.com wrote: I hope we can leave the syntax for this feature open to such specification, even if the initial implementation only supports limiting reads. Well, I hope the opposite: that we can support simple full security by default, while leaving syntax open. The basic model for this is complete separation of data between customers/people. They can't see my data, I can't see theirs. Simple privacy. Obvious. And something we already can handle several different ways. Inheritance, schemas, etc. Allowing data to be fully secured from prying eyes on entry, regardless of whether the role allowing the entry has rights to see the data does not yet have any built-in support. Sure, more complex applications exist, but forcing the simple/common usage to adopt triggers because of that is not a sensible way forwards. Simple basic functionality, with an option for more advanced cases is what we need. Setting a status flag so that the current user no longer sees the row is a good example of more complex workflows in secure applications, I agree, but its not the common case by any means. When we have these discussions about priority, it seems people think this means don't do it ever. It doesn't, it means do the most important things first and then do other stuff later. I always wish to do both, but circumstances teach me that hard cutoffs and deadlines mean we can't always have everything if debates overrun and decisions aren't forthcoming. Well, it seems we have different views of what is intuitively obvious here. What you suggest does not look to me to be more secure (making full security a misnomer), simpler, nor more important. Perhaps we can avoid divisive discussions on which is more important if we can manage both in the initial implementation. I guess the usual rule if we can't manage it is that a tie goes to the author, which is neither you nor I. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On 2012-12-19 18:25, Robert Haas wrote: On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: postgres= INSERT INTO t1 VALUES (4,'ddd'); INSERT 0 1 postgres= INSERT INTO t1 VALUES (5,'eee'); ERROR: new row for relation t1 violates row-secirity DETAIL: Failing row contains (5, eee). I've argued against this before - and maybe I should drop my objection, because a number of people seem to be on the other side. But I still think there will be some people who don't want this behavior. Right now, for example, you can give someone INSERT but not SELECT permission on a table, and they will then be able to put rows into the table that they cannot read back. Similarly, in the RLS case, it is not necessarily undesirable for a user to be able to insert a row that they can't read back; or for them to be able to update a row from a value that they can see to one that they cannot. Some people will want to prohibit that, while others will not. Maybe it is an idea to provide different RLS expressions for read and write. I remember reading a scenario (it might be well known in security land) where it is possible to write to authorization levels = users level, and read levels = the users level. In this setup Kevin's address example is possible, a user could write to e.g. the highest level, but then not read it anymore if his own level was lower than the highest. This setup also shows that to implement it, one would need a different expression for read and write (or the rls expression should know the query's commandtype). regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] strange OOM errors with EXECUTE in PL/pgSQL
Hi, one of our local users reported he's getting OOM errors on 9.2, although on 9.1 the code worked fine. Attached is a simple test-case that should give you an OOM error almost immediately. What it does: 1) creates a simple table called test with one text column. 2) creates a plpgsql function with one parameter, and all that function does is passing the parameter to EXECUTE 3) calls the function with a string containing many INSERTs into the test table The way the EXECUTE is used is a bit awkward, but the failures seem a bit strange to me. The whole script is ~500kB and most of that is about 11k of very simple INSERT statements: insert into test(value) values (''aa''); all of them are exactly the same. Yet when it fails with OOM, the log contains memory context stats like these: TopMemoryContext: 5303376 total in 649 blocks; 2648 free ... PL/pgSQL function context: 8192 total in 1 blocks; 3160 free ... TopTransactionContext: 8192 total in 1 blocks; 6304 free ... ExecutorState: 8192 total in 1 blocks; 7616 free ... ExprContext: 8192 total in 1 blocks; 8160 free ... SPI Exec: 33554432 total in 14 blocks; 6005416 free ... CachedPlanSource: 3072 total in 2 blocks; 1856 free ... CachedPlanSource: 538688 total in 3 blocks; 1744 free ... CachedPlanQuery: 3072 total in 2 blocks; 1648 free ... CachedPlanSource: 538688 total in 3 blocks; 1744 free ... CachedPlanQuery: 3072 total in 2 blocks; 1648 free ... CachedPlanSource: 538688 total in 3 blocks; 1744 free ... CachedPlanQuery: 3072 total in 2 blocks; 1648 free ... CachedPlanSource: 538688 total in 3 blocks; 1744 free ... CachedPlanQuery: 3072 total in 2 blocks; 1648 free ... CachedPlanSource: 538688 total in 3 blocks; 1744 free ... ... There is ~9500 of these CachedPlanSource + CachedPlanQuery row pairs (see the attached log). That seems a bit strange to me, because all the queries are exactly the same in this test case. The number of queries needed to get OOM is inversely proportional to the query length - by using a longer text (instead of 'aaa') you may use much less queries. I am no expert in this area, but it seems to me that the code does not expect that many INSERTs in EXECUTE and does not release the memory for some reason (e.g. because the plans are allocated in SPI Exec memory context, etc.). regards Tomas pg-oom.log.gz Description: application/gzip test2.sql.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Simon, My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. What would such a test look like? It's not obvious to me that there's any rapid way for a user to detect this situation, without checking each server individually. If there's a quick and easy way to test for cycles from the user side, we should put it in documentation somewhere. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
It stalled because the patch author decided not to implement the request to detect recovery.conf in data directory, which allows backwards compatibility. Well, I don't think we had agreement on how important backwards compatibility for recovery.conf was, particularly not on the whole recovery.conf/recovery.done functionality and the wierd formatting of recovery.conf. However, with include_if_exists directives in postgresql.conf, or include_dir, that would be easy to work around. Don't we have something like that planned for SET PERSISTENT? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
This sounds like my previous suggestion of returning the primary conninfo value, but with just ip. That one came with a pretty bad patch, and was later postponed until we folded recovery.conf into the main configuration file parsing. I'm not really sure what happened to that project? (the configuration file one) Hmmm, good point. Just having primary_conninfo it in pg_settings would help a lot. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
On 19 December 2012 22:19, Joshua Berkus j...@agliodbs.com wrote: It stalled because the patch author decided not to implement the request to detect recovery.conf in data directory, which allows backwards compatibility. Well, I don't think we had agreement on how important backwards compatibility for recovery.conf was, particularly not on the whole recovery.conf/recovery.done functionality and the wierd formatting of recovery.conf. As ever, we spent much energy on debating backwards compatibility rather than just solving the problem it posed, which is fairly easy to solve. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote: 4. We need some general performance testing to show whether this is insane or not. I ran a few tests. Test 1 - find worst-case overhead for the checksum calculation on write: fsync = off bgwriter_lru_maxpages = 0 shared_buffers = 1024MB checkpoint_segments = 64 autovacuum = off The idea is to prevent interference from the bgwriter or autovacuum. Also, I turn of fsync so that it's measuring the calculation overhead, not the effort of actually writing to disk. drop table foo; create table foo(i int, j int) with (fillfactor=50); create index foo_idx on foo(i); insert into foo select g%25, -1 from generate_series(1,1000) g; checkpoint; -- during the following sleep, issue an OS sync -- to make test results more consistent select pg_sleep(30); \timing on update foo set j=-1 where i = 0; select pg_sleep(2); checkpoint; update foo set j=-1 where i = 0; select pg_sleep(2); checkpoint; update foo set j=-1 where i = 0; select pg_sleep(2); checkpoint; \timing off I am measuring the time of the CHECKPOINT command, not the update. The update is just to dirty all of the pages (they should all be HOT updates). Without checksums, it takes about 400ms. With checksums, it takes about 500ms. That overhead is quite low, considering that the bottleneck is almost always somewhere else (like actually writing to disk). Test 2 - worst-case overhead for calculating checksum while reading data Same configuration as above. This time, just load a big table: drop table foo; create table foo(i int, j int) with (fillfactor=50); insert into foo select g%25, -1 from generate_series(1,1000) g; -- make sure hint bits and PD_ALL_VISIBLE are set everywhere select count(*) from foo; vacuum; vacuum; vacuum; select relfilenode from pg_class where relname='foo'; Then shut down the server and restart it. Then do a cat data/base/12055/* /dev/null to get the table loaded into the OS buffer cache. Then do: \timing on SELECT COUNT(*) FROM foo; So, shared buffers are cold, but OS cache is warm. This should test the overhead of going from the OS to shared buffers, which requires the checksum calculation. Without checksums is around 820ms; with checksums around 970ms. Again, this is quite reasonable, because I would expect the bottleneck to be reading from the disk rather than the calculation itself. Test 3 - worst-case WAL overhead For this test, I also left fsync off, because I didn't want to test the effort to flush WAL (which shouldn't really be required for this test, anyway). This was simpler: drop table foo; create table foo(i int, j int) with (fillfactor=50); insert into foo select g%25, -1 from generate_series(1,1000) g; checkpoint; select pg_sleep(1); checkpoint; select pg_sleep(30); -- do an OS sync while this is running \timing on SELECT COUNT(*) FROM foo; Without checksums, it takes about 1000ms. With checksums, about 2350ms. I also tested with checksums but without the CHECKPOINT commands above, and it was also 1000ms. This test is more plausible than the other two, so it's more likely to be a real problem. So, the biggest cost of checksums is, by far, the extra full-page images in WAL, which matches our expectations. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On 19 December 2012 17:26, Pavan Deolasee pavan.deola...@gmail.com wrote: We definitely made great progress by having HOT Yes, definitely. Great work. That is not for debate. But I think generally HOT made great difference to the system as a whole, may be at a cost of slowdown for some read-only, select queries. And HOT prune is not the only operation that we do in foreground. We also set hint bits and make buffers dirty in an otherwise read-only queries. And those last things are being debated hotly. We definitely need to ask whether the way things are now can be tweaked to be better. The major mechanics need not be reviewed, but the tradeoffs and balances? Definitely. Anything we do in foreground needs evaluation. Assuming eager actions give a good payoff is not always a useful thought. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Thu, Dec 20, 2012 at 12:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Pavan Deolasee pavan.deola...@gmail.com writes: What if we wrap that into the WAL generated by HOT prune itself ? What WAL? The case we're worried about here is that there's nothing else for HOT prune to do. Does such a case exist ? Or at least, is it that common ? I mean, we have enough checks in place to ensure that HOT prune is attempted only when there is something interesting in the page to be done. Otherwise we don't even attempt getting a cleanup lock on the page. Of course, they are just hints, but they serve very well. Prune XID is what I've in mind in particular. So the fact that the visibility map bit is cleared, it could be because either at least one tuple in the page was updated, deleted or inserted. The first two would have set prune XID and will trigger an HOT action and HOT prune will indeed do something useful. I think aborted non-HOT update may create a scenario that you're talking about i.e. HOT prune will have nothing to do, but the page again turned all-visible. Being an abort path, I wonder if its really that common though. That leaves us with the inserts which will clear the VM bit, but may not have anything for HOT prune to do. But we don't set prune XID for inserts either. So we won't get into hot_page_prune() for such pages. So my point is, for fairly large and common cases, often we will set the bit only when HOT prune did something useful, though not every useful HOT prune will necessarily set the bit. And even if we slip through all the safety nets on HOT prune, we can choose to set the bit only if HOT did something useful to avoid any extra WAL logging assuming we are still worried about those corner cases. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Thu, Dec 20, 2012 at 5:05 AM, Simon Riggs si...@2ndquadrant.com wrote: And those last things are being debated hotly. We definitely need to ask whether the way things are now can be tweaked to be better. The major mechanics need not be reviewed, but the tradeoffs and balances? Definitely. I have zero objection to do that, just that I don't have solid ideas right now. And its not because I haven't thought hard enough. Anything we do in foreground needs evaluation. Assuming eager actions give a good payoff is not always a useful thought. I don't disagree. Your field experience is much larger than mine, but I have spent hours testing PostgreSQL's performance, so can talk with some degree of conviction. I think when we do things that can reduce read/write IO or bloat of a large table in general, the system as a whole benefits, may be at a cost of some genuinely good guy doing a simple SELECT in this case. Often the SELECTs are also benefited because one of their good siblings helped us reduce bloat of the table and hence seq scans had to scan order of magnitude less blocks. I just thought that we can fairly easily limit the damage if we are really worried about SELECTs being penalised. What if we set a configurable limit on *extra* things that a query may do which is otherwise not very useful for the query itself, but is useful to keep the system healthy and steady. HOT prune definitely counts as one of them and may be even setting of hint bits. (This is a topic for a separate thread though) Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system administration functions with hardcoded superuser checks
Tomas Vondra t...@fuzzy.cz writes: On 19.12.2012 07:34, Magnus Hagander wrote: Granting executability on pg_read_xyz is pretty darn close to granting superuser, without explicitly asking for it. Well, you get read only superuser. If we want to make that step as easy as just GRANT, we really need to write some *very* strong warnings in the documentation so that people realize this. I doubt most people will realize it unless we do that (and those who don't read the docs, whch is probably a majority, never will). Yup, that's what I meant by possibility to perform additional parameter values checks ;-) Yeah, which is easily done if you've written a wrapper function and not so easily otherwise. Between that and the point about how pg_dump wouldn't preserve GRANTs done directly on system functions, I think this proposal isn't going anywhere anytime soon. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Dec 18, 2012 at 7:03 PM, Josh Berkus j...@agliodbs.com wrote: 2. should we warn the user, or refuse to start up? One nice property of allowing cyclicity is that it's easier to syndicate application of WAL to a series of standbys before promotion of exactly one to act as a primary (basically, to perform catch-up). One could imagine someone wanting a configuration that was like: +r2 | | r1 ---+ This is only one step before: r1r2 or r2r1 (and, most importantly, after the cycle quiesces one can choose either one) For my use, I'm not convinced that such checks and warnings are useful if delivered by default, and I think outright rejection of cyclicity is harmful. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On Wed, Dec 19, 2012 at 12:54 PM, Simon Riggs si...@2ndquadrant.com wrote: I can see a use case for not having security apply for users who have *only* INSERT privilege. This would allow people to run bulk loads of data into a table with row security. We should add that. That is not the common case, so with proper documentation that should be a useful feature without relaxing default security. Never applying security for INSERT and then forcing them to add BEFORE triggers if they want full security is neither secure nor performant. I think INSERT vs. not-INSERT is not the relevant distinction, because the question also arises for UPDATE. In the UPDATE case, the question is whether the RLS qual should be checked only against the OLD tuple (to make sure that we can see the tuple to modify it) or also against the NEW tuple (to make sure that we're not modifying it to a form that we can no longer see). In other words, the question is not do we support all of the commands? but rather do we check not only the tuple read but also the tuple written?. For INSERT, we only write a tuple, without reading. For SELECT and DELETE, we only read a tuple, without writing a new one. UPDATE does both a read and a write. Previously, I suggested that we handle this by enforcing row-level security only on data read from the table - the OLD row, so to speak - and not on data written to the table - the NEW row, so to speak - because the latter case can be handled well enough by triggers. (The OLD case cannot, because not seeing the row is different from erroring out when you do see it.) There are other alternatives, like allowing the user to specify which behavior they want. But I think that simply decreeing that the policy will apply not only to rows read but also rows written in all cases will be less flexible than we will ultimately want to be. As discussed, we should add a security feature that is secure by default. Adding options to make it less secure can follow initial commit. We might even make it in this release if the review of the main feature goes well. Saying that something is or is not secure is not meaningful without defining what you want to be secure against. There's nothing insecure about checking only the tuples read; it's just a different (and useful) threat model. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Row Level Security
On Wed, Dec 19, 2012 at 1:58 PM, Simon Riggs si...@2ndquadrant.com wrote: If we don't enforce rules on INSERT the user has to specifically add a trigger, which makes things noticeably slower. There is more maintenance work for the average user, less performance and more mistakes to make. Well, again, only if that's the behavior they want. Also, it's also worth noting that, even if we assume that it is in fact the behavior that users will want, the contention that it is faster than a trigger is thus far unsubstantiated by any actual benchmarks. It may indeed be faster ... but I don't know without testing whether it's slightly faster or a whole lot faster. That might be a good thing to find out, because if it is a whole lot faster, that would certainly strengthen the case for including a mode that works that way, whether or not we also provide other options. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
On Wed, Dec 19, 2012 at 5:34 PM, Simon Riggs si...@2ndquadrant.com wrote: As ever, we spent much energy on debating backwards compatibility rather than just solving the problem it posed, which is fairly easy to solve. I'm still of the opinion (as were a lot of people on the previous thread, IIRC) that just making them GUCs and throwing backward compatibility under the bus is acceptable in this case. Changes that break application code are anathema to me, because people can have a LOT of application code and updating it can be REALLY hard. The same cannot be said about recovery.conf - you have at most one of those per standby, and if it needs to be changed in some way, you can do it with a very small Perl script. Yes, third-party tools will need to be updated; that is surely a downside, but I think it might be a tolerable one in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: This can be handled by breaking 1-to-1 mapping on VM bit and PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It was proposed by Andres up thread, but shot down by Tom and Simon. But I still feel that was over reaction and there is a lot of merit in the idea. As I said elsewhere, it will also help the case when there are DEAD line pointers in a page. Today we can't mark such pages all-visible, but if we break this mapping, we can do that. Sure, but you're zipping rather blithely past the disadvantages of such an approach. Jeff Davis recently proposed getting rid of PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism about that; this proposal has the same problems. One of the major benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts, updates, and deletes to the page can ignore the visibility map. That means that a server under heavy concurrency is much less likely to encounter contention on the visibility map blocks. Now, maybe that's not really a problem, but I sure haven't seen enough evidence to make me believe it. If it's really true that PD_ALL_VISIBLE needn't fill this role, then Heikki wasted an awful lot of time implementing it, and I wasted an awful lot of time keeping it working when I made the visibility map crash-safe for IOS. That could be true, but I tend to think it isn't. I would like to run some pgbench tests where we get the system in a steady state such as all/most updates are HOT updates (not entirely unlikely scenario for many real life cases). And then try running some concurrent queries which can be executed via IOS. My gut feel is that, today we will see slow and continuous drop in performance for these queries because IOS will slowly stop working. If there are no vacuums, I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com wrote: The benefit of saying that only UPDATEs clean the block is that this penalises only the workload making the mess, rather than everybody cleaning up repeatedly over one messy guy. Right, but there are plenty of situations where having everybody clean up after the messy guy is better than waiting around and hoping that Mom (aka vacuum) will do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] operator dependency of commutator and negator, redux
Bug #7758 seems to be a rediscovery of the behavior that Itagaki-san complained of a couple years ago: http://archives.postgresql.org/pgsql-hackers/2010-09/msg02035.php While reconsidering the various not-too-satisfactory fixes we thought of back then, I had a sudden thought. Instead of having a COMMUTATOR or NEGATOR forward reference create a shell operator and link to it, why not simply *ignore* such references? Then when the second operator is defined, go ahead and fill in both links? The only case where this could result in an unsatisfactory outcome is if the second operator's CREATE command fails to include the converse COMMUTATOR or NEGATOR reference ... but that doesn't work very nicely today anyway, as you end up with a unidirectional reference, hardly a desirable state of affairs. Not only does this solve the problem complained of, but it allows for much stronger error checking, as there is no longer any need to allow inconsistent catalog states even transiently. We could start treating commutator/negator references as true dependencies, permanently preventing dangling references. We could probably even get rid of the notion of shell operators altogether. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange OOM errors with EXECUTE in PL/pgSQL
Tomas Vondra t...@fuzzy.cz writes: What it does: 1) creates a simple table called test with one text column. 2) creates a plpgsql function with one parameter, and all that function does is passing the parameter to EXECUTE 3) calls the function with a string containing many INSERTs into the test table The way the EXECUTE is used is a bit awkward, but the failures seem a bit strange to me. The whole script is ~500kB and most of that is about 11k of very simple INSERT statements: insert into test(value) values (''aa''); The reason this fails is that you've got a half-megabyte source string, and each of the 11000 plans that are due to be created from it saves its own copy of the source string. Hence, 5500 megabytes needed just for source strings. We could possibly fix this by inventing some sort of reference-sharing arrangement (which'd be complicated and fragile) or by not storing the source strings with the plans (which'd deal a serious blow to our ability to provide helpful error messages). Neither answer seems appealing. I think it would be a better idea to adopt a less brain-dead way of processing the data. Can't you convert this to a single INSERT with a lot of VALUES rows? Or split it into multiple EXECUTE chunks? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 19.12.2012 02:18, Andres Freund wrote: On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote: I think except of the temp buffer issue mentioned below its ready. -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes) { -int i; +int i, j; + +/* sort the list of rnodes */ +pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator); /* If it's a local relation, it's localbuf.c's problem. */ -if (RelFileNodeBackendIsTemp(rnode)) +for (i = 0; i nnodes; i++) { -if (rnode.backend == MyBackendId) -DropRelFileNodeAllLocalBuffers(rnode.node); -return; +if (RelFileNodeBackendIsTemp(rnodes[i])) +{ +if (rnodes[i].backend == MyBackendId) +DropRelFileNodeAllLocalBuffers(rnodes[i].node); +} } While you deal with local buffers here you don't anymore in the big loop over shared buffers. That wasn't needed earlier since we just returned after noticing we have local relation, but thats not the case anymore. Hmm, but that would require us to handle the temp relations explicitly wherever we call DropRelFileNodeAllBuffers. Currently there are two such places - smgrdounlink() and smgrdounlinkall(). By placing it into DropRelFileNodeAllBuffers() this code is shared and I think it's a good thing. But that does not mean the code is perfect - it was based on the assumption that if there's a mix of temp and regular relations, the temp relations will be handled in the first part and the rest in the second one. Maybe it'd be better to improve it so that the temp relations are removed from the array after the first part (and skip the second one if there are no remaining relations). for (i = 0; i NBuffers; i++) { +RelFileNodeBackend *rnode = NULL; volatile BufferDesc *bufHdr = BufferDescriptors[i]; - + /* * As in DropRelFileNodeBuffers, an unlocked precheck should be safe * and saves some cycles. */ -if (!RelFileNodeEquals(bufHdr-tag.rnode, rnode.node)) + +/* + * For low number of relations to drop just use a simple walk through, + * to save the bsearch overhead. The BSEARCH_LIMIT is rather a guess + * than a exactly determined value, as it depends on many factors (CPU + * and RAM speeds, amount of shared buffers etc.). + */ +if (nnodes = BSEARCH_LIMIT) I think thats a sensible plan. It makes sense that for a small number of relations a sequential scan of the rnodes array is faster than a bsearch and 10 sounds like a good value although I would guess the optimal value is slightly higher on most machines. But if it works fine without regressions thats pretty good... I think it's pointless to look for the optimal value in this case, given on how many factors it depends. We could use 20 instead of 10, but I wouldn't go higher probably. + +/* + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so + * that it's suitable for bsearch. + */ +static int +rnode_comparator(const void * p1, const void * p2) +{ +RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1; +RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2; + +if (n1.node.relNode n2.node.relNode) +return -1; +else if (n1.node.relNode n2.node.relNode) +return 1; + +if (n1.node.dbNode n2.node.dbNode) +return -1; +else if (n1.node.dbNode n2.node.dbNode) +return 1; + +if (n1.node.spcNode n2.node.spcNode) +return -1; +else if (n1.node.spcNode n2.node.spcNode) +return 1; +else +return 0; +} Still surprised this is supposed to be faster than a memcmp, but as you seem to have measured it earlier.. It surprised me too. These are the numbers with the current patch: 1) one by one = 0246810 12 14 16 18 20 -- current 15 22 28 34 4175 77 82 92 99 106 memcmp 16 23 29 36 44 122 125 128 153 154 158 Until the number of indexes reaches ~10, the numbers are almost exactly the same. Then the bsearch branch kicks in and it's clear how much slower the memcmp comparator is. 2) batches of 100 = 0246810 12 14 16 18 20 -- current 358 10 1215 17 21 23 27 31 memcmp47 10 13 1619 22 28 30 32 36 Here the difference is much smaller, but even here the memcmp is consistently a bit slower. My
[HACKERS] discarding duplicate indexes
I recently came across a scenario like this (tested on git head): CREATE TABLE test (id int); CREATE INDEX test_idx1 ON test (id); CREATE INDEX test_idx2 ON test (id); CREATE TABLE test_copycat (LIKE test INCLUDING ALL); \d test_copycat Why do we end up with only one index on test_copycat? The culprit seems to be transformIndexConstraints(), which explains: * Scan the index list and remove any redundant index specifications. This * can happen if, for instance, the user writes UNIQUE PRIMARY KEY. A * strict reading of SQL92 would suggest raising an error instead, but * that strikes me as too anal-retentive. - tgl 2001-02-14 and this code happily throws out the second index statement in this example, since its properties are identical to the first. (Side note: some index properties, such as tablespace specification and comment, are ignored when determining duplicates). This behavior does seem like a minor POLA violation to me -- if we do not forbid duplicate indexes on the original table, it seems surprising to do so silently with INCLUDING INDEXES. There was consideration of similar behavior when this patch was proposed[1], so perhaps the behavior is as-designed, and I guess no one else has complained. IMO this behavior should at least be documented under the LIKE source_table section of CREATE TABLE's doc page. Josh [1] http://archives.postgresql.org/pgsql-patches/2007-07/msg00173.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange OOM errors with EXECUTE in PL/pgSQL
On 20.12.2012 02:29, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: What it does: 1) creates a simple table called test with one text column. 2) creates a plpgsql function with one parameter, and all that function does is passing the parameter to EXECUTE 3) calls the function with a string containing many INSERTs into the test table The way the EXECUTE is used is a bit awkward, but the failures seem a bit strange to me. The whole script is ~500kB and most of that is about 11k of very simple INSERT statements: insert into test(value) values (''aa''); The reason this fails is that you've got a half-megabyte source string, and each of the 11000 plans that are due to be created from it saves its own copy of the source string. Hence, 5500 megabytes needed just for source strings. We could possibly fix this by inventing some sort of reference-sharing arrangement (which'd be complicated and fragile) or by not storing the source strings with the plans (which'd deal a serious blow to our ability to provide helpful error messages). Neither answer seems appealing. Thanks for the explanation, I didn't occur to me that each plan keeps a copy of the whole source string. I think it would be a better idea to adopt a less brain-dead way of processing the data. Can't you convert this to a single INSERT with a lot of VALUES rows? Or split it into multiple EXECUTE chunks? Well, it's not my app but I'll recommend it to them. Actually I already did but I didn't have an explanation of why it behaves like this. The really annoying bit is that in 9.1 this works fine (it's just as crazy approach as on but it does not end with OOM error). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
On Tue, Dec 18, 2012 at 10:44 PM, Robert Haas robertmh...@gmail.com wrote: Yeah, that's why I don't know how to make it work. It feels like this is partly artifact of the tool, though. I mean, suppose we haven't read anything yet. Then, the next token can't be an IDENT, so if we see an unreserved keyword, we know we're not going to convert it to an IDENT. OTOH, if we've seen CREATE TABLE, the next token cannot be an unreserved keyword that is intended as a keyword; it has to be something that will reduce to ColId. I guess the problem situation is where we can shift the keyword and then use the following token to decide whether to reduce it to ColId/type_function_name/ColLabel or use some other rule instead; the CREATE INDEX CONCURRENTLY productions might be such a case. It seems to me the avenue for simplifying the transition table would be keywords that can never be used in the same place. That is, if we replaced all the elements of such a set with a single token then the grammar would be unambigous and we could insert a check that the right actual token was present in each place it's used. I'm thinking of the various noise words that the SQL standard introduces which are all going to be reduced to IDENT except for a few places each of which will only admit one such noise word anyways. I think doing this manually would be unmaintainable since every time we modified the grammar it would introduce random unpredictable conflicts which would be hard to debug. But I wonder if we could preparse the transitions table, find any such large sets and rewrite either the transition table or regenerate the grammar and rerun bison on it. Alternately we could just replace the transition table with a representation that is less wasteful such as a list of perfect hash tables just large enough to hold the valid transition. Or even a single very large perfect hash table where the key is state,token. But I'm not entirely convinced any of this is actually useful. Just becuase the transition table is large doesn't mean it's inefficient. Most of these bytes are being wasted on transitions which can never occur or which can never occur in syntactically valid SQL. The transitions which can occur will still be present in any condensed representation we come up with. The L2 cache might still be large enough to hold these hot transitions which might not be a very large subset at all. valgrind comes with a tool called cachegrind which can emulate the cache algorithm on some variants of various cpus and produce reports. Can it be made to produce a report for a specific block of memory? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote: Can you post the full definition of the table on this public email list? Also, why did the error think this was in the public schema? Any idea? --- 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting .. snip It's all what I'm found about this table. -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; Owner: postgres; Tablespace: -- CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ ( @Файл integer NOT NULL, Страница integer NOT NULL, Данные bytea ); ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres; -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); -- -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл FOREIGN KEY (@Файл) REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE; -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; Owner: postgres -- REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC; REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres; GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres; GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user; There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. [ Sorry I have not been replying promptly. I have been sick with the flue for the past four days, and while I read the email promptly, my brain isn't sharp enough to send email out for everyone to read. I am better today so hopefully I will be 100% soon. ] OK, this tells me that the period is in the table name: -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; Owner: postgres; Tablespace: I needed to check that the period wasn't a symptom of a bug. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser Cruft in gram.y
Greg Stark st...@mit.edu writes: But I'm not entirely convinced any of this is actually useful. Just becuase the transition table is large doesn't mean it's inefficient. That's a fair point. However, I've often noticed base_yyparse() showing up rather high on profiles --- higher than seemed plausible at the time, given that its state-machine implementation is pretty tight. Now I'm wondering whether that isn't coming from cache stalls from trying to touch all the requisite parts of the transition table. valgrind comes with a tool called cachegrind which can emulate the cache algorithm on some variants of various cpus and produce reports. Can it be made to produce a report for a specific block of memory? I believe that oprofile can be persuaded to produce statistics about where in one's code are the most cache misses, not just the most wall-clock ticks; which would shed a lot of light on this question. However, my oprofile-fu doesn't quite extend to actually persuading it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Wed, Dec 19, 2012 at 12:56:05PM -0500, Kevin Grittner wrote: Groshev Andrey wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ There is a limit on identifiers of 63 *bytes* (not characters) after which the name is truncated. In UTF8 encoding, the underscore would be in the 64th position. OK, Kevin is certainly pointing out a bug in the pg_upgrade code, though I am unclear how it would exhibit the mismatch error reported. pg_upgrade uses NAMEDATALEN for database, schema, and relation name storage lengths. While NAMEDATALEN works fine in the backend, it is possible that a frontend client, like pg_upgrade, could retrieve a name in the client encoding whose length exceeds NAMEDATALEN if the client encoding did not match the database encoding (or is it the cluster encoding for system tables). This would cause truncation of these values. The truncation would not cause crashes, but might cause failures by not being able to connect to overly-long database names, and it weakens the checking of relation/schema names --- the same check that is reported above. (I believe initdb.c also erroneously uses NAMEDATALEN.) For this to be the cause of the users report, there would have to be different truncation behavior for old and new clusters when you restore the dump. Did we change how this somehow between 9.0 and 9.1? In summary, we are getting closer to a fix, but we are not there yet. I can supply a patch that removes the use of NAMEDATALEN and you can test that, but again, I don't see how that can cause this. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote: Can you post the full definition of the table on this public email list? Also, why did the error think this was in the public schema? Any idea? --- 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting .. snip It's all what I'm found about this table. -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; Owner: postgres; Tablespace: -- CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ ( @Файл integer NOT NULL, Страница integer NOT NULL, Данные bytea ); ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres; -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); -- -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл FOREIGN KEY (@Файл) REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE; -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; Owner: postgres -- REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC; REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres; GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres; GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user; There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. Does the old database have a table with prefix plob., called plob.ВерсияВнешнегоДокумента$Документ? If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a table with that name mentioned? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Thu, Dec 20, 2012 at 6:12 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: This can be handled by breaking 1-to-1 mapping on VM bit and PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It was proposed by Andres up thread, but shot down by Tom and Simon. But I still feel that was over reaction and there is a lot of merit in the idea. As I said elsewhere, it will also help the case when there are DEAD line pointers in a page. Today we can't mark such pages all-visible, but if we break this mapping, we can do that. Sure, but you're zipping rather blithely past the disadvantages of such an approach. Hmm. You're right. I did not think about the disadvantages and now that you mention them, I feel they are important. Jeff Davis recently proposed getting rid of PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism about that; this proposal has the same problems. One of the major benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts, updates, and deletes to the page can ignore the visibility map. That means that a server under heavy concurrency is much less likely to encounter contention on the visibility map blocks. Now, maybe that's not really a problem, but I sure haven't seen enough evidence to make me believe it. If it's really true that PD_ALL_VISIBLE needn't fill this role, then Heikki wasted an awful lot of time implementing it, and I wasted an awful lot of time keeping it working when I made the visibility map crash-safe for IOS. That could be true, but I tend to think it isn't. Yeah, VM buffer contention can become prominent if we break the invariant that page level bit status implies the vm bit status, at least when its clear.OTOH IMHO we need some mechanism to address the issue of aggressive clearing of the VM bits, but a very lame corresponding set operation. Today we don't have much contention on the VM page, but we must be sacrificing its usability in return. IOS as well as vacuum optimizations using VMs will turn out not so useful for many workloads. I'm very reluctant to suggest that we can solve this my setting aside another page-level bit to track visibility of tuples for heapscans. Or even have a bit in the tuple header itself to track this information at that level to avoid repeated visibility check for a tuple which is known to be visible to all current and future transactions. I would like to run some pgbench tests where we get the system in a steady state such as all/most updates are HOT updates (not entirely unlikely scenario for many real life cases). And then try running some concurrent queries which can be executed via IOS. My gut feel is that, today we will see slow and continuous drop in performance for these queries because IOS will slowly stop working. If there are no vacuums, I agree. And we expect vacuums to be very less or none. AFAIR in pgbench, it now takes hours for accounts table to get chosen for vacuum and we should be happy about it. But IOS are almost impossible for pgbench kind of workloads today because of our aggressive strategy to clear the VM bits. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
On Wed, Dec 19, 2012 at 10:35:11PM -0500, Bruce Momjian wrote: There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. Does the old database have a table with prefix plob., called plob.ВерсияВнешнегоДокумента$Документ? If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a table with that name mentioned? Also, when you say rlob above, is the 'r' a Latin letter sound that would look like a Russian 'p' in the error message? (In Cyrillic, a Latin-looking p sounds like Latin-sounding r.) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request: pg_replication_master()
On Thursday, December 20, 2012 3:50 AM Joshua Berkus wrote: It stalled because the patch author decided not to implement the request to detect recovery.conf in data directory, which allows backwards compatibility. Well, I don't think we had agreement on how important backwards compatibility for recovery.conf was, particularly not on the whole recovery.conf/recovery.done functionality and the wierd formatting of recovery.conf. However, with include_if_exists directives in postgresql.conf, or include_dir, that would be easy to work around. Don't we have something like that planned for SET PERSISTENT? Yes in SET PERSISTENT patch, it uses include_dir. I wonder why can't we get this information from WALRcvData structure? It has the required information. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
No, old database not use table plob.. only primary key -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- -- For binary upgrade, must preserve pg_class oids SELECT binary_upgrade.set_next_index_pg_class_oid('786665369'::pg_catalog.oid); ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); 20.12.2012, 06:35, Bruce Momjian br...@momjian.us: On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote: Can you post the full definition of the table on this public email list? Also, why did the error think this was in the public schema? Any idea? --- 18.12.2012, 19:38, Bruce Momjian br...@momjian.us: On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ Failure, exiting .. snip It's all what I'm found about this table. -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; Owner: postgres; Tablespace: -- CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ ( @Файл integer NOT NULL, Страница integer NOT NULL, Данные bytea ); ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres; -- -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY (@Файл, Страница); -- -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл FOREIGN KEY (@Файл) REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE; -- -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; Owner: postgres -- REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC; REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres; GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres; GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user; There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. Does the old database have a table with prefix plob., called plob.ВерсияВнешнегоДокумента$Документ? If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a table with that name mentioned? -- Bruce Momjian br...@momjian.us http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set visibility map bit after HOT prune
On Thursday, December 20, 2012 6:14 AM Robert Haas wrote: On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com wrote: The benefit of saying that only UPDATEs clean the block is that this penalises only the workload making the mess, rather than everybody cleaning up repeatedly over one messy guy. Right, but there are plenty of situations where having everybody clean up after the messy guy is better than waiting around and hoping that Mom (aka vacuum) will do it. If we see for similar situation in index, during index scan, it just marks the tuple as DEAD without taking X lock and then during split (when it already has X lock) it free's the actual space. So not sure if it's good idea to take X lock for cleanup during heap scan, where write operation's happens more frequently and have better chance of cleanup. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
No, people can confuse writing, but it makes a computer. Unfortunately, I have not found developer this database, but I understand the logic was: plob - primary key (lob ~ BLOB) rlob - reference key (lob ~ BLOB) Maybe if I describe the task, this part of the database, the problem is clear. We need to maintain external documents (binary scans, per page). Therefore, there is a table to store the titles and a table to store binary data. To make it more comfortable I replaced all Russian words translated words. This a table for headers store. -- Table: VersionOfTheExternalDocument$Document -- DROP TABLE VersionOfTheExternalDocument$Document; CREATE TABLE VersionOfTheExternalDocument$Document ( @File integer NOT NULL DEFAULT nextval((pg_get_serial_sequence('public.VersionOfTheExternalDocument$Document'::text, '@File'::text))::regclass), GUID uuid, DataTime timestamp without time zone DEFAULT (now())::timestamp without time zone, Name character varying, Size integer, CONSTRAINT VersionOfTheExternalDocument$Document_pkey PRIMARY KEY (@File) ) WITH ( OIDS=FALSE ); ALTER TABLE VersionOfTheExternalDocument$Document OWNER TO postgres; GRANT ALL ON TABLE VersionOfTheExternalDocument$Document TO postgres; GRANT SELECT ON TABLE VersionOfTheExternalDocument$Document TO view_user; -- Index: iVersionOfTheExternalDocument$Document-blb_header -- DROP INDEX iVersionOfTheExternalDocument$Document-blb_header; CREATE INDEX iVersionOfTheExternalDocument$Document-blb_header ON VersionOfTheExternalDocument$Document USING btree (GUID, @Файл, ДатаВремя) WHERE GUID IS NOT NULL; --- And this for data. -- Table: lob.VersionOfTheExternalDocument$Document -- DROP TABLE lob.VersionOfTheExternalDocument$Document; CREATE TABLE lob.VersionOfTheExternalDocument$Document ( @File integer NOT NULL, Page integer NOT NULL, Data bytea, CONSTRAINT lob.VersionOfTheExternalDocument$Document_pkey PRIMARY KEY (@File, Page), CONSTRAINT rlob.VersionOfTheExternalDocument$Document-@File FOREIGN KEY (@File) REFERENCES VersionOfTheExternalDocument$Document (@File) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE ) WITH ( OIDS=FALSE ); ALTER TABLE lob.VersionOfTheExternalDocument$Document OWNER TO postgres; GRANT ALL ON TABLE lob.VersionOfTheExternalDocument$Document TO postgres; GRANT SELECT ON TABLE lob.VersionOfTheExternalDocument$Document TO view_user; 20.12.2012, 07:12, Bruce Momjian br...@momjian.us: On Wed, Dec 19, 2012 at 10:35:11PM -0500, Bruce Momjian wrote: There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.) It is referenced by a foreign key (rlob.ВерсияВнешнегоДокумента$Документ-@Файл) But as I understand it, the problem with the primary key. Does the old database have a table with prefix plob., called plob.ВерсияВнешнегоДокумента$Документ? If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a table with that name mentioned? Also, when you say rlob above, is the 'r' a Latin letter sound that would look like a Russian 'p' in the error message? (In Cyrillic, a Latin-looking p sounds like Latin-sounding r.) -- Bruce Momjian br...@momjian.us http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1
I'm initialize data dir with use ru_RU.UTF8, but this databse use CP1251, ie one byte per character. 19.12.2012, 21:47, Tom Lane t...@sss.pgh.pa.us: Kevin Grittner kgri...@mail.com writes: Groshev Andrey wrote: Mismatch of relation names: database database, old rel public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel public.plob.ВерсияВнешнегоДокумента$Документ There is a limit on identifiers of 63 *bytes* (not characters) after which the name is truncated. In UTF8 encoding, the underscore would be in the 64th position. Hmm ... that is a really good point, except that you are not counting the lob. or plob. part, which we previously saw is part of the relation name not the schema name. Counting that part, it's already overlimit, which seems to be proof that Andrey isn't using UTF8 but some single-byte encoding. Anyway, that would only explain the issue if pg_upgrade were somehow changing the database encoding, which surely we'd have heard complaints about already? Or maybe this has something to do with pg_upgrade's client-side encoding rather than the server encoding... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW: ForeignPlan and parameterized paths
I intentionally did the nestloop_params substitution after calling GetForeignPlan not before. It's not apparent to me why it would be useful to do it before, because the FDW is going to have no idea what those params represent. (Note that they represent values coming from some other, probably local, relation; not from the foreign table.) Even if the FDW have no idea what they represent, it can identify a clause of the form Var Operator Param, which allows to store the param reference (paramid) for retrieving the param value at execution time. If the chosen best path is a parameterized path that has been built by the FDW, it allows to push down this restriction. If this isn't possible, the only way I found to use those clauses would be at scan time. Lets's assume atable is a local relation, and aftable is a foreign table, and the query looks like this: select * from atable t1 inner join aftable t2 on t1.c1 = t2.c1 The FDW identifies the join clause on its column c1, and build a parameterized path on this column (maybe because this column is unique and indexed on the remote side). The planner chooses this path, building a nested loop rescanning the foreign table with this parameter value reflecting the outer relation value (maybe because the local relation's size is much smaller than the remote relation's size). In that case, it seems to be of particular importance to have access to the clause, so that the nested loop can work as intended: avoiding a full seqscan on the remote side. Or is there another way to achieve the same goal ? Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator dependency of commutator and negator, redux
On 20 December 2012 11:51, Tom Lane t...@sss.pgh.pa.us wrote: While reconsidering the various not-too-satisfactory fixes we thought of back then, I had a sudden thought. Instead of having a COMMUTATOR or NEGATOR forward reference create a shell operator and link to it, why not simply *ignore* such references? Then when the second operator is defined, go ahead and fill in both links? Ignore with warning sounds pretty good. So it would go something like this? # CREATE OPERATOR (... COMMUTATOR ); WARNING: COMMUTATOR (foo, foo) undefined, ignoring. CREATE OPERATOR # CREATE OPERATOR (... COMMUTATOR ); CREATE OPERATOR Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers