date:20121219

On 19 December 2012 06:10, Magnus Hagander mag...@hagander.net wrote:

 This sounds like my previous suggestion of returning the primary conninfo
 value, but with just ip. That one came with a pretty bad patch, and was
 later postponed until we folded recovery.conf into the main configuration
 file parsing. I'm not really sure what happened to that project? (the
 configuration file one)

It stalled because the patch author decided not to implement the
request to detect recovery.conf in data directory, which allows
backwards compatibility.

I proposed a solution to how to do that, so we can move forwards if
people have time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - assign result of query to psql variable

2012-12-19 Thread Pavel Stehule

2012/12/19 Shigeru Hanada shigeru.han...@gmail.com:
 On Tue, Dec 18, 2012 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Attached updated patch

 Seems fine.  I'll mark this as ready for committer.

 Nice work!

thank you very much

Regards

Pavel


 --
 Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system administration functions with hardcoded superuser checks

On 19 December 2012 06:34, Magnus Hagander mag...@hagander.net wrote:

 Granting executability on pg_read_xyz is pretty darn close to granting
 superuser, without explicitly asking for it. Well, you get read only
 superuser. If we want to make that step as easy as just GRANT, we
 really need to write some *very* strong warnings in the documentation
 so that people realize this. I doubt most people will realize it
 unless we do that (and those who don't read the docs, whch is probably
 a majority, never will).

Good point.

Can we do that explicitly with fine grained superuser-ness?

GRANT SUPERUSER ON FUNCTION  TO foo;


 If you use SECURITY DEFINER, you can limit the functions to *the
 specific files that you want to grant read on*. Which makes it
 possible to actually make it secure. E.g. you *don't* have to give
 full read to your entire database.

Even better point

 If you're comparing it to a blanket SECURITY DEFINER with no checks,
 then yes, it's a simpler way to fire the cannon into your own foot,
 yes. But if also gives you a way that makes it more likely that you
 don't *realize* that you're about to fire a cannon into your foot.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Joshua D. Drake



On 12/18/2012 11:57 PM, Simon Riggs wrote:


On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote:


So, my question is:

1. should we detect for replication cycles?  *Can* we?
2. should we warn the user, or refuse to start up?


Why not just monitor the config you just created? Anybody that
actually tests their config would spot this.


I think you are being optimistic. We should probably have some logic 
that prevents circular replication.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

On 19 December 2012 08:11, Joshua D. Drake j...@commandprompt.com wrote:

 On 12/18/2012 11:57 PM, Simon Riggs wrote:


 On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote:

 So, my question is:

 1. should we detect for replication cycles?  *Can* we?
 2. should we warn the user, or refuse to start up?


 Why not just monitor the config you just created? Anybody that
 actually tests their config would spot this.


 I think you are being optimistic. We should probably have some logic that
 prevents circular replication.

My logic is that if you make a 1 minute test you will notice your
mistake, which is glaringly obvious. That is sufficient to prevent
that mistake, IMHO.

If you don't test your config and don't monitor either, good luck with HA.

Patches welcome, if you think this important enough.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1


 Can you post the full definition of the table on this public email list?
 Also, why did the error think this was in the public schema?  Any idea?

 ---

  18.12.2012, 19:38, Bruce Momjian br...@momjian.us:
  On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote:
   Mismatch of relation names: database database, old rel 
 public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel 
 public.plob.ВерсияВнешнегоДокумента$Документ
   Failure, exiting
.. snip 

It's all what I'm found about this table.


--
-- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; 
Owner: postgres; Tablespace: 
--

CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ (
@Файл integer NOT NULL,
Страница integer NOT NULL,
Данные bytea
);


ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres;

--
-- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: 
public; Owner: postgres; Tablespace: 
--

ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ 
PRIMARY KEY (@Файл, Страница);


--
-- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; 
Schema: public; Owner: postgres
--

ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл 
FOREIGN KEY (@Файл) 
REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) 
ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE;

--
-- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; 
Owner: postgres
--

REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC;
REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres;
GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres;
GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user;


There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
It is referenced by a foreign key 
(rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
But as I understand it, the problem with the primary key.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] configure.in and setproctitle/optreset problem

2012-12-19 Thread Christoph Berg

Re: Tom Lane 2012-12-18 8705.1355845...@sss.pgh.pa.us
  The correct fix, IMO/IIRC, is to add LDFLAGS=-Wl,--as-needed before
  running most of the configure checks, instead of after.
 
 Meh.  It's not clear to me at all that that fixes the issue here,
 or at least that it does so in any way that's reliable.  The proposal
 to add --as-needed during configure was made to fix a different problem,
 namely making the wrong decision about whether libintl needs to be
 pulled in explicitly.  We don't seem to have done anything about that

To me, twiddling with --as-needed sounds like trading one set of
possible problems for a different one, configure checks should be as
deterministic as possible. It might still be that Peter's --as-needed
suggestion is a good fix, but I believe the issue I reported should
also be fixed by the patch Tom sent.

Reiterating a point from my original message, why is -l{readline,edit}
included in these configure checks at all? Most (if not all) of the
function checks in that block are not related to input editing anyway.

Mit freundlichen Grüßen,
Christoph Berg
-- 
Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] pg_ping utility

2012-12-19 Thread Michael Paquier

On Wed, Dec 12, 2012 at 12:06 AM, Bruce Momjian br...@momjian.us wrote:

 On Sat, Dec  8, 2012 at 08:59:00AM -0500, Phil Sorber wrote:
  On Sat, Dec 8, 2012 at 7:50 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
  
   Bruce mentionned that pg_isready could be used directly by pg_ctl -w.
   Default as being non-verbose would make sense. What are the other
 tools you
   are thinking about? Some utilities in core?
 
  I think Bruce meant that PQPing() is used by pg_ctl -w, not that he
  would use pg_isready.

 Right.

 OK cool. If you have some spare room to write a new version with verbose
option as default, I'll be pleased to review it and then write it as ready
for committer.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Makefiles don't seem to remember to rebuild everything anymore

On Wed, Dec 19, 2012 at 12:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 17, 2012 at 1:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This is definitely not per make's contract, either.  I think maybe the
 Don't rebuild the list if only the OBJS have changed hack in common.mk
 is a brick or two shy of a load, but I don't know how to fix that.

 I feel like it's been this way for a while - at least I feel like I've
 noticed this before.  I think there is some inevitable kludginess
 around having one makefile per subdirectory that leads to these kinds
 of issues.  Maybe we should get rid of all the makefiles under
 src/backend except for the top-level one and just do everything there.

 I mentioned this paper last year, but maybe it's time to start
 taking it seriously:
 http://aegis.sourceforge.net/auug97.pdf

+1 from me.  I don't know that just fixing src/backend will do a whole
lot to improve build times in and of itself, but I do think it might
reduce the required amount of alchemy to keep things working.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication


On 19.12.2012 04:57, Josh Berkus wrote:

Heikki,

I ran into an unexpected issue while testing.  I just wanted to fire up
a chain of 5 replicas to see if I could connect them in a loop.
However, I ran into a weird issue when starting up r3: it refused to
come out of the database is starting up mode until I did a write on
the master.  Then it came up fine.

master--r1--r2--r3--r4

I tried doing the full replication sequence (basebackup, startup, test)
with it twice and got the exact same results each time.

This is very strange because I did not encounter the same issues with r2
or r4.  Nor have I seen this before in my tests.


Ok.. I'm going to need some more details on how to reproduce this, I'm 
not seeing that when I set up four standbys.



I'm also seeing Thom's spurious error message now.  Each of r2, r3 and
r4 have the following message once in their logs:

LOG:  database system was interrupted while in recovery at log time
2012-12-19 02:49:34 GMT
HINT:  If this has occurred more than once some data might be corrupted
and you might need to choose an earlier recovery target.

This message doesn't seem to signify anything.


Yep. You get that message when you start up the system from a base 
backup that was taken from a standby server. It's just noise, it would 
be nice if we could dial it down somehow.


In general, streaming replication and backups tend to be awfully noisy. 
I've been meaning to go through all the messages that get printed during 
normal operation and think carefully which ones are really necessary, 
which ones could perhaps be merged into more compact messages. But 
haven't gotten around to it; that would be a great project for someone 
who actually sets up these systems regularly in production.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parser Cruft in gram.y

2012-12-19 Thread David Fetter

On Tue, Dec 18, 2012 at 10:33:12AM +0100, Dimitri Fontaine wrote:
 Robert Haas robertmh...@gmail.com writes:
  And on the other hand, if you could get a clean split between the two
  grammars, then regardless of exactly what the split was, it might seem
  a win.  But it seemed to me when I looked at this that you'd have to
  duplicate a lot of stuff and the small parser still wouldn't end up
  being very small, which I found hard to get excited about.
 
 I think the goal is not so much about getting a much smaller parser, but
 more about have a separate parser that you don't care about the bloat
 of, so that you can improve DDL without fearing about main parser
 performance regressions.

In addition to this, there could well be uses for a more modular
parser.  For example, if it turns out to be possible to do our parser
in a way that is exportable and (can be converted to a type that)
looks forward, client code could do a lot of interesting (and provably
correct) things.

 If we come out with no regression and no gain on the main query parser,
 I say it still worth the separation effort. And anyway we only add
 things to the main parser (queries) when the standard saith we have to.
 
 Tom Lane t...@sss.pgh.pa.us writes:
  I'm not sure what other tool might be better though.  I looked through
  http://en.wikipedia.org/wiki/Comparison_of_parser_generators#Deterministic_context-free_languages
  but it seems our options are a bit limited if we want something
  that produces C.  It's not clear to me that any of the likely options
  are as mature as bison, let alone likely to substantially outperform it.
  (For instance, Hyacc sounded pretty promising until I got to the part
  about it doesn't yet support %union or %type.)  Still, I didn't spend
  much time on this --- maybe somebody else would like to do a bit more
  research.
 
 I did spend a very little time on it too, with a different search angle,
 and did find that experimental thing that might be worth looking at,
 or maybe not.
 
   http://en.wikipedia.org/wiki/Parsing_expression_grammar
   http://piumarta.com/software/peg/

More angles:

http://wwwiti.cs.uni-magdeburg.de/~rosenmue/publications/SKS+08.pdf

Might anyone here wish to investigate this, given some kind of
resources for the initial study?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles

2012-12-19 Thread Jeff Janes

On Tue, Dec 18, 2012 at 5:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:

Sorry for the malformed and duplicate post.  I was not trying to be
emphatic; I was testing out gmail offline.  Clearly the test didn't go
too well.

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] FDW: ForeignPlan and parameterized paths

2012-12-19 Thread Ronan Dunklau

Hello.

I've noticed that,  when implementing a FDW, it is difficult to use a plan 
which 
best path is a parameterized path. This comes from the fact that the 
parameterized clause is not easily available at plan time.

This is what I understood from how it works:

- The clauses coming from the best path restrictinfo are not available in the 
scan_clauses argument to the GetForeignPlan function.

- They are, however, directly available on the path, but at this point the 
clauses are of the form InnerVar OPERATOR OuterVar. The outer Var node is then 
replaced by a Param node, using the replace_nestloop_params function.

It could be useful to make the parameterized version of the clause (in the 
form InnerVar OPERATOR Param) available to the fdw at plan time.

Could this be possible ?
Maybe by replacing the clauses on the restrictinfo nodes from the path param 
info by the parameterized clauses, and then adding these to the scan clauses 
passed to GetForeignPlan ?

Regards,

--
Ronan Dunklau


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On the other hand, the HOT prune operation itself is worthless when
 we're running a SELECT.  The only reason we do it that way is that we
 have to prune before the query starts to use the page, else pruning
 might invalidate pointers-to-tuples that are being held within the
 query plan tree.

 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.  I think this was discussed
 back when HOT went in, but nobody wanted to make the patch more invasive
 than it had to be.

I think it's wrong to assume that HOT pruning has no value except in
this case.  Truncating dead tuples to line pointers and collapsing HOT
chains speeds up future page scans, and if we were able to set the
all-visible bit, that would help even more.  The problem is that this
is all somewhat prospective: HOT pruning the page doesn't help the
*current* scan - in fact, it can sometimes slow it down considerably -
but it can be a great help to the next scan that comes through.  We
say, oh, don't worry, VACUUM will take care of it, but there are
plenty of cases where a page can be scanned a very large number of
times before VACUUM comes along; and you do can lose a lot of
performance in those cases.

That having been said, I agree with the concerns expressed elsewhere
in this thread that setting the visibility map bit too aggressively
will be a waste.  If the page is about to get dirtied again we surely
don't want to go there.  Aside from the obvious problem of doing work
that may not be necessary, it figures to create buffer-lock contention
on the visibility map page.  One of the strengths of the current
design is that we avoid that pretty effectively.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Documentation bug for LDAP authentication

2012-12-19 Thread Albe Laurenz

While playing with LDAP authentication, I discovered
two documentation bugs.

First, user and password for the first step in the
two-step authentication mode are ldapbinddn and
ldapbindpasswd, not ldapbinduser and ldapbinddn.

This bug has been there since 8.4.

The second one is new in 9.3 with the URL syntax:
It is not possible to specify user and password
in the LDAP URL.

The first hunk should be backpatched.

Yours,
Laurenz Albe


ldapdoc.patch
Description: ldapdoc.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication


On 19.12.2012 15:55, Heikki Linnakangas wrote:

On 19.12.2012 04:57, Josh Berkus wrote:

Heikki,

I ran into an unexpected issue while testing. I just wanted to fire up
a chain of 5 replicas to see if I could connect them in a loop.
However, I ran into a weird issue when starting up r3: it refused to
come out of the database is starting up mode until I did a write on
the master. Then it came up fine.

master--r1--r2--r3--r4

I tried doing the full replication sequence (basebackup, startup, test)
with it twice and got the exact same results each time.

This is very strange because I did not encounter the same issues with r2
or r4. Nor have I seen this before in my tests.


Ok.. I'm going to need some more details on how to reproduce this, I'm
not seeing that when I set up four standbys.


Ok, I managed to reproduce this now. The problem seems to be a timing 
problem, when a standby switches to follow a new timeline. Four is not a 
magic number here, it can happen with just one cascading standby too.


When the timline switch happens, for example, the standby changes 
recovery target timeline from 1 to 2, at WAL position 0/30002D8, it has 
all the WAL up to that WAL position. However, it only has that WAL in 
file 00010003, corresponding to timeline 1, and not in 
the file 00020003, corresponding to the new timeline. 
When a cascaded standby connects, it requests to start streaming from 
point 0/300 at timeline 2 (we always start streaming from the 
beginning of a segment, to avoid leaving partially-filled segments in 
pg_xlog). The walsender in the 1st standby tries to read that from file 
00020003, which does not exist yet.


The problem goes away after some time, after the 1st standby has 
streamed the contents of 00020003 and written it to 
disk, and the cascaded standby reconnects. But it would be nice to avoid 
that situation. I'm not sure how to do that yet, we might need to track 
the timeline we're currently receiving/sending more carefully. Or 
perhaps we need to copy the previous WAL segment to the new name when 
switching recovery target timeline, like we do when a server is 
promoted. I'll try to come up with something...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication


On 19.12.2012 17:27, Heikki Linnakangas wrote:

On 19.12.2012 15:55, Heikki Linnakangas wrote:

On 19.12.2012 04:57, Josh Berkus wrote:

Heikki,

I ran into an unexpected issue while testing. I just wanted to fire up
a chain of 5 replicas to see if I could connect them in a loop.
However, I ran into a weird issue when starting up r3: it refused to
come out of the database is starting up mode until I did a write on
the master. Then it came up fine.

master--r1--r2--r3--r4

I tried doing the full replication sequence (basebackup, startup, test)
with it twice and got the exact same results each time.

This is very strange because I did not encounter the same issues with r2
or r4. Nor have I seen this before in my tests.


Ok.. I'm going to need some more details on how to reproduce this, I'm
not seeing that when I set up four standbys.


Ok, I managed to reproduce this now.


Hmph, no I didn't, I replied to wrong email. The problem I managed to 
reproduce was the one where you get requested WAL
segment 00020003 has already been removed errors, 
reported by Thom.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

2012-12-19 Thread Bernhard Schrader


Hello again,

well, still everything is working.

What information do you need to get into this issue?

Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some 
more or less changed enum_add and enum_del (Which are appended at the 
end) to be able to change enums within transactions.


And that this happened to the beta server and not to the staging server, 
might be because we sometimes have to drop the whole stuff of staging, 
because of some failures we did, so old enum values will not be 
persistent in old indexes.


if you need more info, just ask. :)

regards Bernhard

SET check_function_bodies = false;
CREATE OR REPLACE FUNCTION enum_add (enum_name character varying, 
enum_elem character varying) RETURNS void

AS
$body$
DECLARE
_enum_typid INTEGER;
version_int INTEGER;
_highest_enumsortorder REAL;
BEGIN
-- get enumtypid
SELECT oid FROM pg_type WHERE typtype='e' AND typname=enum_name 
INTO _enum_typid;


SELECT INTO version_int setting FROM pg_settings WHERE name = 
'server_version_num';

--postgres 9.2 or higher
IF version_int  90200 THEN
SELECT MAX(enumsortorder) FROM pg_enum WHERE enumtypid = 
_enum_typid INTO _highest_enumsortorder;

-- check if elem already exists in enum
IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumlabel = 
enum_elem AND enumtypid = _enum_typid) THEN
INSERT INTO pg_enum(enumtypid, enumlabel, enumsortorder) 
VALUES (

_enum_typid,
enum_elem,
_highest_enumsortorder + 1
);
END IF;
ELSE
-- check if elem already exists in enum
IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumlabel = 
enum_elem AND enumtypid = _enum_typid) THEN

INSERT INTO pg_enum(enumtypid, enumlabel) VALUES (
_enum_typid,
enum_elem
);
END IF;
END IF;
END;
$body$
LANGUAGE plpgsql;
--
-- Definition for function enum_del:
--
CREATE OR REPLACE FUNCTION enum_del (enum_name character varying, 
enum_elem character varying) RETURNS void

AS
$body$
DECLARE
type_oid INTEGER;
rec RECORD;
sql VARCHAR;
ret INTEGER;
BEGIN

SELECT pg_type.oid
FROM pg_type
WHERE typtype = 'e' AND typname = enum_name
INTO type_oid;

-- check if enum exists
IF NOT EXISTS (SELECT * FROM pg_enum WHERE enumtypid = type_oid) THEN
RETURN;
END IF;

-- check if element in enum exists
IF NOT FOUND THEN
RAISE EXCEPTION 'Cannot find a enum: %', enum_name;
END IF;

-- Check column DEFAULT value references.
SELECT *
FROM
pg_attrdef
JOIN pg_attribute ON attnum = adnum AND atttypid = type_oid
JOIN pg_class ON pg_class.oid = attrelid
JOIN pg_namespace ON pg_namespace.oid = relnamespace
WHERE
adsrc = quote_literal(enum_elem) || '::' || quote_ident(enum_name)
LIMIT 1
INTO rec;

IF FOUND THEN
RAISE EXCEPTION
'Cannot delete the ENUM element %.%: column %.%.% has 
DEFAULT value of ''%''',

quote_ident(enum_name), quote_ident(enum_elem),
quote_ident(rec.nspname), quote_ident(rec.relname),
rec.attname, quote_ident(enum_elem);
END IF;

-- Check data references.
FOR rec IN
SELECT *
FROM
pg_attribute
JOIN pg_class ON pg_class.oid = attrelid
JOIN pg_namespace ON pg_namespace.oid = relnamespace
WHERE
atttypid = type_oid
AND relkind = 'r'
LOOP
sql :=
'SELECT 1 FROM ONLY '
|| quote_ident(rec.nspname) || '.'
|| quote_ident(rec.relname) || ' '
|| ' WHERE '
|| quote_ident(rec.attname) || ' = '
|| quote_literal(enum_elem)
|| ' LIMIT 1';
EXECUTE sql INTO ret;
IF ret IS NOT NULL THEN
RAISE EXCEPTION
'Cannot delete the ENUM element %.%: column %.%.% 
contains references',

quote_ident(enum_name), quote_ident(enum_elem),
quote_ident(rec.nspname), quote_ident(rec.relname),
rec.attname;
END IF;
END LOOP;

-- OK. We may delete.
DELETE FROM pg_enum WHERE enumtypid = type_oid AND enumlabel = 
enum_elem;

END;
$body$
LANGUAGE plpgsql;




--
Bernhard Schrader
System Administration

InnoGames GmbH
Harburger Schloßstraße 28 (Channel 4) - 21079 Hamburg - Germany
Tel +49 40 7889335-53
Fax +49 40 7889335-22

Managing Directors: Hendrik Klindworth, Eike Klindworth, Michael Zillmer
VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973

http://www.innogames.com -- bernhard.schra...@innogames.de

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

2012-12-19 Thread Andres Freund

On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote:
 Hello again,

 well, still everything is working.

 What information do you need to get into this issue?

 Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more
 or less changed enum_add and enum_del (Which are appended at the end) to be
 able to change enums within transactions.

That explains everything. You *CANNOT* do that. There are some pretty
fundamental reasons why you are not allowed to add enums in a
transaction. And even more reasons why deleting from enums isn't allowed
at all.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 8:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On the other hand, the HOT prune operation itself is worthless when
 we're running a SELECT.  The only reason we do it that way is that we
 have to prune before the query starts to use the page, else pruning
 might invalidate pointers-to-tuples that are being held within the
 query plan tree.

 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.  I think this was discussed
 back when HOT went in, but nobody wanted to make the patch more invasive
 than it had to be.

 I think it's wrong to assume that HOT pruning has no value except in
 this case.  Truncating dead tuples to line pointers and collapsing HOT
 chains speeds up future page scans, and if we were able to set the
 all-visible bit, that would help even more.

Good point.

 The problem is that this
 is all somewhat prospective: HOT pruning the page doesn't help the
 *current* scan - in fact, it can sometimes slow it down considerably -
 but it can be a great help to the next scan that comes through.  We
 say, oh, don't worry, VACUUM will take care of it, but there are
 plenty of cases where a page can be scanned a very large number of
 times before VACUUM comes along; and you do can lose a lot of
 performance in those cases.


Also, since we discount for number of tuples pruned by HOT pruning
while tracking number of dead tuples in a table, in a perfectly stable
system, autovacuum may not ever pick the table for vacuuming, slowly
stopping index-only scans from working. Soon we will have a situation
when all VM bits are clear, but autovacuum would fail to pick the
table. Tom had a good suggestion to periodically count vm bits to
choose tables for vacuuming even if there are no dead tuples or dead
line pointers to remove. I'm not sure though if the extra vacuum will
be better than setting the bit after HOT prune. Also, deciding when to
count the bits can be tricky. Do it every vacuum cycle ? Or after
every 5/10 cycles ? I don't have the answer.

 That having been said, I agree with the concerns expressed elsewhere
 in this thread that setting the visibility map bit too aggressively
 will be a waste.  If the page is about to get dirtied again we surely
 don't want to go there.

Yeah, I agree. If we could figure out that we are soon going to UPDATE
a tuple in the page again, it will be worthless to set the bit. But
predicting that also could turn out to be tricky. Even if we could
somehow tell that the scan is happening on the result relation of an
UPDATE operation, not every page may receive updates because of where
quals etc. So we may get lots of false positives.

 Aside from the obvious problem of doing work
 that may not be necessary, it figures to create buffer-lock contention
 on the visibility map page.  One of the strengths of the current
 design is that we avoid that pretty effectively.


Its a valid concern, though my limited pgbench tests did not show any
drop in the number. But thats hardly any proof. We can possibly
mitigate this by conditional update to the VM bit. Do it only if you
get a conditional exclusive lock on the buffer page.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ThisTimeLineID in checkpointer and bgwriter processes

In both checkpointer.c and bgwriter.c, we do this before entering the 
main loop:


   /*
 * Use the recovery target timeline ID during recovery
 */
if (RecoveryInProgress())
ThisTimeLineID = GetRecoveryTargetTLI();

That seems reasonable. However, since it's only done once, when the 
process starts up, ThisTimeLineID is never updated in those processes, 
even though the startup process changes recovery target timeline.


That actually seems harmless to me, and I also haven't heard of any 
complaints of misbehavior in 9.1 or 9.2 caused by that. I'm not sure why 
we bother to set ThisTimeLineID in those processes in the first place. I 
think we did it when streaming replication was introduced because it was 
an easy thing to do, because back then recovery target timeline never 
changed after recovery was started. But now that it can change, I don't 
think that makes sense anymore.


So, I propose that we simply remove those, and leave ThisTimeLineID at 
zero in checkpointer and bgwriter processes, while we're still in 
recovery. ThisTimeLineID is updated anyway before performing the first 
checkpoint after finishing recovery, and AFAICS that's the first time 
the value is needed.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

2012-12-19 Thread Andrew Dunstan



On 12/19/2012 10:56 AM, Andres Freund wrote:

On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote:

Hello again,

well, still everything is working.

What information do you need to get into this issue?

Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more
or less changed enum_add and enum_del (Which are appended at the end) to be
able to change enums within transactions.

That explains everything. You *CANNOT* do that. There are some pretty
fundamental reasons why you are not allowed to add enums in a
transaction. And even more reasons why deleting from enums isn't allowed
at all.




Yes, this is exactly what I referred to in my recent reply to Tom. This 
is a recipe for database corruption.


Hacking the catalog generally is something to be done only with the most 
extreme caution, IMNSHO.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

Robert Haas robertmh...@gmail.com writes:
 On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.  I think this was discussed
 back when HOT went in, but nobody wanted to make the patch more invasive
 than it had to be.

 I think it's wrong to assume that HOT pruning has no value except in
 this case.  Truncating dead tuples to line pointers and collapsing HOT
 chains speeds up future page scans, and if we were able to set the
 all-visible bit, that would help even more.  The problem is that this
 is all somewhat prospective: HOT pruning the page doesn't help the
 *current* scan - in fact, it can sometimes slow it down considerably -
 but it can be a great help to the next scan that comes through.

Well, no.  The problem with the way we do it now is that doing it every
time a query scan arrives at a page is too often, resulting in a lot of
wasted work.  That wasted work is somewhat tolerable as long as it only
involves looking at the current page and ending up not actually changing
it.  If we start generating a lot of useless WAL activity and I/O as
a result of thrashing the all-visible bit, it won't be so tolerable
anymore.  But the problem is not so much the desire to set the bit as
that we're doing this whole activity at the wrong place and time.

Perhaps doing it every time an UPDATE arrives at the page is too far
in the other direction, and we need to look for some other mechanism
entirely.

I think my core point still stands: the way that HOT pruning is done now
is an artifact of having wanted to shoehorn it into the system with
minimum changes.  Which was reasonable at the time given the
experimental status of the feature, but now it's time to reconsider.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-19 16:51:32 +0100, Bernhard Schrader wrote:
 Well, so far i can say, we dont use ALTER TYPE ADD VALUE. We use some more
 or less changed enum_add and enum_del (Which are appended at the end) to be
 able to change enums within transactions.

 That explains everything. You *CANNOT* do that.

Yeah.  So this was not pg_upgrade's fault at all: that code would have
created problems in 9.1 or later even without using pg_upgrade.

For the record, the reason you can't safely do this is exactly what we
saw here: it's possible for deleted/never-committed values of the type
to remain behind in upper levels of btree indexes.  Since we now need
to be able to consult pg_enum to know how to compare values of an enum
type, deleted values are uncomparable.

enum_add is all right as long as you are careful to commit its
transaction before inserting the new value anywhere.  enum_del is quite
unsafe unless you REINDEX all indexes on columns of the type after
making sure the value is gone from the tables.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  If we start generating a lot of useless WAL activity and I/O as
 a result of thrashing the all-visible bit, it won't be so tolerable
 anymore.

What if we wrap that into the WAL generated by HOT prune itself ?
Would that address your concerns for extra WAL logging ? I also
suggested doing it conditionally to avoid contention on the VM buffer.

(I actually wonder why we WAL-log set operation at all except for HS
to be able to do IOS, but thats a topic for separate thread may be)

Also, if extra WAL-logging is really worrisome, may be we should again
seriously reconsider my idea of *not* clearing the bit at HOT update.
My apologies for repeating myself. But that seems very important in a
steady system when almost every update is a HOT update. So you don't
clear the bit at HOT update and so don't need to set it back either,
thus saving two WAL activity. You definitely don't need any vacuum in
this case if pruning keeps reclaiming dead space at appropriate time
and make it available for the next update. More so, IOS still works
great. Why is this so bad ? I haven't forgotten your complaints about
changed meaning of the bit, but I tried to explain that we can read it
in a slightly different way and still show it as an invariant.


 I think my core point still stands: the way that HOT pruning is done now
 is an artifact of having wanted to shoehorn it into the system with
 minimum changes.  Which was reasonable at the time given the
 experimental status of the feature, but now it's time to reconsider.


ISTM that you already have concret ideas about what are those places
where HOT prune would be more effective. My worry is changing anything
there is going to be a lot trickier and will require heavy testing.
Our initial work has served us well so far. Of course, I've no problem
changing that if its going to benefit users.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

2012-12-19 Thread Andrew Dunstan



On 12/19/2012 11:31 AM, Tom Lane wrote:

enum_add is all right as long as you are careful to commit its
transaction before inserting the new value anywhere.




It's not really all right for post-9.0 versions. For example, this is wrong:


--postgres 9.2 or higher
IF version_int  90200 THEN



It should really be IF version_int = 90100 THEN

what is even worse is that this procedure doesn't take any care at all 
of the ordering rule for even numbered enum oids. We could have oid 
wraparound to an even numbered oid and it would break the rule.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Joshua D. Drake



On 12/19/2012 12:34 AM, Simon Riggs wrote:


My logic is that if you make a 1 minute test you will notice your
mistake, which is glaringly obvious. That is sufficient to prevent
that mistake, IMHO.

If you don't test your config and don't monitor either, good luck with HA.


I am not arguing whether you are right. I am arguing whether or not we 
want to shoot all but our experts users in the foot. People make 
mistakes, when reasonable we should help them not make those mistakes.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 Also, if extra WAL-logging is really worrisome, may be we should again
 seriously reconsider my idea of *not* clearing the bit at HOT update.
 My apologies for repeating myself. But that seems very important in a
 steady system when almost every update is a HOT update. So you don't
 clear the bit at HOT update and so don't need to set it back either,
 thus saving two WAL activity. You definitely don't need any vacuum in
 this case if pruning keeps reclaiming dead space at appropriate time
 and make it available for the next update. More so, IOS still works
 great. Why is this so bad ?

It's bad because then sequential scans will return wrong answers,
unless we also rip out the optimization that uses PD_ALL_VISIBLE as an
excuse to skip all visibility checks for the page.  That optimization
is worth a significant amount of performance.

It's also bad because then vacuum won't visit the page, and it really
should.  It's much better to have vacuum prune the page in the
background than to have some query do it in the foreground, although
the latter is still better than not doing it at all.

We could potentially have two or three bits per page to suit these
different needs: (1) page can benefit from a vacuum, (2) page is safe
for IOS purposes, and (3) page is safe for seqscan purposes.  But I
think that might be overengineering.

IMHO, the goal here should be to have some method of setting the
visibility map, in some set of circumstances, outside of vacuum.
Figuring out which set of circumstances is appropriate is the hard
part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] Problems with enums after pg_upgrade

Andrew Dunstan and...@dunslane.net writes:
 what is even worse is that this procedure doesn't take any care at all 
 of the ordering rule for even numbered enum oids.

Good point.  You really should use ALTER TYPE ADD VALUE, on versions
where that's available.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 postgres= INSERT INTO t1 VALUES (4,'ddd');
 INSERT 0 1
 postgres= INSERT INTO t1 VALUES (5,'eee');
 ERROR:  new row for relation t1 violates row-secirity
 DETAIL:  Failing row contains (5, eee).

I've argued against this before - and maybe I should drop my
objection, because a number of people seem to be on the other side.
But I still think there will be some people who don't want this
behavior.  Right now, for example, you can give someone INSERT but not
SELECT permission on a table, and they will then be able to put rows
into the table that they cannot read back.  Similarly, in the RLS
case, it is not necessarily undesirable for a user to be able to
insert a row that they can't read back; or for them to be able to
update a row from a value that they can see to one that they cannot.
Some people will want to prohibit that, while others will not.

Previously, I suggested that we handle this by enforcing row-level
security only on data read from the table - the OLD row, so to speak -
and not on data written to the table - the NEW row, so to speak -
because the latter case can be handled well enough by triggers.  (The
OLD case cannot, because not seeing the row is different from erroring
out when you do see it.)  There are other alternatives, like allowing
the user to specify which behavior they want.  But I think that simply
decreeing that the policy will apply not only to rows read but also
rows written in all cases will be less flexible than we will
ultimately want to be.

YMMV, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 10:40 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Dec 19, 2012 at 11:49 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 Also, if extra WAL-logging is really worrisome, may be we should again
 seriously reconsider my idea of *not* clearing the bit at HOT update.
 My apologies for repeating myself. But that seems very important in a
 steady system when almost every update is a HOT update. So you don't
 clear the bit at HOT update and so don't need to set it back either,
 thus saving two WAL activity. You definitely don't need any vacuum in
 this case if pruning keeps reclaiming dead space at appropriate time
 and make it available for the next update. More so, IOS still works
 great. Why is this so bad ?

 It's bad because then sequential scans will return wrong answers,
 unless we also rip out the optimization that uses PD_ALL_VISIBLE as an
 excuse to skip all visibility checks for the page.  That optimization
 is worth a significant amount of performance.


This can be handled by breaking 1-to-1 mapping on VM bit and
PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
was proposed by Andres up thread, but shot down by Tom and Simon. But
I still feel that was over reaction and there is a lot of merit in the
idea. As I said elsewhere, it will also help the case when there are
DEAD line pointers in a page. Today we can't mark such pages
all-visible, but if we break this mapping, we can do that.

I would like to run some pgbench tests where we get the system in a
steady state such as all/most updates are HOT updates (not entirely
unlikely scenario for many real life cases). And then try running some
concurrent queries which can be executed via IOS. My gut feel is that,
today we will see slow and continuous drop in performance for these
queries because IOS will slowly stop working.

 It's also bad because then vacuum won't visit the page, and it really
 should.  It's much better to have vacuum prune the page in the
 background than to have some query do it in the foreground, although
 the latter is still better than not doing it at all.


Hmm. This is a good point and I don't have an easy answer. I'm not
sure how this will pan out in real life cases though. We definitely
made great progress by having HOT, though the same concerns were
raised even then that we are moving work from background to
foreground. But I think generally HOT made great difference to the
system as a whole, may be at a cost of slowdown for some read-only,
select queries. And HOT prune is not the only operation that we do in
foreground. We also set hint bits and make buffers dirty in an
otherwise read-only queries.


 IMHO, the goal here should be to have some method of setting the
 visibility map, in some set of circumstances, outside of vacuum.
 Figuring out which set of circumstances is appropriate is the hard
 part.


Yeah, if we can figure that out conclusively, I'm sure we might be
able to auto-tune the system even further.

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

Heikki,

 The problem goes away after some time, after the 1st standby has
 streamed the contents of 00020003 and written it to
 disk, and the cascaded standby reconnects. But it would be nice to
 avoid
 that situation. I'm not sure how to do that yet, we might need to
 track
 the timeline we're currently receiving/sending more carefully. Or
 perhaps we need to copy the previous WAL segment to the new name when
 switching recovery target timeline, like we do when a server is
 promoted. I'll try to come up with something...

Would it be accurate to say that this issue only happens when all of the 
replicated servers have no traffic?

--Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On 19 December 2012 16:21, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Dec 15, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.  I think this was discussed
 back when HOT went in, but nobody wanted to make the patch more invasive
 than it had to be.

 I think it's wrong to assume that HOT pruning has no value except in
 this case.  Truncating dead tuples to line pointers and collapsing HOT
 chains speeds up future page scans, and if we were able to set the
 all-visible bit, that would help even more.  The problem is that this
 is all somewhat prospective: HOT pruning the page doesn't help the
 *current* scan - in fact, it can sometimes slow it down considerably -
 but it can be a great help to the next scan that comes through.

 Well, no.  The problem with the way we do it now is that doing it every
 time a query scan arrives at a page is too often, resulting in a lot of
 wasted work.  That wasted work is somewhat tolerable as long as it only
 involves looking at the current page and ending up not actually changing
 it.  If we start generating a lot of useless WAL activity and I/O as
 a result of thrashing the all-visible bit, it won't be so tolerable
 anymore.  But the problem is not so much the desire to set the bit as
 that we're doing this whole activity at the wrong place and time.

 Perhaps doing it every time an UPDATE arrives at the page is too far
 in the other direction, and we need to look for some other mechanism
 entirely.

The benefit of saying that only UPDATEs clean the block is that this
penalises only the workload making the mess, rather than everybody
cleaning up repeatedly over one messy guy.

Having a random SELECT clean the block causes both delay in
non-UPDATEing process, contention and additional writes.

We definitely know we write too often; this has been measured and
discussed over a period of years.

It would be useful to have a table-level option of hot_cleanup= SELECT
| UPDATE | NONE to allow people to minimise cleanup and test the
difference this makes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

Heikki,

The next time I get the issue, and I'm not paying for 5 cloud servers by the 
hour, I'll give you a login.

--Josh

- Original Message -
 On 19.12.2012 17:27, Heikki Linnakangas wrote:
  On 19.12.2012 15:55, Heikki Linnakangas wrote:
  On 19.12.2012 04:57, Josh Berkus wrote:
  Heikki,
 
  I ran into an unexpected issue while testing. I just wanted to
  fire up
  a chain of 5 replicas to see if I could connect them in a loop.
  However, I ran into a weird issue when starting up r3: it
  refused to
  come out of the database is starting up mode until I did a
  write on
  the master. Then it came up fine.
 
  master--r1--r2--r3--r4
 
  I tried doing the full replication sequence (basebackup, startup,
  test)
  with it twice and got the exact same results each time.
 
  This is very strange because I did not encounter the same issues
  with r2
  or r4. Nor have I seen this before in my tests.
 
  Ok.. I'm going to need some more details on how to reproduce this,
  I'm
  not seeing that when I set up four standbys.
 
  Ok, I managed to reproduce this now.
 
 Hmph, no I didn't, I replied to wrong email. The problem I managed to
 reproduce was the one where you get requested WAL
 segment 00020003 has already been removed errors,
 reported by Thom.
 
 - Heikki
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On 19 December 2012 17:25, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 postgres= INSERT INTO t1 VALUES (4,'ddd');
 INSERT 0 1
 postgres= INSERT INTO t1 VALUES (5,'eee');
 ERROR:  new row for relation t1 violates row-secirity
 DETAIL:  Failing row contains (5, eee).

 I've argued against this before - and maybe I should drop my
 objection, because a number of people seem to be on the other side.
 But I still think there will be some people who don't want this
 behavior.  Right now, for example, you can give someone INSERT but not
 SELECT permission on a table, and they will then be able to put rows
 into the table that they cannot read back.  Similarly, in the RLS
 case, it is not necessarily undesirable for a user to be able to
 insert a row that they can't read back; or for them to be able to
 update a row from a value that they can see to one that they cannot.
 Some people will want to prohibit that, while others will not.

I can see a use case for not having security apply for users who have
*only* INSERT privilege. This would allow people to run bulk loads of
data into a table with row security. We should add that. That is not
the common case, so with proper documentation that should be a useful
feature without relaxing default security.

Never applying security for INSERT and then forcing them to add BEFORE
triggers if they want full security is neither secure nor performant.

 Previously, I suggested that we handle this by enforcing row-level
 security only on data read from the table - the OLD row, so to speak -
 and not on data written to the table - the NEW row, so to speak -
 because the latter case can be handled well enough by triggers.  (The
 OLD case cannot, because not seeing the row is different from erroring
 out when you do see it.)  There are other alternatives, like allowing
 the user to specify which behavior they want.  But I think that simply
 decreeing that the policy will apply not only to rows read but also
 rows written in all cases will be less flexible than we will
 ultimately want to be.

As discussed, we should add a security feature that is secure by
default. Adding options to make it less secure can follow initial
commit. We might even make it in this release if the review of the
main feature goes well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

Groshev Andrey wrote:

    Mismatch of relation names: database database, old rel 
  public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel 
  public.plob.ВерсияВнешнегоДокумента$Документ

There is a limit on identifiers of 63 *bytes* (not characters)
after which the name is truncated. In UTF8 encoding, the underscore
would be in the 64th position.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 postgres= INSERT INTO t1 VALUES (4,'ddd');
 INSERT 0 1
 postgres= INSERT INTO t1 VALUES (5,'eee');
 ERROR:  new row for relation t1 violates row-secirity
 DETAIL:  Failing row contains (5, eee).

 I've argued against this before - and maybe I should drop my
 objection, because a number of people seem to be on the other side.
 But I still think there will be some people who don't want this
 behavior.  Right now, for example, you can give someone INSERT but not
 SELECT permission on a table, and they will then be able to put rows
 into the table that they cannot read back.

There is also precedent for your opinion in the spec-mandated behavior
of updatable views: it is perfectly possible to INSERT a row that you
can't read back via the view, or UPDATE it to a state you can't see
via the view.  The RLS patch's current behavior corresponds to a view
created WITH CHECK OPTION --- which we don't support yet.  Whether
we add that feature soon or not, what seems important for the current
debate is that the SQL spec authors chose not to make it the default
behavior.  This seems to weigh heavily against making it the default,
much less only, behavior for RLS cases.

I'd also suggest that throw an error is not the only response that
people are likely to want for attempts to insert/update non-compliant
rows, so hard-wiring that choice is insufficiently flexible even if you
grant that local policy is to not allow such updates.  (As an example,
they might prefer to log the attempt and substitute some other value.)

 Previously, I suggested that we handle this by enforcing row-level
 security only on data read from the table - the OLD row, so to speak -
 and not on data written to the table - the NEW row, so to speak -
 because the latter case can be handled well enough by triggers.

+1.  I'm less than excited about RLS in the first place, so the less
complexity we have to put into the core system for it the better IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

Kevin Grittner kgri...@mail.com writes:
 Groshev Andrey wrote:
 Â  Mismatch of relation names: database database, old rel 
 public.lob.ÐÐµÑÑÐ¸ÑÐÐ½ÐµÑÐ½ÐµÐ³Ð¾ÐÐ¾ÐºÑÐ¼ÐµÐ½ÑÐ°$ÐÐ¾ÐºÑÐ¼ÐµÐ½Ñ_pkey,
  new rel 
 public.plob.ÐÐµÑÑÐ¸ÑÐÐ½ÐµÑÐ½ÐµÐ³Ð¾ÐÐ¾ÐºÑÐ¼ÐµÐ½ÑÐ°$ÐÐ¾ÐºÑÐ¼ÐµÐ½Ñ

 There is a limit on identifiers of 63 *bytes* (not characters)
 after which the name is truncated. In UTF8 encoding, the underscore
 would be in the 64th position.

Hmm ... that is a really good point, except that you are not counting
the lob. or plob. part, which we previously saw is part of the
relation name not the schema name.  Counting that part, it's already
overlimit, which seems to be proof that Andrey isn't using UTF8 but
some single-byte encoding.

Anyway, that would only explain the issue if pg_upgrade were somehow
changing the database encoding, which surely we'd have heard complaints
about already?  Or maybe this has something to do with pg_upgrade's
client-side encoding rather than the server encoding...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

Pavan Deolasee pavan.deola...@gmail.com writes:
 On Wed, Dec 19, 2012 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we start generating a lot of useless WAL activity and I/O as
 a result of thrashing the all-visible bit, it won't be so tolerable
 anymore.

 What if we wrap that into the WAL generated by HOT prune itself ?

What WAL?  The case we're worried about here is that there's nothing
else for HOT prune to do.

 I think my core point still stands: the way that HOT pruning is done now
 is an artifact of having wanted to shoehorn it into the system with
 minimum changes.  Which was reasonable at the time given the
 experimental status of the feature, but now it's time to reconsider.

 ISTM that you already have concret ideas about what are those places
 where HOT prune would be more effective.

No, I don't; I'm just suggesting that we ought to think outside the box
of the way it's being done now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On 19 December 2012 18:40, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 postgres= INSERT INTO t1 VALUES (4,'ddd');
 INSERT 0 1
 postgres= INSERT INTO t1 VALUES (5,'eee');
 ERROR:  new row for relation t1 violates row-secirity
 DETAIL:  Failing row contains (5, eee).

 I've argued against this before - and maybe I should drop my
 objection, because a number of people seem to be on the other side.
 But I still think there will be some people who don't want this
 behavior.  Right now, for example, you can give someone INSERT but not
 SELECT permission on a table, and they will then be able to put rows
 into the table that they cannot read back.

 There is also precedent for your opinion in the spec-mandated behavior
 of updatable views: it is perfectly possible to INSERT a row that you
 can't read back via the view, or UPDATE it to a state you can't see
 via the view.  The RLS patch's current behavior corresponds to a view
 created WITH CHECK OPTION --- which we don't support yet.  Whether
 we add that feature soon or not, what seems important for the current
 debate is that the SQL spec authors chose not to make it the default
 behavior.  This seems to weigh heavily against making it the default,
 much less only, behavior for RLS cases.

This is security, not spec compliance. By default, we need full security.

Nobody has argued that it should be the only behaviour, only that it
is the most typically requested behaviour and the most secure,
therefore the one we should do first.

 I'd also suggest that throw an error is not the only response that
 people are likely to want for attempts to insert/update non-compliant
 rows, so hard-wiring that choice is insufficiently flexible even if you
 grant that local policy is to not allow such updates.  (As an example,
 they might prefer to log the attempt and substitute some other value.)

 Previously, I suggested that we handle this by enforcing row-level
 security only on data read from the table - the OLD row, so to speak -
 and not on data written to the table - the NEW row, so to speak -
 because the latter case can be handled well enough by triggers.

 +1.  I'm less than excited about RLS in the first place, so the less
 complexity we have to put into the core system for it the better IMO.

Agree with the need for less complexity, but that decision increases
complexity for the typical user and does very little to the complexity
of the patch. Treating a security rule as a check constraint is
natural and obvious, so there are no core system problems here.

If we don't enforce rules on INSERT the user has to specifically add a
trigger, which makes things noticeably slower. There is more
maintenance work for the average user, less performance and more
mistakes to make.

The way to do this is by adding an option to allow users to specify
INSERT should be exempt from the security rule, which Kaigai and I
agreed on list some weeks back should come after the initial patch, to
no other comment.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW: ForeignPlan and parameterized paths

Ronan Dunklau rdunk...@gmail.com writes:
 I've noticed that,  when implementing a FDW, it is difficult to use a plan 
 which 
 best path is a parameterized path. This comes from the fact that the 
 parameterized clause is not easily available at plan time.

 This is what I understood from how it works:

 - The clauses coming from the best path restrictinfo are not available in the 
 scan_clauses argument to the GetForeignPlan function.

 - They are, however, directly available on the path, but at this point the 
 clauses are of the form InnerVar OPERATOR OuterVar. The outer Var node is 
 then 
 replaced by a Param node, using the replace_nestloop_params function.

 It could be useful to make the parameterized version of the clause (in the 
 form InnerVar OPERATOR Param) available to the fdw at plan time.

 Could this be possible ?

I intentionally did the nestloop_params substitution after calling
GetForeignPlan not before.  It's not apparent to me why it would be
useful to do it before, because the FDW is going to have no idea what
those params represent.  (Note that they represent values coming from
some other, probably local, relation; not from the foreign table.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

Simon Riggs wrote:

 This is security, not spec compliance. By default, we need full
 security.

But you are arguing that users should not be able to make something
secure if they, and everyone with the same permissions, could not
later access it. I thought about situations where I've seen a need
for something like this, and probably the best fit that I've seen
is the ability of a judge to order that something is sealed. There
are various levels where that can happen, but I'll focus on just
one which Wisconsin Courts have implemented at the application
level, but which would be nice to be able to support at the
database level.

Let's say we're talking about Milwaukee County, where hundreds of
people work for the courts and related organizations with some
rights to view the court data. Let's say a battered wife has moved
to a new address she wants to keep secret for safety. She files a
case with the court for a temporary restraining order, prohibiting
the husband from coming near her. The court needs her address for
the official record, but the judge will order the address sealed
so that only people with a certain authority can see it. The
authority is very limited, for obvious reasons.

It is quite likely that the person initially entering the address
and flagging it as sealed will not have authority to see the
address if they go back and look up the case. Neither will the
dozens of other people making the same kind of entries in the
county. Obviously, if the person doing the initial entry is a
friend of the husband, the data is compromised; but not allowing
entry of the data in a state sealed by people without authority to
look it up exposes the data to every other person with entry
authority, with fairly obvious risks.

The more secure behavior is to allow entry of data which will not
be visible by the person doing the entry.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On 19 December 2012 19:46, Kevin Grittner kgri...@mail.com wrote:

 But you are arguing that users should not be able to make something
 secure if they, and everyone with the same permissions, could not
 later access it.

Not exactly, no.

I've argued that row security should apply to ALL commands by default.
Which is exactly the same default as Oracle, as well as being the
obvious common sense  understanding of row security, which does not
cause a POLA violation.

I have no objection to an option to allow row security to not apply to
inserts, if people want that.

I do object to the idea that row security for inserts/updates should
only happen via triggers, which is an ugly and non-performant route,
as well as complicating security.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

2012-12-19 Thread Andres Freund

On 2012-12-19 14:46:18 -0500, Kevin Grittner wrote:
 Simon Riggs wrote:

  This is security, not spec compliance. By default, we need full
  security.

 But you are arguing that users should not be able to make something
 secure if they, and everyone with the same permissions, could not
 later access it. I thought about situations where I've seen a need
 for something like this, and probably the best fit that I've seen
 is the ability of a judge to order that something is sealed. There
 are various levels where that can happen, but I'll focus on just
 one which Wisconsin Courts have implemented at the application
 level, but which would be nice to be able to support at the
 database level.

 Let's say we're talking about Milwaukee County, where hundreds of
 people work for the courts and related organizations with some
 rights to view the court data. Let's say a battered wife has moved
 to a new address she wants to keep secret for safety. She files a
 case with the court for a temporary restraining order, prohibiting
 the husband from coming near her. The court needs her address for
 the official record, but the judge will order the address sealed
 so that only people with a certain authority can see it. The
 authority is very limited, for obvious reasons.

 It is quite likely that the person initially entering the address
 and flagging it as sealed will not have authority to see the
 address if they go back and look up the case. Neither will the
 dozens of other people making the same kind of entries in the
 county. Obviously, if the person doing the initial entry is a
 friend of the husband, the data is compromised; but not allowing
 entry of the data in a state sealed by people without authority to
 look it up exposes the data to every other person with entry
 authority, with fairly obvious risks.

 The more secure behavior is to allow entry of data which will not
 be visible by the person doing the entry.

I don't think it is that simple. Allowing inserts without regard for row
level restrictions makes it far easier to probe for data. E.g. by
inserting rows and checking for unique violations.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

Simon Riggs wrote:

 I've argued that row security should apply to ALL commands by default.
 Which is exactly the same default as Oracle, as well as being the
 obvious common sense understanding of row security, which does not
 cause a POLA violation.
 
 I have no objection to an option to allow row security to not apply to
 inserts, if people want that.
 
 I do object to the idea that row security for inserts/updates should
 only happen via triggers, which is an ugly and non-performant route,
 as well as complicating security.

In the software where I've seen this managed at an application
level, an address (for example) can be sealed by an update to the
sealed column or inserted with the sealed column set to true. I'm
not sure why you would want to allow one and not the other.

Now, I can envision a situation where it could make sense to use
the same predicate for limiting what a role could read and what a
role could write, but I'm not buying that that is more secure. In
fact, I see cases where you cannot achieve decent security without
the ability for both INSERT and UPDATE to write rows which security
excludes on reads. Functionally, I don't see anything which can't
be accomplished with just the read security and triggers.

I do agree that it would be nice if there were an easy way to
specify that the same predicate applies to both reads and writes.
I hope we can leave the syntax for this feature open to such
specification, even if the initial implementation only supports
limiting reads.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

Andres Freund wrote:

 I don't think it is that simple. Allowing inserts without regard for row
 level restrictions makes it far easier to probe for data. E.g. by
 inserting rows and checking for unique violations.

Unless you want to go to a military-style security level system
where people at each security level have a separate version of the
same data, primary keys (and I think other unique constraints) can
leak. It seems clear enough that sensitive data should not be used
for such constraints.

That doesn't even require completely meaningless numeric keys.
Court cases in Wisconsin have been numbered within county by year,
case type, a sequential portion, and an optional apha suffix since
before things were computerized -- you may know that there is a
2010 case in Dane County for mental commitment number 45, but that
doesn't leak any sensitive data.

In the sealed address example, if that were moved to the database
layer, someone might be able to test whether addess number 5
existed on a case by seeing whether an insert succeeded; but if it
did, there would be a record of that insert with their user ID that
they could not retract in any way -- they would know very little,
and be exposed as having done something inappropriate.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On 19 December 2012 20:23, Kevin Grittner kgri...@mail.com wrote:

 I hope we can leave the syntax for this feature open to such
 specification, even if the initial implementation only supports
 limiting reads.

Well, I hope the opposite: that we can support simple full security by
default, while leaving syntax open.

The basic model for this is complete separation of data between
customers/people. They can't see my data, I can't see theirs. Simple
privacy. Obvious.

Sure, more complex applications exist, but forcing the simple/common
usage to adopt triggers because of that is not a sensible way
forwards. Simple basic functionality, with an option for more advanced
cases is what we need. Setting a status flag so that the current user
no longer sees the row is a good example of more complex workflows in
secure applications, I agree, but its not the common case by any
means.

When we have these discussions about priority, it seems people think
this means don't do it ever. It doesn't, it means do the most
important things first and then do other stuff later. I always wish to
do both, but circumstances teach me that hard cutoffs and deadlines
mean we can't always have everything if debates overrun and decisions
aren't forthcoming.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

2012-12-19 Thread David Johnston

  The more secure behavior is to allow entry of data which will not be
  visible by the person doing the entry.
 
 I don't think it is that simple. Allowing inserts without regard for row
level
 restrictions makes it far easier to probe for data. E.g. by inserting rows
and
 checking for unique violations.
 

So the PK column(s) are not as secure as, say, the address-related column.
Vice-versa I may know that someone lives at a given address (because my
attempt to place someone else there failed) but I would have no way of
knowing who that other person is.  My recourse would be to escalate the
data-entry request to someone with higher security permissions who could
read and write to the appropriate tables and resolve the conflict.  In both
cases the direct write-only situation necessitates that some level of
exposure occurs.  The work-around if that is unacceptable would be to accept
all data but any entries that cannot be directly inserted into the table
would remain in a staging area that someone with higher security would have
to monitor and clear as needed.  The same intervention is required but in
the first situation you can at least avoid coding the special logic and
instead trade security for ease-of-use.

As a default level of security we could throw a generic secure DLL rejected
for ROW(...) and not tell the user anything about the cause.  If that
person knows all unique indexes and constraints defined on the table they
could use trial-and-error to discover information about stored records but
even then if they get an error on two different columns they still have no
way of knowing if those errors belong to the same record.

Beyond that level you provide the user with some information as to the cause
so that they have a reasonable chance to catch typos and other mistakes
instead of escalating an benign issue.

Lastly is the custom solution whereby the developers accept ALL data entered
as being correct but saved to a staging table.  A review process by someone
with higher security clearances would then process and clear out that table
as necessary.  If the user is write-only then regardless of whether the
entry succeeded or failed they are considered to be done with their task
at that point and no meaningful results from the system can be supplied to
them.

None of these options disallows the presence of non-security related check
constraints to be checked, enforced, and communicated to the user.

I've probably lost sight of the bigger picture as my response to mostly
informed by these last couple of messages.

David J.

 Greetings,
 
 Andres Freund
 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On 19 December 2012 20:37, Kevin Grittner kgri...@mail.com wrote:
 Andres Freund wrote:

 I don't think it is that simple. Allowing inserts without regard for row
 level restrictions makes it far easier to probe for data. E.g. by
 inserting rows and checking for unique violations.

 Unless you want to go to a military-style security level system
 where people at each security level have a separate version of the
 same data, primary keys (and I think other unique constraints) can
 leak. It seems clear enough that sensitive data should not be used
 for such constraints.

But there is the more obvious case where you shouldn't be able to
insert medical history for a patient you have no responsibility for.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] too much pgbench init output

On 19.12.2012 06:30, Jeevan Chalke wrote:
 
 
 
 On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 Hi,
 
 attached is a new version of the patch that
 
 (a) converts the 'log_step_seconds' variable to a constant (and does
 not allow changing it using a command-line option etc.)
 
 (b) keeps the current logging as a default
 
 (c) adds a -q switch that enables the new logging with a 5-second
 interval
 
 I'm still not convinced there should be yet another know for tuning the
 log interval - opinions?
 
 
 It seems that you have generated a patch over your earlier version and
 due to that it is not cleanly applying on fresh sources.
 Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

 However, I absolutely no issues with the design. Also code review is
 already done and looks good to me.
 I think to move forward on this we need someone from core-team. So I am
 planning to change its status to ready-for-committor.
 
 Before that please provide updated patch for final code review.

thanks
Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..f3953a7 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include portability/instr_time.h
 
 #include ctype.h
+#include math.h
 
 #ifndef WIN32
 #include sys/time.h
@@ -102,6 +103,7 @@ extern int  optind;
 #define MAXCLIENTS 1024
 #endif
 
+#define LOG_STEP_SECONDS   5   /* seconds between log messages */
 #define DEFAULT_NXACTS 10  /* default nxacts */
 
 intnxacts = 0; /* number of 
transactions per client */
@@ -150,6 +152,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+bool   use_quiet;  /* quiet logging onto stderr */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -389,6 +392,7 @@ usage(void)
 -v   vacuum all four standard tables before 
tests\n
   \nCommon options:\n
 -d print debugging output\n
+-q quiet logging (a message each 5 seconds)\n
 -h HOSTNAMEdatabase server host or socket directory\n
 -p PORTdatabase server port number\n
 -U USERNAMEconnect as specified database user\n
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
charsql[256];
int i;
 
+   /* used to track elapsed time and estimate of the remaining time */
+   instr_time  start, diff;
+   double  elapsed_sec, remaining_sec;
+   int log_interval = 1;
+
if ((con = doConnect()) == NULL)
exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
}
PQclear(res);
 
+   INSTR_TIME_SET_CURRENT(start);
+
for (i = 0; i  naccounts * scale; i++)
{
int j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
exit(1);
}
 
-   if (j % 10 == 0)
+   /* If we want to stick with the original logging, print a 
message each
+* 100k inserted rows. */
+   if ((! use_quiet)  (j % 10 == 0))
fprintf(stderr, %d of %d tuples (%d%%) done.\n,
-   j, naccounts * scale,
-   (int) (((int64) j * 100) / (naccounts * 
scale)));
+   j, naccounts * scale,
+   (int) (((int64) j * 
100) / (naccounts * scale)));
+   /* let's not call the timing for each row, but only each 100 
rows */
+   else if (use_quiet  (j % 100 == 0))
+   {
+   INSTR_TIME_SET_CURRENT(diff);
+   INSTR_TIME_SUBTRACT(diff, start);
+
+   elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+   remaining_sec = (scale * naccounts - j) * elapsed_sec / 
j;
+
+   /* have we reached the next interval (or end)? */
+   if ((j == scale * naccounts) || (elapsed_sec = 
log_interval * LOG_STEP_SECONDS)) {
+
+   fprintf(stderr, %d of %d tuples (%d%%) done 
(elapsed %.2f s, remaining %.2f s).\n,
+   j, naccounts * scale,
+

Re: [HACKERS] system administration functions with hardcoded superuser checks

On 19.12.2012 07:34, Magnus Hagander wrote:
 On Wed, Dec 19, 2012 at 1:58 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 18.12.2012 18:38, Pavel Stehule wrote:
 2012/12/18 Peter Eisentraut pete...@gmx.net:
 There are some system administration functions that have hardcoded
 superuser checks, specifically:

 pg_reload_conf
 pg_rotate_logfile

 Some of these are useful in monitoring or maintenance tools, and the
 hardcoded superuser checks require that these tools run with maximum
 privileges.  Couldn't we just install these functions without default
 privileges and allow users to grant privileges as necessary?

 isn't it too strong gun for some people ???

 I believe so some one can decrease necessary rights and it opens doors
 to system.

 No one was speaking about making them executable by a wider group of
 users by default (i.e. decreasing necessary rights). Today, when you
 need to provide the EXECUTE privilege on those functions, you have three
 options
 
 Given how limited these functions are in scope, I don't see a problem here.
 
 pg_read_file
 pg_read_file_all
 pg_read_binary_file
 pg_read_binary_file_all
 pg_stat_file
 pg_ls_dir

 is relative dangerous and I am not for opening these functions.

 power user can simply to write extension, but he knows what he does/

 I see only dangers that are already present.
 
 Granting executability on pg_read_xyz is pretty darn close to granting
 superuser, without explicitly asking for it. Well, you get read only
 superuser. If we want to make that step as easy as just GRANT, we
 really need to write some *very* strong warnings in the documentation
 so that people realize this. I doubt most people will realize it
 unless we do that (and those who don't read the docs, whch is probably
 a majority, never will).

Yup, that's what I meant by possibility to perform additional parameter
values checks ;-)

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

Simon Riggs wrote:
 Kevin Grittner kgri...@mail.com wrote:
 
 I hope we can leave the syntax for this feature open to such
 specification, even if the initial implementation only supports
 limiting reads.
 
 Well, I hope the opposite: that we can support simple full security by
 default, while leaving syntax open.
 
 The basic model for this is complete separation of data between
 customers/people. They can't see my data, I can't see theirs. Simple
 privacy. Obvious.

And something we already can handle several different ways.
Inheritance, schemas, etc. Allowing data to be fully secured from
prying eyes on entry, regardless of whether the role allowing the
entry has rights to see the data does not yet have any built-in
support.

 Sure, more complex applications exist, but forcing the simple/common
 usage to adopt triggers because of that is not a sensible way
 forwards. Simple basic functionality, with an option for more advanced
 cases is what we need. Setting a status flag so that the current user
 no longer sees the row is a good example of more complex workflows in
 secure applications, I agree, but its not the common case by any
 means.
 
 When we have these discussions about priority, it seems people think
 this means don't do it ever. It doesn't, it means do the most
 important things first and then do other stuff later. I always wish to
 do both, but circumstances teach me that hard cutoffs and deadlines
 mean we can't always have everything if debates overrun and decisions
 aren't forthcoming.

Well, it seems we have different views of what is intuitively
obvious here. What you suggest does not look to me to be more
secure (making full security a misnomer), simpler, nor more
important. Perhaps we can avoid divisive discussions on which is
more important if we can manage both in the initial implementation.
I guess the usual rule if we can't manage it is that a tie goes to
the author, which is neither you nor I.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

2012-12-19 Thread Yeb Havinga


On 2012-12-19 18:25, Robert Haas wrote:

On Tue, Dec 18, 2012 at 3:39 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

postgres= INSERT INTO t1 VALUES (4,'ddd');
INSERT 0 1
postgres= INSERT INTO t1 VALUES (5,'eee');
ERROR:  new row for relation t1 violates row-secirity
DETAIL:  Failing row contains (5, eee).

I've argued against this before - and maybe I should drop my
objection, because a number of people seem to be on the other side.
But I still think there will be some people who don't want this
behavior.  Right now, for example, you can give someone INSERT but not
SELECT permission on a table, and they will then be able to put rows
into the table that they cannot read back.  Similarly, in the RLS
case, it is not necessarily undesirable for a user to be able to
insert a row that they can't read back; or for them to be able to
update a row from a value that they can see to one that they cannot.
Some people will want to prohibit that, while others will not.


Maybe it is an idea to provide different RLS expressions for read and 
write. I remember reading a scenario (it might be well known in security 
land) where it is possible to write to authorization levels = users 
level, and read levels = the users level. In this setup Kevin's address 
example is possible, a user could write to e.g. the highest level, but 
then not read it anymore if his own level was lower than the highest. 
This setup also shows that to implement it, one would need a different 
expression for read and write (or the rls expression should know the 
query's commandtype).


regards,
Yeb



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strange OOM errors with EXECUTE in PL/pgSQL

Hi,

one of our local users reported he's getting OOM errors on 9.2, although
on 9.1 the code worked fine. Attached is a simple test-case that should
give you an OOM error almost immediately.

What it does:

1) creates a simple table called test with one text column.

2) creates a plpgsql function with one parameter, and all that function
   does is passing the parameter to EXECUTE

3) calls the function with a string containing many INSERTs into the
   test table

The way the EXECUTE is used is a bit awkward, but the failures seem a
bit strange to me. The whole script is ~500kB and most of that is about
11k of very simple INSERT statements:

   insert into test(value) values (''aa'');

all of them are exactly the same. Yet when it fails with OOM, the log
contains memory context stats like these:

TopMemoryContext: 5303376 total in 649 blocks; 2648 free ...
  PL/pgSQL function context: 8192 total in 1 blocks; 3160 free ...
  TopTransactionContext: 8192 total in 1 blocks; 6304 free ...
ExecutorState: 8192 total in 1 blocks; 7616 free ...
  ExprContext: 8192 total in 1 blocks; 8160 free ...
SPI Exec: 33554432 total in 14 blocks; 6005416 free ...
  CachedPlanSource: 3072 total in 2 blocks; 1856 free ...
  CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
  CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
  CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
  CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
  CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
  ...

There is ~9500 of these CachedPlanSource + CachedPlanQuery row pairs
(see the attached log). That seems a bit strange to me, because all the
queries are exactly the same in this test case.

The number of queries needed to get OOM is inversely proportional to the
query length - by using a longer text (instead of 'aaa') you may
use much less queries.

I am no expert in this area, but it seems to me that the code does not
expect that many INSERTs in EXECUTE and does not release the memory for
some reason (e.g. because the plans are allocated in SPI Exec memory
context, etc.).

regards
Tomas


pg-oom.log.gz
Description: application/gzip


test2.sql.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Simon,

 My logic is that if you make a 1 minute test you will notice your
 mistake, which is glaringly obvious. That is sufficient to prevent
 that mistake, IMHO.

What would such a test look like?  It's not obvious to me that there's any 
rapid way for a user to detect this situation, without checking each server 
individually.

If there's a quick and easy way to test for cycles from the user side, we 
should put it in documentation somewhere.

--Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request: pg_replication_master()


 It stalled because the patch author decided not to implement the
 request to detect recovery.conf in data directory, which allows
 backwards compatibility.

Well, I don't think we had agreement on how important backwards compatibility 
for recovery.conf was, particularly not on the whole 
recovery.conf/recovery.done functionality and the wierd formatting of 
recovery.conf.

However, with include_if_exists directives in postgresql.conf, or 
include_dir, that would be easy to work around.  Don't we have something like 
that planned for SET PERSISTENT?

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request: pg_replication_master()


 This sounds like my previous suggestion of returning the primary
 conninfo value, but with just ip. That one came with a pretty bad
 patch, and was later postponed until we folded recovery.conf into
 the main configuration file parsing. I'm not really sure what
 happened to that project? (the configuration file one)

Hmmm, good point.  Just having primary_conninfo it in pg_settings would help a 
lot.

--Josh 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request: pg_replication_master()

On 19 December 2012 22:19, Joshua Berkus j...@agliodbs.com wrote:

 It stalled because the patch author decided not to implement the
 request to detect recovery.conf in data directory, which allows
 backwards compatibility.

 Well, I don't think we had agreement on how important backwards compatibility 
 for recovery.conf was, particularly not on the whole 
 recovery.conf/recovery.done functionality and the wierd formatting of 
 recovery.conf.

As ever, we spent much energy on debating backwards compatibility
rather than just solving the problem it posed, which is fairly easy to
solve.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-12-19 Thread Jeff Davis

On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
  4. We need some general performance testing to show whether this is
  insane or not.

I ran a few tests.

Test 1 - find worst-case overhead for the checksum calculation on write:

   fsync = off
   bgwriter_lru_maxpages = 0
   shared_buffers = 1024MB
   checkpoint_segments = 64
   autovacuum = off

The idea is to prevent interference from the bgwriter or autovacuum.
Also, I turn of fsync so that it's measuring the calculation overhead,
not the effort of actually writing to disk.

drop table foo;
create table foo(i int, j int) with (fillfactor=50);
create index foo_idx on foo(i);
insert into foo select g%25, -1 from generate_series(1,1000) g;
checkpoint;
-- during the following sleep, issue an OS sync
-- to make test results more consistent
select pg_sleep(30);
\timing on
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
\timing off

I am measuring the time of the CHECKPOINT command, not the update. The
update is just to dirty all of the pages (they should all be HOT
updates). Without checksums, it takes about 400ms. With checksums, it
takes about 500ms. That overhead is quite low, considering that the
bottleneck is almost always somewhere else (like actually writing to
disk).

Test 2 - worst-case overhead for calculating checksum while reading data

Same configuration as above. This time, just load a big table:

drop table foo;
create table foo(i int, j int) with (fillfactor=50);
insert into foo select g%25, -1 from generate_series(1,1000) g;
-- make sure hint bits and PD_ALL_VISIBLE are set everywhere
select count(*) from foo;
vacuum;
vacuum;
vacuum;
select relfilenode from pg_class where relname='foo';

Then shut down the server and restart it. Then do a cat
data/base/12055/*  /dev/null to get the table loaded into the OS
buffer cache. Then do:

\timing on
SELECT COUNT(*) FROM foo;

So, shared buffers are cold, but OS cache is warm. This should test the
overhead of going from the OS to shared buffers, which requires the
checksum calculation. Without checksums is around 820ms; with checksums
around 970ms. Again, this is quite reasonable, because I would expect
the bottleneck to be reading from the disk rather than the calculation
itself.

Test 3 - worst-case WAL overhead

For this test, I also left fsync off, because I didn't want to test the
effort to flush WAL (which shouldn't really be required for this test,
anyway). This was simpler:

  drop table foo;
  create table foo(i int, j int) with (fillfactor=50);
  insert into foo select g%25, -1 from generate_series(1,1000) g;
  checkpoint;
  select pg_sleep(1);
  checkpoint;
  select pg_sleep(30); -- do an OS sync while this is running
  \timing on
  SELECT COUNT(*) FROM foo;

Without checksums, it takes about 1000ms. With checksums, about 2350ms.
I also tested with checksums but without the CHECKPOINT commands above,
and it was also 1000ms.

This test is more plausible than the other two, so it's more likely to
be a real problem. So, the biggest cost of checksums is, by far, the
extra full-page images in WAL, which matches our expectations.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On 19 December 2012 17:26, Pavan Deolasee pavan.deola...@gmail.com wrote:

 We definitely
 made great progress by having HOT

Yes, definitely. Great work. That is not for debate.

 But I think generally HOT made great difference to the
 system as a whole, may be at a cost of slowdown for some read-only,
 select queries. And HOT prune is not the only operation that we do in
 foreground. We also set hint bits and make buffers dirty in an
 otherwise read-only queries.

And those last things are being debated hotly. We definitely need to
ask whether the way things are now can be tweaked to be better. The
major mechanics need not be reviewed, but the tradeoffs and balances?
Definitely.

Anything we do in foreground needs evaluation. Assuming eager actions
give a good payoff is not always a useful thought.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Thu, Dec 20, 2012 at 12:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Pavan Deolasee pavan.deola...@gmail.com writes:


 What if we wrap that into the WAL generated by HOT prune itself ?

 What WAL?  The case we're worried about here is that there's nothing
 else for HOT prune to do.


Does such a case exist ? Or at least, is it that common ? I mean, we
have enough checks in place to ensure that HOT prune is attempted only
when there is something interesting in the page to be done. Otherwise
we don't even attempt getting a cleanup lock on the page. Of course,
they are just hints, but they serve very well. Prune XID is what I've
in mind in particular.

So the fact that the visibility map bit is cleared, it could be
because either at least one tuple in the page was updated, deleted or
inserted. The first two would have set prune XID and will trigger an
HOT action and HOT prune will indeed do something useful. I think
aborted non-HOT update may create a scenario that you're talking about
i.e. HOT prune will have nothing to do, but the page again turned
all-visible. Being an abort path, I wonder if its really that common
though.

That leaves us with the inserts which will clear the VM bit, but may
not have anything for HOT prune to do. But we don't set prune XID for
inserts either. So we won't get into hot_page_prune() for such pages.

So my point is, for fairly large and common cases, often we will set
the bit only when HOT prune did something useful, though not every
useful HOT prune will necessarily set the bit. And even if we slip
through all the safety nets on HOT prune, we can choose to set the bit
only if HOT did something useful to avoid any extra WAL logging
assuming we are still worried about those corner cases.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Thu, Dec 20, 2012 at 5:05 AM, Simon Riggs si...@2ndquadrant.com wrote:


 And those last things are being debated hotly. We definitely need to
 ask whether the way things are now can be tweaked to be better. The
 major mechanics need not be reviewed, but the tradeoffs and balances?
 Definitely.


I have zero objection to do that, just that I don't have solid ideas
right now. And its not because I haven't thought hard enough.

 Anything we do in foreground needs evaluation. Assuming eager actions
 give a good payoff is not always a useful thought.


I don't disagree. Your field experience is much larger than mine, but
I have spent hours testing PostgreSQL's performance, so can talk with
some degree of conviction. I think when we do things that can reduce
read/write IO or bloat of a large table in general, the system as a
whole benefits, may be at a cost of some genuinely good guy doing a
simple SELECT in this case. Often the SELECTs are also benefited
because one of their good siblings helped us reduce bloat of the table
and hence seq scans had to scan order of magnitude less blocks.

I just thought that we can fairly easily limit the damage if we are
really worried about SELECTs being penalised. What if we set a
configurable limit on *extra* things that a query may do which is
otherwise not very useful for the query itself, but is useful to keep
the system healthy and steady. HOT prune definitely counts as one of
them and may be even setting of hint bits. (This is a topic for a
separate thread though)

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system administration functions with hardcoded superuser checks

Tomas Vondra t...@fuzzy.cz writes:
 On 19.12.2012 07:34, Magnus Hagander wrote:
 Granting executability on pg_read_xyz is pretty darn close to granting
 superuser, without explicitly asking for it. Well, you get read only
 superuser. If we want to make that step as easy as just GRANT, we
 really need to write some *very* strong warnings in the documentation
 so that people realize this. I doubt most people will realize it
 unless we do that (and those who don't read the docs, whch is probably
 a majority, never will).

 Yup, that's what I meant by possibility to perform additional parameter
 values checks ;-)

Yeah, which is easily done if you've written a wrapper function and not
so easily otherwise.  Between that and the point about how pg_dump
wouldn't preserve GRANTs done directly on system functions, I think this
proposal isn't going anywhere anytime soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Daniel Farina

On Tue, Dec 18, 2012 at 7:03 PM, Josh Berkus j...@agliodbs.com wrote:
 2. should we warn the user, or refuse to start up?

One nice property of allowing cyclicity is that it's easier to
syndicate application of WAL to a series of standbys before promotion
of exactly one to act as a primary (basically, to perform catch-up).
One could imagine someone wanting a configuration that was like:

 +r2
 | |
r1 ---+

This is only one step before:

r1r2

or

r2r1

(and, most importantly, after the cycle quiesces one can choose either one)

For my use, I'm not convinced that such checks and warnings are useful
if delivered by default, and I think outright rejection of cyclicity
is harmful.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On Wed, Dec 19, 2012 at 12:54 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I can see a use case for not having security apply for users who have
 *only* INSERT privilege. This would allow people to run bulk loads of
 data into a table with row security. We should add that. That is not
 the common case, so with proper documentation that should be a useful
 feature without relaxing default security.

 Never applying security for INSERT and then forcing them to add BEFORE
 triggers if they want full security is neither secure nor performant.

I think INSERT vs. not-INSERT is not the relevant distinction, because
the question also arises for UPDATE.  In the UPDATE case, the question
is whether the RLS qual should be checked only against the OLD tuple
(to make sure that we can see the tuple to modify it) or also against
the NEW tuple (to make sure that we're not modifying it to a form that
we can no longer see).  In other words, the question is not do we
support all of the commands? but rather do we check not only the
tuple read but also the tuple written?.  For INSERT, we only write a
tuple, without reading.  For SELECT and DELETE, we only read a tuple,
without writing a new one.  UPDATE does both a read and a write.

 Previously, I suggested that we handle this by enforcing row-level
 security only on data read from the table - the OLD row, so to speak -
 and not on data written to the table - the NEW row, so to speak -
 because the latter case can be handled well enough by triggers.  (The
 OLD case cannot, because not seeing the row is different from erroring
 out when you do see it.)  There are other alternatives, like allowing
 the user to specify which behavior they want.  But I think that simply
 decreeing that the policy will apply not only to rows read but also
 rows written in all cases will be less flexible than we will
 ultimately want to be.

 As discussed, we should add a security feature that is secure by
 default. Adding options to make it less secure can follow initial
 commit. We might even make it in this release if the review of the
 main feature goes well.

Saying that something is or is not secure is not meaningful without
defining what you want to be secure against.  There's nothing
insecure about checking only the tuples read; it's just a different
(and useful) threat model.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Row Level Security

On Wed, Dec 19, 2012 at 1:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If we don't enforce rules on INSERT the user has to specifically add a
 trigger, which makes things noticeably slower. There is more
 maintenance work for the average user, less performance and more
 mistakes to make.

Well, again, only if that's the behavior they want.

Also, it's also worth noting that, even if we assume that it is in
fact the behavior that users will want, the contention that it is
faster than a trigger is thus far unsubstantiated by any actual
benchmarks.  It may indeed be faster ... but I don't know without
testing whether it's slightly faster or a whole lot faster.  That
might be a good thing to find out, because if it is a whole lot
faster, that would certainly strengthen the case for including a mode
that works that way, whether or not we also provide other options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request: pg_replication_master()

On Wed, Dec 19, 2012 at 5:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 As ever, we spent much energy on debating backwards compatibility
 rather than just solving the problem it posed, which is fairly easy to
 solve.

I'm still of the opinion (as were a lot of people on the previous
thread, IIRC) that just making them GUCs and throwing backward
compatibility under the bus is acceptable in this case.  Changes that
break application code are anathema to me, because people can have a
LOT of application code and updating it can be REALLY hard.  The same
cannot be said about recovery.conf - you have at most one of those per
standby, and if it needs to be changed in some way, you can do it with
a very small Perl script.  Yes, third-party tools will need to be
updated; that is surely a downside, but I think it might be a
tolerable one in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 This can be handled by breaking 1-to-1 mapping on VM bit and
 PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
 was proposed by Andres up thread, but shot down by Tom and Simon. But
 I still feel that was over reaction and there is a lot of merit in the
 idea. As I said elsewhere, it will also help the case when there are
 DEAD line pointers in a page. Today we can't mark such pages
 all-visible, but if we break this mapping, we can do that.

Sure, but you're zipping rather blithely past the disadvantages of
such an approach.  Jeff Davis recently proposed getting rid of
PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism
about that; this proposal has the same problems.  One of the major
benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts,
updates, and deletes to the page can ignore the visibility map.  That
means that a server under heavy concurrency is much less likely to
encounter contention on the visibility map blocks.  Now, maybe that's
not really a problem, but I sure haven't seen enough evidence to make
me believe it.  If it's really true that PD_ALL_VISIBLE needn't fill
this role, then Heikki wasted an awful lot of time implementing it,
and I wasted an awful lot of time keeping it working when I made the
visibility map crash-safe for IOS.  That could be true, but I tend to
think it isn't.

 I would like to run some pgbench tests where we get the system in a
 steady state such as all/most updates are HOT updates (not entirely
 unlikely scenario for many real life cases). And then try running some
 concurrent queries which can be executed via IOS. My gut feel is that,
 today we will see slow and continuous drop in performance for these
 queries because IOS will slowly stop working.

If there are no vacuums, I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The benefit of saying that only UPDATEs clean the block is that this
 penalises only the workload making the mess, rather than everybody
 cleaning up repeatedly over one messy guy.

Right, but there are plenty of situations where having everybody clean
up after the messy guy is better than waiting around and hoping that
Mom (aka vacuum) will do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] operator dependency of commutator and negator, redux

Bug #7758 seems to be a rediscovery of the behavior that Itagaki-san
complained of a couple years ago:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg02035.php

While reconsidering the various not-too-satisfactory fixes we thought of
back then, I had a sudden thought.  Instead of having a COMMUTATOR or
NEGATOR forward reference create a shell operator and link to it,
why not simply *ignore* such references?  Then when the second operator
is defined, go ahead and fill in both links?

The only case where this could result in an unsatisfactory outcome is
if the second operator's CREATE command fails to include the converse
COMMUTATOR or NEGATOR reference ... but that doesn't work very nicely
today anyway, as you end up with a unidirectional reference, hardly a
desirable state of affairs.

Not only does this solve the problem complained of, but it allows for
much stronger error checking, as there is no longer any need to allow
inconsistent catalog states even transiently.  We could start treating
commutator/negator references as true dependencies, permanently
preventing dangling references.  We could probably even get rid of the
notion of shell operators altogether.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange OOM errors with EXECUTE in PL/pgSQL

Tomas Vondra t...@fuzzy.cz writes:
 What it does:

 1) creates a simple table called test with one text column.

 2) creates a plpgsql function with one parameter, and all that function
does is passing the parameter to EXECUTE

 3) calls the function with a string containing many INSERTs into the
test table

 The way the EXECUTE is used is a bit awkward, but the failures seem a
 bit strange to me. The whole script is ~500kB and most of that is about
 11k of very simple INSERT statements:

insert into test(value) values (''aa'');

The reason this fails is that you've got a half-megabyte source string,
and each of the 11000 plans that are due to be created from it saves
its own copy of the source string.  Hence, 5500 megabytes needed just
for source strings.

We could possibly fix this by inventing some sort of reference-sharing
arrangement (which'd be complicated and fragile) or by not storing the
source strings with the plans (which'd deal a serious blow to our
ability to provide helpful error messages).  Neither answer seems
appealing.

I think it would be a better idea to adopt a less brain-dead way of
processing the data.  Can't you convert this to a single INSERT with a
lot of VALUES rows?  Or split it into multiple EXECUTE chunks?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

On 19.12.2012 02:18, Andres Freund wrote:
 On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
 
 I think except of the temp buffer issue mentioned below its ready.
 
 -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
 +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
  {
 -int i;
 +int i, j;
 +
 +/* sort the list of rnodes */
 +pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);

  /* If it's a local relation, it's localbuf.c's problem. */
 -if (RelFileNodeBackendIsTemp(rnode))
 +for (i = 0; i  nnodes; i++)
  {
 -if (rnode.backend == MyBackendId)
 -DropRelFileNodeAllLocalBuffers(rnode.node);
 -return;
 +if (RelFileNodeBackendIsTemp(rnodes[i]))
 +{
 +if (rnodes[i].backend == MyBackendId)
 +DropRelFileNodeAllLocalBuffers(rnodes[i].node);
 +}
  }
 
 While you deal with local buffers here you don't anymore in the big loop
 over shared buffers. That wasn't needed earlier since we just returned
 after noticing we have local relation, but thats not the case anymore.

Hmm, but that would require us to handle the temp relations explicitly
wherever we call DropRelFileNodeAllBuffers. Currently there are two such
places - smgrdounlink() and smgrdounlinkall().

By placing it into DropRelFileNodeAllBuffers() this code is shared and I
think it's a good thing.

But that does not mean the code is perfect - it was based on the
assumption that if there's a mix of temp and regular relations, the temp
relations will be handled in the first part and the rest in the second one.

Maybe it'd be better to improve it so that the temp relations are
removed from the array after the first part (and skip the second one if
there are no remaining relations).

 
  for (i = 0; i  NBuffers; i++)
  {
 +RelFileNodeBackend *rnode = NULL;
  volatile BufferDesc *bufHdr = BufferDescriptors[i];
 -
 +
  /*
   * As in DropRelFileNodeBuffers, an unlocked precheck should be 
 safe
   * and saves some cycles.
   */
 -if (!RelFileNodeEquals(bufHdr-tag.rnode, rnode.node))
 +
 +/*
 + * For low number of relations to drop just use a simple walk 
 through,
 + * to save the bsearch overhead. The BSEARCH_LIMIT is rather a 
 guess
 + * than a exactly determined value, as it depends on many 
 factors (CPU
 + * and RAM speeds, amount of shared buffers etc.).
 + */
 +if (nnodes = BSEARCH_LIMIT)
 
 I think thats a sensible plan. It makes sense that for a small number of
 relations a sequential scan of the rnodes array is faster than a bsearch
 and 10 sounds like a good value although I would guess the optimal value
 is slightly higher on most machines. But if it works fine without
 regressions thats pretty good...

I think it's pointless to look for the optimal value in this case, given
on how many factors it depends. We could use 20 instead of 10, but I
wouldn't go higher probably.

 +
 +/*
 + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), 
 so
 + * that it's suitable for bsearch.
 + */
 +static int
 +rnode_comparator(const void * p1, const void * p2)
 +{
 +RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
 +RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
 +
 +if (n1.node.relNode  n2.node.relNode)
 +return -1;
 +else if (n1.node.relNode  n2.node.relNode)
 +return 1;
 +
 +if (n1.node.dbNode  n2.node.dbNode)
 +return -1;
 +else if (n1.node.dbNode  n2.node.dbNode)
 +return 1;
 +
 +if (n1.node.spcNode  n2.node.spcNode)
 +return -1;
 +else if (n1.node.spcNode  n2.node.spcNode)
 +return 1;
 +else
 +return 0;
 +}
 
 Still surprised this is supposed to be faster than a memcmp, but as you
 seem to have measured it earlier..

It surprised me too. These are the numbers with the current patch:

1) one by one
=
  0246810   12   14   16   18   20
--
current  15   22   28   34   4175   77   82   92   99  106
memcmp   16   23   29   36   44   122  125  128  153  154  158

Until the number of indexes reaches ~10, the numbers are almost exactly
the same. Then the bsearch branch kicks in and it's clear how much
slower the memcmp comparator is.

2) batches of 100
=
  0246810   12   14   16   18   20
--
current   358   10   1215   17   21   23   27   31
memcmp47   10   13   1619   22   28   30   32   36

Here the difference is much smaller, but even here the memcmp is
consistently a bit slower.


My

[HACKERS] discarding duplicate indexes

2012-12-19 Thread Josh Kupershmidt

I recently came across a scenario like this (tested on git head):


CREATE TABLE test (id int);
CREATE INDEX test_idx1 ON test (id);
CREATE INDEX test_idx2 ON test (id);

CREATE TABLE test_copycat (LIKE test INCLUDING ALL);
\d test_copycat


Why do we end up with only one index on test_copycat? The culprit
seems to be transformIndexConstraints(), which explains:

   * Scan the index list and remove any redundant index specifications. This
   * can happen if, for instance, the user writes UNIQUE PRIMARY KEY. A
   * strict reading of SQL92 would suggest raising an error instead, but
   * that strikes me as too anal-retentive. - tgl 2001-02-14

and this code happily throws out the second index statement in this
example, since its properties are identical to the first. (Side note:
some index properties, such as tablespace specification and comment,
are ignored when determining duplicates). This behavior does seem like
a minor POLA violation to me -- if we do not forbid duplicate indexes
on the original table, it seems surprising to do so silently with
INCLUDING INDEXES.

There was consideration of similar behavior when this patch was
proposed[1], so perhaps the behavior is as-designed, and I guess no
one else has complained. IMO this behavior should at least be
documented under the LIKE source_table section of CREATE TABLE's doc
page.

Josh

[1] http://archives.postgresql.org/pgsql-patches/2007-07/msg00173.php


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange OOM errors with EXECUTE in PL/pgSQL

On 20.12.2012 02:29, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 What it does:
 
 1) creates a simple table called test with one text column.
 
 2) creates a plpgsql function with one parameter, and all that function
does is passing the parameter to EXECUTE
 
 3) calls the function with a string containing many INSERTs into the
test table
 
 The way the EXECUTE is used is a bit awkward, but the failures seem a
 bit strange to me. The whole script is ~500kB and most of that is about
 11k of very simple INSERT statements:
 
insert into test(value) values (''aa'');
 
 The reason this fails is that you've got a half-megabyte source string,
 and each of the 11000 plans that are due to be created from it saves
 its own copy of the source string.  Hence, 5500 megabytes needed just
 for source strings.
 
 We could possibly fix this by inventing some sort of reference-sharing
 arrangement (which'd be complicated and fragile) or by not storing the
 source strings with the plans (which'd deal a serious blow to our
 ability to provide helpful error messages).  Neither answer seems
 appealing.

Thanks for the explanation, I didn't occur to me that each plan keeps a
copy of the whole source string.

 I think it would be a better idea to adopt a less brain-dead way of
 processing the data.  Can't you convert this to a single INSERT with a
 lot of VALUES rows?  Or split it into multiple EXECUTE chunks?

Well, it's not my app but I'll recommend it to them. Actually I already
did but I didn't have an explanation of why it behaves like this.

The really annoying bit is that in 9.1 this works fine (it's just as
crazy approach as on but it does not end with OOM error).

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parser Cruft in gram.y

2012-12-19 Thread Greg Stark

On Tue, Dec 18, 2012 at 10:44 PM, Robert Haas robertmh...@gmail.com wrote:
 Yeah, that's why I don't know how to make it work.  It feels like this
 is partly artifact of the tool, though.  I mean, suppose we haven't
 read anything yet.  Then, the next token can't be an IDENT, so if we
 see an unreserved keyword, we know we're not going to convert it to an
 IDENT.  OTOH, if we've seen CREATE TABLE, the next token cannot be an
 unreserved keyword that is intended as a keyword; it has to be
 something that will reduce to ColId.

 I guess the problem situation is where we can shift the keyword and
 then use the following token to decide whether to reduce it to
 ColId/type_function_name/ColLabel or use some other rule instead; the
 CREATE INDEX CONCURRENTLY productions might be such a case.

It seems to me the avenue for simplifying the transition table would
be keywords that can never be used in the same place. That is, if we
replaced all the elements of such a set with a single token then the
grammar would be unambigous and we could insert a check that the right
actual token was present in each place it's used. I'm thinking of the
various noise words that the SQL standard introduces which are all
going to be reduced to IDENT except for a few places each of which
will only admit one such noise word anyways.

I think doing this manually would be unmaintainable since every time
we modified the grammar it would introduce random unpredictable
conflicts which would be hard to debug. But I wonder if we could
preparse the transitions table, find any such large sets and rewrite
either the transition table or regenerate the grammar and rerun bison
on it.

Alternately we could just replace the transition table with a
representation that is less wasteful such as a list of perfect hash
tables just large enough to hold the valid transition. Or even a
single very large perfect hash table where the key is state,token.

But I'm not entirely convinced any of this is actually useful. Just
becuase the transition table is large doesn't mean it's inefficient.
Most of these bytes are being wasted on transitions which can never
occur or which can never occur in syntactically valid SQL. The
transitions which can occur will still be present in any condensed
representation we come up with. The L2 cache might still be large
enough to hold these hot transitions which might not be a very large
subset at all.

valgrind comes with a tool called cachegrind which can emulate the
cache algorithm on some variants of various cpus and produce reports.
Can it be made to produce a report for a specific block of memory?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote:
 
  Can you post the full definition of the table on this public email list?
  Also, why did the error think this was in the public schema?  Any idea?
 
  ---
 
   18.12.2012, 19:38, Bruce Momjian br...@momjian.us:
   On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote:
    Mismatch of relation names: database database, old rel 
  public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel 
  public.plob.ВерсияВнешнегоДокумента$Документ
    Failure, exiting
 .. snip 
 
 It's all what I'm found about this table.
 
 
 --
 -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; 
 Owner: postgres; Tablespace: 
 --
 
 CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ (
 @Файл integer NOT NULL,
 Страница integer NOT NULL,
 Данные bytea
 );
 
 
 ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres;
 
 --
 -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: 
 public; Owner: postgres; Tablespace: 
 --
 
 ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
 ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ 
   PRIMARY KEY (@Файл, Страница);
 
 
 --
 -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; 
 Schema: public; Owner: postgres
 --
 
 ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
 ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл 
   FOREIGN KEY (@Файл) 
   REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) 
   ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE;
 
 --
 -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; 
 Owner: postgres
 --
 
 REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC;
 REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres;
 GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres;
 GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user;
 
 
 There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
 It is referenced by a foreign key 
 (rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
 But as I understand it, the problem with the primary key.

[  Sorry I have not been replying promptly.  I have been sick with the
flue for the past four days, and while I read the email promptly, my
brain isn't sharp enough to send email out for everyone to read.  I am
better today so hopefully I will be 100% soon. ]

OK, this tells me that the period is in the table name:

-- Name: 
lob.ВерсияВнешнегоДокумента$Документ;
Type: TABLE; Schema: public; Owner: postgres; Tablespace:

I needed to check that the period wasn't a symptom of a bug.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parser Cruft in gram.y

Greg Stark st...@mit.edu writes:
 But I'm not entirely convinced any of this is actually useful. Just
 becuase the transition table is large doesn't mean it's inefficient.

That's a fair point.  However, I've often noticed base_yyparse() showing
up rather high on profiles --- higher than seemed plausible at the time,
given that its state-machine implementation is pretty tight.  Now I'm
wondering whether that isn't coming from cache stalls from trying to
touch all the requisite parts of the transition table.

 valgrind comes with a tool called cachegrind which can emulate the
 cache algorithm on some variants of various cpus and produce reports.
 Can it be made to produce a report for a specific block of memory?

I believe that oprofile can be persuaded to produce statistics about
where in one's code are the most cache misses, not just the most
wall-clock ticks; which would shed a lot of light on this question.
However, my oprofile-fu doesn't quite extend to actually persuading it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

On Wed, Dec 19, 2012 at 12:56:05PM -0500, Kevin Grittner wrote:
Groshev Andrey wrote:

Mismatch of relation names: database database, old rel
public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel
public.plob.ВерсияВнешнегоДокумента$Документ

There is a limit on identifiers of 63 *bytes* (not characters)
after which the name is truncated. In UTF8 encoding, the underscore
would be in the 64th position.

OK, Kevin is certainly pointing out a bug in the pg_upgrade code, though
I am unclear how it would exhibit the mismatch error reported.

pg_upgrade uses NAMEDATALEN for database, schema, and relation name
storage lengths. While NAMEDATALEN works fine in the backend, it is
possible that a frontend client, like pg_upgrade, could retrieve a name
in the client encoding whose length exceeds NAMEDATALEN if the client
encoding did not match the database encoding (or is it the cluster
encoding for system tables). This would cause truncation of these
values. The truncation would not cause crashes, but might cause
failures by not being able to connect to overly-long database names, and
it weakens the checking of relation/schema names --- the same check that
is reported above.

(I believe initdb.c also erroneously uses NAMEDATALEN.)

For this to be the cause of the users report, there would have to be
different truncation behavior for old and new clusters when you restore
the dump. Did we change how this somehow between 9.0 and 9.1?

In summary, we are getting closer to a fix, but we are not there yet. I
can supply a patch that removes the use of NAMEDATALEN and you can test
that, but again, I don't see how that can cause this.

--
Bruce Momjian br...@momjian.ushttp://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote:
 
  Can you post the full definition of the table on this public email list?
  Also, why did the error think this was in the public schema?  Any idea?
 
  ---
 
   18.12.2012, 19:38, Bruce Momjian br...@momjian.us:
   On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote:
    Mismatch of relation names: database database, old rel 
  public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel 
  public.plob.ВерсияВнешнегоДокумента$Документ
    Failure, exiting
 .. snip 
 
 It's all what I'm found about this table.
 
 
 --
 -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; 
 Owner: postgres; Tablespace: 
 --
 
 CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ (
 @Файл integer NOT NULL,
 Страница integer NOT NULL,
 Данные bytea
 );
 
 
 ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres;
 
 --
 -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: 
 public; Owner: postgres; Tablespace: 
 --
 
 ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
 ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ 
   PRIMARY KEY (@Файл, Страница);
 
 
 --
 -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; 
 Schema: public; Owner: postgres
 --
 
 ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
 ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл 
   FOREIGN KEY (@Файл) 
   REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл) 
   ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE;
 
 --
 -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; 
 Owner: postgres
 --
 
 REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC;
 REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres;
 GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres;
 GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user;
 
 
 There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
 It is referenced by a foreign key 
 (rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
 But as I understand it, the problem with the primary key.

Does the old database have a table with prefix plob., called
plob.ВерсияВнешнегоДокумента$Документ?

If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a
table with that name mentioned?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

On Thu, Dec 20, 2012 at 6:12 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Dec 19, 2012 at 12:26 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 This can be handled by breaking 1-to-1 mapping on VM bit and
 PD_ALL_VISIBLE bit. So seq scans will only look at PD_ALL_VISIBLE. It
 was proposed by Andres up thread, but shot down by Tom and Simon. But
 I still feel that was over reaction and there is a lot of merit in the
 idea. As I said elsewhere, it will also help the case when there are
 DEAD line pointers in a page. Today we can't mark such pages
 all-visible, but if we break this mapping, we can do that.

 Sure, but you're zipping rather blithely past the disadvantages of
 such an approach.

Hmm. You're right. I did not think about the disadvantages and now
that you mention them, I feel they are important.

  Jeff Davis recently proposed getting rid of
 PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism
 about that; this proposal has the same problems.  One of the major
 benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts,
 updates, and deletes to the page can ignore the visibility map.  That
 means that a server under heavy concurrency is much less likely to
 encounter contention on the visibility map blocks.  Now, maybe that's
 not really a problem, but I sure haven't seen enough evidence to make
 me believe it.  If it's really true that PD_ALL_VISIBLE needn't fill
 this role, then Heikki wasted an awful lot of time implementing it,
 and I wasted an awful lot of time keeping it working when I made the
 visibility map crash-safe for IOS.  That could be true, but I tend to
 think it isn't.


Yeah, VM buffer contention can become prominent if we break the
invariant that page level bit status implies the vm bit status, at
least when its clear.OTOH IMHO we need some mechanism to address the
issue of aggressive clearing of the VM bits, but a very lame
corresponding set operation. Today we don't have much contention on
the VM page, but we must be sacrificing its usability in return. IOS
as well as vacuum optimizations using VMs will turn out not so useful
for many workloads. I'm very reluctant to suggest that we can solve
this my setting aside another page-level bit to track visibility of
tuples for heapscans. Or even have a bit in the tuple header itself to
track this information at that level to avoid repeated visibility
check for a tuple which is known to be visible to all current and
future transactions.

 I would like to run some pgbench tests where we get the system in a
 steady state such as all/most updates are HOT updates (not entirely
 unlikely scenario for many real life cases). And then try running some
 concurrent queries which can be executed via IOS. My gut feel is that,
 today we will see slow and continuous drop in performance for these
 queries because IOS will slowly stop working.

 If there are no vacuums, I agree.


And we expect vacuums to be very less or none. AFAIR in pgbench, it
now takes hours for accounts table to get chosen for vacuum and we
should be happy about it. But IOS are almost impossible for pgbench
kind of workloads today because of our aggressive strategy to clear
the VM bits.

Thanks,
Pavan




-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

On Wed, Dec 19, 2012 at 10:35:11PM -0500, Bruce Momjian wrote:
  There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
  It is referenced by a foreign key 
  (rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
  But as I understand it, the problem with the primary key.
 
 Does the old database have a table with prefix plob., called
 plob.ВерсияВнешнегоДокумента$Документ?
 
 If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a
 table with that name mentioned?

Also, when you say rlob above, is the 'r' a Latin letter sound that
would look like a Russian 'p' in the error message?  (In Cyrillic, a
Latin-looking p sounds like Latin-sounding r.)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request: pg_replication_master()

2012-12-19 Thread Amit Kapila

On Thursday, December 20, 2012 3:50 AM Joshua Berkus wrote:
 
  It stalled because the patch author decided not to implement the
  request to detect recovery.conf in data directory, which allows
  backwards compatibility.
 
 Well, I don't think we had agreement on how important backwards
 compatibility for recovery.conf was, particularly not on the whole
 recovery.conf/recovery.done functionality and the wierd formatting of
 recovery.conf.
 
 However, with include_if_exists directives in postgresql.conf, or
 include_dir, that would be easy to work around.  Don't we have
 something like that planned for SET PERSISTENT?

Yes in SET PERSISTENT patch, it uses include_dir.

I wonder why can't we get this information from WALRcvData structure?
It has the required information.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

No, old database not use table plob.. 
only primary key

--
-- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: 
public; Owner: postgres; Tablespace:
--


-- For binary upgrade, must preserve pg_class oids
SELECT binary_upgrade.set_next_index_pg_class_oid('786665369'::pg_catalog.oid);

ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ PRIMARY KEY 
(@Файл, Страница);





20.12.2012, 06:35, Bruce Momjian br...@momjian.us:
 On Wed, Dec 19, 2012 at 01:51:08PM +0400, Groshev Andrey wrote:

  Can you post the full definition of the table on this public email list?
  Also, why did the error think this was in the public schema?  Any idea?

  ---
   18.12.2012, 19:38, Bruce Momjian br...@momjian.us:
   On Mon, Dec 17, 2012 at 09:21:59PM -0500, Bruce Momjian wrote:
    Mismatch of relation names: database database, old rel 
 public.lob.ВерсияВнешнегоДокумента$Документ_pkey, new rel 
 public.plob.ВерсияВнешнегоДокумента$Документ
    Failure, exiting
  .. snip 

  It's all what I'm found about this table.

  --
  -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: TABLE; Schema: public; 
 Owner: postgres; Tablespace:
  --

  CREATE TABLE lob.ВерсияВнешнегоДокумента$Документ (
  @Файл integer NOT NULL,
  Страница integer NOT NULL,
  Данные bytea
  );

  ALTER TABLE public.lob.ВерсияВнешнегоДокумента$Документ OWNER TO postgres;

  --
  -- Name: plob.ВерсияВнешнегоДокумента$Документ; Type: CONSTRAINT; Schema: 
 public; Owner: postgres; Tablespace:
  --

  ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
  ADD CONSTRAINT plob.ВерсияВнешнегоДокумента$Документ
  PRIMARY KEY (@Файл, Страница);

  --
  -- Name: rlob.ВерсияВнешнегоДокумента$Документ-@Файл; Type: FK CONSTRAINT; 
 Schema: public; Owner: postgres
  --

  ALTER TABLE ONLY lob.ВерсияВнешнегоДокумента$Документ
  ADD CONSTRAINT rlob.ВерсияВнешнегоДокумента$Документ-@Файл
  FOREIGN KEY (@Файл)
  REFERENCES ВерсияВнешнегоДокумента$Документ(@Файл)
  ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE;

  --
  -- Name: lob.ВерсияВнешнегоДокумента$Документ; Type: ACL; Schema: public; 
 Owner: postgres
  --

  REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM PUBLIC;
  REVOKE ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ FROM postgres;
  GRANT ALL ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO postgres;
  GRANT SELECT ON TABLE lob.ВерсияВнешнегоДокумента$Документ TO view_user;

  There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
  It is referenced by a foreign key 
 (rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
  But as I understand it, the problem with the primary key.

 Does the old database have a table with prefix plob., called
 plob.ВерсияВнешнегоДокумента$Документ?

 If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a
 table with that name mentioned?

 --
   Bruce Momjian  br...@momjian.us    http://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-19 Thread Amit Kapila

On Thursday, December 20, 2012 6:14 AM Robert Haas wrote:
 On Wed, Dec 19, 2012 at 12:39 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  The benefit of saying that only UPDATEs clean the block is that this
  penalises only the workload making the mess, rather than everybody
  cleaning up repeatedly over one messy guy.
 
 Right, but there are plenty of situations where having everybody clean
 up after the messy guy is better than waiting around and hoping that
 Mom (aka vacuum) will do it.

If we see for similar situation in index, during index scan, it just marks
the tuple as DEAD without taking X lock and then during split (when it
already has X lock) it free's the actual space. 
So not sure if it's good idea to take X lock for cleanup during heap scan,
where write operation's happens more frequently and have better chance of
cleanup.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1

No, people can confuse writing, but it makes a computer.
Unfortunately, I have not found developer this database, but I understand the 
logic was:
plob - primary key (lob ~ BLOB)
rlob - reference key (lob ~ BLOB)
Maybe if I describe the task, this part of the database, the problem is clear.
We need to maintain external documents (binary scans, per page).
Therefore, there is a table to store the titles and a table to store binary 
data.
To make it more comfortable I replaced all Russian words translated words.

This a table for headers store.

-- Table: VersionOfTheExternalDocument$Document
-- DROP TABLE VersionOfTheExternalDocument$Document;

CREATE TABLE VersionOfTheExternalDocument$Document
(
  @File integer NOT NULL DEFAULT 
nextval((pg_get_serial_sequence('public.VersionOfTheExternalDocument$Document'::text,
 '@File'::text))::regclass),
  GUID uuid,
  DataTime timestamp without time zone DEFAULT (now())::timestamp without 
time zone,
  Name character varying,
  Size integer,
  CONSTRAINT VersionOfTheExternalDocument$Document_pkey PRIMARY KEY (@File)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE VersionOfTheExternalDocument$Document
  OWNER TO postgres;
GRANT ALL ON TABLE VersionOfTheExternalDocument$Document TO postgres;
GRANT SELECT ON TABLE VersionOfTheExternalDocument$Document TO view_user;

-- Index: iVersionOfTheExternalDocument$Document-blb_header
-- DROP INDEX iVersionOfTheExternalDocument$Document-blb_header;

CREATE INDEX iVersionOfTheExternalDocument$Document-blb_header
  ON VersionOfTheExternalDocument$Document
  USING btree
  (GUID, @Файл, ДатаВремя)
  WHERE GUID IS NOT NULL;
---
And this for data.

-- Table: lob.VersionOfTheExternalDocument$Document
-- DROP TABLE lob.VersionOfTheExternalDocument$Document;
CREATE TABLE lob.VersionOfTheExternalDocument$Document
(
  @File integer NOT NULL,
  Page integer NOT NULL,
  Data bytea,
  CONSTRAINT lob.VersionOfTheExternalDocument$Document_pkey PRIMARY KEY 
(@File, Page),
  CONSTRAINT rlob.VersionOfTheExternalDocument$Document-@File FOREIGN KEY 
(@File)
  REFERENCES VersionOfTheExternalDocument$Document (@File) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
)
WITH (
  OIDS=FALSE
);
ALTER TABLE lob.VersionOfTheExternalDocument$Document
  OWNER TO postgres;
GRANT ALL ON TABLE lob.VersionOfTheExternalDocument$Document TO postgres;
GRANT SELECT ON TABLE lob.VersionOfTheExternalDocument$Document TO view_user;








20.12.2012, 07:12, Bruce Momjian br...@momjian.us:
 On Wed, Dec 19, 2012 at 10:35:11PM -0500, Bruce Momjian wrote:

  There is another table ВерсияВнешнегоДокумента$Документ (without ^lob.)
  It is referenced by a foreign key 
 (rlob.ВерсияВнешнегоДокумента$Документ-@Файл)
  But as I understand it, the problem with the primary key.
  Does the old database have a table with prefix plob., called
  plob.ВерсияВнешнегоДокумента$Документ?

  If not, if you do pg_dumpall --schema-only --binary-upgrade, is there a
  table with that name mentioned?

 Also, when you say rlob above, is the 'r' a Latin letter sound that
 would look like a Russian 'p' in the error message?  (In Cyrillic, a
 Latin-looking p sounds like Latin-sounding r.)

 --
   Bruce Momjian  br...@momjian.us    http://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] trouble with pg_upgrade 9.0 - 9.1