Re: [HACKERS] Timing events WIP v1

2012-12-12 Thread Pavel Stehule
2012/12/12 Greg Smith g...@2ndquadrant.com:
 On 11/20/12 8:02 PM, Craig Ringer wrote:

 On 11/20/2012 04:48 PM, Pavel Stehule wrote:


 I don't like to see current hstore in core - It doesn't support
 nesting, it doesn't support different datatypes, it is not well
 supported by plpgsql


 ... or by many client libraries, though converting hstore values to json
 notation for output usually takes care of that.


 The more I look at this, the less comfortable I am moving forward with hand
 waving this issue away.  The obvious but seemingly all troublesome options:

 -Store things internally using hstore.  This seems to require pulling hstore
 fully into core, which has this list of issues.  Outputting through a JSON
 filter seems like a lot of overhead.  And even if the output concerns are
 addressed, if there's a problem processing the raw data with PL/pgSQL that
 would be bad.  I didn't think about that at all until Pavel brought it up.

 -Go back to a simpler composite type idea for storing this data.  It feels
 wrong to create something that looks a lot like hstore data, but isn't.
 There's probably subtle display/client/processing issues in there I haven't
 thought of too.

 -Try to store the data in a JSON friendly way in the first place.

 That last one seems to lead right to...

 I can't help but suspect that hstore will be subsumed by native storage
 and indexing of json objects, since these are already widely supported
 in KV or document stores. Of course, that requires that we come to some
 agreement on how to represent literal scalars for interaction with json,
 which hasn't seen much progress.


 If this is the Right Way to solve this problem, that's puts a serious snag
 in doing something useful with Timing Events in he near future.  I know
 better than to try and fight against following the correct path once it's
 identified.  I can't claim any good expertise on client encoding/transfer
 issues though; that's not a strong area for me.


I didn't watch this discussion all time, but I don't understand why
you need a complex datatypes.

Maybe you can do normalization and complex data types can hold only in memory.

Regards

Pavel

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-12-12 Thread Amit Kapila
On Wednesday, December 12, 2012 5:23 AM Greg Smith wrote:
 On 11/23/12 5:57 AM, Amit kapila wrote:
  Let us try to see by example:
  Total RAM - 22G
  Database size - 16G
 ...
  Case -2 (Shared Buffers - 10G)
  a. Load all the files in OS buffers. In best case OS buffers can
 contain10-12G data as OS has 12G of memory available.
  b. Try to load all in Shared buffers. Last 10G will be there in shared
 buffers.
  c. Now as there is no direct correlation of data between Shared
 Buffers and OS buffers, so whenever PG has to access any data
  which is not there in Shared Buffers, good chances are there that
 it can lead to IO.
 
 I don't think either of these examples are very representative of
 real-world behavior.  The idea of load all the files in OS buffers
 assumes someone has used a utility like pg_prewarm or pgfincore.  It's
 not something that happens in normal use.  Being able to re-populate all
 of RAM using those utilities isn't realistic anyway.  Anyone who tries
 to load more than (memory - shared_buffers) that way is likely to be
 disappointed by the result.

True, I also think nobody will directly try to do it in this way, but such
similar situations can arise after long run.
Something like if we assume most used pages fall under the range of RAM.

 
 Similarly, the problem you're describing here has been described as the
 double buffering one for a while now.  The old suggestion that
 shared_buffers not be set about 25% of RAM comes from this sort of
 concern.  If triggering a problem requires doing that, essentially
 misconfiguring the server, it's hard to get too excited about it.
 
 Anyway, none of that impacts on me mixing testing for this into what I'm
 working on.  The way most pgbench tests happen, it's hard to *not* have
 the important data in cache.  Once you run the init step, you have to
 either reboot or drop the OS cache to get those pages out of RAM.  That
 means the sort of cached setup you're using pg_prewarm to
 simulate--things are in the OS cache, but not the PostgreSQL one--is one
 that anyone running an init/test pair will often create.  You don't need
 pg_prewarm to do it.  

The way, I have ran the tests is to just try to simulate scenario's where
invalidating buffers 
by bgwriter/checkpoint can have advantage.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Andres Freund
On 2012-12-11 21:05:51 -0500, Joachim Wieland wrote:
 On Tue, Dec 11, 2012 at 6:52 PM, Andres Freund and...@2ndquadrant.com wrote:
  One problem I see is that while exporting a snapshot solves the
  visibility issues of the table's contents it does not protect against
  schema changes. I am not sure whether thats a problem.
 
  If somebody runs a CLUSTER or something like that, the table's contents
  will be preserved including MVCC semantics. That's fine.
  The more problematic cases I see are TRUNCATE, DROP and ALTER
  TABLE.

 This is why the pg_dump master process executes a

 lock table table in access share mode

 for every table, so your commands would all block.

 In fact it's even more complicated because the pg_dump worker
 processes also need to lock the table. They try to get a similar lock
 in NOWAIT mode right before dumping the table. If they don't get the
 lock that means that somebody else is waiting for an exclusive lock
 (this is the case you describe) and the backup will fail.

[Tom explains why this is problematic way better than I do, see his
email]

A trivial example - you can play nastier games than that though:

S1: CREATE TABLE test(id int);
S1: INSERT INTO test VALUES(1), (2);
S1: BEGIN;
S1: ALTER TABLE test RENAME COLUMN id TO id2;
S2: pg_dump
S2: waits
S1: COMMIT;
pg_dump: [archiver (db)] query failed: ERROR:  column id of relation test 
does not exist
pg_dump: [archiver (db)] query was: COPY public.test (id) TO stdout;

  Problem 2:
 
  To be able to do a SET TRANSACTION SNAPSHOT the source transaction
  needs to be alive. That's currently solved by exporting the snapshot in
  the walsender connection that did the INIT_LOGICAL_REPLICATION. The
  question is how long should we preserve that snapshot?

 You lost me on this one after the first sentence... But in general the
 snapshot isn't so much a magical thing: As long the exporting
 transaction is open, it guarantees that there is a transaction out
 there that is holding off vacuum from removing data and it's also
 guaranteeing that the snapshot as is has existed at some time in the
 past.

 Once it is applied to another transaction you now have two
 transactions that will hold off vacuum because they share the same
 xmin,xmax values. You could also just end the exporting transaction at
 that point.

I should probably have given a bit more context here... All this is in
the context of decoding WAL back into something useful for the purpose
of replication. This is done in the already existing walsender process,
using the commands I explained in the original post.
What we do there is walk the WAL from some point until we could assemble
a snapshot for exactly that LSN. Several preconditions need to be met
there (like a low enough xmin horizon, so the old catalog entries are
still arround). Using that snapshot we can now decode the entries stored
in the WAL.
If you setup a new replica though, getting all changes from one point
where you're consistent isn't all that interesting if you cannot also
get a backup from exactly that point in time because otherwise its very
hard to start with something sensible on the standby without locking
everything down. So the idea is that once we found that consistent
snapshot we also export it. Only that we are in a walsender connection
which doesn't expose transactional semantics, so the snapshot cannot be
deallocated by COMMIT; ing.

So the question is basically about how to design the user interface of
that deallocation.

Makes slightly more sense now?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould

COPY IN loops in heap_multi_insert() extending the table until it fills the
disk when trying to insert a wide row into a table with a low fill-factor.
Internally fill-factor is implemented by reserving some space space on a
page. For large enough rows and small enough fill-factor bulk_multi_insert()
can't fit the row even on a new empty page, so it keeps allocating new pages
but is never able to place the row. It should always put at least one row on
an empty page.

In the excerpt below saveFreeSpace is the reserved space for the fill-factor.

while (ndone  ntuples)
{ ...
/*
 * Find buffer where at least the next tuple will fit.  If the page is
 * all-visible, this will also pin the requisite visibility map page.
 */
buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]-t_len,
...
/* Put as many tuples as fit on this page */
for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
{
HeapTuple   heaptup = heaptuples[ndone + nthispage];

if (PageGetHeapFreeSpace(page)  MAXALIGN(heaptup-t_len) + 
saveFreeSpace)
break;

RelationPutHeapTuple(relation, buffer, heaptup);
}
...Do a bunch of dirtying and logging etc ...
 }

This was introduced in 9.2 as part of the bulk insert speedup.

One more point, in the case where we don't insert any rows, we still do all the
dirtying and logging work even though we did not modify the page. I have tried
skip all this if no rows are added (nthispage == 0), but my access method foo
is sadly out of date, so someone should take a skeptical look at that.

A test case and patch against 9.2.2 is attached. It fixes the problem and passes
make check. Most of the diff is just indentation changes. Whoever tries this 
will
want to test this on a small partition by itself.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.
\set ECHO all
\timing on
drop table if exists big_sparse;
CREATE TABLE big_sparse (
u_id integer NOT NULL,
c_id integer DEFAULT 0 NOT NULL,
a_id integer DEFAULT 0 NOT NULL,
cr_id integer DEFAULT 0 NOT NULL,
b integer,
c_t_u text,
l_date abstime DEFAULT now(),
c_b double precision DEFAULT 0.0,
d_b double precision,
c_s text,
c_s_u abstime,
c_e_u abstime,
co_b_r_ts abstime,
e_a_s integer,
c_c_i_l text,
c_c_e_l text,
ag_c_i_l text,
ag_c_e_l text,
c_c_t_l text,
c_r_t_l text,
c_m_t_l text,
c_ci_t_l text,
ag_c_t_l text,
ag_r_t_l text,
ag_m_t_l text,
ag_ci_t_l text,
c_l_t_l text,
ag_l_t_l text,
r_s_id smallint DEFAULT 1 NOT NULL,
r_s text NOT NULL,
i_u_l text,
e_u_l text,
f_l text,
f smallint,
e_s_l_e text,
r_h_f_n text,
cr_h smallint,
cr_w smallint,
n_b integer,
cr_a_l text,
e_id bigint NOT NULL,
a_ex text,
ca_b_a real[],
su smallint DEFAULT 0 NOT NULL,
h_b_f real[],
p_id integer,
ca_b_aj real[],
ca_b_aj_i integer[]
)
WITH (fillfactor=10);

COPY big_sparse FROM stdin;
1688	700032834	700580073	704810483	25	foobar.grotxbarb/xdexyzzyen.xyzzyv=xxixsrc=201002RAA9	2011-11-12 05:00:01+00	0	500	i	2010-10-31 04:00:00+00	\N	2011-11-12 05:00:01+00	113	\N	\N	\N	\N	US	\N	\N	\N	\N	\N	\N	\N	en	\N	5	Rxyzzyn	\N	foobar./|foobar.xyzzyatxyzzy.xyzzyk/|foobar.grotxyzzyroupbarb/|grotxyzzxyzzybarb|foobar.grotxyxyzzybarb/|foobar.xyzzyxyzzyily.xyzzy/|grotxyzzyancebarb|grotxyzzydog|foobar.grotxyzzyxyzzysbarbdog/|foobar.grotxyzzyaobarb/|grotxyzzxyzzywsbarb|foobar.xyzzyun.xyzzy/|foobar.grotogejbarb/|foobar.grotxyzzxyzzybarb/|foobar.grotogcdbarbdog/|grotxyzzyetbarb|foobar.grotxyzzyipobarb/|foobar.grotxyzzyaobarb/|grotxyzzyxyzzyiabarb|grotxyzzybarbdog	xyzzyign	2	1:1291|0|0 ABC DEF 36|0|0 OR 1290|0|0 ABC DEF 36|0|0 OR 13|0|0 ABC DEF 36|0|0 OR 84|2592000|0 ABC DEF 36|0|0 OR 83|2592000|0 ABC DEF 36|0|0 OR 82|2592000|0 ABC DEF 36|0|0 OR 12|0|0 ABC DEF 36|0|0	\N	0	0	25	\N	0	\N	\N	0	{1,1,1,1,1,0.20003,0.40006,0.60024,0.80012,1,1,1,1,1,1,1!
 ,1,1,1,1,1,1,1,1}	\N	\N	\N
\.

*** postgresql-9.2.2/src/backend/access/heap/heapam.c	2012-12-03 12:16:10.0 -0800
--- postgresql-9.2.2dg/src/backend/access/heap/heapam.c	2012-12-12 01:55:58.174653706 -0800
***
*** 2158,2163 
--- 2158,2164 
  		Buffer		buffer;
  		Buffer		vmbuffer = InvalidBuffer;
  		bool		all_visible_cleared = false;
+ 		bool		page_is_empty;
  		int			nthispage;
  
  		/*
***
*** 2173,2299 
  		START_CRIT_SECTION();
  
  		/* Put as many tuples as fit on this page */
  		for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
  		{
  			HeapTuple	heaptup = heaptuples[ndone + nthispage];
  
! 			if (PageGetHeapFreeSpace(page)  MAXALIGN(heaptup-t_len) + saveFreeSpace)
  break;
! 
  			RelationPutHeapTuple(relation, buffer, heaptup);
  		}
 

Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Andres Freund
On 2012-12-11 22:20:18 -0500, Tom Lane wrote:
 Joachim Wieland j...@mcknight.de writes:
  On Tue, Dec 11, 2012 at 6:52 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  One problem I see is that while exporting a snapshot solves the
  visibility issues of the table's contents it does not protect against
  schema changes. I am not sure whether thats a problem.
 
  If somebody runs a CLUSTER or something like that, the table's contents
  will be preserved including MVCC semantics. That's fine.
  The more problematic cases I see are TRUNCATE, DROP and ALTER
  TABLE.

  This is why the pg_dump master process executes a
  lock table table in access share mode
  for every table, so your commands would all block.

 A lock doesn't protect against schema changes made before the lock was
 taken.  The reason that the described scenario is problematic is that
 pg_dump is going to be expected to work against a snapshot made before
 it gets a chance to take those table locks.  Thus, there's a window
 where DDL is dangerous, and will invalidate the dump --- perhaps without
 any warning.

Exactly.

 Now, we have this problem today, in that pg_dump has to read pg_class
 before it can take table locks so some window exists already.

Yea. And if somebody else already has conflicting locks its pretty damn
easy to hit as show in my answer to Joachim... I am pretty sure there
are lots of nasty silent variants.

 What's
 bothering me about what Andres describes is that the window for trouble
 seems to be getting much bigger.

Me too. If it only were clearly visible errors I wouldn't be bothered
too much, but ...

This morning I wondered whether we couldn't protect against that by
acquiring share locks on the catalog rows pg_dump reads, that would
result in could not serialize access due to concurrent update type of
errors which would be easy enough discernible/translateable.
While pretty damn ugly that should take care of most of those issues,
shouldn't it?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Andres Freund
On 2012-12-11 22:39:14 -0500, Steve Singer wrote:
 On 12-12-11 06:52 PM, Andres Freund wrote:
 Hi,

 
 Problem 1:
 
 One problem I see is that while exporting a snapshot solves the
 visibility issues of the table's contents it does not protect against
 schema changes. I am not sure whether thats a problem.
 
 If somebody runs a CLUSTER or something like that, the table's contents
 will be preserved including MVCC semantics. That's fine.
 The more problematic cases I see are TRUNCATE, DROP and ALTER
 TABLE. Imagine the following:
 
 S1: INIT_LOGICAL_REPLICATION
 S1: get snapshot: 0333-1
 S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
 S3: pg_dump --snapshot 0333-1
 S1: START_LOGICAL_REPLICATION
 
 In that case the pg_dump would dump foo using the schema *after* the
 ALTER TABLE but showing only rows visible to our snapshot. After
 START_LOGICAL_REPLICATION all changes after the xlog position from
 INIT_LOGICAL_REPLICATION will be returned though - including all the
 tuples from the ALTER TABLE and potentially - if some form of schema
 capturing was in place - the ALTER TABLE itself. The copied schema would
 have the new format already though.
 
 Does anybody see that as aproblem or is it just a case of PEBKAC? One
 argument for the latter is that thats already a problematic case for
 normal pg_dump's. Its just that the window is a bit larger here.

 Is there anyway to detect this situation as part of the pg_dump?  If I could
 detect this, abort my pg_dump then go and get a new snapshot then I don't
 see this as a big deal.  I can live with telling users, don't do DDL like
 things while subscribing a new node, if you do the subscription will
 restart. I am less keen on telling users don't do DDL like things while
 subscribing a new node or the results will be unpredictable

I am trying to think of unintrusive way to detect this

 I'm trying to convince myself if I will be able to take a pg_dump from an
 exported snapshot plus the changes made after in between the snapshot id to
 some later time and turn the results into a consistent database.  What if
 something like this comes along

 ...

 I'm worried that it will be  difficult to pragmatically stitch together the
 inconsistent snapshot from the pg_dump plus the logical records generated in
 between the snapshot and the dump (along with any record of the DDL if it
 exists).

I think trying to solve that in the replication solution is a bad
idea. There are too many possible scenarios and some of them very subtle
and hard to detect.

So I think its either:
1) find something that tests for this and abort if so
2) acquire locks earlier preventing DDL alltogether till pg_dump starts
3) don't care. The problem exists today and not many people have
   complained.

 Problem 2:
 
 Control Flow.
 
 To be able to do a SET TRANSACTION SNAPSHOT the source transaction
 needs to be alive. That's currently solved by exporting the snapshot in
 the walsender connection that did the INIT_LOGICAL_REPLICATION. The
 question is how long should we preserve that snapshot?
 
 There is no requirement - and there *cannot* be one in the general case,
 the slot needs to be usable after a restart - that
 START_LOGICAL_REPLICATION has to be run in the same replication
 connection as INIT.
 
 Possible solutions:
 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
 confirms that logical replication initialization is finished. Before
 that the walsender connection cannot be used for anything else.
 
 2) we remove the snapshot as soon as any other commend is received, this
 way the replication connection stays usable, e.g. to issue a
 START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
 case the snapshot would have to be imported *before* the next command
 was received as SET TRANSACTION SNAPSHOT requires the source transaction
 to be still open.

 Option 2 sounds more flexible.  Is it more difficult to implement?

No, I don't think so. It's a bit more intrusive in that it requires
knowledge about logical replication in more parts of walsender, but it
should be ok.

Note btw, that my description of 1) was easy to misunderstand. The
that in Before that the walsender connection cannot be used for
anything else. is the answer from the client, not the usage of the
exported snapshot. Once the snapshot has been iimported into other
session(s) the source doesn't need to be alive anymore.
Does that explanation change anything?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread Andres Freund
On 2012-12-12 03:04:19 -0800, David Gould wrote:

 COPY IN loops in heap_multi_insert() extending the table until it fills the
 disk when trying to insert a wide row into a table with a low fill-factor.
 Internally fill-factor is implemented by reserving some space space on a
 page. For large enough rows and small enough fill-factor bulk_multi_insert()
 can't fit the row even on a new empty page, so it keeps allocating new pages
 but is never able to place the row. It should always put at least one row on
 an empty page.

Heh. Nice one. Did you hit that in practice?

 One more point, in the case where we don't insert any rows, we still do all 
 the
 dirtying and logging work even though we did not modify the page. I have tried
 skip all this if no rows are added (nthispage == 0), but my access method foo
 is sadly out of date, so someone should take a skeptical look at that.

 A test case and patch against 9.2.2 is attached. It fixes the problem and 
 passes
 make check. Most of the diff is just indentation changes. Whoever tries this 
 will
 want to test this on a small partition by itself.

ISTM this would be fixed with a smaller footprint by just making

if (PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)

if (!PageIsEmpty(page) 
PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)

I think that should work?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread Heikki Linnakangas

On 12.12.2012 13:27, Andres Freund wrote:

On 2012-12-12 03:04:19 -0800, David Gould wrote:

One more point, in the case where we don't insert any rows, we still do all the
dirtying and logging work even though we did not modify the page. I have tried
skip all this if no rows are added (nthispage == 0), but my access method foo
is sadly out of date, so someone should take a skeptical look at that.

A test case and patch against 9.2.2 is attached. It fixes the problem and passes
make check. Most of the diff is just indentation changes. Whoever tries this 
will
want to test this on a small partition by itself.


ISTM this would be fixed with a smaller footprint by just making

if (PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)

if (!PageIsEmpty(page)
 PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)

I think that should work?


Yeah, seems that it should, although PageIsEmpty() is no guarantee that 
the tuple fits, because even though PageIsEmpty() returns true, there 
might be dead line pointers consuming so much space that the tuple at 
hand doesn't fit. However, RelationGetBufferForTuple() won't return such 
a page, it guarantees that the first tuple does indeed fit on the page 
it returns. For the same reason, the later check that at least one tuple 
was actually placed on the page is not necessary.


I committed a slightly different version, which unconditionally puts the 
first tuple on the page, and only applies the freespace check to the 
subsequent tuples. Since RelationGetBufferForTuple() guarantees that the 
first tuple fits, we can trust that, like heap_insert does.


--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple 
*tuples, int ntuples,

/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();

-   /* Put as many tuples as fit on this page */
-   for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
+   /*
+* RelationGetBufferForTuple has ensured that the first tuple 
fits.
+* Put that on the page, and then as many other tuples as fit.
+*/
+   RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
+   for (nthispage = 1; ndone + nthispage  ntuples; nthispage++)
{
HeapTuple   heaptup = heaptuples[ndone + nthispage];


Thanks for the report!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 12:27:11 +0100
Andres Freund and...@2ndquadrant.com wrote:

 On 2012-12-12 03:04:19 -0800, David Gould wrote:
 
  COPY IN loops in heap_multi_insert() extending the table until it fills the

 Heh. Nice one. Did you hit that in practice?

Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
happens late in the initial setup script for new hosts. The first batch of
new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
within a minute. Since the script setups up a lot of stuff we had no idea
at first who oomed.

 ISTM this would be fixed with a smaller footprint by just making
 
 if (PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)
 
 if (!PageIsEmpty(page) 
 PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)
 
 I think that should work?

I like PageIsEmpty() better (and would have used if I I knew), but I'm not
so crazy about the negation.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread Heikki Linnakangas

On 12.12.2012 14:17, David Gould wrote:

On Wed, 12 Dec 2012 12:27:11 +0100
Andres Freundand...@2ndquadrant.com  wrote:


On 2012-12-12 03:04:19 -0800, David Gould wrote:


COPY IN loops in heap_multi_insert() extending the table until it fills the



Heh. Nice one. Did you hit that in practice?


Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
happens late in the initial setup script for new hosts. The first batch of
new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
within a minute. Since the script setups up a lot of stuff we had no idea
at first who oomed.


The bug's been fixed now, but note that huge tuples like this will 
always cause the table to be extended. Even if there are completely 
empty pages in the table, after a vacuum. Even a completely empty 
existing page is not considered spacious enough in this case, because 
it's still too small when you take fillfactor into account, so the 
insertion will always extend the table. If you regularly run into this 
situation, you might want to raise your fillfactor..


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 13:56:08 +0200
Heikki Linnakangas hlinnakan...@vmware.com wrote:

 However, RelationGetBufferForTuple() won't return such 
 a page, it guarantees that the first tuple does indeed fit on the page 
 it returns. For the same reason, the later check that at least one tuple 
 was actually placed on the page is not necessary.
 
 I committed a slightly different version, which unconditionally puts the 
 first tuple on the page, and only applies the freespace check to the 
 subsequent tuples. Since RelationGetBufferForTuple() guarantees that the 
 first tuple fits, we can trust that, like heap_insert does.
 
 --- a/src/backend/access/heap/heapam.c
 +++ b/src/backend/access/heap/heapam.c
 @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple 
 *tuples, int ntuples,
   /* NO EREPORT(ERROR) from here till changes are logged */
   START_CRIT_SECTION();
 
 - /* Put as many tuples as fit on this page */
 - for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
 + /*
 +  * ()  has ensured that the first tuple fits.
 +  * Put that on the page, and then as many other tuples as fit.
 +  */
 + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
 + for (nthispage = 1; ndone + nthispage  ntuples; nthispage++)
   {
   HeapTuple   heaptup = heaptuples[ndone + nthispage];

I don't know if this is the same thing. At least in the comments I was
reading trying to figure this out there was some concern that someone
else could change the space on the page. Does RelationGetBufferForTuple()
guarantee against this too?

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread Heikki Linnakangas

On 12.12.2012 14:24, David Gould wrote:

I don't know if this is the same thing. At least in the comments I was
reading trying to figure this out there was some concern that someone
else could change the space on the page. Does RelationGetBufferForTuple()
guarantee against this too?


Yeah, RelationGetBufferForTuple grabs a lock on the page before 
returning it. For comparison, plain heap_insert does simply this:



buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
   
InvalidBuffer, options, bistate,
   
vmbuffer, NULL);

/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();

RelationPutHeapTuple(relation, buffer, heaptup);


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 14:23:12 +0200
Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
 The bug's been fixed now, but note that huge tuples like this will 
 always cause the table to be extended. Even if there are completely 
 empty pages in the table, after a vacuum. Even a completely empty 
 existing page is not considered spacious enough in this case, because 
 it's still too small when you take fillfactor into account, so the 
 insertion will always extend the table. If you regularly run into this 
 situation, you might want to raise your fillfactor..

Actually, we'd like it lower. Ideally, one row per page.

We lose noticable performance when we raise fill-factor above 10. Even 20 is
slower.

During busy times these hosts sometimes fall into a stable state
with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
PinBuffer plus very high system cpu in the scheduler (I don't have the perf
trace in front of me so take this with a grain of salt). In this mode they
fall from the normal 7000 queries per second to below 3000. Once in this
state they tend to stay that way. If we turn down the number of incoming
requests they go back to normal. Our conjecture is that most requests are
for only a few keys and so we have multiple sessions contending for few
pages and convoying in the buffer manager. The table is under 20k rows, but
the hot items are probably only a couple hundred different rows. The busy
processes are doing reads only, but there is some update activity on this
table too.

Ah, found an email with the significant part of the perf output:

 ... set number of client threads = number of postgres backends = 70. That way
 all my threads have constant access to a backend and they just spin in a tight
 loop running the same query over and over (with different values). ... this
 seems to have tapped into 9.2's resonant frequency, right now we're spending
 almost all our time spin locking.
...
762377.00 71.0% s_lock   
 /usr/local/bin/postgres
 22279.00  2.1% LWLockAcquire
 /usr/local/bin/postgres
 18916.00  1.8% LWLockRelease
 /usr/local/bin/postgres 

I was trying to resurrect the pthread s_lock() patch to see if that helps,
but it did not apply at all and I have not had time to persue it.

We have tried lots of number of processes and get the best result with
about ten less active postgresql backends than HT cores. System is 128GB
with:
 
processor   : 79
vendor_id   : GenuineIntel
cpu family  : 6
model   : 47
model name  : Intel(R) Xeon(R) CPU E7-L8867  @ 2.13GHz
stepping: 2
cpu MHz : 2128.478
cache size  : 30720 KB

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of systable_beginscan_ordered in event trigger patch

2012-12-12 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Yeah, I had forgotten about that loose end, but we definitely need
 something of the sort.  Committed with additional documentation.
 (I put something in the CREATE EVENT TRIGGER ref page, but do we
 need anything about it in chapter 37?)

Thanks!

I guess we could add a note at the end of the Overview of Event Trigger
Behavior section. Then maybe we should explain in that section that you
can't set an event trigger to fire on event trigger commands.

 BTW, is the example in the CREATE EVENT TRIGGER ref page meant as an
 IQ test for unwary readers?  You do know there are people who will
 copy-and-paste just about any example that's put in front of them.
 Perhaps we just want to make sure they internalize the advice about how
 to get out of a broken-event-trigger situation.

For those following at home, the example is:

CREATE OR REPLACE FUNCTION abort_any_command()
  RETURNS event_trigger
 LANGUAGE plpgsql
  AS $$
BEGIN
  RAISE EXCEPTION 'command % is disabled', tg_tag;
END;
$$;

CREATE EVENT TRIGGER abort_ddl ON ddl_command_start
   EXECUTE PROCEDURE abort_any_command();

Maybe we could just append to it how to remove such an event trigger,
which is easy to do because you can't target an event trigger related
command with event triggers, so the following is not disabled:

DROP EVENT TRIGGER abort_ddl;

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] writable foreign tables

2012-12-12 Thread Erik Rijkers
On Wed, December 12, 2012 14:45, Kohei KaiGai wrote:

 pgsql-v9.3-writable-fdw-poc.v8.part-2.patch  151 k
 pgsql-v9.3-writable-fdw-poc.v8.part-1.patch   70 k

I wanted to have a look at this, and tried to apply part 1, en then part 2 on 
top of that (that's
the idea, right?)

part 1 applies and then part 2 does not.



Erik Rijkers










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] writable foreign tables

2012-12-12 Thread Albe Laurenz
Erik Rijkers wrote:
 pgsql-v9.3-writable-fdw-poc.v8.part-2.patch  151 k
 pgsql-v9.3-writable-fdw-poc.v8.part-1.patch   70 k
 
 I wanted to have a look at this, and tried to apply part 1, en then part 2 on 
 top of that (that's
 the idea, right?)
 
 part 1 applies and then part 2 does not.

Part 2 needs this prerequisite:
http://archives.postgresql.org/message-id/CAEZqfEcQtxn1JSjhC5usqhL4_n+Zck3mqo=rzedfpz+dawf...@mail.gmail.com

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of systable_beginscan_ordered in event trigger patch

2012-12-12 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 BTW, is the example in the CREATE EVENT TRIGGER ref page meant as an
 IQ test for unwary readers?

 Maybe we could just append to it how to remove such an event trigger,
 which is easy to do because you can't target an event trigger related
 command with event triggers, so the following is not disabled:
 DROP EVENT TRIGGER abort_ddl;

Oh really?  The explanation of the example certainly doesn't say that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple --table options for other commands

2012-12-12 Thread Karl O. Pinc
On 12/11/2012 10:25:43 PM, Josh Kupershmidt wrote:
 On Tue, Dec 11, 2012 at 11:46 AM, Karl O. Pinc k...@meme.com wrote:

  Yes, the current pg_restore silently
  ignores multiple --table arguments, and seems to use the last
  one.  You are introducing a backwards incompatible
  change here. 

 Agreed with Robert that this change should be reasonable in a major
 version (i.e. 9.3).

Good by me.  Seemed worth a mention.

  I believe you need ellipses behind --table in the syntax summaries
  of the command reference docs.
 
 Hrm, I was following pg_dump's lead here for the .sgml docs, and
 didn't see anywhere that pg_dump makes the multiple --table syntax
 explicit other than in the explanatory text underneath the option.

Yes.  I see.  I didn't look at all the command's reference pages
but did happen to look at clusterdb, which does have --table
in the syntax summary.  I just checked and you need to fix
clusterdb, reindexdb, and vacuumdb.

  I also note that the pg_dump --help output says table(s) so
  you probably want to have pg_restore say the same now that it
  takes multiple tables.
 
 Good catch, will fix, and ditto reindexdb's --index help output. (It
 is possible that the --help output for pg_dump was worded to say
 table(s) because one can use a pattern --table specification with
 pg_dump, though IMO it's helpful to mention table(s) in the --help
 output for the rest of these programs as well, as a little reminder 
 to
 the user.)

Agreed.



Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

2012-12-12 Thread Fujii Masao
On Wed, Dec 12, 2012 at 2:03 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 11.12.2012 17:04, Fujii Masao wrote:

 The patch looks good. I confirmed that the PANIC error didn't happen,
 with the patch.


 Ok, committed, and also moved the CheckRecoveryConsistency call. Please take
 a look to double-check that I didn't miss anything.

Looks good to me. Thanks a lot!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 The bug's been fixed now, but note that huge tuples like this will 
 always cause the table to be extended. Even if there are completely 
 empty pages in the table, after a vacuum. Even a completely empty 
 existing page is not considered spacious enough in this case, because 
 it's still too small when you take fillfactor into account, so the 
 insertion will always extend the table.

Seems like that's a bug in itself: there's no reason to reject an empty
existing page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-12 Thread Pavan Deolasee
On Wed, Dec 12, 2012 at 1:35 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 4, 2012 at 12:10 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 Hmm. Yeah, I do not have guts to prove that either. I'll probably write up a
 comment for your consideration to explain why we don't trust PD_ALL_VISIBLE
 in Hot standby for seq scans, but still trust VM for index-only scans.

 Sure.


Here is a small patch that adds comments to heap scan code explaining
why we don't trust the all-visible flag in the page, still continue to
support index-only scans on hot standby.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


hot-standby-pd-all-visible-heapscan.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PL/Python: Add spidata to all spiexceptions

2012-12-12 Thread Jan Urbański
Though not the original patch autor, I did modify and submit it to the 
CF app, so I'll reply here :)


On 10/12/12 19:20, Karl O. Pinc wrote:

On 12/09/2012 10:33:59 PM, Karl O. Pinc wrote:



I've gone ahead and signed up to review this patch.


Thanks!


There were 2 outstanding issue raised:

Is this useful enough/proceeding in the right direction to commit
now?


I believe the process would be to mark it as Ready for Committer if 
you feel like it's ready and a then a committer would make the final call.



Should some of the logic be moved out of a subroutine and into the
calling code?


I think I structured the PLy_get_spi_sqlerrcode to accept the same kind 
of arguments as PLy_get_spi_error_data, which means a SPIException 
object and a pointer to whatever values it can fill in.


That said, I haven't really thought about that too much, so I'm 
perfectly fine with moving that bit of logic to the caller.


Cheers,
Jan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PRIVATE columns

2012-12-12 Thread Simon Riggs
Currently, ANALYZE collects data on all columns and stores these
samples in pg_statistic where they can be seen via the view pg_stats.

In some cases we have data that is private and we do not wish others
to see it, such as patient names. This becomes more important when we
have row security.

Perhaps that data can be protected, but it would be even better if we
simply didn't store value-revealing statistic data at all. Such
private data is seldom the target of searches, or if it is, it is
mostly evenly distributed anyway.

It would be good if we could collect the overall stats
* NULL fraction
* average width
* ndistinct
yet without storing either the MFVs or histogram.
Doing that would avoid inadvertent leaking of potentially private information.

SET STATISTICS 0
simply skips collection of statistics altogether

To implement this, one way would be to allow

ALTER TABLE foo
  ALTER COLUMN foo1 SET STATISTICS PRIVATE;

Or we could use another magic value like -2 to request this case.

(Yes, I am aware we could use a custom datatype with a custom
typanalyze for this, but that breaks other things)

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PRIVATE columns

2012-12-12 Thread Jan Wieck

On 12/12/2012 1:12 PM, Simon Riggs wrote:

Currently, ANALYZE collects data on all columns and stores these
samples in pg_statistic where they can be seen via the view pg_stats.

In some cases we have data that is private and we do not wish others
to see it, such as patient names. This becomes more important when we
have row security.

Perhaps that data can be protected, but it would be even better if we
simply didn't store value-revealing statistic data at all. Such
private data is seldom the target of searches, or if it is, it is
mostly evenly distributed anyway.


Would protecting it the same way, we protect the passwords in pg_authid, 
be sufficient?



Jan



It would be good if we could collect the overall stats
* NULL fraction
* average width
* ndistinct
yet without storing either the MFVs or histogram.
Doing that would avoid inadvertent leaking of potentially private information.

SET STATISTICS 0
simply skips collection of statistics altogether

To implement this, one way would be to allow

ALTER TABLE foo
   ALTER COLUMN foo1 SET STATISTICS PRIVATE;

Or we could use another magic value like -2 to request this case.

(Yes, I am aware we could use a custom datatype with a custom
typanalyze for this, but that breaks other things)

Thoughts?




--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PL/Python: Add spidata to all spiexceptions

2012-12-12 Thread Karl O. Pinc
Hi Jan and Oskari,

On 12/12/2012 11:36:54 AM, Jan Urbański wrote:
 On 10/12/12 19:20, Karl O. Pinc wrote:
  On 12/09/2012 10:33:59 PM, Karl O. Pinc wrote:

  There were 2 outstanding issue raised:
 
  Is this useful enough/proceeding in the right direction to commit
  now?
 
 I believe the process would be to mark it as Ready for Committer if 
 you feel like it's ready and a then a committer would make the final
 call.

It looks acceptable to me.  My concern is that there's nothing 
introduced here that precludes attaching additional attributes
to the exception object to carry message, detail, and hint.
I don't see a problem, and can't see how there could be a problem
but also feel like a novice.  I was looking for some assurance from
you that there's no concern here, but am confident enough regardless
to pass this aspect through to a committer for final review. 

 
  Should some of the logic be moved out of a subroutine and into the
  calling code?
 
 I think I structured the PLy_get_spi_sqlerrcode to accept the same
 kind 
 of arguments as PLy_get_spi_error_data, which means a SPIException 
 object and a pointer to whatever values it can fill in.
 
 That said, I haven't really thought about that too much, so I'm 
 perfectly fine with moving that bit of logic to the caller.

I can see arguments to be made for both sides.  I'm asking that you
think it through to the extent you deem necessary and make
changes or not.  At that point it should be ready to send
to a committer.  (I'll re-test first, if you make any changes.)

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PRIVATE columns

2012-12-12 Thread Simon Riggs
On 12 December 2012 19:13, Jan Wieck janwi...@yahoo.com wrote:
 On 12/12/2012 1:12 PM, Simon Riggs wrote:

 Currently, ANALYZE collects data on all columns and stores these
 samples in pg_statistic where they can be seen via the view pg_stats.

 In some cases we have data that is private and we do not wish others
 to see it, such as patient names. This becomes more important when we
 have row security.

 Perhaps that data can be protected, but it would be even better if we
 simply didn't store value-revealing statistic data at all. Such
 private data is seldom the target of searches, or if it is, it is
 mostly evenly distributed anyway.


 Would protecting it the same way, we protect the passwords in pg_authid, be
 sufficient?

The user backend does need to be able to access the stats data during
optimization. It's hard to have data accessible and yet impose limits
on the uses to which that can be put. If we have row security on the
table but no equivalent capability on the stats, then we'll have
leakage. e.g. set statistics 1, ANALYZE, then leak 1 credit
card numbers.

Selectivity functions are not marked leakproof, nor do people think
they can easily be made so. Which means the data might be leaked by
various means through error messages, plan selection, skullduggery
etc..

If it ain't in the bucket, the bucket can't leak it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PRIVATE columns

2012-12-12 Thread Joshua D. Drake


On 12/12/2012 12:12 PM, Simon Riggs wrote:


Would protecting it the same way, we protect the passwords in pg_authid, be
sufficient?


The user backend does need to be able to access the stats data during
optimization. It's hard to have data accessible and yet impose limits
on the uses to which that can be put. If we have row security on the
table but no equivalent capability on the stats, then we'll have
leakage. e.g. set statistics 1, ANALYZE, then leak 1 credit
card numbers.

Selectivity functions are not marked leakproof, nor do people think
they can easily be made so. Which means the data might be leaked by
various means through error messages, plan selection, skullduggery
etc..

If it ain't in the bucket, the bucket can't leak it.



I accidentally responded to Simon off-list to this. I understand the 
need and think it would be a good thing to have. However, the real 
opportunity here is to make statistics non-user visible. I can't think 
of any reason that they need to be visible to the standard user? Even if 
when we set the statistics private, it makes just that column non-visible.


Sincerely,

Joshua D. Drake



--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of systable_beginscan_ordered in event trigger patch

2012-12-12 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Maybe we could just append to it how to remove such an event trigger,
 which is easy to do because you can't target an event trigger related
 command with event triggers, so the following is not disabled:
 DROP EVENT TRIGGER abort_ddl;

 Oh really?  The explanation of the example certainly doesn't say that.

I remember than in my proposals somewhere we had support for a very
limited form of command triggers for any command in the system. Of
course as that included transaction control commands we made that
better. I'm quite tired so maybe my memory is playing tricks to me, but
I kind of remember that we also had quite a discussion about just that
example and its phrasing in the docs and did came out with something
satisfactory.

Robert, does that ring a bell to you? I'm going to crawl the archives
tomorrow if not…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PRIVATE columns

2012-12-12 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Currently, ANALYZE collects data on all columns and stores these
 samples in pg_statistic where they can be seen via the view pg_stats.

Only if you have appropriate privileges.

 In some cases we have data that is private and we do not wish others
 to see it, such as patient names. This becomes more important when we
 have row security.

 Perhaps that data can be protected, but it would be even better if we
 simply didn't store value-revealing statistic data at all.

SET STATISTICS 0 seems like a sufficient solution for people who don't
trust the have_column_privilege() protection in the pg_stats view.

In practice I think this is a waste of time, though.  Anyone who can
bypass the view restriction can probably just read the original table.

(I suppose we could consider marking pg_stats as a security_barrier
view to make this even safer.  Not sure it's worth the trouble though;
the interesting columns are anyarray so it's hard to do much with them
mechanically.)

 It would be good if we could collect the overall stats
 * NULL fraction
 * average width
 * ndistinct
 yet without storing either the MFVs or histogram.

Do you have any evidence whatsoever that that's worth the trouble?
I'd bet against it.  And if we're being paranoid, who's to say that
those numbers couldn't reveal useful data in themselves?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use gcc built-in atomic inc/dec in lock.c

2012-12-12 Thread Mikko Tiihonen

Hi,

I noticed a XXX: It might be worth considering using an atomic fetch-and-add
instruction here, on architectures where that is supported. in lock.c

Here is my first try at using it. The patch adds a configure check for
gcc 4.7 __atomic_add_fetch as well as the older __sync_add_and_fetch built-ins.
If either is available they are used instead of the mutex.

Changes do the following:
- declare atomic_inc and atomic_dec inline functions (implemented either using
  GCC built-in functions or the old mutex code) in new atomics.h
- change lock.c to use atomic_* functions instead of explicit mutex
- moved all other assignments inside the mutex to occur before the atomic
  operation so that the barrier of the atomic operation can guarantee the
  stores are visible when the function ends
- removed one assert that could not easily be implemented with atomic_dec
  in RemoveLocalLock


Using method AbortStrongLockAcquire as an example.
When compiling with Fedora GCC 4.7.2 the following assembly code is generated

Original code before the patch: 136 bytes
Mutex code with the patch: 124 bytes
Code with gcc-built-ins: 56 bytes

I think moving the extra assignments outside the mutex has allowed the
compiler to optimize the code more even when the mutex is used.


Questions:
1) is it safe to move the assignments to locallock-holdsStrongLockCount
   and StrongLockInProgress outside the FastPathStrongRelationLocks?
2) With built-ins the FastPathStrongRelationLockData becomes uint32[1024],
   should we add back some padding to reduce the cacheline collisions?
   For a modern cpu using 64 byte cache lines there can be only
   max 16 concurrent updates at the and the propability of collision
   is quite big
3) What kind of pgbench test would utilise the code path the most?

TODO:
1) add check in configure.in to ensure that the built-ins are not converted
   to external calls by gcc (on architectures that use the gcc generic version)
2) other architectures / compilers


--

Original code before the patch:

0e10 AbortStrongLockAcquire:
 e10:   48 89 5c 24 f0  mov%rbx,-0x10(%rsp)
 e15:   48 89 6c 24 f8  mov%rbp,-0x8(%rsp)
 e1a:   48 83 ec 18 sub$0x18,%rsp
 e1e:   48 8b 1d 00 00 00 00mov0x0(%rip),%rbx# e25 
AbortStrongLockAcquire+0x15
 e25:   48 85 dbtest   %rbx,%rbx
 e28:   74 41   je e6b AbortStrongLockAcquire+0x5b
 e2a:   8b 6b 28mov0x28(%rbx),%ebp
 e2d:   b8 01 00 00 00  mov$0x1,%eax
 e32:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# e39 
AbortStrongLockAcquire+0x29
 e39:   81 e5 ff 03 00 00   and$0x3ff,%ebp
 e3f:   f0 86 02lock xchg %al,(%rdx)
 e42:   84 c0   test   %al,%al
 e44:   75 3a   jnee80 AbortStrongLockAcquire+0x70
 e46:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# e4d 
AbortStrongLockAcquire+0x3d
 e4d:   48 c7 05 00 00 00 00movq   $0x0,0x0(%rip)# e58 
AbortStrongLockAcquire+0x48
 e54:   00 00 00 00
 e58:   83 6c a8 04 01  subl   $0x1,0x4(%rax,%rbp,4)
 e5d:   c6 43 40 00 movb   $0x0,0x40(%rbx)
 e61:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# e68 
AbortStrongLockAcquire+0x58
 e68:   c6 00 00movb   $0x0,(%rax)
 e6b:   48 8b 5c 24 08  mov0x8(%rsp),%rbx
 e70:   48 8b 6c 24 10  mov0x10(%rsp),%rbp
 e75:   48 83 c4 18 add$0x18,%rsp
 e79:   c3  retq
 e7a:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
 e80:   48 8b 3d 00 00 00 00mov0x0(%rip),%rdi# e87 
AbortStrongLockAcquire+0x77
 e87:   ba a8 05 00 00  mov$0x5a8,%edx
 e8c:   be 00 00 00 00  mov$0x0,%esi
 e91:   e8 00 00 00 00  callq  e96 AbortStrongLockAcquire+0x86
 e96:   eb ae   jmpe46 AbortStrongLockAcquire+0x36
 e98:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
 e9f:   00

Mutex code with the patch:
0e00 AbortStrongLockAcquire:
 e00:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# e07 
AbortStrongLockAcquire+0x7
 e07:   48 85 c0test   %rax,%rax
 e0a:   74 55   je e61 AbortStrongLockAcquire+0x61
 e0c:   48 89 5c 24 f0  mov%rbx,-0x10(%rsp)
 e11:   48 89 6c 24 f8  mov%rbp,-0x8(%rsp)
 e16:   48 83 ec 18 sub$0x18,%rsp
 e1a:   8b 68 28mov0x28(%rax),%ebp
 e1d:   c6 40 40 00 movb   $0x0,0x40(%rax)
 e21:   b8 01 00 00 00  mov$0x1,%eax
 e26:   48 c7 05 00 00 00 00movq   $0x0,0x0(%rip)  

Re: [HACKERS] Use gcc built-in atomic inc/dec in lock.c

2012-12-12 Thread Peter Geoghegan
On 12 December 2012 22:11, Mikko Tiihonen
mikko.tiiho...@nitorcreations.com wrote:
 noticed a XXX: It might be worth considering using an atomic fetch-and-add
 instruction here, on architectures where that is supported. in lock.c

 Here is my first try at using it.

That's interesting, but I have to wonder if there is any evidence that
this *is* actually helpful to performance.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-12-12 Thread Greg Smith

On 12/5/12 6:49 PM, Simon Riggs wrote:

* Zeroing pages, making pages all 1s
* Transposing pages
* Moving chunks of data sideways in a block
* Flipping bits randomly
* Flipping data endianness
* Destroying particular catalog tables or structures


I can take this on, as part of the QA around checksums working as 
expected.  The result would be a Python program; I don't have quite 
enough time to write this in C or re-learn Perl to do it right now.  But 
this won't be a lot of code.  If it's tossed one day as simply a 
prototype for something more permanent, I think it's still worth doing now.


The UI I'm thinking of for what I'm going to call pg_corrupt is a CLI 
that asks for:


-A relation name
-Corruption type (an entry from this list)
-How many blocks to touch

I'll just loop based on the count, randomly selecting a block each time 
and messing with it in that way.


The randomness seed should be printed as part of the output, so that 
it's possible re-create the damage exactly later.  If the server doesn't 
handle it correctly, we'll want to be able to replicate the condition it 
choked on exactly later, just based on the tool's log output.


Any other requests?

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Andres Freund
On 2012-12-12 12:13:44 +0100, Andres Freund wrote:
 On 2012-12-11 22:20:18 -0500, Tom Lane wrote:
  Joachim Wieland j...@mcknight.de writes:
   On Tue, Dec 11, 2012 at 6:52 PM, Andres Freund and...@2ndquadrant.com 
   wrote:
   One problem I see is that while exporting a snapshot solves the
   visibility issues of the table's contents it does not protect against
   schema changes. I am not sure whether thats a problem.
  
   If somebody runs a CLUSTER or something like that, the table's contents
   will be preserved including MVCC semantics. That's fine.
   The more problematic cases I see are TRUNCATE, DROP and ALTER
   TABLE.
 
   This is why the pg_dump master process executes a
   lock table table in access share mode
   for every table, so your commands would all block.
 
  A lock doesn't protect against schema changes made before the lock was
  taken.  The reason that the described scenario is problematic is that
  pg_dump is going to be expected to work against a snapshot made before
  it gets a chance to take those table locks.  Thus, there's a window
  where DDL is dangerous, and will invalidate the dump --- perhaps without
  any warning.

  Now, we have this problem today, in that pg_dump has to read pg_class
  before it can take table locks so some window exists already.

  What's
  bothering me about what Andres describes is that the window for trouble
  seems to be getting much bigger.

 This morning I wondered whether we couldn't protect against that by
 acquiring share locks on the catalog rows pg_dump reads, that would
 result in could not serialize access due to concurrent update type of
 errors which would be easy enough discernible/translateable.
 While pretty damn ugly that should take care of most of those issues,
 shouldn't it?

After a quick look it doesn't look too hard to add this, does anybody
have an opinion whether its something worthwile? And possibly a suggest
option name? I can't come up with something better than --recheck-locks
or something, but thats awful.

I don't think there's too much point in locking anything but pg_class,
pg_attribute, pg_type nearly everything else is read using a mvcc
snapshot anyway or isn't all that critical. Other candidates?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2012-12-12 12:13:44 +0100, Andres Freund wrote:
 This morning I wondered whether we couldn't protect against that by
 acquiring share locks on the catalog rows pg_dump reads, that would
 result in could not serialize access due to concurrent update type of
 errors which would be easy enough discernible/translateable.
 While pretty damn ugly that should take care of most of those issues,
 shouldn't it?

How would it fix anything?  The problem is with DDL that's committed and
gone before pg_dump ever gets to the table's pg_class row.  Once it
does, and takes AccessShareLock on the relation, it's safe.  Adding a
SELECT FOR SHARE step just adds more time before we can get that lock.

Also, locking the pg_class row doesn't provide protection against DDL
that doesn't modify the relation's pg_class row, of which there is
plenty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding exported base snapshot

2012-12-12 Thread Andres Freund
On 2012-12-12 18:52:33 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2012-12-12 12:13:44 +0100, Andres Freund wrote:
  This morning I wondered whether we couldn't protect against that by
  acquiring share locks on the catalog rows pg_dump reads, that would
  result in could not serialize access due to concurrent update type of
  errors which would be easy enough discernible/translateable.
  While pretty damn ugly that should take care of most of those issues,
  shouldn't it?

 How would it fix anything?  The problem is with DDL that's committed and
 gone before pg_dump ever gets to the table's pg_class row.  Once it
 does, and takes AccessShareLock on the relation, it's safe.  Adding a
 SELECT FOR SHARE step just adds more time before we can get that lock.

Getting a FOR SHARE lock ought to error out with a serialization failure
if the row was updated since our snapshot started as pg_dump uses
repeatable read/serializable. Now that obviously doesn't fix the
situation, but making it detectable in a safe way seems to be good
enough for me.

 Also, locking the pg_class row doesn't provide protection against DDL
 that doesn't modify the relation's pg_class row, of which there is
 plenty.

Well, thats why I thought of pg_class, pg_attribute, pg_type. Maybe that
list needs to get extended a bit, but I think just those 3 should detect
most dangerous situations.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - git repository

2012-12-12 Thread Peter Geoghegan
On 9 December 2012 19:14, Andres Freund and...@2ndquadrant.com wrote:
 I pushed a new version which

 - is rebased ontop of master
 - is based ontop of the new xlogreader (biggest part)
 - is base ontop of the new binaryheap.h
 - some fixes
 - some more comments

I decided to take another look at this, following my earlier reviews
of a substantial subset of earlier versions of this patch, including
my earlier reviews of WAL decoding [1] and focused review of snapshot
building [2] (itself a subset of WAL decoding). I think it was the
right time to consolidate your multiple earlier patches, because some
of the earlier BDR patches were committed (including Rearrange
storage of data in xl_running_xacts [3], Basic binary heap
implementation [4], Embedded list interface [5], and, though it
isn't touched on here and is technically entirely distinct
functionality, Background worker processes [6]). Furthermore, now
that we've gotten past some early rounds of reviewing, it makes sense
to build a *perfectly* (rather than just approximately) atomic unit,
as we work towards something that is actually committable.

So what's the footprint of this big, newly rebased feature branch?
Well, though some of these changes are uncommitted stuff from Heikki
(i.e. XLogReader, which you've modified), and some of this is README
documentation, the footprint is very large. I merged master with your
dev branch (last commit of yours,
743f3af081209f784a30270bdf49301e9e242b78, made on Mon 10th Dec 15:35),
and the stats are:

 91 files changed, 9736 insertions(+), 956 deletions(-)

Note that there is a relatively large number of files affected in part
because the tqual interface was bashed around a bit for the benefit of
logical decoding - a lot of the changes to each of those 91 files are
completely trivial.

I'm very glad that you followed my earlier recommendation of splitting
your demo logical changeset consumer into a contrib module, in the
spirit of contrib/spi, etc. This module, test_decoding, represents a
logical entry point, if you will, for the entire patch. As unwieldy as
it may appear to be, the patch is (or at least *should be*) ultimately
reducible to some infrastructural changes to core to facilitate this
example logical change-set consumer.

test_decoding contrib module


 contrib/Makefile   |1 +
 contrib/test_decoding/Makefile |   16 +
 contrib/test_decoding/test_decoding.c  |  192 

Once again, because test_decoding is a kind of entry point, it gives
me a nice point to continually refer back to when talking about this
patch. (Incidentally, maybe test_decoding should be called
pg_decoding?).

The regression tests pass, though this isn't all that surprising,
since frankly the test coverage of this patch appears to be quite low.
I know that you're working with Abhijit on improvements to the
isolation tester to verify the correctness of the patch as it relates
to supporting actual, practical logical replication systems. I would
very much welcome any such test coverage (even if it wasn't in a
committable state), since in effect you're asking me to take a leap of
faith in respect of how well this infrastructure will support such
systems – previously, I obliged you and didn't focus on concurrency
and serializability concerns (it was sufficient to print out values/do
some decoding in a toy function), but it's time to take a closer look
at those now, I feel. test_decoding is a client of the logical
change-set producing infrastructure, and there appears to be broad
agreement that that infrastructure needs to treat such consumers in a
way that is maximally abstract. My question is, just how abstract does
this interface have to be, really? How well are you going to support
the use-case of a real logical replication system?

Now, maybe it's just that I haven't being paying attention (in
particular, to the discussion surrounding [3] – though that commit
doesn't appear to have been justified in terms of commit ordering in
BDR at all), but I would like you to be more demonstrative of certain
things, like:

1. Just what does a logical change-set consumer look like? What things
are always true of one, and never true of one?
2. Please describe in as much detail as possible the concurrency
issues with respect to logical replication systems. Please make
verifiable, testable claims as to how well these issues are considered
here, perhaps with reference to the previous remarks of subject-matter
experts like Chris Browne [7], Steve Singer [8] and Kevin Grittner [9]
following my earlier review.

I'm not all that impressed with where test_decoding is at right now.
There is still essentially no documentation. I think it's notable that
you don't really touch the ReorderBufferTXN passed by the core system
in the test_decoding plugin.

test_decoding and pg_receivellog


I surmised that the way that the test_decoding module is intended to
be used is as a client of 

Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Bruce Momjian
On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 
 On 12/12/2012 05:12 PM, Andrew Dunstan wrote:
 
 On 12/12/2012 04:32 PM, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 A client is testing a migration from 9.1 to 9.2, and has found that a
 large number of queries run much faster if they use index-only scans.
 However, the only way he has found to get such a plan is by increasing
 the seq_page_cost to insanely high levels (3.5). Is there any approved
 way to encourage such scans that's a but less violent than this?
 Is the pg_class.relallvisible estimate for the table realistic? They
 might need a few more VACUUM and ANALYZE cycles to get it into the
 neighborhood of reality, if not.
 
 That was the problem - I didn't know this hadn't been done.
 
 
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

[ moved to hackers list.]

Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
would prefer the later.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

 [ moved to hackers list.]

 Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
 have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
 would prefer the later.

ANALYZE does not set that value, and is not going to start doing so,
because it doesn't scan enough of the table to derive a trustworthy
value.

It's been clear for some time that pg_upgrade ought to do something
about transferring the statistics columns in pg_class to the new
cluster.  This is just another example of why.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PRIVATE columns

2012-12-12 Thread Jan Wieck

On 12/12/2012 3:12 PM, Simon Riggs wrote:

On 12 December 2012 19:13, Jan Wieck janwi...@yahoo.com wrote:

On 12/12/2012 1:12 PM, Simon Riggs wrote:


Currently, ANALYZE collects data on all columns and stores these
samples in pg_statistic where they can be seen via the view pg_stats.

In some cases we have data that is private and we do not wish others
to see it, such as patient names. This becomes more important when we
have row security.

Perhaps that data can be protected, but it would be even better if we
simply didn't store value-revealing statistic data at all. Such
private data is seldom the target of searches, or if it is, it is
mostly evenly distributed anyway.



Would protecting it the same way, we protect the passwords in pg_authid, be
sufficient?


The user backend does need to be able to access the stats data during
optimization. It's hard to have data accessible and yet impose limits
on the uses to which that can be put. If we have row security on the
table but no equivalent capability on the stats, then we'll have
leakage. e.g. set statistics 1, ANALYZE, then leak 1 credit
card numbers.


Like for the encrypted password column of pg_authid, I don't see any 
reason why the values in the stats columns need to be readable for 
anyone but a superuser at all. Do you?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Pavan Deolasee
On Thu, Dec 13, 2012 at 9:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

 [ moved to hackers list.]

 Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
 have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
 would prefer the later.

 ANALYZE does not set that value, and is not going to start doing so,
 because it doesn't scan enough of the table to derive a trustworthy
 value.


Should we do that though ? i.e. scan the entire map and count the
number of bits at the end of ANALYZE, like we do at the end of VACUUM
? I recently tried to optimize that code path by not recounting at the
end of the vacuum and instead track the number of all-visible bits
while scanning them in the earlier phases on vacuum. But it turned out
that its so fast to count even a million bits that its probably not
worth doing so.

 It's been clear for some time that pg_upgrade ought to do something
 about transferring the statistics columns in pg_class to the new
 cluster.  This is just another example of why.


+1.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple --table options for other commands

2012-12-12 Thread Josh Kupershmidt
On Wed, Dec 12, 2012 at 8:14 AM, Karl O. Pinc k...@meme.com wrote:
 On 12/11/2012 10:25:43 PM, Josh Kupershmidt wrote:
 On Tue, Dec 11, 2012 at 11:46 AM, Karl O. Pinc k...@meme.com wrote:
  I believe you need ellipses behind --table in the syntax summaries
  of the command reference docs.

 Hrm, I was following pg_dump's lead here for the .sgml docs, and
 didn't see anywhere that pg_dump makes the multiple --table syntax
 explicit other than in the explanatory text underneath the option.

 Yes.  I see.  I didn't look at all the command's reference pages
 but did happen to look at clusterdb, which does have --table
 in the syntax summary.  I just checked and you need to fix
 clusterdb, reindexdb, and vacuumdb.

OK, I made some changes to the command synopses for these three
commands. For some reason, reindexdb.sgml was slightly different from
the other two, in that the --table and --index bits were underneath a
group node instead of arg. I've changed that one to be like the
other two, and made them all have:
 arg choice=opt rep=repeat

to include the ellipses after the table -- I hope that matches what
you had in mind.

Josh


multiple_tables.v3.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple --table options for other commands

2012-12-12 Thread Karl O. Pinc
Hi Josh,

The good news is that there's only this little bit of
documentation formatting to fix

On 12/12/2012 11:04:53 PM, Josh Kupershmidt wrote:
 On Wed, Dec 12, 2012 at 8:14 AM, Karl O. Pinc k...@meme.com wrote:
  On 12/11/2012 10:25:43 PM, Josh Kupershmidt wrote:
  On Tue, Dec 11, 2012 at 11:46 AM, Karl O. Pinc k...@meme.com
 wrote:
   I believe you need ellipses behind --table in the syntax
 summaries
   of the command reference docs.


  Yes.  I see.  I didn't look at all the command's reference pages
  but did happen to look at clusterdb, which does have --table
  in the syntax summary.  I just checked and you need to fix
  clusterdb, reindexdb, and vacuumdb.
 
 OK, I made some changes to the command synopses for these three
 commands. For some reason, reindexdb.sgml was slightly different from
 the other two, in that the --table and --index bits were underneath a
 group node instead of arg. I've changed that one to be like the
 other two, and made them all have:
  arg choice=opt rep=repeat
 
 to include the ellipses after the table -- I hope that matches what
 you had in mind.

Sadly, the ... is in the wrong place in all 3.  Yours looks like (for 
2 examples):

vacuumdb [connection-option...] [option...] [ --table | -t table  
[( column [,...] )] ...] [dbname]

reindexdb [connection-option...] [ --table | -t table  ...] [ --index | 
-i index  ...] [dbname]



I want the ... outside the square braces, because the --table (or -t)
must repeat for each table specified.  Like:

vacuumdb [connection-option...] [option...] [ --table | -t table  
[( column [,...] )] ]... [dbname]

reindexdb [connection-option...] [ --table | -t table ]... [ --index | 
-i index ]... [dbname]

Sorry, I don't see any examples in the existing docs of how this is 
done.

It seems like what you have should work, although perhaps
somehow that's the difference between the arg and group
and you need to switch them back to how reindex used to
be.  Otherwise, there's the docs at docbook.org, and
the docbook mailing lists.  Or maybe the stylesheets
are broken.  (Always blame the tool!  It's never _our_
fault!  ;-)

Have you had trouble getting this to work?
Maybe someone who knows what they're doing will step
in and give us the answer.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers