date:20111011

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Jeff Davis

On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote:
> The real issue is that the costing estimates need to be accurate, and
> that's where the rubber hits the road.  Otherwise, even if we pick the
> right way to scan the table, we may do silly things up the line when
> we go to start constructing the join order.  I think we need to beef
> up ANALYZE to gather statistics on the fraction of the pages that are
> marked all-visible, or maybe VACUUM should gather that information.
> The trouble is that if we VACUUM and then ANALYZE, we'll often get
> back a value very close to 100%, but then the real value may diminish
> quite a bit before the next auto-analyze fires.  I think if we can
> figure out what to do about that problem we'll be well on our way...

Can you send stats messages to keep track when you unset a bit in the
VM? That might allow it to be more up-to-date.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Heikki Linnakangas


On 11.10.2011 23:21, Simon Riggs wrote:

If the normal default_transaction_isolation = read committed and all
transactions that require serializable are explicitly marked in the
application then there is no way to turn off SSI without altering the
application. That is not acceptable, since it causes changes in
application behaviour and possibly also performance issues.


I don't get that. If all the transactions that require serializability 
are marked as such, why would you disable SSI for them? That would just 
break the application, since the transactions would no longer be 
serializable.


If they don't actually need serializability, but repeatable read is 
enough, then mark them that way.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Tom Lane

Josh Berkus  writes:
> The reason I want to have the dependant roles created as part of a
> database dump is so that we can ship around dump files as a single file,
> and restore them with a single command.  This is considerably simpler
> than the current requirements, which are:

> 1. pg_dumpall the roles
> 2. pg_dump the database
> 3. tar both files
> 4. ship file
> 5. untar both files
> 6. psql the role file
> 7. pg_restore the database file

I don't find this terribly convincing.  I can see the rationales for two
endpoint cases: (1) restore these objects into exactly the same
ownership/permissions environment that existed before, and (2) restore
these objects with the absolute minimum of ownership/permissions
assumptions.  The latter case seems to me to be covered already by
--no-owner --no-privileges.  Cases in between those endpoints seem
pretty special-purpose, and I don't want to buy into the assumption that
we should fix them by creating a plethora of --do-it-joshs-way switches.
Can we invent something comparable to the --list/--use-list mechanism,
that can handle a range of use cases with a bit more manual effort?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 4:21 PM, Kääriäinen Anssi
 wrote:
> This is probably a silly idea, but I have been wondering about the
> following idea: Instead of having visibility info in the row header,
> have a couple of row visibility slots in the page header. These slots
> could be shared between rows in the page, so that if you do a bulk
> insert/update/delete you would only use one slot. If the slots
> overflow, you would use external slots buffer.
>
> When the row is all visible, no slot would be used at all.

I've thought about stuff like this, too.  One idea would be to make
the slots just be items on the page.  Each tuple includes a 2-byte
field which points to the item number (on the same page) where the
XMAX, CID, and CTID information for the update/delete of that tuple is
stored.  If the tuple isn't updated or deleted, it points back to the
tuple's own item ID.  When a tuple is updated or deleted, add an item
to the page with the necessary XMAX/CMAX/CTID and set the pointer to
the new item ID.  If the page is full, allocate an "overflow block"
and store the overflow block number in the page header.  Add the new
item to the overflow block and set the 2-byte field to point to it,
setting the high bit or something like that to indicate that the data
is in the overflow block rather than the data block.  The main
objection to this idea I see is that if the overflow blocks get much
use, the overall performance might end up being poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Tom Lane

Joe Conway  writes:
> On 10/11/2011 02:07 PM, Kevin Grittner wrote:
>> I would certainly vote for enforcing on the SET and not causing an
>> error on the attempt to change the limit. ...
>> What problems do you see with that?

> Yeah, I don't know why it need be handled any different than say
>   ALTER DATABASE foo SET config_param TO value
> or
>   ALTER ROLE foo SET config_param TO value
> These cases do not effect already existing processes either.

It's not the same thing.  Those operations are documented as providing
the initial default value for subsequently-started sessions.  The
proposed change in limit values is different because the GUC range
limits have always before been immutable and continuously enforced
for the life of a database instance.

It may be that Kevin's proposal is adequate.  But I find that far
from obvious.  The trend of everything we've done with GUC for the last
ten years is to cause settings changes to apply immediately on-demand
and without "oh, but that's obvious if you know the implementation"
special cases.  I'm not real sure why this should get a free exemption
from that expectation ... or to put it more plainly, I *am* sure that
we'll be expected to fix it later, just like we had to fix the behavior
around removal of postgresql.conf entries, and some other things that
people didn't find as obvious as all that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 4:00 PM, Florian Pflug  wrote:
> On Oct11, 2011, at 21:27 , Robert Haas wrote:
>> Alternatively, we could try to graft the concept of a self-clustering
>> table on top of the existing heap implementation.  But I'm having
>> trouble seeing how that would work.  The TODO describes it as
>> something like "maintain CLUSTER ordering", but that's a gross
>> oversimplification, because we have no structure that would allow us
>> to sensibly do any such thing...  the current heap implementation is
>> just that: a pile of stuff.
>
> We could still be smarter about where we insert new rows in a clustered
> table, though.
>
> Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
> tuple will eventually go in the index we're supposed to maintain CLUSTER
> for. Then we'd check if any of the pages referenced there contains enough
> space, and if so place the new tuple there. If not it'd go at the end.

That's an interesting idea, but my guess is that the "if not" clause
would trigger frequently enough to make this not work very well.

Of course, we'd need to test it to know for sure

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 3:18 PM, Josh Berkus  wrote:
>
>> The trouble is that if we VACUUM and then ANALYZE, we'll often get
>> back a value very close to 100%, but then the real value may diminish
>> quite a bit before the next auto-analyze fires.  I think if we can
>> figure out what to do about that problem we'll be well on our way...
>
> It's not so much an issue of when the last auto-analyze was as an issue
> of the number of rows in write transactions against that table in the
> last X minutes.  This is where it really hurts us that
> pg_stat_user_tables is not time-based.

The number of write transactions in the last X minutes seems pretty
much irrelevant.

What matters is the number of previously-all-visible pages written
since the last vacuum.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-10-11 Thread Jun Ishiduka


> Some testing notes
> --
> select pg_start_backup('x');
> ERROR: full_page_writes on master is set invalid at least once since
> latest checkpoint
> 
> I think this error should be rewritten as
> ERROR: full_page_writes on master has been off at some point since
> latest checkpoint
> 
> We should be using 'off' instead of 'invalid' since that is what is what
> the user sets it to.

Sure.


> I switched full_page_writes=on , on the master
> 
> did a pg_start_backup() on the slave1.
> 
> Then I switched full_page_writes=off on the master, did a reload +
> checkpoint.
> 
> I was able to then do my backup of slave1, copy the control file, and
> pg_stop_backup().
>
> When I did the test slave2 started okay, but is this safe? Do we need a
> warning from pg_stop_backup() that is printed if it is detected that
> full_page_writes was turned off on the master during the backup period?

I also reproduced.

pg_stop_backup() fails in most cases.
However, it succeeds if both the following cases are true.
  * checkpoint is done before walwriter recieves SIGHUP.
  * slave1 has not received the WAL of 'off' by SIGHUP yet.



> Minor typo above at 'CHECKPOTNT'

Yes.


> If my concern about full page writes being switched to off in the middle
> of a backup is unfounded then I think this patch is ready for a
> committer. They can clean the two editorial changes when they apply the
> patches.

Yes. I'll clean since these comments fix.


> If do_pg_stop_backup is going to need some logic to recheck the full
> page write status then an updated patch is required.

It already contains.


Regards.


Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka@po.ntts.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Josh Berkus


> There seems to be agreement on something for (2), and it won't be hard.
> (1) would probably be much more complex. Essentially we'd need to add a
> new object type for roles, I think. But won't (2) give you most of what
> you need for (1) anyway? AIUI, your problem is that the roles might not
> exist, and so some or all of the dump will fail. But if you have (2) and
> dump the roles without passwords and restore them before running
> pg_restore that wouldn't happen. It won't be one command but it will be
> two or three pretty easy commands.

These serve two different purposes.

The reason I want to have the dependant roles created as part of a
database dump is so that we can ship around dump files as a single file,
and restore them with a single command.  This is considerably simpler
than the current requirements, which are:

1. pg_dumpall the roles
2. pg_dump the database
3. tar both files
4. ship file
5. untar both files
6. psql the role file
7. pg_restore the database file

Since the above is something I'm doing on around 11 different machines
between once a day and once a week, eliminating the 4 extra steps would
be really nice.

However, we'd also need CREATE OR REPLACE ROLE to really integrate
shipping database copies.  Without that, including roles in the database
dump doesn't help that much.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Joe Conway

On 10/11/2011 02:07 PM, Kevin Grittner wrote:
> Tom Lane  wrote:
>  
>> This isn't exactly a trivial matter.  What happens for instance if
>> you try to change the limit, and there are already active values
>> outside the limit in some processes?
>  
> I would certainly vote for enforcing on the SET and not causing an
> error on the attempt to change the limit.  (Maybe a notice?)  At the
> time they set the GUC, they were allowed to do so.  It's a bit like
> revoking a user's right to create a table in a schema -- what if
> they've already done so?  You leave the table and you don't let them
> create another.
>  
> What problems do you see with that?

Yeah, I don't know why it need be handled any different than say

  ALTER DATABASE foo SET config_param TO value
or
  ALTER ROLE foo SET config_param TO value

These cases do not effect already existing processes either.

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-10-11 Thread Bruce Momjian

Andrew Dunstan wrote:
> Attached are two patches, one to remove some infelicity in the entab 
> makefile, and the other to allow skipping specifying the typedefs file 

I have applied the 'entab' Makefile fix.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, Oct 11, 2011 at 10:30 PM, Florian Pflug  wrote:
> 
> > That experience has taught me that backwards compatibility, while very
> > important in a lot of cases, has the potential to do just as much harm
> > if overdone.
> 
> Agreed. Does my suggestion represent overdoing it? I ask for balance,
> not an extreme.

Well, balance is looking at what everyone else in the group is
suggesting, and realizing you might not have all the answers, and
listening.  As far as I can see, you are the _only_ one who thinks it
needs an option.  In that light, your suggestion seems extreme, not
balanced.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-10-11 Thread Steve Singer

On 11-10-11 11:17 AM, Jun Ishiduka wrote:
> Done.
>
> Updated patch attached.
>

I have taken Jun's latest patch and applied it on top of Fujii's most
recent patch. I did some testing with the result but nothing theory
enough to stumble on any race conditions.

Some testing notes
--
select pg_start_backup('x');
ERROR: full_page_writes on master is set invalid at least once since
latest checkpoint

I think this error should be rewritten as
ERROR: full_page_writes on master has been off at some point since
latest checkpoint

We should be using 'off' instead of 'invalid' since that is what is what
the user sets it to.

I switched full_page_writes=on , on the master

did a pg_start_backup() on the slave1.

Then I switched full_page_writes=off on the master, did a reload +
checkpoint.

I was able to then do my backup of slave1, copy the control file, and
pg_stop_backup().
When I did the test slave2 started okay, but is this safe? Do we need a
warning from pg_stop_backup() that is printed if it is detected that
full_page_writes was turned off on the master during the backup period?

Code Notes
-
*** 6865,6870 
--- 6871,6886 
/* Pre-scan prepared transactions to find out the range of XIDs present */
oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);

+ /*
+ * The startup updates FPW in shaerd-memory after REDO. However, it must
+ * perform before writing the WAL of the CHECKPOINT. The reason is that
+ * it uses a value of fpw in shared-memory when it writes a WAL of its
+ * CHECKPOTNT.
+ */

Minor typo above at 'CHECKPOTNT'

If my concern about full page writes being switched to off in the middle
of a backup is unfounded then I think this patch is ready for a
committer. They can clean the two editorial changes when they apply the
patches.

If do_pg_stop_backup is going to need some logic to recheck the full
page write status then an updated patch is required.

> Regards.
>
> 
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka@po.ntts.co.jp
> 
>
>
>

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, Oct 11, 2011 at 10:22 PM, Tom Lane  wrote:
> > Simon Riggs  writes:
> >> How could I change the viewpoint of the group without making rational
> >> arguments when it matters?
> >
> > Well, you make your arguments, and you see if you convince anybody.
> > On these specific points, you've failed to sway the consensus AFAICT,
> > and at some point you have to accept that you've lost the argument.
> 
> I'm happy to wait more than 4 hours before trying to judge any
> meaningful consensus.
> 
> Rushing judgement on such points is hardly likely to encourage people
> to speak up, even assuming they have the opportunity.

This is an issue you have been pushing for a very long time on many
fronts --- four hours is not going to change anything.

Have you considered developing a super-backward-compatible version of
Postgres, or a patch set which does this, and seeing if there is any
interest from users?  If you could get major uptake, it would certainly
bolster your argument.  But, until I see that, I am unlikely to listen
to further protestations.

I am much more likely to just ignore your suggestions as "Oh, it is just
Simon on the backward-compatibility bandwagon again", and it causes me
to just mentally filter your ideas, which isn't productive, and I am
sure others will do the same.

Read the MS-SQL post about backward compatibility knobs as proof that
many of us know the risks of too much backward compatibility.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 10:30 PM, Florian Pflug  wrote:

> That experience has taught me that backwards compatibility, while very
> important in a lot of cases, has the potential to do just as much harm
> if overdone.

Agreed. Does my suggestion represent overdoing it? I ask for balance,
not an extreme.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Simon Riggs  wrote:

> They ask comparative questions like "What is the risk of
> upgrade?", "How much testing is required?"

> I never met a customer yet that has an automated test suite
> designed to stress the accuracy of results under concurrent
> workloads

I'll try to provide some information here to help you answer those
questions.  I hope it is helpful.

The only behavioral difference an unchanged application could see
(short of some as-yet-undiscovered bug) is that they could get more
serialization failures when using serializable isolation level than
they previously got, and that there could be a performance impact. 
It really does nothing except run exactly what serializable mode was
before, while monitoring for conditions which are present when a
race condition between transactions might cause odd results, and
generate a serialization failure as needed to prevent that.

It kind of "hangs off the side" of legacy behavior and watches
things.  No new blocking.  No new deadlocks.  No chance of results
you didn't get before.

It might also be worth reviewing this page:

http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Andrew Dunstan




On 10/11/2011 03:50 PM, Josh Berkus wrote:

Acording the docs, I assume -r is only roles, while -g includes
tablespace, so what you want is already available in pg_dumpall.

No, it's not.  You don't seem to be actually reading any of my proposals.

(1) I cannot produce a single file in custom dump format which includes
both a single database and all of the roles I need to build that database.

(2) I cannot dump a set of roles without md5 passwords.

Both of these are things I need to support dev/stage/testing integration
at multiple sites.



There seems to be agreement on something for (2), and it won't be hard. 
(1) would probably be much more complex. Essentially we'd need to add a 
new object type for roles, I think. But won't (2) give you most of what 
you need for (1) anyway? AIUI, your problem is that the roles might not 
exist, and so some or all of the dump will fail. But if you have (2) and 
dump the roles without passwords and restore them before running 
pg_restore that wouldn't happen. It won't be one command but it will be 
two or three pretty easy commands.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl restart - behaviour based on wrong instance

2011-10-11 Thread Bruce Momjian

Robert Haas wrote:
> On Wed, Mar 23, 2011 at 1:48 AM, Fujii Masao  wrote:
> > On Sat, Mar 19, 2011 at 10:20 AM, Robert Haas  wrote:
> >> On Fri, Mar 18, 2011 at 1:19 PM, Erik Rijkers  wrote:
> >>> This is OK and expected. ?But then it continues (in the logfile) with:
> >>>
> >>> FATAL: ?lock file "postmaster.pid" already exists
> >>> HINT: ?Is another postmaster (PID 20519) running in data directory
> >>> "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?
> >>>
> >>> So, complaints about the *other* instance. ?It doesn't happen once a 
> >>> successful start (with pg_ctl
> >>> start) has happened.
> >>
> >> I'm guessing that leftover postmaster.pid contents might be
> >> responsible for this?
> >
> > The cause is that "pg_ctl restart" uses the postmaster.opts which was
> > created in the primary. Since its content was something like
> > "pg_ctl -D vanilla_1/data", vanilla_1/data/postmaster.pid was checked
> > wrongly.
> >
> > The simple workaround is to exclude postmaster.opts from the backup
> > as well as postmaster.pid. But when postmaster.opts doesn't exist,
> > "pg_ctl restart" cannot start up the server. We might also need to change
> > the code of "pg_ctl restart" so that it does just "pg_ctl start" when
> > postmaster.opts doesn't exist.
> 
> Sounds reasonable.

I looked over this issue and I don't thinking having pg_ctl restart fall
back to 'start' is a good solution.  I am concerned about cases where we
start a different server without shutting down the old server, for some
reason.  When they say 'restart', I think we have to assume they want a
restart.

What I did do was to document that not backing up postmaster.pid and
postmaster.opts might help prevent pg_ctl from getting confused.

Patch applied and backpatched to 9.1.X.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index b8daedc..737355a
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** SELECT pg_stop_backup();
*** 869,875 
  of mistakes when restoring.  This is easy to arrange if
  pg_xlog/ is a symbolic link pointing to someplace outside
  the cluster directory, which is a common setup anyway for performance
! reasons.
 
  
 
--- 869,879 
  of mistakes when restoring.  This is easy to arrange if
  pg_xlog/ is a symbolic link pointing to someplace outside
  the cluster directory, which is a common setup anyway for performance
! reasons.  You might also want to exclude postmaster.pid
! and postmaster.opts, which record information
! about the running postmaster, not about the
! postmaster which will eventually use this backup.
! (These files can confuse pg_ctl.)
 
  
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 10:22 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> How could I change the viewpoint of the group without making rational
>> arguments when it matters?
>
> Well, you make your arguments, and you see if you convince anybody.
> On these specific points, you've failed to sway the consensus AFAICT,
> and at some point you have to accept that you've lost the argument.

I'm happy to wait more than 4 hours before trying to judge any
meaningful consensus.

Rushing judgement on such points is hardly likely to encourage people
to speak up, even assuming they have the opportunity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Florian Pflug

On Oct11, 2011, at 22:55 , Simon Riggs wrote:
> Probably as a matter of policy all new features that effect semantics
> should have some kind of compatibility or off switch, if easily
> possible.

There's a huge downside to that, though. After a while, you end up
with a gazillion settings, each influencing behaviour in non-obvious,
subtle ways. Plus, every new code we add would have to be tested against
*all* combinations of these switches. Or, maybe, we'd punt and make
some features work only with "reasonable" settings. And by doing so
cause much frustration of the kind "I need to set X to Y to use feature
Z, but I can't because our app requires X to be set to Y2".

I've recently had to use Microsoft SQL Server for a project, and they
fell into *precisely* this trap. Nearly *everything* is a setting there,
like whether various things follow the ANSI standard (NULLS, CHAR types,
one setting for each), whether identifiers are double-quoted or put between
square brackets, whether loss of precision is an error, ...

And, some of their very own features depend on specific combination of
these settings. Sometimes on the values in effect when the object was
created, sometimes when it's used. For example, their flavour of
materialized views (called "indexed views") requires a bunch of options
to be set correctly to be able to create such an object. Some of these
must even be in effect to update the view's base tables, once the view
is created...

That experience has taught me that backwards compatibility, while very
important in a lot of cases, has the potential to do just as much harm
if overdone.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 10:00 PM, Kevin Grittner
 wrote:
> Simon Riggs  wrote:
>
>> "You'll have to retest your apps" just isn't a good answer
>
> For which major PostgreSQL releases have you recommended that people
> deploy their apps without retesting?

None. People don't always follow my advice, regrettably. They ask
comparative questions like "What is the risk of upgrade?", "How much
testing is required?"

I never met a customer yet that has an automated test suite designed
to stress the accuracy of results under concurrent workloads, so the
inability to control the way a new feature operates makes such
questions more likely to be given an answer that indicates greater
effort and higher risk. That is exactly what I personally would wish
to avoid.

An off switch encourages people to use new features. It is not a
punishment or an admonition to the developer.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Tom Lane

Simon Riggs  writes:
> How could I change the viewpoint of the group without making rational
> arguments when it matters?

Well, you make your arguments, and you see if you convince anybody.
On these specific points, you've failed to sway the consensus AFAICT,
and at some point you have to accept that you've lost the argument.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 9:53 PM, Bruce Momjian  wrote:
> Simon Riggs wrote:
>> > Simon seems to value backward-compatibility more than the average
>> > hackers poster. ?The lack of complaints about 9.1 I think means that the
>> > hackers decision of _not_ providing a swich was the right one.
>>
>> So its been out 1 month and you think that is sufficient time for us
>> to decide that there are no user complaints about SSI? I doubt it.
>> Longer term I have every confidence that it will be appreciated.
>>
>> I'm keen to ensure people enjoy the possibility of upgrading to the
>> latest release. The continual need to retest applications mean that
>> very few users upgrade quickly or with anywhere near the frequency
>> with which we put out new releases. What is the point of rushing out
>> software that nobody can use? pg_upgrade doesn't change your
>> applications, so there isn't a fast path to upgrade in the way you
>> seem to think.
>
> Simon, I basically think you are swimming up-stream on this issue, and
> on the recovery.conf thread as well.  You can keep arguing that backward
> compatibility warrants more effort, but until there is more general
> agreement in the group, you are going to lose these arguments, and
> frankly, the arguments are getting tiring.

I speak when it is important that someone does so, and only on
specific, real issues.

When I speak, I do so on behalf of my clients and other Postgres users
that suffer the problems created by those issues. I've never given a
viewpoint on list that I know to be the opposite of the majority of
people I represent.

How could I change the viewpoint of the group without making rational
arguments when it matters?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Kevin Grittner

Tom Lane  wrote:

> This isn't exactly a trivial matter.  What happens for instance if
> you try to change the limit, and there are already active values
> outside the limit in some processes?

I would certainly vote for enforcing on the SET and not causing an
error on the attempt to change the limit.  (Maybe a notice?)  At the
time they set the GUC, they were allowed to do so.  It's a bit like
revoking a user's right to create a table in a schema -- what if
they've already done so?  You leave the table and you don't let them
create another.

What problems do you see with that?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Dimitri Fontaine

Robert Haas  writes:
> Alternatively, we could try to graft the concept of a self-clustering
> table on top of the existing heap implementation.  But I'm having
> trouble seeing how that would work.  The TODO describes it as
> something like "maintain CLUSTER ordering", but that's a gross
> oversimplification, because we have no structure that would allow us
> to sensibly do any such thing...  the current heap implementation is
> just that: a pile of stuff.

I currently think that's the way to go, with some coarser granularity
than tuple or page.  Picture HOT inserts, if you will.  That would be at
the page level, but do we need that level of precision?

I'm thinking that we need something more like segment based here, or
maybe some intermediate value would be good between a page of 8Kb and a
segment of 1GB, but I'm not so sure.  We would have to track the bounds
of each segment for the indexed columns, and maintain them, and the
planner would have to exercise pruning at the segment level.

So going down too much in granularity would have negative impacts on
planning performances (too many data to play with), and anyway a server
that needs that kind of optimization can certainly handle a couple of GB
in its file system cache.

So, it's quite hand wavy still, but Segment Exclusion has been discussed
here already, and it seems to me that's the next thing we need. Call it
partial Seq Scan and HOT inserts if new names are your thing :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 9:37 PM, Kevin Grittner
 wrote:

> It would certainly be a trivial change to
> implement; the problem is convincing others that it's a good idea.

I don't want it, I just think we need it now. "You'll have to retest
your apps" just isn't a good answer and we should respect the huge
cost that causes our users.

Probably as a matter of policy all new features that effect semantics
should have some kind of compatibility or off switch, if easily
possible.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

2011-10-11 Thread Tom Lane

Alexander Korotkov  writes:
> On Wed, Oct 12, 2011 at 12:35 AM, Tom Lane  wrote:
>> Maybe, instead of a simple constant amcanreturn column, we need an AM
>> API function that says whether the index can return data.

> I like idea of such AM API function. Since single multicolumn index can use
> multiple opclasses, AM API function should also say *what* data index can
> return.

I was thinking more like "amcanreturn(index, column_number) returns bool"
which says if the index can return the data for that column.  The AM
would still have to return a full IndexTuple at runtime, but it'd be
allowed to insert nulls or garbage for columns it hadn't promised to
return.

BTW, if we do this, I'm rather strongly tempted to get rid of the
name-versus-cstring hack (see index_descriptor_hack() in HEAD) by
defining btree name_ops as not capable of returning data.  I don't
trust that hack much at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Simon Riggs  wrote:
 
> "You'll have to retest your apps" just isn't a good answer
 
For which major PostgreSQL releases have you recommended that people
deploy their apps without retesting?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> > Simon seems to value backward-compatibility more than the average
> > hackers poster. ?The lack of complaints about 9.1 I think means that the
> > hackers decision of _not_ providing a swich was the right one.
> 
> So its been out 1 month and you think that is sufficient time for us
> to decide that there are no user complaints about SSI? I doubt it.
> Longer term I have every confidence that it will be appreciated.
> 
> I'm keen to ensure people enjoy the possibility of upgrading to the
> latest release. The continual need to retest applications mean that
> very few users upgrade quickly or with anywhere near the frequency
> with which we put out new releases. What is the point of rushing out
> software that nobody can use? pg_upgrade doesn't change your
> applications, so there isn't a fast path to upgrade in the way you
> seem to think.

Simon, I basically think you are swimming up-stream on this issue, and
on the recovery.conf thread as well.  You can keep arguing that backward
compatibility warrants more effort, but until there is more general
agreement in the group, you are going to lose these arguments, and
frankly, the arguments are getting tiring.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

2011-10-11 Thread Dimitri Fontaine

Tom Lane  writes:
> I haven't thought as far ahead as how we might get the information
> needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
> CLASS might be the only way.

It looks to me like it's related to the RECHECK property.  Maybe it's
just too late, though.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Greg Sabino Mullane  wrote: 

> Kevin Grittner:
>  
>> Did these transactions write anything?  If not, were they
>> declared to be READ ONLY?  If they were, in fact, only reading,
>> it would be interesting to see what the performance looks like if
>> the recommendation to use the READ ONLY attribute is followed.
> 
> Yes, I'll definitely look into that, but the great majority of
> the things done in this case are read/write.

But it is precisely *because* those were fully cached read-only
transactions that the numbers came out so bad.  As Robert pointed
out, in other loads the difference in time per transaction could be
lost in the noise.

Now, I know SSI won't be good fit for all applications, but you
might not want to write it off on performance grounds for an
application where "the great majority of the things done ... are
read/write" based on a test which ran only read-only transactions
without declaring them READ ONLY.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, Oct 11, 2011 at 9:28 PM, Bruce Momjian  wrote:
> 
> > Standard conforming strings
> > was tricky because it was more user-facing, or certainly SQL-facing.
> 
> Why is SQL more important than backup?

Because the percentage of database users it affects is different. 
Administrators know when they are installing a new version of Postgres
and already are probably changing these configuration files. 
Application binaries and perhaps application developers are not as aware
of a change, and there are a far higher percentage of them in an
organization than administrators.

> There is no good reason to do this so quickly.

I just gave you a reason above, and as I said, doing backward
compatibility can make the system more complex.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Tom Lane

Simon Riggs  writes:
> There is no off switch and there should be.

As Greg said, that ship has sailed.  I believe that we specifically
discussed the notion of an "off switch" via a GUC or similar during
9.1 development, and rejected it on the grounds that GUCs changing
fundamental transactional behavior are dangerous.  I don't believe that
you've made a case for changing that decision, and even if you had,
it's too late; 9.1 is what it is.  Can we end this subthread please,
and concentrate on something actually useful, like improving SSI's
performance?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Josh Berkus


> Acording the docs, I assume -r is only roles, while -g includes
> tablespace, so what you want is already available in pg_dumpall.

No, it's not.  You don't seem to be actually reading any of my proposals.

(1) I cannot produce a single file in custom dump format which includes
both a single database and all of the roles I need to build that database.

(2) I cannot dump a set of roles without md5 passwords.

Both of these are things I need to support dev/stage/testing integration
at multiple sites.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 9:32 PM, Bruce Momjian  wrote:
> Greg Sabino Mullane wrote:
> -- Start of PGP signed section.
>> > If the normal default_transaction_isolation = read committed and all
>> > transactions that require serializable are explicitly marked in the
>> > application then there is no way to turn off SSI without altering the
>> > application. That is not acceptable, since it causes changes in
>> > application behaviour and possibly also performance issues.
>>
>> Performance, perhaps. What application behavior changes? Less
>> serialization conflicts?

If you change default_transaction_isolation then the behaviour of the
application will change.

>> > We should provide a mechanism to allow people to upgrade to 9.1+
>> > without needing to change the meaning and/or performance of their
>> > apps.
>>
>> That ship has sailed.
>
> Simon seems to value backward-compatibility more than the average
> hackers poster.  The lack of complaints about 9.1 I think means that the
> hackers decision of _not_ providing a swich was the right one.

So its been out 1 month and you think that is sufficient time for us
to decide that there are no user complaints about SSI? I doubt it.
Longer term I have every confidence that it will be appreciated.

I'm keen to ensure people enjoy the possibility of upgrading to the
latest release. The continual need to retest applications mean that
very few users upgrade quickly or with anywhere near the frequency
with which we put out new releases. What is the point of rushing out
software that nobody can use? pg_upgrade doesn't change your
applications, so there isn't a fast path to upgrade in the way you
seem to think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION .. ADD/DROP weirdness

2011-10-11 Thread Dimitri Fontaine

Robert Haas  writes:
> On Mon, Oct 10, 2011 at 2:52 PM, Tom Lane  wrote:
>> The underlying issue here is whether objects dependent on an extension
>> member should have direct dependencies on the extension too, and if not,
>> how do we prevent that?  The recordDependencyOnCurrentExtension calls
>> don't have enough information to know what to do, I think.

I think the original patch, that didn't have the DEPENDENCY_EXTENSION
tracking but relied on the INTERNAL stuff, did only record first level
objects as a dependency.  Given the way INTERNAL dependencies following
are done, that kind of worked in a limited set of cases.

> Well, I'm not an expert on this code, but from a user perspective, I
> think it would be nicer if only the view ended up being a member of
> the extension, and the generated types did not.  Otherwise, writing an
> extension upgrade script requires detailed knowledge of what other
> objects are going to be generated internally.  In fact, it doesn't
> seem implausible that the set of internally generated objects from a
> given DDL command could change between releases, which would really be
> rather ugly here.

The reason why the original patch got changed by Tom is, of course, that
it failed to work properly in some interesting cases. Specifically,
handling both your use case and extension dependencies (earthdistance
depends on cube) is not so easy. How do you know you're crossing a
dependency unit when recursing in pg_depends is a nice exercise if you
want to be very familiar with WITH RECURSIVE catalog queries.  Been
there, done that :)

The main test case is DROP EXTENSION earthdistance;, adding CASCADE is
easier because you then don't care about stopping at the right place.

Of course I'm just trying to help you figure out why the problem is not
already solved, please feel free to come back with a design that make it
simple enough :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

2011-10-11 Thread Alexander Korotkov

On Wed, Oct 12, 2011 at 12:35 AM, Tom Lane  wrote:
>
> Hm.  I had been supposing that lossless compress functions would just be
> no-ops.  If that's not necessarily the case then we might need something
> different from the opclass's decompress function to get back the
> original data.  However, that doesn't really solve the problem I'm
> concerned about, because the existence and use of such a function would
> be entirely internal to GiST.  There still needs to be a way for the
> planner to know which opclasses support data retrieval.  And I do *not*
> want to see us hard-wire "the presence of opclass function 8 means a
> GiST opclass can return data" into the planner.
>
> Maybe, instead of a simple constant amcanreturn column, we need an AM
> API function that says whether the index can return data.
>
I like idea of such AM API function. Since single multicolumn index can use
multiple opclasses, AM API function should also say *what* data index can
return.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Greg Sabino Mullane

On Tue, Oct 11, 2011 at 04:32:45PM -0400, Bruce Momjian wrote:
...
> Simon seems to value backward-compatibility more than the average
> hackers poster.  The lack of complaints about 9.1 I think means that the
> hackers decision of _not_ providing a swich was the right one.

I wouldn't go that far: 9.1 is very new. Certainly the release notes do 
not explain the change enough: part of the reason I wrote:

http://blog.endpoint.com/2011/09/postgresql-allows-for-different.html

Simon has a point, but I think that having applications switch from 
serializable to repeatable read is a pain point people should 
pay when going to 9.1, rather than adding some switch now.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

pgp7jN6DSMohw.pgp
Description: PGP signature

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Tom Lane

"Kevin Grittner"  writes:
> Dimitri Fontaine  wrote:
>> Adding the information visible at the right places is a fun
>> project in itself, too :)
 
> I was thinking a couple new columns in pg_settings (and what backs
> it) would be the main thing, but I haven't searched the source code
> yet.  Does something else leap to mind for you?

This isn't exactly a trivial matter.  What happens for instance if you
try to change the limit, and there are already active values outside the
limit in some processes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Simon Riggs  wrote:
> Greg Sabino Mullane  wrote:

>> Eh? It has an off switch: repeatable read.
> 
> You mean: if we recode the application and retest it, we can get
> it to work same way as it used to.
> 
> To most people that is the same thing as "it doesn't work with
> this release", ask any application vendor.
> 
> There is no off switch and there should be.

This was discussed at some length, and nobody seemed to favor a
behavior-changing GUC.  One example of such a thread is here:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg01165.php

It came up at least a couple other times, and the outcome was always
the same -- after discussion, nobody was in favor of a GUC to make
the semantics of these statement variable.  I'm sorry if you missed
those discussions.  It would certainly be a trivial change to
implement; the problem is convincing others that it's a good idea.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 9:28 PM, Bruce Momjian  wrote:

> Standard conforming strings
> was tricky because it was more user-facing, or certainly SQL-facing.

Why is SQL more important than backup?

There is no good reason to do this so quickly.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Bruce Momjian

Josh Berkus wrote:
> 
> > Acording the docs, I assume -r is only roles, while -g includes
> > tablespace, so what you want is already available in pg_dumpall.
> 
> No, it's not.  You don't seem to be actually reading any of my proposals.
> 
> (1) I cannot produce a single file in custom dump format which includes
> both a single database and all of the roles I need to build that database.
> 
> (2) I cannot dump a set of roles without md5 passwords.
> 
> Both of these are things I need to support dev/stage/testing integration
> at multiple sites.

We are not writing this software for you.  Please submit a clear
proposal.  I am sure you have 10k customers who want this.  :-|

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

2011-10-11 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Oct 11, 2011 at 5:22 PM, Tom Lane  wrote:
>> I haven't thought as far ahead as how we might get the information
>> needed for a per-opclass flag.  A syntax addition to CREATE OPERATOR
>> CLASS might be the only way.
>> 
> Shouldn't it be implemented through additional interface function? There are
> situations when restoring of original tuple requires some transformation.
> For example, in point_ops we store box in the leaf index tuple, while point
> can be easily restored from box.

Hm.  I had been supposing that lossless compress functions would just be
no-ops.  If that's not necessarily the case then we might need something
different from the opclass's decompress function to get back the
original data.  However, that doesn't really solve the problem I'm
concerned about, because the existence and use of such a function would
be entirely internal to GiST.  There still needs to be a way for the
planner to know which opclasses support data retrieval.  And I do *not*
want to see us hard-wire "the presence of opclass function 8 means a
GiST opclass can return data" into the planner.

Maybe, instead of a simple constant amcanreturn column, we need an AM
API function that says whether the index can return data.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, Oct 11, 2011 at 6:44 PM, Kevin Grittner
>  wrote:
> 
> >> If you alter the default_transaction_isolation then you will break
> >> applications like this, so it is not a valid way to turn off SSI.
> >
> > I don't follow you here. ?What would break? ?In what fashion? ?Since
> > the standard allows any isolation level to provide more strict
> > transaction isolation than required, it would be conforming to
> > *only* support serializable transactions, regardless of the level
> > requested. ?Not a good idea for some workloads from a performance
> > perspective, but it would be conforming, and any application which
> > doesn't work correctly with that is not written to the standard.
> 
> If the normal default_transaction_isolation = read committed and all
> transactions that require serializable are explicitly marked in the
> application then there is no way to turn off SSI without altering the
> application. That is not acceptable, since it causes changes in
> application behaviour and possibly also performance issues.
> 
> We should provide a mechanism to allow people to upgrade to 9.1+
> without needing to change the meaning and/or performance of their
> apps.
> 
> I strongly support the development of SSI, but I don't support
> application breakage. We can have SSI without breaking anything for
> people that can't or don't want to use it.

The problem is that all the switches needed to allow for "no application
breakage" makes configuration of the server and source code more
complicated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Bruce Momjian

Greg Sabino Mullane wrote:
-- Start of PGP signed section.
> > If the normal default_transaction_isolation = read committed and all
> > transactions that require serializable are explicitly marked in the
> > application then there is no way to turn off SSI without altering the
> > application. That is not acceptable, since it causes changes in
> > application behaviour and possibly also performance issues.
> 
> Performance, perhaps. What application behavior changes? Less 
> serialization conflicts?
> 
> > We should provide a mechanism to allow people to upgrade to 9.1+
> > without needing to change the meaning and/or performance of their
> > apps.
> 
> That ship has sailed.

Simon seems to value backward-compatibility more than the average
hackers poster.  The lack of complaints about 9.1 I think means that the
hackers decision of _not_ providing a swich was the right one.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Josh Berkus


> The trouble is that if we VACUUM and then ANALYZE, we'll often get
> back a value very close to 100%, but then the real value may diminish
> quite a bit before the next auto-analyze fires.  I think if we can
> figure out what to do about that problem we'll be well on our way...

It's not so much an issue of when the last auto-analyze was as an issue
of the number of rows in write transactions against that table in the
last X minutes.  This is where it really hurts us that
pg_stat_user_tables is not time-based.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 9:21 PM, Greg Sabino Mullane  wrote:

> Simon Riggs:
>> Most apps use mixed mode serializable/repeatable read and therefore
>> can't be changed by simple parameter. Rewriting the application isn't
>> a sensible solution.
>>
>> I think it's clear that SSI should have had and still needs an "off
>> switch" for cases that cause performance problems.
>
> Eh? It has an off switch: repeatable read.

You mean: if we recode the application and retest it, we can get it to
work same way as it used to.

To most people that is the same thing as "it doesn't work with this
release", ask any application vendor.

There is no off switch and there should be.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Andrew Dunstan




On 10/11/2011 04:19 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 10/11/2011 12:40 PM, Bruce Momjian wrote:

Josh Berkus wrote:

pg_dumpall -r --no-passwords which would dump the roles but without
CREATE PASSWORD statements.  This would be useful for cloning databases
for use in Dev, Test and Staging, where you don't what to copy the md5s
of passwords for possible cracking.

What would this do that pg_dumpall --globals-only doesn't?

As stated, it would not export the passwords.

I can see some possible use in a --no-passwords option that's orthogonal
to all else.  The rest of this seems rather confused...




Yes, I also think that will meet the case.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Bruce Momjian

Simon Riggs wrote:
> On Tue, Oct 11, 2011 at 3:29 PM, Bruce Momjian  wrote:
> 
> > As much as I appreciate Simon's work in this area, I think we are still
> > unclear if keeping backward-compatibility is worth the complexity
> > required for future users. ?Historically we have been bold in changing
> > postgresql.conf settings to improve clarity, and that approach has
> > served us well.
> 
> You raise a good point. First, thank you for the respectful comment;
> my viewpoint is not formed from resistance to change per se, even if
> may appear to be so.  Thank you for raising that possibility to allow
> me to explain and refute that.
> 
> I am genuinely concerned that we show respect to downstream software
> that uses our APIs and have no personal or corporate ulterior motive.
> 
> Most people are used to the 3 year cycle of development on which
> SQLServer and Oracle have now standardised. Our 1 year cycle provides
> a considerable benefit in agility, but it also provides for x3
> complexity in release management and a continual temptation to change
> for no good reason. I want to encourage people to adopt our APIs, not
> give them a headache for attempting to do so. We know that software
> exists that follows the previous API and we should take steps to
> deprecate that across multiple releases, with appropriate notice, just
> as we do in other cases, such as standard conforming strings where our
> lack of boldness is appropriate.

Well, let me be specific.  Around 2003 to 2006, we added many new
configuration parameters for logging, which required renaming or
removing older parameters.  There really wasn't a smooth way to allow
for this to be done without impacting users, and the current system we
have enjoyed since 2006 is logical only because we made the changes
necessary.

We can look at trying to phase changes in, but often the phasing becomes
more complicated that just doing the change.  Logging parameter changes
were easier because it was assumed logging was an admin-only task, as I
assume pitr and replication are as well.  Standard conforming strings
was tricky because it was more user-facing, or certainly SQL-facing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 3:29 PM, Bruce Momjian  wrote:

> As much as I appreciate Simon's work in this area, I think we are still
> unclear if keeping backward-compatibility is worth the complexity
> required for future users.  Historically we have been bold in changing
> postgresql.conf settings to improve clarity, and that approach has
> served us well.

You raise a good point. First, thank you for the respectful comment;
my viewpoint is not formed from resistance to change per se, even if
may appear to be so.  Thank you for raising that possibility to allow
me to explain and refute that.

I am genuinely concerned that we show respect to downstream software
that uses our APIs and have no personal or corporate ulterior motive.

Most people are used to the 3 year cycle of development on which
SQLServer and Oracle have now standardised. Our 1 year cycle provides
a considerable benefit in agility, but it also provides for x3
complexity in release management and a continual temptation to change
for no good reason. I want to encourage people to adopt our APIs, not
give them a headache for attempting to do so. We know that software
exists that follows the previous API and we should take steps to
deprecate that across multiple releases, with appropriate notice, just
as we do in other cases, such as standard conforming strings where our
lack of boldness is appropriate.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 5:45 AM, Greg Stark  wrote:
> On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane  wrote:
>> My intention was to allow it to consider any covering index.  You're
>> thinking about the cost estimate, which is really entirely different.
>>
>
> Is there any reason to consider more than one? I would have expected
> the narrowest one to be the best choice. There's something to be said
> for using the same index consistently but we already have that problem
> and make no attempt to do that. And partial indexes might be better
> but then we would already be considering them if their constraints are
> satisfied.

You raise a fantastic idea. Use the frequency of use as a factor of an
index in the cost of optimising a query.

We have previously discussed the idea of using the RAM residency of an
index to control the cost. That is difficult to judge.

Using the long term prevalence of usage as a weighting factor makes a
great deal of sense for queries that could potentially utilise
multiple indexes. That information is readily available and directly
applicable. The prevalence of use directly drives RAM residency, so it
makes sense to use the causal factor as input to the cost.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Greg Sabino Mullane

> If the normal default_transaction_isolation = read committed and all
> transactions that require serializable are explicitly marked in the
> application then there is no way to turn off SSI without altering the
> application. That is not acceptable, since it causes changes in
> application behaviour and possibly also performance issues.

Performance, perhaps. What application behavior changes? Less 
serialization conflicts?

> We should provide a mechanism to allow people to upgrade to 9.1+
> without needing to change the meaning and/or performance of their
> apps.

That ship has sailed.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpsMUli41Pnm.pgp
Description: PGP signature

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Kääriäinen Anssi

Robert Haas wrote:
"""
And it seems to me that there could easily be format changes that
would make sense for particular cases, but not across the board,
like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)
"""

This is probably a silly idea, but I have been wondering about the
following idea: Instead of having visibility info in the row header,
have a couple of row visibility slots in the page header. These slots
could be shared between rows in the page, so that if you do a bulk
insert/update/delete you would only use one slot. If the slots
overflow, you would use external slots buffer.

When the row is all visible, no slot would be used at all.

The xmin, xmax and cid would be in the slots. ctid would have its
current meaning, except when the external slots would be used,
then ctid would point to the external slot, and it would have the real
row header. I don't know if there would be any other row header
parts which could be shared.

The external slots buffer would then contain xmin, xmax, cid and
the real ctid.

Updates would write the new rows to another page in the heap,
and old rows would stay in place, just as now. So there would not
be any redo log like configuration. Also, the external slots buffer
would be small (18 bytes per row), so it would not get out of
cache too easily.

The performance would suck if you had lots of small updates, or
long running transactions. On the other hand in data warehousing,
where bulk loads are normal, and there are a lot of small rows,
this could actually work.

As said, this is probably a silly idea. But as pluggable heap types
came up, I thought to ask if this could actually work. If this kind of
wondering posts are inappropriate for this list, please tell me so
that I can avoid these in the future.

 - Anssi Kääriäinen

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Greg Sabino Mullane

Robert Haas:
> Serializable mode is much slower on this test, though.  On
> REL9_1_STABLE, it's about 8% slower with a single client.  At 8
> clients, the difference rises to 43%, and at 32 clients, it's 51%
> slower.

Bummer. Thanks for putting some numbers out there; glad I was able 
to jump start a deeper look at this. Based on this thread so far, 
I am probably going to avoid serializable in this particular case, 
and stick to repeatable read. Once things are in place, perhaps I'll 
be able to try switching to serializable and get some measurements, 
but I wanted to see if the impact was minor enough to safely start 
with serializable. Seems not. :) Keep in mind this is not even a 
formal proposal yet for our client, so any benchmarks from me may 
be quite a while.

Kevin Grittner:

> Did these transactions write anything?  If not, were they declared
> to be READ ONLY?  If they were, in fact, only reading, it would be
> interesting to see what the performance looks like if the
> recommendation to use the READ ONLY attribute is followed.

Yes, I'll definitely look into that, but the great majority of the 
things done in this case are read/write.

Simon Riggs:
> Most apps use mixed mode serializable/repeatable read and therefore
> can't be changed by simple parameter. Rewriting the application isn't
> a sensible solution.
> 
> I think it's clear that SSI should have had and still needs an "off
> switch" for cases that cause performance problems.

Eh? It has an off switch: repeatable read.


Thanks for all replying to this thread, it's been very helpful.


-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpkFVkl3Xl3T.pgp
Description: PGP signature

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 6:44 PM, Kevin Grittner
 wrote:

>> If you alter the default_transaction_isolation then you will break
>> applications like this, so it is not a valid way to turn off SSI.
>
> I don't follow you here.  What would break?  In what fashion?  Since
> the standard allows any isolation level to provide more strict
> transaction isolation than required, it would be conforming to
> *only* support serializable transactions, regardless of the level
> requested.  Not a good idea for some workloads from a performance
> perspective, but it would be conforming, and any application which
> doesn't work correctly with that is not written to the standard.

If the normal default_transaction_isolation = read committed and all
transactions that require serializable are explicitly marked in the
application then there is no way to turn off SSI without altering the
application. That is not acceptable, since it causes changes in
application behaviour and possibly also performance issues.

We should provide a mechanism to allow people to upgrade to 9.1+
without needing to change the meaning and/or performance of their
apps.

I strongly support the development of SSI, but I don't support
application breakage. We can have SSI without breaking anything for
people that can't or don't want to use it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Tom Lane

Andrew Dunstan  writes:
> On 10/11/2011 12:40 PM, Bruce Momjian wrote:
>> Josh Berkus wrote:
>>> pg_dumpall -r --no-passwords which would dump the roles but without
>>> CREATE PASSWORD statements.  This would be useful for cloning databases
>>> for use in Dev, Test and Staging, where you don't what to copy the md5s
>>> of passwords for possible cracking.

>> What would this do that pg_dumpall --globals-only doesn't?

> As stated, it would not export the passwords.

I can see some possible use in a --no-passwords option that's orthogonal
to all else.  The rest of this seems rather confused...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Dimitri Fontaine

"Kevin Grittner"  writes:
> Well, we've identified a few people who like the idea, but I'm not
> sure we have the degree of consensus we normally look for before
> putting something on the TODO list.  After the discussion on this
> thread, are there still any *objections* to allowing bounds or
> subsets to be SUSET to limit GUC values more strictly than the
> limits hard-coded in C?

No objection here, I like this whole idea.  Adding the information
visible at the right places is a fun project in itself, too :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Florian Pflug

On Oct11, 2011, at 21:27 , Robert Haas wrote:
> Alternatively, we could try to graft the concept of a self-clustering
> table on top of the existing heap implementation.  But I'm having
> trouble seeing how that would work.  The TODO describes it as
> something like "maintain CLUSTER ordering", but that's a gross
> oversimplification, because we have no structure that would allow us
> to sensibly do any such thing...  the current heap implementation is
> just that: a pile of stuff.

We could still be smarter about where we insert new rows in a clustered
table, though.

Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
tuple will eventually go in the index we're supposed to maintain CLUSTER
for. Then we'd check if any of the pages referenced there contains enough
space, and if so place the new tuple there. If not it'd go at the end.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-tree parent pointer and checkpoints

2011-10-11 Thread Bruce Momjian

Heikki Linnakangas wrote:
> On 11.03.2011 19:41, Tom Lane wrote:
> > Heikki Linnakangas  writes:
> >> On 11.03.2011 17:59, Tom Lane wrote:
> >>> But that will be fixed during WAL replay.
> >
> >> Not under the circumstances that started the original thread:
> >
> >> 1. Backend splits a page
> >> 2. Checkpoint starts
> >> 3. Checkpoint runs to completion
> >> 4. Crash
> >> (5. Backend never got to insert the parent pointer)
> >
> >> WAL replay starts at the checkpoint redo pointer, which is after the
> >> page split record, so WAL replay won't insert the parent pointer. That's
> >> an incredibly tight window to hit in practice, but it's possible in theory.
> >
> > Hmm.  It's not so improbable that checkpoint would start inside that
> > window, but that the parent insertion is still pending by the time the
> > checkpoint finishes is pretty improbable.
> >
> > How about just reducing the deletion-time ERROR for missing downlink to a 
> > LOG?
> 
> Well, the code that follows expects to have a valid parent page locked, 
> so you can't literally do just that. But yeah, LOG and aborting the page 
> deletion seems fine to me.

Added to TODO:

Fix problem with btree page splits during checkpoints

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00052.php
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00184.php 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Tom Lane

Florian Pflug  writes:
> On Oct11, 2011, at 14:43 , David Fetter wrote:
>> I'd recoil at not having ranges default to left-closed, right-open.
>> The use case for that one is so compelling that I'm OK with making it
>> the default from which deviations need to be specified.

I agree with David on this.

> The downside of that is that, as Tom pointed out upthread, we cannot
> make [) the canonical representation of ranges.

Yeah, we certainly *can* do that, we just have to allow ranges that
include the last element of the domain to be corner cases that require
special handling.  If we don't want to just fail, we have to
canonicalize them to closed instead of open ranges.  It does not follow
that the default on input has to be closed.

Note that the INT_MAX case is probably not the worst issue in practice.
What is going to be an issue is ranges over enum types, where having the
last element being part of the range is a much more likely use-case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Bruce Momjian

Josh Berkus wrote:
> 
> > What is the logic for not dumping passwords but the CREATE ROLE
> > statement?  I don't see how anyone would recognize that behavior as
> > logical.  If you want to add a --no-passwords option to pg_dumpall, that
> > seems more logical to me.
> 
> That's what I'm suggesting.
> 
> Incidentally, what's the difference between -g and -r in terms of actual
> output, anyway?

Acording the docs, I assume -r is only roles, while -g includes
tablespace, so what you want is already available in pg_dumpall.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Intermittent regression test failure from index-only scans patch

2011-10-11 Thread Tom Lane

Magnus Hagander  writes:
> On Sun, Oct 9, 2011 at 06:34, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Should we have another counter for heap fetches avoided?  Seems like that 
>>> could be useful to know.

>> Hm.  I'm hesitant to add another per-table (or per index?) statistics
>> counter because of the resultant bloat in the stats file.

> We certainly need *some* way to figure out if this has been used,
> IMHO. So yeah, if the current way doesn't scale enough, we need to
> think of some other way. But I'm not sure one more counter would
> really bloat it that bad? OTOH, repeating that reasoning enough time
> will eventually make it enough to care about...

You can already tell whether it's happening by comparing idx_tup_read
versus idx_tup_fetch.  Now that measure does conflate some things, like
whether the tuple was not read at all or was read and rejected as not
visible, but I'm not at all convinced that another counter is worth its
weight.  If invisible tuples are a significant part of the table then
index-only scanning isn't going to be very useful to you anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Josh Berkus


> What is the logic for not dumping passwords but the CREATE ROLE
> statement?  I don't see how anyone would recognize that behavior as
> logical.  If you want to add a --no-passwords option to pg_dumpall, that
> seems more logical to me.

That's what I'm suggesting.

Incidentally, what's the difference between -g and -r in terms of actual
output, anyway?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Intermittent regression test failure from index-only scans patch

2011-10-11 Thread Magnus Hagander

On Sun, Oct 9, 2011 at 06:34, Tom Lane  wrote:
> Robert Haas  writes:
>> On Oct 8, 2011, at 11:04 AM, Tom Lane  wrote:
>>> I'm inclined to fix this by changing the test to examine idx_tup_read
>>> not idx_tup_fetch.  Alternatively, we could have the test force
>>> enable_indexonlyscan off.  Thoughts?
>
>> No preference.
>
> I ended up doing it the second way (ie enable_indexonlyscan = off)
> because it turns out that pg_stat_user_tables doesn't have the
> idx_tup_read column --- we track that count per index, not per table.
> I could have complicated the test's stats queries some more, but it
> seemed quite not relevant to the goals of the test.
>
>> Should we have another counter for heap fetches avoided?  Seems like that 
>> could be useful to know.
>
> Hm.  I'm hesitant to add another per-table (or per index?) statistics
> counter because of the resultant bloat in the stats file.  But it
> wouldn't be a bad idea for somebody to take two steps back and rethink
> what we're counting in this area.  The current counter definitions are
> mostly backwards-compatible with pre-8.1 behavior, and it seems like the
> goalposts have moved enough that maybe it's time to break compatibility.

We certainly need *some* way to figure out if this has been used,
IMHO. So yeah, if the current way doesn't scale enough, we need to
think of some other way. But I'm not sure one more counter would
really bloat it that bad? OTOH, repeating that reasoning enough time
will eventually make it enough to care about...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 3:02 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>>> [implement "clustered index" as a covering index with all columns
>>> which are present in the heap]
>> I guess we could do that, but I'm not convinced there would be
>> much benefit.
>
> The "traditional" way to implement a clustered index is to have the
> leaf level of the index contain the tuples rather than pointers to
> the tuples.  If we're going to do clustered tables, we might want to
> jump all the way to that, rather than a half-way solution which
> stores everything twice.

Not a bad thought.

Actually, I've been thinking for a while now that we might need a
pluggable heapam, similar to the pluggable indexams we already have.
Our current heap format has served us pretty well, but there are any
number of things that we can't really do without changing it.  Of
course, if we came up with a new format that was better in every case,
across the board, then perhaps we'd be willing to just replace the
current format outright -- though even then, that would break
pg_upgrade, which would be painful, to put it mildly.  And it seems to
me that there could easily be format changes that would make sense for
particular cases, but not across the board, like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)

Alternatively, we could try to graft the concept of a self-clustering
table on top of the existing heap implementation.  But I'm having
trouble seeing how that would work.  The TODO describes it as
something like "maintain CLUSTER ordering", but that's a gross
oversimplification, because we have no structure that would allow us
to sensibly do any such thing...  the current heap implementation is
just that: a pile of stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Josh Berkus

On 10/10/11 9:53 PM, Fujii Masao wrote:
> Or you think that, to keep the backward compatibility completely,
> recovery.conf should be used as not only a configuration file but also a
> recovery trigger one and it should be renamed to recovery.done at
> the end of recovery?

That's precisely my point.  The trigger file nature of recovery.conf is
a problem in itself, and I don't see any way to support that and fix it
at the same time.  Maybe Simon can?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Kevin Grittner

Robert Haas  wrote:

>> [implement "clustered index" as a covering index with all columns
>> which are present in the heap]

> I guess we could do that, but I'm not convinced there would be
> much benefit. 

The "traditional" way to implement a clustered index is to have the
leaf level of the index contain the tuples rather than pointers to
the tuples.  If we're going to do clustered tables, we might want to
jump all the way to that, rather than a half-way solution which
stores everything twice.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Joe Conway

On 10/11/2011 11:53 AM, Kevin Grittner wrote:
> Bruce Momjian  wrote:
>> Is this a TODO?  We might not want to make work_mem SUSET, but it 
>> would allow administrators to control this.
> 
> Well, we've identified a few people who like the idea, but I'm not 
> sure we have the degree of consensus we normally look for before 
> putting something on the TODO list.

That's pretty much what I was thinking.

> After the discussion on this thread, are there still any *objections*
> to allowing bounds or subsets to be SUSET to limit GUC values more
> strictly than the limits hard-coded in C?

No objections here.

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan paving the way for "auto" clustered tables?

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 7:08 AM, Royce Ausburn  wrote:
> I wonder, could the recent work on index only scans pave the way for auto 
> clustered tables?  Consider a wide, mostly insert table with some subset of 
> columns that I'd like to cluster on.  I'm after locality of tuples that are 
> very frequently fetched together, but not keen on the downtime for a cluster, 
> nor the maintenance that it requires.  Would it be a stretch to have an index 
> that branches on the subset of "cluster" columns, but still stores all the 
> columns, making it a covering index?  Given that we can already index 
> concurrently, such an index would not require downtime, and would be self 
> maintaining.  From my understanding of the index-only scan implementation, I 
> suspect that such an index would effectively give locality, with some caveats…

I guess we could do that, but I'm not convinced there would be much
benefit.  The only thing you'd be saving would be the cost of keeping
the tuples sorted by only the high-order columns rather than all of
them, and I doubt that's significant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Kevin Grittner

Bruce Momjian  wrote:
> Kevin Grittner wrote:
>> Joe Conway  wrote:
>>> On 10/10/2011 01:52 PM, Gurjeet Singh wrote:
>>  
 ALTER USER novice SET MIN_VAL OF statement_timeout TO '1';
 -- So that the user cannot turn off the timeout
 
 ALTER DATABASE super_reliable SET ENUM_VALS OF
   synchronous_commit TO 'on';
 -- So that the user cannot change the synchronicity of
 transactions against this database.
>>> 
>>> I like this better than GRANT/REVOKE on SET.
>>  
>> +1
>>  
>> I would really like a way to prevent normal users from switching
>> from the default transaction isolation level I set.  This seems
>> like a good way to do that.  Putting sane bounds on some other
>> settings, more to protect against the accidental bad settings
>> than malicious mischief, would be a good thing, too.
> 
> Is this a TODO?  We might not want to make work_mem SUSET, but it
> would allow administrators to control this.
 
Well, we've identified a few people who like the idea, but I'm not
sure we have the degree of consensus we normally look for before
putting something on the TODO list.  After the discussion on this
thread, are there still any *objections* to allowing bounds or
subsets to be SUSET to limit GUC values more strictly than the
limits hard-coded in C?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: casts row to array and array to row

2011-10-11 Thread Merlin Moncure

On Tue, Oct 11, 2011 at 8:36 AM, Tom Lane  wrote:
> Pavel Stehule  writes:
>> 2011/10/11 Robert Haas :
>>> On Tue, Oct 11, 2011 at 4:40 AM, Pavel Stehule  
>>> wrote:
 What do you think about this idea?
>
> It's a bad one.
>
>>> Well, a ROW can contain values of different types; an ARRAY can't.
>
>> yes, I know - but it should be problem only in few cases - when is not
>> possible to cast a row field to array field.
>
> This idea is basically the same as "data types don't matter", which is
> not SQL-ish and certainly not Postgres-ish.

hm. I agree, but if it were possible to create sql/plpgsql functions
accepting 'record', then you could at least rig the cast in userland
around hstore without resorting to hacky text manipulation and/or
flattening the record to text before doing the operation.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter

On Tue, Oct 11, 2011 at 12:18:18PM -0400, Robert Haas wrote:
> On Tue, Oct 11, 2011 at 12:12 PM, David Fetter  wrote:
> > Nothing's bad about '[]' per se.  What's better, but possibly out
> > of the reach of our current lexing and parsing system, would be
> > things like:
> >
> > [1::int, 10)
> 
> That's been discussed before.  Aside from the parser issues (which
> are formidable) it would break brace-matching in most if not all
> commonly used editors.

That being the situation, ubiquitous support for the natural syntax
looks like it's a decade away, minimum. :(

Trying to be cheery,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Tom Lane

Jeff Davis  writes:
> Tom made an observation about '[1,INT_MAX]' thowing an error because
> canonicalization would try to increment INT_MAX. But I'm not
> particularly disturbed by it. If you want a bigger range, use int8range
> or numrange -- the same advice we give to people who want unsigned
> types. Or, for people who really need the entire range of signed int4
> exactly, they can easily make their own range type that canonicalizes to
> '[]'.

I agree we shouldn't contort the entire design to avoid that corner
case.  We should, however, make sure that the increment *does* throw
an error, and not just silently overflow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis

On Tue, 2011-10-11 at 12:40 -0400, Robert Haas wrote:
> I think using '[)' is fine.  At some level, this is just a question of
> expectations.  If you expect that int4range(1,4) will create a range
> that includes 4, well, you're wrong.  Once you get used to it, it will
> seem normal, and you'll know that you need to write
> int4range(1,4,'[]') if that's what you want.  As long as the system is
> designed around a set of consistent and well-thought-out principles,
> people will get used to the details.  I don't see that the idea of a
> half-open range over a discrete-valued type is particularly confusing
> - we use them all the time in programming, when we make the end
> pointer point to the byte following the end of the array, rather than
> the last element - but even if it is, overall design consistency
> trumps what someone may find to be the absolutely perfect behavior in
> some particular case.  And saving typing is nearly always good -
> unless it creates a LOT more confusion than I think this will.

That sounds very reasonable to me.

Tom made an observation about '[1,INT_MAX]' thowing an error because
canonicalization would try to increment INT_MAX. But I'm not
particularly disturbed by it. If you want a bigger range, use int8range
or numrange -- the same advice we give to people who want unsigned
types. Or, for people who really need the entire range of signed int4
exactly, they can easily make their own range type that canonicalizes to
'[]'.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Simon Riggs  wrote:

> It's common to find applications that have some transactions
> explicitly coded to use SERIALIZABLE mode, while the rest are in
> the default mode READ COMMITTED. So common that TPC-E benchmark
> has been written as a representation of such workloads.

I would be willing to be that any such implementations assume S2PL,
and would not prevent anomalies as expected unless all transactions
are serializable.

> The reason this is common is that some transactions require
> SERIALIZABLE as a "fix" for transaction problems.

That is a mode of thinking which doesn't work if you only assume
serializable provides the guarantees required by the standard.  Many
people assume otherwise.  It does *not* guarantee blocking on
conflicts, and it does not require that transactions appear to have
executed in the order of successful commit.  It requires only that
the result of concurrently running any mix of serializable
transactions produce a result consistent with some one-at-a-time
execution of those transactions.  Rollback of transactions to
prevent violations of that guarantee are allowed.  I don't see any
guarantees about how serializable transactions interact with
non-serializable transactions beyond each transaction not seeing any
of the phenomena prohibited for its isolation level.

> If you alter the default_transaction_isolation then you will break
> applications like this, so it is not a valid way to turn off SSI.

I don't follow you here.  What would break?  In what fashion?  Since
the standard allows any isolation level to provide more strict
transaction isolation than required, it would be conforming to
*only* support serializable transactions, regardless of the level
requested.  Not a good idea for some workloads from a performance
perspective, but it would be conforming, and any application which
doesn't work correctly with that is not written to the standard.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Robert Haas  wrote:
> Kevin Grittner  wrote:

>> Did these transactions write anything?  If not, were they
>> declared to be READ ONLY?  If they were, in fact, only reading,
>> it would be interesting to see what the performance looks like if
>> the recommendation to use the READ ONLY attribute is followed.
> 
> pgbench -S doesn't do any writes, or issue any transaction control
> statements.  It just fires off SELECT statements against a single
> table as fast as it can, retrieving values from rows chosen at
> random.  Each SELECT implicitly begins and ends a transaction.

So that test could be accomplished by setting
default_transaction_read_only to on.  That's actually what we're
doing, because we have a lot more of them than of read-write
transactions.  But, with the scripts I can confirm the performance
of that on this end.  It should be indistinguishable from the
repeatable read line; if not, there's something to look at there.

> Possibly the system could gaze upon the SELECT statement and infer
> that the one-statement transaction induced thereby can't possibly
> write any tuples, and mark it read-only automatically, but I'm
> actually not that excited about that approach

I wasn't intending to suggest that.  In fact I hadn't really thought
of it.  It might be a fun optimization, although it would be well
down my list, and it wouldn't be trivial because you couldn't use if
for any statements with volatile functions -- so the statement would
need to be planned far enough to know whether that was the case
before making this decision.  In fact, I'm not sure the community
would want to generate an error if a user marked a function other
than volatile and ran it in this way.  Definitely not something to
even look at any time soon.

> trying to fix the lwlock contention that's causing the headache in
> the first place seems like a better use of time, assuming it's
> possible to make some headway there.

Absolutely.  I just thought the timings with READ ONLY would make
for an interesting data point.  For one thing, it might reassure
people that even this artificial use cases doesn't perform that
badly if the advice in the documentation is heeded.  For another, a
result slower than repeatable read would be a surprise that might
point more directly to the problem.

> My general observation is that, on this machine, a lightweight
> lock that is taken in exclusive mode by a series of lockers in
> quick succession seems to max out around 16-20 clients, and the
> curve starts to bend well before that.

OK, I will keep that in mind.

Thanks,

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Tue, Oct 11, 2011 at 6:14 PM, Robert Haas  wrote:
> On Tue, Oct 11, 2011 at 1:11 PM, Simon Riggs  wrote:
>> On Mon, Oct 10, 2011 at 11:31 PM, Kevin Grittner
>>  wrote:
>>> Simon Riggs  wrote:
>>>
 How do we turn it on/off to allow the overhead to be measured?
>>>
>>> User REPEATABLE READ transactions or SERIALIZABLE transactions.  The
>>> easiest way, if you're doing it for all transactions (which I
>>> recommend) is to set default_transaction_isolation.
>>
>> Most apps use mixed mode serializable/repeatable read and therefore
>> can't be changed by simple parameter. Rewriting the application isn't
>> a sensible solution.
>>
>> I think it's clear that SSI should have had and still needs an "off
>> switch" for cases that cause performance problems.
>
> Is it possible that you are confusing the default level, which is READ
> COMMITTED, with REPEATABLE READ?  I can't see why anyone would code up
> their application to use REPEATABLE READ for some things and
> SERIALIZABLE for other things unless they were explicitly trying to
> turn SSI off for a subset of their transactions.  In all releases
> prior to 9.0, REPEATABLE READ and SERIALIZABLE behaved identically, so
> there wouldn't be any reason for a legacy app to mix-and-match between
> the two.

Yes, I mistyped "read" when I meant "committed". You are right to
point out there is no problem if people were using repeatable read and
serializable.

Let me retype, so there is no confusion:

It's common to find applications that have some transactions
explicitly coded to use SERIALIZABLE mode, while the rest are in the
default mode READ COMMITTED. So common that TPC-E benchmark has been
written as a representation of such workloads. The reason this is
common is that some transactions require SERIALIZABLE as a "fix" for
transaction problems.

If you alter the default_transaction_isolation then you will break
applications like this, so it is not a valid way to turn off SSI.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 12:43 PM, Bruce Momjian  wrote:
> Greg Stark wrote:
>> On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane  wrote:
>> > My intention was to allow it to consider any covering index. ?You're
>> > thinking about the cost estimate, which is really entirely different.
>> >
>>
>> Is there any reason to consider more than one? I would have expected
>> the narrowest one to be the best choice. There's something to be said
>> for using the same index consistently but we already have that problem
>> and make no attempt to do that. And partial indexes might be better
>> but then we would already be considering them if their constraints are
>> satisfied.
>
> Actually, I think the smallest non-partial one on disk might be the best
> --- that is very easy to find out.

I doubt there is any need to write special-purpose code to decide
which index ought to be used for a full table scan.  We can just throw
all of the otherwise-useless indexes into the costing machinery with
empty pathkeys, and let them duke it out.  All but the best one will
be instantly discarded, and the best one will either beat or lose to a
sequential scan.  All of this will happen before we start trying to
build join paths, so there's no combinatorial explosion in planning
time - it'll just be a straightforward cost comparison between plans
with identical pathkeys, and should be quite fast.

The real issue is that the costing estimates need to be accurate, and
that's where the rubber hits the road.  Otherwise, even if we pick the
right way to scan the table, we may do silly things up the line when
we go to start constructing the join order.  I think we need to beef
up ANALYZE to gather statistics on the fraction of the pages that are
marked all-visible, or maybe VACUUM should gather that information.
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires.  I think if we can
figure out what to do about that problem we'll be well on our way...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 1:11 PM, Simon Riggs  wrote:
> On Mon, Oct 10, 2011 at 11:31 PM, Kevin Grittner
>  wrote:
>> Simon Riggs  wrote:
>>
>>> How do we turn it on/off to allow the overhead to be measured?
>>
>> User REPEATABLE READ transactions or SERIALIZABLE transactions.  The
>> easiest way, if you're doing it for all transactions (which I
>> recommend) is to set default_transaction_isolation.
>
> Most apps use mixed mode serializable/repeatable read and therefore
> can't be changed by simple parameter. Rewriting the application isn't
> a sensible solution.
>
> I think it's clear that SSI should have had and still needs an "off
> switch" for cases that cause performance problems.

Is it possible that you are confusing the default level, which is READ
COMMITTED, with REPEATABLE READ?  I can't see why anyone would code up
their application to use REPEATABLE READ for some things and
SERIALIZABLE for other things unless they were explicitly trying to
turn SSI off for a subset of their transactions.  In all releases
prior to 9.0, REPEATABLE READ and SERIALIZABLE behaved identically, so
there wouldn't be any reason for a legacy app to mix-and-match between
the two.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 12:46 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> I ran my good old pgbench -S, scale factor 100, shared_buffers =
>> 8GB test on Nate Boley's box.  I ran it on both 9.1 and 9.2dev,
>> and at all three isolation levels.  As usual, I took the median of
>> three 5-minute runs, which I've generally found adequate to
>> eliminate the noise. On both 9.1 and 9.2dev, read committed and
>> repeatable read have basically identical performance; if anything,
>> repeatable read may be slightly better - which would make sense,
>> if it cuts down the number of snapshots taken.
>
> Right.  Thanks for running this.  Could you give enough details to
> allow reproducing on this end (or point to a previous post with the
> details)?

Sure, it's pretty much just a vanilla pgbench -S run, but the scripts
I used are attached here.  I build the head of each branch using the
"test-build" script and then used the "runtestiso" script to drive the
test runs.  These scripts are throwaway so they're not really
documented, but hopefully it's clear enough what it's doing.  The
server itself is a 32-core AMD 6128.

>> Data are attached, in text form and as a plot.  I excluded the
>> repeatable read results from the plot as they just clutter it up -
>> they're basically on top of the read committed results.
>
> That was kind, but really the REPEATABLE READ results are probably
> the more meaningful comparison, even if they are more embarrassing.
> :-(

They're neither more nor less embarrassing - they're pretty much not
different at all.  I just didn't see any point in making a graph with
6 lines on it when you could only actually see 4 of them.

> Did these transactions write anything?  If not, were they declared
> to be READ ONLY?  If they were, in fact, only reading, it would be
> interesting to see what the performance looks like if the
> recommendation to use the READ ONLY attribute is followed.

pgbench -S doesn't do any writes, or issue any transaction control
statements.  It just fires off SELECT statements against a single
table as fast as it can, retrieving values from rows chosen at random.
 Each SELECT implicitly begins and ends a transaction.  Possibly the
system could gaze upon the SELECT statement and infer that the
one-statement transaction induced thereby can't possibly write any
tuples, and mark it read-only automatically, but I'm actually not that
excited about that approach - trying to fix the lwlock contention
that's causing the headache in the first place seems like a better use
of time, assuming it's possible to make some headway there.

My general observation is that, on this machine, a lightweight lock
that is taken in exclusive mode by a series of lockers in quick
succession seems to max out around 16-20 clients, and the curve starts
to bend well before that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

test-build
Description: Binary data

runtestiso
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Simon Riggs

On Mon, Oct 10, 2011 at 11:31 PM, Kevin Grittner
 wrote:
> Simon Riggs  wrote:
>
>> How do we turn it on/off to allow the overhead to be measured?
>
> User REPEATABLE READ transactions or SERIALIZABLE transactions.  The
> easiest way, if you're doing it for all transactions (which I
> recommend) is to set default_transaction_isolation.

Most apps use mixed mode serializable/repeatable read and therefore
can't be changed by simple parameter. Rewriting the application isn't
a sensible solution.

I think it's clear that SSI should have had and still needs an "off
switch" for cases that cause performance problems.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Bruce Momjian

Andrew Dunstan wrote:
> 
> 
> On 10/11/2011 12:40 PM, Bruce Momjian wrote:
> > Josh Berkus wrote:
> >> It occurs to me that we could really use two things to make it easier to
> >> move copies of database stuff around:
> >>
> >> pg_dump -r, which would include a CREATE ROLE for all roles needed to
> >> restore the database (probably without passwords), and
> >>
> >> pg_dumpall -r --no-passwords which would dump the roles but without
> >> CREATE PASSWORD statements.  This would be useful for cloning databases
> >> for use in Dev, Test and Staging, where you don't what to copy the md5s
> >> of passwords for possible cracking.
> > What would this do that pg_dumpall --globals-only doesn't?
> >
> 
> As stated, it would not export the passwords.

What is the logic for not dumping passwords but the CREATE ROLE
statement?  I don't see how anyone would recognize that behavior as
logical.  If you want to add a --no-passwords option to pg_dumpall, that
seems more logical to me.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Kevin Grittner

Robert Haas  wrote: 

> I ran my good old pgbench -S, scale factor 100, shared_buffers =
> 8GB test on Nate Boley's box.  I ran it on both 9.1 and 9.2dev,
> and at all three isolation levels.  As usual, I took the median of
> three 5-minute runs, which I've generally found adequate to
> eliminate the noise. On both 9.1 and 9.2dev, read committed and
> repeatable read have basically identical performance; if anything,
> repeatable read may be slightly better - which would make sense,
> if it cuts down the number of snapshots taken.

Right.  Thanks for running this.  Could you give enough details to
allow reproducing on this end (or point to a previous post with the
details)?

> Serializable mode is much slower on this test, though.  On
> REL9_1_STABLE, it's about 8% slower with a single client.  At 8
> clients, the difference rises to 43%, and at 32 clients, it's 51%
> slower.  On 9.2devel, raw performance is somewhat higher (e.g.
> +51% at 8 clients) but the performance when not using SSI has
> improved so much that the performance gap between serializable and
> the other two isolation levels is now huge: with 32 clients, in
> serializable mode, the median result was 21114.577645 tps; in read
> committed, 218748.929692 tps - that is, read committed is running
> more than ten times faster than serializable.

Yeah.  I was very excited to see your numbers as you worked on that,
but I've been concerned that with the "Performance Whack A Mole"
nature of things (to borrow a term from Josh Berkus), SSI
lightweight locks might be popping their heads up.

> Data are attached, in text form and as a plot.  I excluded the
> repeatable read results from the plot as they just clutter it up -
> they're basically on top of the read committed results.

That was kind, but really the REPEATABLE READ results are probably
the more meaningful comparison, even if they are more embarrassing. 
:-(

> I haven't run this with LWLOCK_STATS, but my seat-of-the-pants
> guess is that there's a single lightweight lock that everything is
> bottlenecking on.

The lock in question is SerializableXactHashLock.  A secondary
problem is SerializableFinishedListLock, which is used for
protecting cleanup of old transactions.  This is per Dan's reports,
who had a better look at in on a 16 core machine, but is consistent
with what I saw on fewer cores.

Early in development we had a bigger problem with
SerializablePredicateLockListLock, but Dan added a local map to
eliminate contention during lock promotion decision, and I reworked
that lock from the SHARED read and EXCLUSIVE write approach to the
SHARED for accessing your own data and EXCLUSIVE for accessing data
for another process technique.  Combined, that made the problems
with that negligible.

> One possible difference between this test case and the ones you
> may have used is that this case involves lots and lots of really
> short transactions that don't do much.

I did some tests like that, but not on a box with that many
processors, and I probably didn't try using a thread count more than
double the core count, so I probably never ran into the level of
contention you're seeing.  The differences at the low connection
counts are surprising to me.  Maybe it will make more sense when I
see the test case.  There's also some chance that late elimination
of some race conditions found in testing affected this, and I didn't
re-run those tests late enough to see that.  Not sure.

> The effect of anything that only happens once or a few times per
> transaction is really magnified in this type of workload (which is
> why the locking changes make so much of a difference here - in a
> longer or heavier-weight transaction that stuff would be lost in
> the noise).

Did these transactions write anything?  If not, were they declared
to be READ ONLY?  If they were, in fact, only reading, it would be
interesting to see what the performance looks like if the
recommendation to use the READ ONLY attribute is followed.  That's
at the top of the list of performance tips for SSI at:

http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html#XACT-SERIALIZABLE

Anyway, this isolates a real issue, even if the tests exaggerate it
beyond what anyone is likely to see in production.  Once this CF is
over, I'll put a review of this at the top of my PG list.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Andrew Dunstan




On 10/11/2011 12:40 PM, Bruce Momjian wrote:

Josh Berkus wrote:

It occurs to me that we could really use two things to make it easier to
move copies of database stuff around:

pg_dump -r, which would include a CREATE ROLE for all roles needed to
restore the database (probably without passwords), and

pg_dumpall -r --no-passwords which would dump the roles but without
CREATE PASSWORD statements.  This would be useful for cloning databases
for use in Dev, Test and Staging, where you don't what to copy the md5s
of passwords for possible cracking.

What would this do that pg_dumpall --globals-only doesn't?



As stated, it would not export the passwords.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Bruce Momjian

Greg Stark wrote:
> On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane  wrote:
> > My intention was to allow it to consider any covering index. ?You're
> > thinking about the cost estimate, which is really entirely different.
> >
> 
> Is there any reason to consider more than one? I would have expected
> the narrowest one to be the best choice. There's something to be said
> for using the same index consistently but we already have that problem
> and make no attempt to do that. And partial indexes might be better
> but then we would already be considering them if their constraints are
> satisfied.

Actually, I think the smallest non-partial one on disk might be the best
--- that is very easy to find out.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dumping roles improvements?

2011-10-11 Thread Bruce Momjian

Josh Berkus wrote:
> It occurs to me that we could really use two things to make it easier to
> move copies of database stuff around:
> 
> pg_dump -r, which would include a CREATE ROLE for all roles needed to
> restore the database (probably without passwords), and
> 
> pg_dumpall -r --no-passwords which would dump the roles but without
> CREATE PASSWORD statements.  This would be useful for cloning databases
> for use in Dev, Test and Staging, where you don't what to copy the md5s
> of passwords for possible cracking.

What would this do that pg_dumpall --globals-only doesn't?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 12:30 PM, Jeff Davis  wrote:
> On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
>> The cure seems worse than the disease.  What is so bad about '[]'?
>
> OK, so we stick with the 3-argument form. Do we have a default for the
> third argument, or do we scrap it to avoid confusion?
>
> There were some fairly strong objections to using '[]' as the default or
> having the default vary between types. So, the only real option
> remaining, if we do have a default, is '[)'.

I think using '[)' is fine.  At some level, this is just a question of
expectations.  If you expect that int4range(1,4) will create a range
that includes 4, well, you're wrong.  Once you get used to it, it will
seem normal, and you'll know that you need to write
int4range(1,4,'[]') if that's what you want.  As long as the system is
designed around a set of consistent and well-thought-out principles,
people will get used to the details.  I don't see that the idea of a
half-open range over a discrete-valued type is particularly confusing
- we use them all the time in programming, when we make the end
pointer point to the byte following the end of the array, rather than
the last element - but even if it is, overall design consistency
trumps what someone may find to be the absolutely perfect behavior in
some particular case.  And saving typing is nearly always good -
unless it creates a LOT more confusion than I think this will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis

On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
> > Certainly not the end of the world, but is the convenience of being
> > able to write somerange(a, b) instead of somerange(a, b, '[)')
> > really worth it? I kind of doubt that...
> 
> You're making a persuasive argument for the latter based solely on the
> clarity.  If people see that 3rd element in the DDL, or need to
> provide it, it's *very* obvious what's going on.

That was how I originally thought, but we're also providing built-in
range types like tsrange and daterange. I could see how if the former
excluded the endpoint and the latter included it, it could be confusing.

We could go back to having different constructor names for different
inclusivity; e.g. int4range_cc(1,10). That at least removes the
awkwardness of typing (and seeing) '[]'.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis

On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
> The cure seems worse than the disease.  What is so bad about '[]'?

OK, so we stick with the 3-argument form. Do we have a default for the
third argument, or do we scrap it to avoid confusion?

There were some fairly strong objections to using '[]' as the default or
having the default vary between types. So, the only real option
remaining, if we do have a default, is '[)'.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 12:12 PM, David Fetter  wrote:
> Nothing's bad about '[]' per se.  What's better, but possibly out of
> the reach of our current lexing and parsing system, would be things
> like:
>
> [1::int, 10)

That's been discussed before.  Aside from the parser issues (which are
formidable) it would break brace-matching in most if not all commonly
used editors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter

On Tue, Oct 11, 2011 at 12:09:01PM -0400, Robert Haas wrote:
> On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis  wrote:
> > On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
> >> > Certainly not the end of the world, but is the convenience of being
> >> > able to write somerange(a, b) instead of somerange(a, b, '[)')
> >> > really worth it? I kind of doubt that...
> >>
> >> You're making a persuasive argument for the latter based solely on the
> >> clarity.  If people see that 3rd element in the DDL, or need to
> >> provide it, it's *very* obvious what's going on.
> >
> > That was how I originally thought, but we're also providing built-in
> > range types like tsrange and daterange. I could see how if the former
> > excluded the endpoint and the latter included it, it could be confusing.
> >
> > We could go back to having different constructor names for different
> > inclusivity; e.g. int4range_cc(1,10). That at least removes the
> > awkwardness of typing (and seeing) '[]'.
> 
> The cure seems worse than the disease.  What is so bad about '[]'?

Nothing's bad about '[]' per se.  What's better, but possibly out of
the reach of our current lexing and parsing system, would be things
like:

[1::int, 10)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas

On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis  wrote:
> On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
>> > Certainly not the end of the world, but is the convenience of being
>> > able to write somerange(a, b) instead of somerange(a, b, '[)')
>> > really worth it? I kind of doubt that...
>>
>> You're making a persuasive argument for the latter based solely on the
>> clarity.  If people see that 3rd element in the DDL, or need to
>> provide it, it's *very* obvious what's going on.
>
> That was how I originally thought, but we're also providing built-in
> range types like tsrange and daterange. I could see how if the former
> excluded the endpoint and the latter included it, it could be confusing.
>
> We could go back to having different constructor names for different
> inclusivity; e.g. int4range_cc(1,10). That at least removes the
> awkwardness of typing (and seeing) '[]'.

The cure seems worse than the disease.  What is so bad about '[]'?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhead cost of Serializable Snapshot Isolation

2011-10-11 Thread Robert Haas

On Mon, Oct 10, 2011 at 3:59 PM, Kevin Grittner
 wrote:
> I do have some concern about whether the performance improvements
> from reduced LW locking contention elsewhere in the code may (in
> whack-a-mole fashion) cause the percentages to go higher in SSI.
> The biggest performance issues in some of the SSI benchmarks were on
> LW lock contention, so those may become more noticeable as other
> contention is reduced.  I've been trying to follow along on the
> threads regarding Robert's work in that area, with hopes of applying
> some of the same techniques to SSI, but it's not clear whether I'll
> have time to work on that for the 9.2 release.  (It's actually
> looking improbably at this point.)

I ran my good old pgbench -S, scale factor 100, shared_buffers = 8GB
test on Nate Boley's box.  I ran it on both 9.1 and 9.2dev, and at all
three isolation levels.  As usual, I took the median of three 5-minute
runs, which I've generally found adequate to eliminate the noise.  On
both 9.1 and 9.2dev, read committed and repeatable read have basically
identical performance; if anything, repeatable read may be slightly
better - which would make sense, if it cuts down the number of
snapshots taken.

Serializable mode is much slower on this test, though.  On
REL9_1_STABLE, it's about 8% slower with a single client.  At 8
clients, the difference rises to 43%, and at 32 clients, it's 51%
slower.  On 9.2devel, raw performance is somewhat higher (e.g. +51% at
8 clients) but the performance when not using SSI has improved so much
that the performance gap between serializable and the other two
isolation levels is now huge: with 32 clients, in serializable mode,
the median result was 21114.577645 tps; in read committed,
218748.929692 tps - that is, read committed is running more than ten
times faster than serializable.  Data are attached, in text form and
as a plot.  I excluded the repeatable read results from the plot as
they just clutter it up - they're basically on top of the read
committed results.

I haven't run this with LWLOCK_STATS, but my seat-of-the-pants guess
is that there's a single lightweight lock that everything is
bottlenecking on.  One possible difference between this test case and
the ones you may have used is that this case involves lots and lots of
really short transactions that don't do much.  The effect of anything
that only happens once or a few times per transaction is really
magnified in this type of workload (which is why the locking changes
make so much of a difference here - in a longer or heavier-weight
transaction that stuff would be lost in the noise).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
# clients PG91.read-committed PG91.repeatable-read PG91.serializable 
PG92dev.read-committed PG92dev.repeatable-read PG92dev.serializable
1 4309.029470 4309.786279 3997.033815 4420.435075 4432.991835 4158.783631
8 26881.573137 27105.962897 15604.313328 33369.189146 33744.015354 23607.824697
16 44578.177731 44347.193447 21687.135630 62690.803360 63391.583338 23308.801395
32 38725.876808 38773.523347 17995.772382 218748.929692 218928.089856 
21114.577645
<>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-10-11 Thread Jun Ishiduka


> > I can't see a reason why we would use a new WAL record for this,
> > rather than modify the XLOG_PARAMETER_CHANGE record type which was
> > created for a very similar reason.
> > The code would be much simpler if we just extend
> > XLOG_PARAMETER_CHANGE, so please can we do that?
> 
> Sure.
> 
> > The log message "full_page_writes on master is set invalid more than
> > once during online backup" should read "at least once" rather than
> > "more than once".
> 
> Yes.
> 
> > lastFpwDisabledLSN needs to be initialized.
> 
> I think it don't need because all values in XLogCtl is initialized 0.
> 
> > Is there a reason to add lastFpwDisabledLSN onto the Control file? If
> > we log parameters after every checkpoint then we'll know the values
> > when we startup. If we keep logging parameters this way we'll end up
> > with a very awkward and large control file. I would personally prefer
> > to avoid that, but that thought could go either way. Let's see if
> > anyone else thinks that also.
> 
> Yes. I add to CreateCheckPoint().
> 
> Image:
>   CreateCheckPoint()
>   {
>  if (!shutdown && XLogStandbyInfoActive())
>  {
> LogStandbySnapshot()
> XLogReportParameters()
>  }
>}
> 
>   XLogReportParameters()
>   {
>  if (fpw == 'off' || ... )
>  XLOGINSERT()
>   }
> 
> However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 
> 'off'.
> (It will increases the amount of WAL.)
> Is it OK?

Done.

Updated patch attached.

Regards.


Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka@po.ntts.co.jp



standby_online_backup_09base-02fpw.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SET variable - Permission issues

2011-10-11 Thread Bruce Momjian

Kevin Grittner wrote:
> Joe Conway  wrote:
> > On 10/10/2011 01:52 PM, Gurjeet Singh wrote:
>  
> >> ALTER USER novice SET MIN_VAL OF statement_timeout TO '1';
> >> -- So that the user cannot turn off the timeout
> >> 
> >> ALTER DATABASE super_reliable SET ENUM_VALS OF synchronous_commit
> >>   TO 'on';
> >> -- So that the user cannot change the synchronicity of
> >> transactions against this database.
> > 
> > I like this better than GRANT/REVOKE on SET.
>  
> +1
>  
> I would really like a way to prevent normal users from switching
> from the default transaction isolation level I set.  This seems like
> a good way to do that.  Putting sane bounds on some other settings,
> more to protect against the accidental bad settings than malicious
> mischief, would be a good thing, too.

Is this a TODO?  We might not want to make work_mem SUSET, but it would
allow administrators to control this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unite recovery.conf and postgresql.conf

2011-10-11 Thread Bruce Momjian

Fujii Masao wrote:
> On Tue, Oct 11, 2011 at 6:37 AM, Simon Riggs  wrote:
> > On Mon, Oct 10, 2011 at 6:52 PM, Josh Berkus  wrote:
> >
> >>> Tatsuo/Josh/Robert also discussed how recovery.conf can be used to
> >>> provide parameters solely for recovery. That is difficult to do
> >>> without causing all downstream tools to make major changes in the ways
> >>> they supply parameters.
> >>
> >> Actually, this case is easily solved by an "include recovery.conf"
> >> parameter. ?So it's a non-issue.
> >
> > That is what I've suggested and yes, doing that is straightforward.
> 
> Even if we do that, you still need to modify the tool so that it can handle
> the recovery trigger file. recovery.conf is used as just a configuration file
> (not recovery trigger file at all). It's not renamed to recovery.done at the
> end of recovery. If the tool depends on the renaming from recovery.conf
> to recovery.done, it also would need to be modified. If the tool needs to
> be changed anyway, why do you hesitate in changing it so that it adds
> "include recovery.conf" into postgresql.conf automatically?
> 
> Or you think that, to keep the backward compatibility completely,
> recovery.conf should be used as not only a configuration file but also a
> recovery trigger one and it should be renamed to recovery.done at
> the end of recovery?

As much as I appreciate Simon's work in this area, I think we are still
unclear if keeping backward-compatibility is worth the complexity
required for future users.  Historically we have been bold in changing
postgresql.conf settings to improve clarity, and that approach has
served us well.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: casts row to array and array to row

2011-10-11 Thread Tom Lane

Pavel Stehule  writes:
> 2011/10/11 Robert Haas :
>> On Tue, Oct 11, 2011 at 4:40 AM, Pavel Stehule  
>> wrote:
>>> What do you think about this idea?

It's a bad one.

>> Well, a ROW can contain values of different types; an ARRAY can't.

> yes, I know - but it should be problem only in few cases - when is not
> possible to cast a row field to array field.

This idea is basically the same as "data types don't matter", which is
not SQL-ish and certainly not Postgres-ish.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

2011-10-11 Thread PostgreSQL - Hans-Jürgen Schönig

On Oct 7, 2011, at 8:47 PM, Joshua D. Drake wrote:

> 
> On 10/07/2011 11:40 AM, Tom Lane wrote:
>> Robert Haas  writes:
>>> Please find attached a patch implementing a basic version of
>>> index-only scans.
>> 
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>> 
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
> 
> enable_onlyindexscan
> 
> I'm kidding.
> 
> +1 on Tom's proposed name.


+1 ...
definitely an important thing to do.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] DROP statement reworks

2011-10-11 Thread Robert Haas

On Mon, Oct 10, 2011 at 1:38 PM, Kohei KaiGai  wrote:
> I'm sorry again. I tought it was obvious from the filenames.

I guess I got confused because you re-posted part 2 without the other
parts, and I got mixed up and thought you were reposting part one.

I've committed a stripped-down version of the part one patch, which
had several mistakes even in just the part I committed - e.g., you
forgot TABLESPACEOID.  I also did some renaming for clarity.

I'm going to throw this back to you for rebasing at this point, which
I realize is going to be somewhat of an unenjoyable task given the way
I cut up your changes to objectaddress.c, but I wasn't very confident
that all of the entries were correct (the one for attributes seemed
clearly wrong to me, for example), and I didn't want to commit a bunch
of stuff that wasn't going to be exercised.  I suggest that you merge
the remainder of the part-one changes into part-two.  On the flip
side, I think you should take the stuff that deals with dropping
relations OUT of part two.  I don't see what good it does us to try to
centralize the drop logic if we still have to have special cases for
relations, so let's just leave that separate for now until we figure
out a better approach, or at least split it off as a separate patch so
that it doesn't hold up all the other changes.

I think get_object_namespace() needs substantial revision.  Instead of
passing the object type and the object address, why not just pass the
object address?  You should be able to use the classId in the address
to figure out everything you need to know.  Since this function is
private to objectaddress.c, there's no reason for it to use those
accessor functions - it can just iterate through the array just as
object_exists() does.  That way you also avoid iterating through the
array multiple times.  I also think that we probably ought to revise
AlterObjectNamespace() to make use of this new machinery, instead of
making the caller pass in all the same information that
objectaddress.c is now learning how to provide.  That would possibly
open the way to a bunch more consolidation of the SET SCHEMA code; in
fact, we might want to clean that up first, before dealing with the
DROP stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: casts row to array and array to row

2011-10-11 Thread Pavel Stehule

2011/10/11 Merlin Moncure :
> On Tue, Oct 11, 2011 at 8:18 AM, Pavel Stehule  
> wrote:
>> 2011/10/11 Merlin Moncure :
>>> On Tue, Oct 11, 2011 at 3:40 AM, Pavel Stehule  
>>> wrote:
 Hello

 A current limits of dynamic work with row types in PL/pgSQL can be
 decreased with a possible casts between rows and arrays. Now we have a
 lot of tools for arrays, and these tools should be used for rows too.

 postgres=# \d mypoint
 Composite type "public.mypoint"
  Column │  Type   │ Modifiers
 ┼─┼───
  a      │ integer │
  b      │ integer │

 postgres=# select cast(rmypoint '(10,20) as int[]);
   array
 
  {10,20}
 (1 row)

 postgres=# select cast(ARRAY[10,20] AS mypoint);
  mypoint
 ─
  (10,20)
 (1 row)

 What do you think about this idea?
>>>
>>> Not sure what it buys you over the syntax we already have:
>>>
>>> select row(foo[1], bar[2]);
>>> select array[(bar).a, (bar).b];
>>
>> You can do it manually for known combinations of rowtype and
>> arraytype. But proposed casts do it generally - what has sense mainly
>> for plpgsql functions or some sql functions.
>>
>>>
>>> Also, in my coding of composite types, homogeneously typed rows don't
>>> really come up that often...
>>
>> you can use everywhere text type.
>>
>> When I wrote 
>> http://stackoverflow.com/questions/7711432/how-to-set-value-of-composite-variable-field-using-dynamic-sql/7722575#7722575
>> then I had to do lot of string operations. Proposed casts
>> significantly do this simply - and it is enought general for general
>> usage.
>
> How does your approach compare to hstore?  hstore to me is just
> enhanced generic container type which supports the operations you are
> trying to do.  It can be trivially (as of 9.0) moved in an out of both
> arrays and record types:

for replace some value is hstore ok, but cast to arrays is more
general - you can do some tricks like table transposition, you can use
a all tricks that we have for arrays.

>
> postgres=# create type foo_t as (a int, b text, c float);
> CREATE TYPE
>
> postgres=# select row(1, 'abc', 1.0)::foo_t  #= '"b"=>"def"';
>  ?column?
> ---
>  (1,def,1)
> (1 row)

In some future version I would to have a general indexable types - and
then we can have a hash (hstore) in code, but casts to arrays or to
hashs can be useful - in higher languages like plpgsql or sql.

Pavel

>
> merlin
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 127 matches

Mail list logo