Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Heikki Linnakangas

On 20.01.2011 04:36, Robert Haas wrote:

... Even better, the
code changes would be confined to ANALYZE rather than spread out all
over the system, which has positive implications for robustness and
likelihood of commit.


Keep in mind that the administrator can already override the ndistinct 
estimate with ALTER TABLE. If he needs to manually run a special ANALYZE 
command to make it scan the whole table, he might as well just use ALTER 
TABLE to tell the system what the real (or good enough) value is. A DBA 
should have a pretty good feeling of what the distribution of his data 
is like.


And how good does the estimate need to be? For a single-column, it's 
usually not that critical, because if the column has only a few distinct 
values then we'll already estimate that pretty well, and OTOH if 
ndistinct is large, it doesn't usually affect the plans much if it's 10% 
of the number of rows or 90%.


It seems that the suggested multi-column selectivity estimator would be 
more sensitive to ndistinct of the individual columns. Is that correct? 
How is it biased? If we routinely under-estimate ndistinct of individual 
columns, for example, does the bias accumulate or cancel itself in the 
multi-column estimate?


I'd like to see some testing of the suggested selectivity estimator with 
the ndistinct estimates we have. Who knows, maybe it works fine in practice.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Actually the other way around.  An opclass is the subset of an opfamily
 that is tightly bound to an index.  The build methods have to be
 associatable with an index, so they're part of the index's opclass.
 The query methods could be loose in the opfamily.

I had understood your proposal to change that for GIN.  Thinking again
now with keeping opfamily and opclass as they are now: an opclass is the
code we run to build and scan the index, an opfamily is a way to use the
same index data and code in more contexts than strictly covered by an
opclass.

 The planner's not the problem here --- what's missing is the rule for
 the index AM to look up the right support functions to call at runtime.

 The trick is to associate the proper query support methods with any
 given query operator (which'd also be loose in the family, probably).
 The existing schema for pg_amop and pg_amproc is built on the assumption
 that the amoplefttype/amoprighttype are sufficient for making this
 association; but that seems to fall down if we would like to allow
 contrib modules to add new query operators that coincidentally take the
 same input types as an existing opfamily member.

Well the opfamily machinery allows to give query support to any index
whose opclass is in the family.  That is, the same set of operators are
covered by more than one opclass.

What we want to add is more than one set of operators can find data
support in more than one index kind.  But you still want to run
specific search code here.  So it seems to me we shouldn't attack the
problem at the operator left and right type level, but rather model that
we need another level of flexibility, separating somewhat the index data
building and maintaining from the code that's used to access it.

The example that we're working from seem to be covered if we are able to
instruct PostgreSQL than a set of opclass'es are binary coercible, I
think that's the term here.

Then the idea would be to have PostgreSQL able to figure out that a
given index can be used with any binary coercible opclass, rather than
only the one used to maintain it.  What do you think?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: RangeTypes

2011-01-20 Thread Jeff Davis
New patch. I added a lot of generic range functions, and a lot of
operators.

There is still more work to do, this is just an updated patch. The
latest can be seen on the git repository, as well:

http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

Regards,
Jeff Davis


rangetypes-20110119.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in

2011-01-20 Thread Pavel Stehule
2011/1/20 Stephen Frost sfr...@snowman.net:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 19, 2011 at 6:04 PM, Stephen Frost sfr...@snowman.net wrote:
  I'm going to mark this returned to author with feedback.

 That implies you don't think it should be considered further for this
 CommitFest.  Perhaps you mean Waiting on Author?

 I did, actually, and that's what I actually marked it as in the CF.
 Sorry for any confusion.  When I went to mark it in CF, I realized my
 mistake.


ok :), I'll look on it tomorrow.

regards

Pavel

        Thanks,

                Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)

 iEYEARECAAYFAk03ltkACgkQrzgMPqB3kihSmQCePy6+fpC7RJdki5guPRCLp5IZ
 EJMAoIqgjb+IsG853/gC9T9xgFg5M5aM
 =VLWh
 -END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REPLICATION privilege and shutdown

2011-01-20 Thread Fujii Masao
Hi,

Both the user with REPLICATION privilege and the superuser can
call pg_stop_backup. But only superuser can connect to the server
to cancel online backup during shutdown. The non-superuser with
REPLICATION privilege cannot. Is this behavior intentional? Or just
oversight?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICATION privilege and shutdown

2011-01-20 Thread Heikki Linnakangas

On 20.01.2011 11:41, Fujii Masao wrote:

Hi,

Both the user with REPLICATION privilege and the superuser can
call pg_stop_backup. But only superuser can connect to the server
to cancel online backup during shutdown. The non-superuser with
REPLICATION privilege cannot. Is this behavior intentional? Or just
oversight?


I think we need to consider the situation after the multiple streaming 
base backups patch goes in. After that we can change pg_stop_backup() 
so that you need superuser privileges to run it again - replication 
privileges is enough to do a streaming base backup, but that no longer 
interferes with the pg_start/stop_backup() admin functions.


At the moment, a streaming base backup and manual pg_start/stop_backup() 
use the same machinery, so it's possible e.g to run pg_stop_backup() 
while a streaming base backup is running, causing it to fail at the end. 
Or worse, you can run pg_stop_backup()+pg_start_backup(), and the 
streaming base backup will seemingly succeed, but the produced backup is 
potentially corrupt.


The multiple base backups patch will fix that too.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICATION privilege and shutdown

2011-01-20 Thread Magnus Hagander
On Thu, Jan 20, 2011 at 10:50, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 20.01.2011 11:41, Fujii Masao wrote:

 Hi,

 Both the user with REPLICATION privilege and the superuser can
 call pg_stop_backup. But only superuser can connect to the server
 to cancel online backup during shutdown. The non-superuser with
 REPLICATION privilege cannot. Is this behavior intentional? Or just
 oversight?

 I think we need to consider the situation after the multiple streaming base
 backups patch goes in. After that we can change pg_stop_backup() so that
 you need superuser privileges to run it again - replication privileges is
 enough to do a streaming base backup, but that no longer interferes with the
 pg_start/stop_backup() admin functions.

 At the moment, a streaming base backup and manual pg_start/stop_backup() use
 the same machinery, so it's possible e.g to run pg_stop_backup() while a
 streaming base backup is running, causing it to fail at the end. Or worse,
 you can run pg_stop_backup()+pg_start_backup(), and the streaming base
 backup will seemingly succeed, but the produced backup is potentially
 corrupt.

 The multiple base backups patch will fix that too.

Yeah; I've been avoiding to even think about that one pending the
multiple base backups, onthe assumption that it goes in before
release. If it doesn't, for some reason, that needs to be revisited,
obviously.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICATION privilege and shutdown

2011-01-20 Thread Fujii Masao
On Thu, Jan 20, 2011 at 6:50 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think we need to consider the situation after the multiple streaming base
 backups patch goes in. After that we can change pg_stop_backup() so that
 you need superuser privileges to run it again - replication privileges is
 enough to do a streaming base backup, but that no longer interferes with the
 pg_start/stop_backup() admin functions.

Fair enough.

 At the moment, a streaming base backup and manual pg_start/stop_backup() use
 the same machinery, so it's possible e.g to run pg_stop_backup() while a
 streaming base backup is running, causing it to fail at the end. Or worse,
 you can run pg_stop_backup()+pg_start_backup(), and the streaming base
 backup will seemingly succeed, but the produced backup is potentially
 corrupt.

 The multiple base backups patch will fix that too.

OK.

BTW, I found this behavior when I read your patch ;)

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Simon Riggs
On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

 First, I'd like to note that the thread for this patch had *four* me-too
 responses to the use case.  That's extremely unusual; the subject is 
 definitely
 compelling to people.  It addresses the bad behavior of natural attempts to
 atomically swap two tables in the namespace:
 
   psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
 ('new')
   psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
   sleep 1   # 
 give prev time to take AccessShareLock
 
   # Do it this way, and the next SELECT gets data from the old table.
   #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 
 
   # Do it this way, and get: ERROR:  could not open relation with OID 
 41380
   psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' 
 
   psql -c 'SELECT * FROM t'   # I get 'old' or an error, 
 never 'new'.
   psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'
 
 by letting you do this instead:
 
   psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
 ('new')
   psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
   sleep 1   # 
 give prev time to take AccessShareLock
 
   psql -c 'EXCHANGE TABLE new_t TO t 
 
   psql -c 'SELECT * FROM t'   # I get 'new', finally!
   psql -c 'DROP TABLE IF EXISTS t, new_t'
 
 I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread
 interesting: can we just make the first example work?  Even granting that the
 second syntax may be a useful addition, the existing behavior of the first
 example is surely worthless, even actively harmful.  I tossed together a
 proof-of-concept patch, attached, that makes the first example DTRT.  Do you 
 see
 any value in going down that road?

As I said previously on the thread you quote, having this happen
implicitly is not a good thing, and IMHO, definitely not the right
thing.

Heikki's suggestion, and your patch, contain no checking to see whether
the old and new tables are similar. If they are not similar then we have
all the same problems raised by my patch. SQL will suddenly fail because
columns have ceased to exist, FKs suddenly disappear etc..

I don't see how having a patch helps at all. I didn't think it was the
right way before you wrote it and I still disagree now you've written
it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-01-20 Thread Marko Tiikkaja

On 2011-01-17 9:28 AM +0200, Itagaki Takahiro wrote:

Here is a short review for Transaction scoped advisory locks:
https://commitfest.postgresql.org/action/patch_view?id=518


Thanks for reviewing!


== Features ==
The patch adds pg_[try_]advisory_xact_lock[_shared] functions.
The function names follows the past discussion -- it's better than
bool isXact argument or changing the existing behavior.

== Coding ==
I expect documentation will come soon.


I'm sorry about this, I have been occupied with other stuff.  I'm going 
to work on this tonight.



There is no regression test, but we have no regression test for
advisory locks even now. Tests for lock conflict might be difficult,
but we could have single-threaded test for lock/unlock and pg_locks view.


Seems useful.


== Questions ==
I have a question about unlocking transaction-scope advisory locks.
We cannot unlock them with pg_advisory_unlock(), but can unlock with
pg_advisory_unlock_all(). It's inconsistent behavior.
Furthermore, I wonder we can allow unlocking transaction-scope locks
-- we have LOCK TABLE but don't have UNLOCK TABLE.


I guess we could add new pg_advisory_txn_unlock() functions to unlock 
transaction-scope locks, but I do share your doubt on whether or not we 
want to allow this at all.  On the other hand, the reasons why we don't 
allow non-advisory locks to be unreleased is a lot more clear than the 
issue at hand.  I have no strong opinion on this.


Another thing I now see is this:

BEGIN;
SELECT pg_advisory_xact_lock(1);

-- do something here

-- upgrade to session lock
SELECT pg_advisory_lock(1);
COMMIT;


This seems useful, since the xact lock would be automatically released 
if an error happens during -- do something here so you wouldn't need 
to worry about releasing the lock elsewhere.  But I'm not sure this is 
safe.  Can anyone see a problem with it?



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-20 Thread Heikki Linnakangas

On 19.01.2011 16:01, Joachim Wieland wrote:

On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Here are the latest patches all of them also rebased to current HEAD.
Will update the commitfest app as well.


What's the idea of storing the file sizes in the toc file? It looks like
it's not used for anything.


It's part of the overall idea to make sure files are not inadvertently
exchanged between different backups and that a file is not truncated.
In the future I'd also like to add a checksum to the TOC so that a
backup can be checked for integrity. This will cost performance but
with the parallel backup it can be distributed to several processors.


Ok. I'm going to leave out the filesize. I can see some value in that, 
and the CRC, but I don't want to add stuff that's not used at this point.



It would be nice to have this format match the tar format. At the moment,
there's a couple of cosmetic differences:

* TOC file is called TOC, instead of toc.dat

* blobs TOC file is called BLOBS.TOC instead of blobs.toc

* each blob is stored as blobs/oid.dat, instead of blob_oid.dat


That can be done easily...


The only significant difference is that in the directory archive format,
each data file has a header in the beginning.



What are the benefits of the data file header? Would it be better to leave
it out, so that the format would be identical to the tar format? You could
then just tar up the directory to get a tar archive, or vice versa.


The header is there to identify a file, it contains the header that
every other pgdump file contains, including the internal version
number and the unique backup id.

The tar format doesn't support compression so going from one to the
other would only work for an uncompressed archive and special care
must be taken to get the order of the tar file right.


Hmm, tar format doesn't support compression, but looks like the file 
format issue has been thought of already: there's still code there to 
add .gz suffix for compressed files. How about adopting that convention 
in the directory format too? That would make an uncompressed directory 
format compatible with the tar format.


That seems pretty attractive anyway, because you can then dump to a 
directory, and manually gzip the data files later.


Now that we have an API for compression in compress_io.c, it probably 
wouldn't be very hard to implement the missing compression support to 
tar format either.



If you want to drop the header altogether, fine with me but if it's
just for the tar-  directory conversion, then I am failing to see
what the use case of that would be.

A tar archive has the advantage that you can postprocess the dump data
with other tools  but for this we could also add an option that gives
you only the data part of a dump file (and uncompresses it at the same
time if compressed). Once we have that however, the question is what
anybody would then still want to use the tar format for...


I don't know how popular it'll be in practice, but it seems very nice to 
me if you can do things like parallel pg_dump in directory format first, 
and then tar it up to a file for archival.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Magnus Hagander
On Thu, Jan 20, 2011 at 05:23, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander mag...@hagander.net wrote:
 Great. Thanks for the quick update!

 Here are another comments:

 Here are comments against the documents. The other code looks good.

Thanks!

 It's helpful to document what to set to allow pg_basebackup connection.
 That is not only the REPLICATION privilege but also max_wal_senders and
 pg_hba.conf.

Hmm. Yeha, i guess that wouldn't hurt. Will add that.


 + refsect1
 +  titleOptions/title

 Can we list the descriptions of option in the same order as
 pg_basebackup --help does?

 It's helpful to document that the target directory must be specified and
 it must be empty.

Yeah, that's on the list - I just wanted to make any other changes
first before I did that. I based on (no further) feedback and a few
extra questions, I'm going to change it per your suggestion to use -D
dir -F format, instead of -D/-T, which will change that stuff
anyway. So I'll reorder them at that time.


 +  para
 +   The backup will include all files in the data directory and tablespaces,
 +   including the configuration files and any additional files placed in the
 +   directory by third parties. Only regular files and directories are allowed
 +   in the data directory, no symbolic links or special device files.

 The latter sentence means that the backup of the database cluster
 created by initdb -X is not supported? Because the symlink to the
 actual WAL directory is included in it.

No, it's not. pg_xlog is specifically excluded, and sent as an empty
directory, so upon restore you will have an empty pg_xlog directory.


 OTOH, I found the following source code comments:

 + * Receive a tar format stream from the connection to the server, and unpack
 + * the contents of it into a directory. Only files, directories and
 + * symlinks are supported, no other kinds of special files.

 This says that symlinks are supported. Which is true? Is the symlink
 supported only in tar format?

That's actually a *backend* side restriction. If there is a symlink
anywhere other than pg_tblspc in the data directory, we simply won't
send it across (with a warning).

The frontend code supports creating symlinks, both in directory format
and in tar format (actually, in tar format it doesn't do anything, of
course, it just lets it through)

It wouldn't actually be hard to allow the inclusion of symlinks in the
backend side. But it would make verification a lot harder - for
example, if someone symlinked out pg_clog (as an example), we'd back
up the symlink but not the actual files since they're not actually
registered as a tablespace.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include WAL in base backup

2011-01-20 Thread Magnus Hagander
On Thu, Jan 20, 2011 at 05:03, Stephen Frost sfr...@snowman.net wrote:
 Greetings,

 * Magnus Hagander (mag...@hagander.net) wrote:
 For now, you need to set wal_keep_segments to make it work properly,
 but if you do the idea is that the tar file/stream generated in the
 base backup will include all the required WAL files.

 Is there some reason to not ERROR outright if we're asked to provide WAL
 and wal_keep_segments isn't set..?  I'd rather do that than only ERROR
 when a particular WAL is missing..  That could lead to transient backup
 errors that an inexperienced sysadmin or DBA might miss or ignore.
 They'll notice if it doesn't work the first time they try it and spits
 out a hint about wal_keep_segments.

Well, in a smaller:ish database you can easily do the full backup
before you run out of segments in the data directory even when you
haven't set wal_keep_segments. If we error out, we force extra work on
the user in the trivial case.

I'd rather not change that, but instead (as Fujii-san has also
mentioned is needed anyway) put some more effort into documenting in
which cases you need to set it.


 I've got some refactoring I want to do around the
 SendBackupDirectory() function after this, but a review of the
 functionality first would be good. And obviously, documentation is
 still necessary.

 mkay, I'm not going to try to make this ready for committer, but will
 provide my comments on it overall.  Bit difficult to review when someone
 else is reviewing the base patch too. :/

Heh, yeah.


 Here goes:

 - I'm not a huge fan of the whole 'closetar' option, that feels really
  rather wrong to me.  Why not just open it and close it in
  perform_base_backup(), unconditionally?

Yeah, we could move the whole thing up there. Or, as I mentioned in an
IM conversation with Heikki, just get rid of SendBackupDirectory()
completely and inline it inside the loop in perform_base_backup().
Given that it's basically just 5 lines + a call to sendDir()..


 - I wonder if you're not getting to a level where you shold be using a
  struct to pass the relevant information to perform_base_backup()
  instead of adding more arguments on..  That's going to get unwieldy at
  some point.

Yeah, probably.

We *could* pass the BaseBackupCmd struct from the parser all the way
in - or is that cheating too much on abstractions? It seems if we
don't, we're just going to hav ea copy of that struct without the
NodeTag member..


 - Why not initialize logid and logseg like so?:

        int logid = startptr.xlogid;
        int logseg = startptr.xrecoff / XLogSegSize;

  Then use those in your elog?  Seems cleaner to me.

Hmm. Yes. Agreed.


 - A #define right in the middle of a while loop...?  Really?

Haha, yeah, that was a typo. I didn't remember the name of the
variable so I stuck it there for testing and forgot it. It should be
ThisTimeLineID, and no #define at all.


 - The grammar changes strike me as..  odd.  Typically, you would have an
  'option' production that you can then have a list of and then let each
  option be whatever the OR'd set of options is.  Wouldn't the current
  grammar require that you put the options in a specific order?  That'd
  blow.

It does require them in a specific order. The advantage is that it
makes the code easier. and it's not like end users are expected to run
them anyway...

Now, I'm no bison export, so it might be an easy fix. But the way I
could figure, to make them order indepdent I have to basically collect
them up together as a List instead of just as a struct, and then loop
through that list to build a struct later.

If someone who knows Bison better can tell me a neater way to do that,
I'll be happy to change :-)


 @@ -687,7 +690,7 @@ BaseBackup()
                 * once since it can be relocated, and it will be checked 
 before we do
                 * anything anyway.
                 */
 -               if (basedir != NULL  i  0)
 +               if (basedir != NULL  !PQgetisnull(res, i, 1))
                        verify_dir_is_empty_or_create(PQgetvalue(res, i, 1));
        }

 - Should the 'i  0' conditional above still be there..?

No. That's a cheat-check that assumes the base directory is always
sent first. Which is not true anymore - with this patch we always send
it *last* so we can include the WAL in it.


 So, that's my review from just reading the source code and the thread..
 Hope it's useful, sorry it's not more. :/

Thanks - it certainly is!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Magnus Hagander
On Thu, Jan 20, 2011 at 12:42, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Jan 20, 2011 at 05:23, Fujii Masao masao.fu...@gmail.com wrote:
 It's helpful to document what to set to allow pg_basebackup connection.
 That is not only the REPLICATION privilege but also max_wal_senders and
 pg_hba.conf.

 Hmm. Yeha, i guess that wouldn't hurt. Will add that.

Added, see github branch.


 + refsect1
 +  titleOptions/title

 Can we list the descriptions of option in the same order as
 pg_basebackup --help does?

 It's helpful to document that the target directory must be specified and
 it must be empty.

 Yeah, that's on the list - I just wanted to make any other changes
 first before I did that. I based on (no further) feedback and a few
 extra questions, I'm going to change it per your suggestion to use -D
 dir -F format, instead of -D/-T, which will change that stuff
 anyway. So I'll reorder them at that time.

Updated on github.


 +  para
 +   The backup will include all files in the data directory and tablespaces,
 +   including the configuration files and any additional files placed in the
 +   directory by third parties. Only regular files and directories are 
 allowed
 +   in the data directory, no symbolic links or special device files.

 The latter sentence means that the backup of the database cluster
 created by initdb -X is not supported? Because the symlink to the
 actual WAL directory is included in it.

 No, it's not. pg_xlog is specifically excluded, and sent as an empty
 directory, so upon restore you will have an empty pg_xlog directory.

Actually, when I verified that statement, I found a bug where we sent
the wrong thing if pg_xlog was a symlink, leading to a corrupt
tarfile! Patch is in the github branch.


 OTOH, I found the following source code comments:

 + * Receive a tar format stream from the connection to the server, and unpack
 + * the contents of it into a directory. Only files, directories and
 + * symlinks are supported, no other kinds of special files.

 This says that symlinks are supported. Which is true? Is the symlink
 supported only in tar format?

 That's actually a *backend* side restriction. If there is a symlink
 anywhere other than pg_tblspc in the data directory, we simply won't
 send it across (with a warning).

 The frontend code supports creating symlinks, both in directory format
 and in tar format (actually, in tar format it doesn't do anything, of
 course, it just lets it through)

 It wouldn't actually be hard to allow the inclusion of symlinks in the
 backend side. But it would make verification a lot harder - for
 example, if someone symlinked out pg_clog (as an example), we'd back
 up the symlink but not the actual files since they're not actually
 registered as a tablespace.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2011-01-20 Thread Bernd Helmle



--On 22. Dezember 2010 15:51:09 +0900 Fujii Masao masao.fu...@gmail.com 
wrote:



How about doing target != ALL test at the head for the most common case
(target == ALL)?


That's an idea, but the test you propose implements it incorrectly.


Thanks! I revised the patch.


I had a look at this for the current CF and the patch looks reasonable to 
me. Some testing shows that the changes are working as intended (at least, 
the wal sender actually receives now signals from SignalSomeChildren() as 
far as the DEBUG4 output shows). Maybe we should put in a small comment, 
why we special case BACKEND_TYPE_ALL (following Tom's comment about 
expensive shared memory access and IsPostmasterChildWalSender()). I marked 
it as Ready for Committer.


Question for my understanding:

While reading the small patch, i realized that there's no 
BACKEND_TYPE_WALRECV or similar. If i understand correctly there's no need 
to handle it this way, since there's only one wal receiver process per 
instance?


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-20 Thread Itagaki Takahiro
On Wed, Jan 19, 2011 at 00:34, Shigeru HANADA han...@metrosystems.co.jp wrote:
 Attached patch requires FDW API patches and copy_export-20110114.patch.

Some minor comments:

* Can you pass slot-tts_values and tts_isnull directly to NextCopyFrom()?
It won't allocate the arrays; just fill the array buffers.

* You can pass NULL for the 4th argument for NextCopyFrom().
| Oid tupleoid; /* just for required parameter */

* file_fdw_validator still has duplicated codes with BeginCopy,
but I have no idea to share the validation code in clean way...

* Try strVal() instead of DefElem-val.str
* FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate?
* private is a bad identifier name because it's a C++ keyword.
We should rename FdwExecutionState-private.


 In that message, you also pointed out that FDW must generate
 explainInfo in every PlanRelScan call even if the planning is not for
 EXPLAIN.  I'll try to defer generating explainInfo until EXPLAIN
 VERBOSE really uses it.  It might need new hook point in expalain.c,
 though.

I complained about the overhead, but it won't be a problem for
file_fdw and pgsql_fdw. file_fdw can easily generate the text,
and pgsql_fdw needs to generate a SQL query anyway.

My concern is the explainInfo interface is not ideal for the purpose
and therefore it will be unstable interface. If we support nested plans
in FDWs, each FDW should receive a tree writer used internally in
explain.c. explainInfo, that is a plan text, is not enough for complex
FdwPlans. However, since we don't have any better solution for now,
we could have the variable for 9.1. It's much better than nothing.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-20 Thread Joachim Wieland
On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 It's part of the overall idea to make sure files are not inadvertently
 exchanged between different backups and that a file is not truncated.
 In the future I'd also like to add a checksum to the TOC so that a
 backup can be checked for integrity. This will cost performance but
 with the parallel backup it can be distributed to several processors.

 Ok. I'm going to leave out the filesize. I can see some value in that, and
 the CRC, but I don't want to add stuff that's not used at this point.

Okay.

 The header is there to identify a file, it contains the header that
 every other pgdump file contains, including the internal version
 number and the unique backup id.

 The tar format doesn't support compression so going from one to the
 other would only work for an uncompressed archive and special care
 must be taken to get the order of the tar file right.

 Hmm, tar format doesn't support compression, but looks like the file format
 issue has been thought of already: there's still code there to add .gz
 suffix for compressed files. How about adopting that convention in the
 directory format too? That would make an uncompressed directory format
 compatible with the tar format.

So what you could do is dump in the tar format, untar and restore in
the directory format. I see that this sounds nice but still I am not
sure why someone would dump to the tar format in the first place.

But you still cannot go back from the directory archive to the tar
archive because the standard command line tar will not respect the
order of the objects that pg_restore expects in a tar format, right?


 That seems pretty attractive anyway, because you can then dump to a
 directory, and manually gzip the data files later.

The command line gzip will probably add its own header to the file
that pg_restore would need to strip off...

This is a valid use case for people who are concerned with a fast
dump, usually they would dump uncompressed and later compress the
archive. However once we have parallel pg_dump, this advantage
vanishes.


 Now that we have an API for compression in compress_io.c, it probably
 wouldn't be very hard to implement the missing compression support to tar
 format either.

True, but the question to the advantage of the tar format remains :-)


 A tar archive has the advantage that you can postprocess the dump data
 with other tools  but for this we could also add an option that gives
 you only the data part of a dump file (and uncompresses it at the same
 time if compressed). Once we have that however, the question is what
 anybody would then still want to use the tar format for...

 I don't know how popular it'll be in practice, but it seems very nice to me
 if you can do things like parallel pg_dump in directory format first, and
 then tar it up to a file for archival.

Yes, but you cannot pg_restore the archive then if it was created with
standard tar, right?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] streaming replication feature request

2011-01-20 Thread Bruce Momjian
Scott Ribe wrote:
 How about supporting something like:
 
 wal_keep_segments = '7d'

[ moved to hackers]

Sorry for the late reply.  That is a very interesting idea.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving test_fsync to /contrib?

2011-01-20 Thread Bruce Momjian
Greg Smith wrote:
 Alvaro Herrera wrote:
  I don't understand why it would be overkill.  Are you saying people
  would complain because you installed a 25 kB executable that they might
  not want to use?  Just for fun I checked /usr/bin and noticed that I
  have a pandoc executable, weighing 17 MB, that I have never used and I
  have no idea what is it for.

 
 It's for converting between the various types of text-like markup, i.e. 
 reST, LaTex, Markdown, etc.
 
 Anyway, just because the rest of the world has no standards anymore 
 doesn't mean we shouldn't.  The changes Bruce has made recently have 
 gotten this tool to where its output is starting to be readable and 
 reliable.  The sort of people who want to run this will certainly be 
 fine with installing contrib to do it, because they may want to have 
 things like pgbench too.  There's really not enough demand for this to 
 pollute the default server install footprint with any overhead from this 
 tool, either in bytes or increased tool name squatting.  And the fact 
 that it's still a little rough around the edges nudges away from the 
 standard server package too.
 
 Install in contrib as pg_test_fsync and I think you'll achieve the 
 optimal subset of people who can be made happy here.  Not having it 
 packaged at all before wasn't a big deal, because it was so hard to 
 collect good data from only developer-level people were doing it 
 anyway.  Now that it is starting to be more useful in that role for less 
 experienced users, we need to make it easier for more people to run it, 
 to collect feedback toward further improving its quality.

OK, I am ready to move test_fsync to /contrib.  Is pg_test_fsync the
best name?  pg_check_fsync?  pg_fsync_performance?  pg_verify_fsync?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving test_fsync to /contrib?

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote:
 OK, I am ready to move test_fsync to /contrib.  Is pg_test_fsync the
 best name?  pg_check_fsync?  pg_fsync_performance?  pg_verify_fsync?

I don't see too much reason to rename it more than necessary, so how
about pg_test_fsync?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 12:57 AM, Noah Misch n...@leadboat.com wrote:
 There are two distinct questions here:

Agreed.

 (1) Should reindex_relation receive boolean facts from its callers by way of 
 two
 boolean arguments, or by way of one flags vector?

 The former seems best when you want every caller to definitely think about 
 which
 answer is right, and the latter when that's not so necessary.  (For a very 
 long
 list of options, the flags might be chosen on other grounds.)  As framed, I'd
 lean toward keeping distinct arguments, as these are important questions.

My main beef with the Boolean flags is that this kind of thing is not too clear:

   reindex_relation(myrel, false, false, true, true, false, true,
false, false, true);

Unless you have an excellent memory, you can't tell what the heck
that's doing without flipping back and forth between the function
definition and the call site.  With a bit-field, it's a lot easier to
glance at the call site and have a clue what's going on.  We're of
course not quite to the point of that exaggerated example yet.

 However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS 
 and
 REINDEX_ALLOW_OLD_INDEX_USE.  Then, flags = 0 can hurt performance but not
 correctness.  That's looking like a win.

I prefer the positive sense for those flags because I think it's more
clear.  There aren't so many call sites or so many people using this
that we have to worry about what people are going to do in new calling
locations; getting it right in any new code shouldn't be a
consideration.

 (2) Should reindex_relation frame its boolean arguments in terms of what the
 caller did (heap_rebuilt, tuples_changed), or in terms of what 
 reindex_relation
 will be doing (check_constraints, suppress_index_use)?

Yeah, I know we're doing the former now, but I think it's just getting
confusing for exactly the reasons you state:

 I agree that both heap_rebuilt and tuples_changed are bad abstractions.
 TRUNCATE is lying about the former, and the latter, as you say, is never 
 really
 correct.  column_values_changed, perhaps.  
 tuples_could_now_violate_constraints
 would be correct, but that's just a bad spelling for 
 REINDEX_CHECK_CONSTRAINTS.
 I guess the equivalent long-winded improvement for heap_rebuilt would be
 indexes_still_valid_for_snapshotnow.  Eh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Robert Haas
On Wed, Jan 19, 2011 at 4:27 PM, Simone Aiken sai...@ulfheim.net wrote:
 In my experience size increases related to documentation are almost always
 worth it.  So I'm prejudiced right out of the gate.  I was wondering if
 every pg_ table gets copied out to every database ..  if there is already a
 mechanism for not replicating all of them we could utilize views or
 re-writes rules to merge a single copy of catalog comments in a separate
 table with each deployed database's pg_descriptions.

All of them get copied, except for a handful of so-called shared
catalogs.  Changing that would be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql contrib module

2011-01-20 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue ene 20 00:10:55 -0300 2011:

 You have to write it with a line of dashes on the first and last
 lines, if you don't want it reformatted as a paragraph.  It might be
 worth actually running pgindent over contrib/selinux and then check
 over the results.

Hmm, I don't think pgindent is run regularly on contrib as it is on the
core code.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Bruce Momjian
Magnus Hagander wrote:
 On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote:
  On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote:
  On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote:
   On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net 
   wrote:
   Hmm. I don't like those names at all :(
  
   I agree. ?I don't think your original names are bad, as long as
   they're well-documented. ?I sympathize with Simon's desire to make it
   clear that these use the replication framework, but I really don't
   want the command names to be that long.
 
  Actually, after some IM chats, I think pg_streamrecv should be
  renamed, probably to pg_walstream (or pg_logstream, but pg_walstream
  is a lot more specific than that)
 
  pg_stream_log
  pg_stream_backup
 
 Those seem better.
 
 Tom, would those solve your concerns about it being clear which side
 they are on? Or do you think you'd still risk reading them as the
 sending side?

It seems pg_create_backup would be the most natural because we already
have pg_start_backup and pg_stop_backup.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql contrib module

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 9:59 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of jue ene 20 00:10:55 -0300 2011:

 You have to write it with a line of dashes on the first and last
 lines, if you don't want it reformatted as a paragraph.  It might be
 worth actually running pgindent over contrib/selinux and then check
 over the results.

 Hmm, I don't think pgindent is run regularly on contrib as it is on the
 core code.

I went back and looked at commit
239d769e7e05e0a5ef3bd6828e93e22ef3962780 and it touches both src and
contrib.  But even if we don't always do that, it seems like a good
idea to fix whatever we're committing so that a hypothetical future
pgindent run won't mangle it.

Incidentally, I thought that running pgindent twice in the 9.0 cycle,
once after the end of CF4 and again just before the branch worked
well.  Anyone up for doing it that way again this time?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote:
 Magnus Hagander wrote:
 On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote:
  On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote:
  On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote:
   On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net 
   wrote:
   Hmm. I don't like those names at all :(
  
   I agree. ?I don't think your original names are bad, as long as
   they're well-documented. ?I sympathize with Simon's desire to make it
   clear that these use the replication framework, but I really don't
   want the command names to be that long.
 
  Actually, after some IM chats, I think pg_streamrecv should be
  renamed, probably to pg_walstream (or pg_logstream, but pg_walstream
  is a lot more specific than that)
 
  pg_stream_log
  pg_stream_backup

 Those seem better.

 Tom, would those solve your concerns about it being clear which side
 they are on? Or do you think you'd still risk reading them as the
 sending side?

 It seems pg_create_backup would be the most natural because we already
 have pg_start_backup and pg_stop_backup.

Uh, wow.  That's really mixing apples and oranges.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Csaba Nagy
Hi Tomas,

On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
 No, the multi-column statistics do not require constant updating. There
 are cases where a sampling is perfectly fine, although you may need a
 bit larger sample. Generally if you can use a multi-dimensional
 histogram, you don't need to scan the whole table.

In the cases where sampling is enough, you can do that to the updates
too: do a sampling on the changes, in that you only process every Nth
change to make it to the estimator. If you can also dynamically tune the
N to grow it as the statistics stabilize, and lower it if you detect
high variance, even better.

If the analyze process could be decoupled from the backends, and maybe
just get the data passed over to be processed asynchronously, then that
could be a feasible way to have always up to date statistics when the
bottleneck is IO and CPU power is in excess. If that then leads to
better plans, it could really be a win exceeding the overhead.

If this analyze process (or more of them) could also just get the data
from the modified buffers in a cyclic way, so that backends need nothing
extra to do, then I don't see any performance disadvantage other than
possible extra locking contention on the buffers and non-determinism of
the actual time when a change makes it to the statistics. Then you just
need to get more CPU power and higher memory bandwidth to pay for the
accurate statistics.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote:
  Magnus Hagander wrote:
  On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote:
   On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote:
   On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com 
   wrote:
On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander 
mag...@hagander.net wrote:
Hmm. I don't like those names at all :(
   
I agree. ?I don't think your original names are bad, as long as
they're well-documented. ?I sympathize with Simon's desire to make it
clear that these use the replication framework, but I really don't
want the command names to be that long.
  
   Actually, after some IM chats, I think pg_streamrecv should be
   renamed, probably to pg_walstream (or pg_logstream, but pg_walstream
   is a lot more specific than that)
  
   pg_stream_log
   pg_stream_backup
 
  Those seem better.
 
  Tom, would those solve your concerns about it being clear which side
  they are on? Or do you think you'd still risk reading them as the
  sending side?
 
  It seems pg_create_backup would be the most natural because we already
  have pg_start_backup and pg_stop_backup.
 
 Uh, wow.  That's really mixing apples and oranges.

I read the description as:

+You can also use the xref linkend=app-pgbasebackup tool to take
+the backup, instead of manually copying the files. This tool will take
+care of the functionpg_start_backup()/, copy and
+functionpg_stop_backup()/ steps automatically, and transfers the
+backup over a regular productnamePostgreSQL/productname connection
+using the replication protocol, instead of requiring filesystem level
+access.

so I thought, well it does pg_start_backup and pg_stop_backup, and also
creates the data directory.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql contrib module

2011-01-20 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Hmm, I don't think pgindent is run regularly on contrib as it is on the
 core code.

News to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is there a way to build PostgreSQL client libraries with MinGW

2011-01-20 Thread XiaoboGu
Hi,
We are using R to work with 64bit PostgreSQL client libraries, and
to avoid compiler compatibility issues the R development community suggest
using the same compiler for both the main application and dlls. So do you
have any experience to build libpq.dll using MinGW 64 bit. Thanks.


Xiaobo Gu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 10:15 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Thu, Jan 20, 2011 at 10:01 AM, Bruce Momjian br...@momjian.us wrote:
  Magnus Hagander wrote:
  On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote:
   On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote:
   On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com 
   wrote:
On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander 
mag...@hagander.net wrote:
Hmm. I don't like those names at all :(
   
I agree. ?I don't think your original names are bad, as long as
they're well-documented. ?I sympathize with Simon's desire to make it
clear that these use the replication framework, but I really don't
want the command names to be that long.
  
   Actually, after some IM chats, I think pg_streamrecv should be
   renamed, probably to pg_walstream (or pg_logstream, but pg_walstream
   is a lot more specific than that)
  
   pg_stream_log
   pg_stream_backup
 
  Those seem better.
 
  Tom, would those solve your concerns about it being clear which side
  they are on? Or do you think you'd still risk reading them as the
  sending side?
 
  It seems pg_create_backup would be the most natural because we already
  have pg_start_backup and pg_stop_backup.

 Uh, wow.  That's really mixing apples and oranges.

 I read the description as:

 +    You can also use the xref linkend=app-pgbasebackup tool to take
 +    the backup, instead of manually copying the files. This tool will take
 +    care of the functionpg_start_backup()/, copy and
 +    functionpg_stop_backup()/ steps automatically, and transfers the
 +    backup over a regular productnamePostgreSQL/productname connection
 +    using the replication protocol, instead of requiring filesystem level
 +    access.

 so I thought, well it does pg_start_backup and pg_stop_backup, and also
 creates the data directory.

Yeah, but pg_start_backup() and pg_stop_backup() are server functions,
and this is an application.

Also, it won't actually work unless the server has replication
configured (wal_level!=minimal, max_wal_senders0, and possibly some
setting for wal_keep_segments), which has been the main point of the
naming discussion thus far.  Now, you know what would be REALLY cool?
Making this work without any special advance configuration.  Like if
we somehow figured out a way to make max_wal_senders unnecessary, and
a way to change wal_level without bouncing the server, so that we
could temporarily boost the WAL level from minimal to archive if
someone's running a backup.

That, however, is not going to happen for 9.1... but it would be *really* cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-20 Thread Heikki Linnakangas

On 20.01.2011 15:46, Joachim Wieland wrote:

On Thu, Jan 20, 2011 at 6:07 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

The header is there to identify a file, it contains the header that
every other pgdump file contains, including the internal version
number and the unique backup id.

The tar format doesn't support compression so going from one to the
other would only work for an uncompressed archive and special care
must be taken to get the order of the tar file right.


Hmm, tar format doesn't support compression, but looks like the file format
issue has been thought of already: there's still code there to add .gz
suffix for compressed files. How about adopting that convention in the
directory format too? That would make an uncompressed directory format
compatible with the tar format.


So what you could do is dump in the tar format, untar and restore in
the directory format. I see that this sounds nice but still I am not
sure why someone would dump to the tar format in the first place.


I'm not sure either. Maybe you want to pipe the output of pg_dump -F t 
via an ssh tunnel to another host, where you untar it, producing a 
directory format dump. You can then edit the directory format dump, and 
restore it back to the database without having to tar it again.


It gives you a lot of flexibility if the formats are compatible, which 
is generally good.



But you still cannot go back from the directory archive to the tar
archive because the standard command line tar will not respect the
order of the objects that pg_restore expects in a tar format, right?


Hmm, I didn't realize pg_restore requires the files to be in certain 
order in the tar file. There's no mention of that in the docs either, we 
should add that. It doesn't actually require that if you read from a 
file, but from stdin it does.


You can put files in the archive in a certain order if you list them 
explicitly in the tar command line, like tar cf backup.tar toc.dat 
 It's hard to know the right order, though. In practice you would 
need to do tar tf backup.tar files before untarring, and use files 
to tar them again in the rightorder.



That seems pretty attractive anyway, because you can then dump to a
directory, and manually gzip the data files later.


The command line gzip will probably add its own header to the file
that pg_restore would need to strip off...


Yeah, we should write the header too. That's not hard, e.g gzopen will 
do that automatically, or you can pass a flag to deflateInit2.



A tar archive has the advantage that you can postprocess the dump data
with other tools  but for this we could also add an option that gives
you only the data part of a dump file (and uncompresses it at the same
time if compressed). Once we have that however, the question is what
anybody would then still want to use the tar format for...


I don't know how popular it'll be in practice, but it seems very nice to me
if you can do things like parallel pg_dump in directory format first, and
then tar it up to a file for archival.


Yes, but you cannot pg_restore the archive then if it was created with
standard tar, right?


See above, you can unless you try to pipe it to pg_restore. In fact, 
that's listed as an advantage of the tar format over other formats in 
the pg_dump documentation.


(I'm working on this, no need to submit a new patch)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving test_fsync to /contrib?

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote:
 OK, I am ready to move test_fsync to /contrib.  Is pg_test_fsync the
 best name?  pg_check_fsync?  pg_fsync_performance?  pg_verify_fsync?

 I don't see too much reason to rename it more than necessary, so how
 about pg_test_fsync?

Yeah, there's no reason to try to confuse people about whether it's
the same program or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include WAL in base backup

2011-01-20 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 - Why not initialize logid and logseg like so?:
 
        int logid = startptr.xlogid;
        int logseg = startptr.xrecoff / XLogSegSize;
 
  Then use those in your elog?  Seems cleaner to me.

 Hmm. Yes. Agreed.

Marginal complaint here: int is the wrong type, I'm pretty sure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-20 Thread Florian Pflug
On Jan20, 2011, at 16:22 , Heikki Linnakangas wrote:
 You can put files in the archive in a certain order if you list them 
 explicitly in the tar command line, like tar cf backup.tar toc.dat  
 It's hard to know the right order, though. In practice you would need to do 
 tar tf backup.tar files before untarring, and use files to tar them 
 again in the rightorder.

Hm, could we create a file in the backup directory which lists the files in the 
right order?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-01-20 Thread Tom Lane
Marko Tiikkaja marko.tiikk...@cs.helsinki.fi writes:
 Another thing I now see is this:

 BEGIN;
 SELECT pg_advisory_xact_lock(1);

 -- do something here

 -- upgrade to session lock
 SELECT pg_advisory_lock(1);
 COMMIT;


 This seems useful, since the xact lock would be automatically released 
 if an error happens during -- do something here so you wouldn't need 
 to worry about releasing the lock elsewhere.  But I'm not sure this is 
 safe.  Can anyone see a problem with it?

I think the POLA dictates that the behavior of that should be that you
now have both a transactional and a nontransactional hold on the lock;
and only the transactional hold goes away at commit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Bruce Momjian
Robert Haas wrote:
  I read the description as:
 
  + ? ?You can also use the xref linkend=app-pgbasebackup tool to take
  + ? ?the backup, instead of manually copying the files. This tool will take
  + ? ?care of the functionpg_start_backup()/, copy and
  + ? ?functionpg_stop_backup()/ steps automatically, and transfers the
  + ? ?backup over a regular productnamePostgreSQL/productname connection
  + ? ?using the replication protocol, instead of requiring filesystem level
  + ? ?access.
 
  so I thought, well it does pg_start_backup and pg_stop_backup, and also
  creates the data directory.
 
 Yeah, but pg_start_backup() and pg_stop_backup() are server functions,
 and this is an application.
 
 Also, it won't actually work unless the server has replication
 configured (wal_level!=minimal, max_wal_senders0, and possibly some
 setting for wal_keep_segments), which has been the main point of the
 naming discussion thus far.  Now, you know what would be REALLY cool?
 Making this work without any special advance configuration.  Like if
 we somehow figured out a way to make max_wal_senders unnecessary, and
 a way to change wal_level without bouncing the server, so that we
 could temporarily boost the WAL level from minimal to archive if
 someone's running a backup.
 
 That, however, is not going to happen for 9.1... but it would be
 *really* cool.

Well, when we originally implemented PITR, we could have found a way to
avoid using SQL commands to start/stop backup, but we envisioned that we
would want to hook things on to those commands so we created a stable
API that we could improve, and we have.

Do we envision pg_basebackup as something we will enahance, and if so,
should we consider a generic name?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Thu, Jan 20, 2011 at 10:53 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not sure why that's the right solution. Why do you think that we should
 not create the tablespace under the $PGDATA directory? I'm not surprised
 that people mounts the filesystem on $PGDATA/mnt and creates the
 tablespace on it.

 No?  Usually, having a mount point in a non-root-owned directory is
 considered a Bad Thing.

 Hmm.. but ISTM we can have a root-owned mount point in $PGDATA
 and create a tablespace there.

Nonsense.  The more general statement is that it's a security hole
unless the mount point *and everything above it* is root owned.
In the case you sketch, there would be nothing to stop the (non root)
postgres user from renaming $PGDATA/mnt to something else and then
inserting his own trojan-horse directories.

Given that nobody except postgres and root could get to the mount point,
maybe there wouldn't be any really serious problems caused that way ---
but I still say that it's bad practice that no competent sysadmin would
accept.

Moreover, I see no positive *good* reason to do it.  There isn't
anyplace under $PGDATA that users should be randomly creating
directories, much less mount points.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost sfr...@snowman.net wrote:
 This patch looked good, in general, to me.  I added a few documentation
 updates and a comment, but it's a very straight-forward patch as far as
 I can tell.  Passes all regressions and my additional testing.

 I have not looked at the code for this patch at all as yet, but just
 as a general user comment - I really, really want this.  It's one of
 about, uh, two pieces of information that the EXPLAIN output doesn't
 give you that is incredibly important for troubleshooting.

What's the other one?

The main problem I've got with this patch is that there's no place to
shoehorn the information into the textual EXPLAIN format without
breaking a lot of expectations (and hence code --- it's insane to
imagine that any significant amount of client-side code has been
rewritten to make use of xml/json output yet).  It would be nice to know
what other requests are likely to be coming down the pike before we
decide exactly how we're going to break things.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Simon Riggs
On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote:

 That's another way of saying the patch is not anywhere close to being done.

My patch is materially incomplete. Certainly we may see that as grounds
for rejection, which I would not and could not argue with. It is a
popular feature, so I submitted anyway.

When I said Noah's patch was trivial, I was referring to the amount of
work expended on it so far; no insult intended. I think the amount of
code to finish either is fairly low as well.

If we wish to continue in this release then we must decide how. What I
was trying to indicate in my earlier comments was that my focus is on
achieving the required functionality in this release, or put another
way, I would accept Noah's patch rather than end with nothing.

The main requirement, as I see it, is error checking. We need to do the
same checking however we do it; neither patch currently does it.

If Noah's patch had error checking, then it would at least be safe to
recommend people do that. Then it is a simple matter of whether we think
implicit is OK, or whether it needs an explicit command. My patch does
it explicitly because that was the consensus from the earlier
discussion; I am in favour of the explicit route which is why I wrote
the patch that way, not because I wrote it that way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Magnus Hagander
On Thu, Jan 20, 2011 at 16:45, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  I read the description as:
 
  + ? ?You can also use the xref linkend=app-pgbasebackup tool to take
  + ? ?the backup, instead of manually copying the files. This tool will take
  + ? ?care of the functionpg_start_backup()/, copy and
  + ? ?functionpg_stop_backup()/ steps automatically, and transfers the
  + ? ?backup over a regular productnamePostgreSQL/productname connection
  + ? ?using the replication protocol, instead of requiring filesystem level
  + ? ?access.
 
  so I thought, well it does pg_start_backup and pg_stop_backup, and also
  creates the data directory.

 Yeah, but pg_start_backup() and pg_stop_backup() are server functions,
 and this is an application.

 Also, it won't actually work unless the server has replication
 configured (wal_level!=minimal, max_wal_senders0, and possibly some
 setting for wal_keep_segments), which has been the main point of the
 naming discussion thus far.  Now, you know what would be REALLY cool?
 Making this work without any special advance configuration.  Like if
 we somehow figured out a way to make max_wal_senders unnecessary, and
 a way to change wal_level without bouncing the server, so that we
 could temporarily boost the WAL level from minimal to archive if
 someone's running a backup.

 That, however, is not going to happen for 9.1... but it would be
 *really* cool.

 Well, when we originally implemented PITR, we could have found a way to
 avoid using SQL commands to start/stop backup, but we envisioned that we
 would want to hook things on to those commands so we created a stable
 API that we could improve, and we have.

Yeah, we're certianly not taking those *away*.


 Do we envision pg_basebackup as something we will enahance, and if so,
 should we consider a generic name?

Well, it's certainly going to be enhanced. I think there are two main
uses for it - backups, and setting up replication slaves. I can't see
it expanding beyond those, really.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 10:16 PM, Stephen Frost sfr...@snowman.net wrote:
 This patch looked good, in general, to me.  I added a few documentation
 updates and a comment, but it's a very straight-forward patch as far as
 I can tell.  Passes all regressions and my additional testing.

 I have not looked at the code for this patch at all as yet, but just
 as a general user comment - I really, really want this.  It's one of
 about, uh, two pieces of information that the EXPLAIN output doesn't
 give you that is incredibly important for troubleshooting.

 What's the other one?

In the following sort of plan:

rhaas=# explain analyze select * from bob b, sally s where b.a = s.a;
QUERY PLAN
---
 Nested Loop  (cost=0.00..117890.00 rows=1000 width=8) (actual
time=0.036..533.372 rows=1000 loops=1)
   -  Seq Scan on sally s  (cost=0.00..5770.00 rows=40 width=4)
(actual time=0.014..46.469 rows=40 loops=1)
   -  Index Scan using bob_pkey on bob b  (cost=0.00..0.27 rows=1
width=4) (actual time=0.001..0.001 rows=0 loops=40)
 Index Cond: (a = s.a)
 Total runtime: 533.935 ms
(5 rows)

...you cannot really tell how many rows the index scan was expected to
match, or actually did match.  The answer to the latter question
certainly isn't 0.  We previously discussed making the rows= line go
out to three decimal places when used in an inner-index-scan context,
which would help a lot - you could then multiply rows by loops to get
an approximate answer.  My preferred fix would be just to remove the
unhelpful division-by-nloops code that gets applied in this case, but
that's less backward-compatible.

 The main problem I've got with this patch is that there's no place to
 shoehorn the information into the textual EXPLAIN format without
 breaking a lot of expectations (and hence code --- it's insane to
 imagine that any significant amount of client-side code has been
 rewritten to make use of xml/json output yet).  It would be nice to know
 what other requests are likely to be coming down the pike before we
 decide exactly how we're going to break things.

It's hard to predict the nature of future feature requests, but this
and the above are at the top of my list of ongoing gripes, and there
isn't a close third.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 The main problem I've got with this patch is that there's no place to
 shoehorn the information into the textual EXPLAIN format without
 breaking a lot of expectations (and hence code --- it's insane to
 imagine that any significant amount of client-side code has been
 rewritten to make use of xml/json output yet).  It would be nice to know
 what other requests are likely to be coming down the pike before we
 decide exactly how we're going to break things.

While I agree completely about the general if you're going to break,
break it big approach, but I don't particularly care for holding output
strings from EXPLAIN to the same level that we do the wireline protocol.
This is going into a new major version, not something which is being
back-patched, and users now have a way in a released version to get away
from relying on the string output.

Have we worried about adding new plan nodes due to breakage in the
explain output..?  It strikes me that we've actually changed it with
some regularity, in one aspect or another, over a couple of releases.
Maybe my memory is bad though.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Also, it won't actually work unless the server has replication
 configured (wal_level!=minimal, max_wal_senders0, and possibly some
 setting for wal_keep_segments), which has been the main point of the
 naming discussion thus far.  Now, you know what would be REALLY cool?
 Making this work without any special advance configuration.  Like if
 we somehow figured out a way to make max_wal_senders unnecessary, and
 a way to change wal_level without bouncing the server, so that we
 could temporarily boost the WAL level from minimal to archive if
 someone's running a backup.

Not using max_wal_senders we're on our way, you just have to use the
external walreceiver that Magnus the code for already.  WAL level, I
don't know that we have that already, but a big part of what this base
backup tool is useful for is preparing a standby… so certainly you want
to change that setup there.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 The main problem I've got with this patch is that there's no place to
 shoehorn the information into the textual EXPLAIN format without
 breaking a lot of expectations (and hence code --- it's insane to
 imagine that any significant amount of client-side code has been
 rewritten to make use of xml/json output yet).  It would be nice to know
 what other requests are likely to be coming down the pike before we
 decide exactly how we're going to break things.

 While I agree completely about the general if you're going to break,
 break it big approach, but I don't particularly care for holding output
 strings from EXPLAIN to the same level that we do the wireline protocol.
 This is going into a new major version, not something which is being
 back-patched, and users now have a way in a released version to get away
 from relying on the string output.

I agree; we make bigger changes than this all the time.  At the risk
of putting words in Tom's mouth, I think part of the concern here may
be that the EXPLAIN output is quite verbose already, and adding a few
more details is going to make it harder to read in the cases where you
*don't* care about this additional information.  That's a valid
concern, but I don't know what to do about it - not having this
information available isn't better.  It's tempting to propose moving
the actual numbers down to the next line, so that the lines aren't
so ridiculously long.

Looking at the patch, I have to say I had hoped this was going to show
nfiltered in both the estimated and actual cases, which it doesn't.
Now maybe that's more work than we want to put in, but it would be
nice to have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest wrap-up

2011-01-20 Thread Robert Haas
On Tue, Dec 21, 2010 at 10:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 - Writeable CTEs - I think we need Tom to pick this one up.
 - Fix snapshot taking inconsistencies - Ready for committer. Can any
 committer pick this up?

 Will take a look at these two also.

 Tom, what is your time frame on this?  I think we should wrap up the
 CF without these and bundle 9.1alpha3 unless you plan to get to this
 in the next day or two.

 We probably shouldn't hold up the alpha for these, if there are no
 other items outstanding.

 OK.  I've moved them to the next CommitFest and marked this one closed.

Tom, are you still planning to pick these two up?  They've been
basically collecting dust for over two months now, or in one case
three months, and we're running out of time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Marko Tiikkaja

On 2011-01-20 7:07 PM +0200, Robert Haas wrote:

Looking at the patch, I have to say I had hoped this was going to show
nfiltered in both the estimated and actual cases, which it doesn't.
Now maybe that's more work than we want to put in, but it would be
nice to have.


That would be fantastical, if we can make it happen.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net wrote:
 While I agree completely about the general if you're going to break,
 break it big approach, but I don't particularly care for holding output
 strings from EXPLAIN to the same level that we do the wireline protocol.

 I agree; we make bigger changes than this all the time.

No, we don't.  It's true that a client that wants to truly *understand*
the plan has to know a lot of things, but the fundamental format of
EXPLAIN ANALYZE output has been real stable for a real long time:

 node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual 
time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
   detail line: something or other
   -  subnode name  ... more of the same ...

This level of understanding seems plenty sufficient for something like
explain.depesz.com, to name just one popular tool.  The last format
change of any kind we made in this skeleton was to increase the number
of decimal places in the actual time numbers from 2 to 3 (wow).
That was in 7.4.  Modulo that detail, this basic contract has been valid
since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
will break it.

It might be interesting for somebody to go look at Hubert's code and see
just how much it really knows about the EXPLAIN output format, and how
much it's had to change across PG releases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest wrap-up

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Dec 21, 2010 at 10:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 - Writeable CTEs - I think we need Tom to pick this one up.
 - Fix snapshot taking inconsistencies - Ready for committer. Can any
 committer pick this up?

 Tom, are you still planning to pick these two up?  They've been
 basically collecting dust for over two months now, or in one case
 three months, and we're running out of time.

Yes, I will get to them.  I haven't yet put my head down into full
commit fest mode...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 11:59 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Also, it won't actually work unless the server has replication
 configured (wal_level!=minimal, max_wal_senders0, and possibly some
 setting for wal_keep_segments), which has been the main point of the
 naming discussion thus far.  Now, you know what would be REALLY cool?
 Making this work without any special advance configuration.  Like if
 we somehow figured out a way to make max_wal_senders unnecessary, and
 a way to change wal_level without bouncing the server, so that we
 could temporarily boost the WAL level from minimal to archive if
 someone's running a backup.

 Not using max_wal_senders we're on our way, you just have to use the
 external walreceiver that Magnus the code for already.  WAL level, I
 don't know that we have that already, but a big part of what this base
 backup tool is useful for is preparing a standby… so certainly you want
 to change that setup there.

Well, yeah, but it would be nice to also use it just to take a regular
old backup on a system that doesn't otherwise need replication.

I think that the basic problem with wal_level is that to increase it
you need to somehow ensure that all the backends have the new setting,
and then checkpoint.  Right now, the backends get the value through
the GUC machinery, and so there's no particular bound on how long it
could take for them to pick up the new value.  I think if we could
find some way of making sure that the backends got the new value in a
reasonably timely fashion, we'd be pretty close to being able to do
this.  But it's hard to see how to do that.

I had some vague idea of creating a mechanism for broadcasting
critical parameter changes.  You'd make a structure in shared memory
containing the canonical values of wal_level and all other critical
variables, and the structure would also contain a 64-bit counter.
Whenever you want to make a parameter change, you lock the structure,
make your change, bump the counter, and release the lock.  Then,
there's a second structure, also in shared memory, where backends
report the value that the counter had the last time they updated their
local copies of the structure from the shared structure.  You can
watch that to find out when everyone's guaranteed to have the new
value.  If someone doesn't respond quickly enough, you could send them
a signal to get them moving.  What would really be ideal is if you
could actually make this safe enough that the interrupt service
routine could do all the work, rather than just setting a flag.  Or
maybe CHECK_FOR_INTERRUPTS().  If you can't make it safe enough to put
it in someplace pretty low-level like that, the whole idea might fall
apart, because it wouldn't be useful to have a way of doing this that
mostly works except sometimes it just sits there and hangs for a
really long time.

All pie in the sky at this point...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 12:57 PM, Magnus Hagander mag...@hagander.net wrote:

 On Jan 20, 2011 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net
  wrote:
  While I agree completely about the general if you're going to break,
  break it big approach, but I don't particularly care for holding
  output
  strings from EXPLAIN to the same level that we do the wireline
  protocol.

  I agree; we make bigger changes than this all the time.

 No, we don't.  It's true that a client that wants to truly *understand*
 the plan has to know a lot of things, but the fundamental format of
 EXPLAIN ANALYZE output has been real stable for a real long time:

  node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual
 time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
   detail line: something or other
   -  subnode name  ... more of the same ...

 This level of understanding seems plenty sufficient for something like
 explain.depesz.com, to name just one popular tool.  The last format
 change of any kind we made in this skeleton was to increase the number
 of decimal places in the actual time numbers from 2 to 3 (wow).
 That was in 7.4.  Modulo that detail, this basic contract has been valid
 since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
 will break it.

 It might be interesting for somebody to go look at Hubert's code and see
 just how much it really knows about the EXPLAIN output format, and how
 much it's had to change across PG releases.


 Haven't looked at what changes with this patch, but dont forget PgAdmin that
 also parses the output. Though if the format changes enough to affect it,
 that might be the driving force to have it use xml format instead, which is
 the one that is intended for machine parsing after all..

How much has that code been updated from one release to the next?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Magnus Hagander
On Jan 20, 2011 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost sfr...@snowman.net
wrote:
  While I agree completely about the general if you're going to break,
  break it big approach, but I don't particularly care for holding
output
  strings from EXPLAIN to the same level that we do the wireline
protocol.

  I agree; we make bigger changes than this all the time.

 No, we don't.  It's true that a client that wants to truly *understand*
 the plan has to know a lot of things, but the fundamental format of
 EXPLAIN ANALYZE output has been real stable for a real long time:

  node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual
time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
   detail line: something or other
   -  subnode name  ... more of the same ...

 This level of understanding seems plenty sufficient for something like
 explain.depesz.com, to name just one popular tool.  The last format
 change of any kind we made in this skeleton was to increase the number
 of decimal places in the actual time numbers from 2 to 3 (wow).
 That was in 7.4.  Modulo that detail, this basic contract has been valid
 since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
 will break it.

 It might be interesting for somebody to go look at Hubert's code and see
 just how much it really knows about the EXPLAIN output format, and how
 much it's had to change across PG releases.


Haven't looked at what changes with this patch, but dont forget PgAdmin that
also parses the output. Though if the format changes enough to affect it,
that might be the driving force to have it use xml format instead, which is
the one that is intended for machine parsing after all..

/Magnus


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Noah Misch
On Thu, Jan 20, 2011 at 10:07:23AM +, Simon Riggs wrote:
 On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:
 
  First, I'd like to note that the thread for this patch had *four* me-too
  responses to the use case.  That's extremely unusual; the subject is 
  definitely
  compelling to people.  It addresses the bad behavior of natural attempts to
  atomically swap two tables in the namespace:
  
  psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
  ('new')
  psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
  sleep 1   # 
  give prev time to take AccessShareLock
  
  # Do it this way, and the next SELECT gets data from the old table.
  #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 
  
  # Do it this way, and get: ERROR:  could not open relation with OID 
  41380
  psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' 
  
  psql -c 'SELECT * FROM t'   # I get 'old' or an error, 
  never 'new'.
  psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'
  
  by letting you do this instead:
  
  psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
  ('new')
  psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
  sleep 1   # 
  give prev time to take AccessShareLock
  
  psql -c 'EXCHANGE TABLE new_t TO t 
  
  psql -c 'SELECT * FROM t'   # I get 'new', finally!
  psql -c 'DROP TABLE IF EXISTS t, new_t'
  
  I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the 
  thread
  interesting: can we just make the first example work?  Even granting that 
  the
  second syntax may be a useful addition, the existing behavior of the first
  example is surely worthless, even actively harmful.  I tossed together a
  proof-of-concept patch, attached, that makes the first example DTRT.  Do 
  you see
  any value in going down that road?
 
 As I said previously on the thread you quote, having this happen
 implicitly is not a good thing, and IMHO, definitely not the right
 thing.

When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right
Thing for that query to wake up and proceed based on the complete, final state
of that committed DDL.  Aside from the waiting itself, the query should behave
as though it started after the DDL completed.

In my example, the SELECT silently reads data from a table named old_t.  What
if that were an INSERT?  The data falls in the wrong table.

 Heikki's suggestion, and your patch, contain no checking to see whether
 the old and new tables are similar. If they are not similar then we have
 all the same problems raised by my patch. SQL will suddenly fail because
 columns have ceased to exist, FKs suddenly disappear etc..

Indeed, Heikki's suggestion and my patch would not do such verification.  I
can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
...; CREATE ...; than we allow today.  Those need to stay open-ended, with the
user responsible for choosing well.  So, what's the right concurrent behavior
around use of those statements?  I answer that above.

That said, I see utility in a feature that compares two tables, swaps them if
similar, and fixes up foreign keys.  Having such a feature does not justify
wrong concurrent behavior around ALTER TABLE RENAME.  Having right concurrent
behavior around ALTER TABLE RENAME does not remove the utility of this feature.
We should do both.

 I don't see how having a patch helps at all. I didn't think it was the
 right way before you wrote it and I still disagree now you've written
 it.

Perhaps it helped me more than anyone else, and I should have kept it to myself.
Heikki's suggestion seemed straightforward, so much so that I couldn't figure
why nobody had done it.  That would usually mean I'm missing something.  So, I
implemented it in a effort to discover what I had missed, failing to do so.
Then, I sent it with the review in case you might spot what I had missed.
Failure to add some kind of table similarity check was intentional, per above.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Robert Haas robertmh...@gmail.com writes:
  I agree; we make bigger changes than this all the time.
 
 No, we don't.

Alright, do we want to go down the road of adding new things to the
XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
version, to avoid this concern?

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Moving test_fsync to /contrib?

2011-01-20 Thread Josh Berkus
On 1/20/11 6:15 AM, Robert Haas wrote:
 On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian br...@momjian.us wrote:
 OK, I am ready to move test_fsync to /contrib.  Is pg_test_fsync the
 best name?  pg_check_fsync?  pg_fsync_performance?  pg_verify_fsync?
 
 I don't see too much reason to rename it more than necessary, so how
 about pg_test_fsync?

+1.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I think that the basic problem with wal_level is that to increase it
 you need to somehow ensure that all the backends have the new setting,
 and then checkpoint.  Right now, the backends get the value through
 the GUC machinery, and so there's no particular bound on how long it
 could take for them to pick up the new value.  I think if we could
 find some way of making sure that the backends got the new value in a
 reasonably timely fashion, we'd be pretty close to being able to do
 this.  But it's hard to see how to do that.

Well, you just said when to force the reload to take effect: at
checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
add that behavior here?  And signal every backend at checkpoint time
when wal_level has changed?

 I had some vague idea of creating a mechanism for broadcasting
 critical parameter changes.  You'd make a structure in shared memory
 containing the canonical values of wal_level and all other critical
 variables, and the structure would also contain a 64-bit counter.
 Whenever you want to make a parameter change, you lock the structure,
 make your change, bump the counter, and release the lock.  Then,
 there's a second structure, also in shared memory, where backends
 report the value that the counter had the last time they updated their
 local copies of the structure from the shared structure.  You can
 watch that to find out when everyone's guaranteed to have the new
 value.  If someone doesn't respond quickly enough, you could send them
 a signal to get them moving.  What would really be ideal is if you
 could actually make this safe enough that the interrupt service
 routine could do all the work, rather than just setting a flag.  Or
 maybe CHECK_FOR_INTERRUPTS().  If you can't make it safe enough to put
 it in someplace pretty low-level like that, the whole idea might fall
 apart, because it wouldn't be useful to have a way of doing this that
 mostly works except sometimes it just sits there and hangs for a
 really long time.

 All pie in the sky at this point...

Unless we manage to simplify enough the idea to have wal_level SIGHUP.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-20 Thread Noah Misch
On Thu, Jan 20, 2011 at 09:26:29AM -0500, Robert Haas wrote:
 My main beef with the Boolean flags is that this kind of thing is not too 
 clear:
 
reindex_relation(myrel, false, false, true, true, false, true,
 false, false, true);
 
 Unless you have an excellent memory, you can't tell what the heck
 that's doing without flipping back and forth between the function
 definition and the call site.  With a bit-field, it's a lot easier to
 glance at the call site and have a clue what's going on.  We're of
 course not quite to the point of that exaggerated example yet.

Agreed.

  However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS 
  and
  REINDEX_ALLOW_OLD_INDEX_USE. ?Then, flags = 0 can hurt performance but not
  correctness. ?That's looking like a win.
 
 I prefer the positive sense for those flags because I think it's more
 clear.  There aren't so many call sites or so many people using this
 that we have to worry about what people are going to do in new calling
 locations; getting it right in any new code shouldn't be a
 consideration.

Okay.  I've attached a new patch version based on that strategy.
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1c9df98..75e7055 100644
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 2533,2558  reindex_index(Oid indexId, bool skip_constraint_checks)
   * reindex_relation - This routine is used to recreate all indexes
   * of a relation (and optionally its toast relation too, if any).
   *
!  * If heap_rebuilt is true, then the relation was just completely rebuilt by
!  * an operation such as VACUUM FULL or CLUSTER, and therefore its indexes are
!  * inconsistent with it.  This makes things tricky if the relation is a system
!  * catalog that we might consult during the reindexing.  To deal with that
!  * case, we mark all of the indexes as pending rebuild so that they won't be
!  * trusted until rebuilt.  The caller is required to call us *without* having
!  * made the rebuilt versions visible by doing CommandCounterIncrement; we'll
!  * do CCI after having collected the index list.  (This way we can still use
!  * catalog indexes while collecting the list.)
   *
!  * We also skip rechecking uniqueness/exclusion constraint properties if
!  * heap_rebuilt is true.  This avoids likely deadlock conditions when doing
!  * VACUUM FULL or CLUSTER on system catalogs.  REINDEX should be used to
!  * rebuild an index if constraint inconsistency is suspected.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
   */
  bool
! reindex_relation(Oid relid, bool toast_too, bool heap_rebuilt)
  {
Relationrel;
Oid toast_relid;
--- 2533,2561 
   * reindex_relation - This routine is used to recreate all indexes
   * of a relation (and optionally its toast relation too, if any).
   *
!  * flags can include REINDEX_SUPPRESS_INDEX_USE and 
REINDEX_CHECK_CONSTRAINTS.
   *
!  * If flags has REINDEX_SUPPRESS_INDEX_USE, the relation was just completely
!  * rebuilt by an operation such as VACUUM FULL or CLUSTER, and therefore its
!  * indexes are inconsistent with it.  This makes things tricky if the relation
!  * is a system catalog that we might consult during the reindexing.  To deal
!  * with that case, we mark all of the indexes as pending rebuild so that they
!  * won't be trusted until rebuilt.  The caller is required to call us 
*without*
!  * having made the rebuilt versions visible by doing CommandCounterIncrement;
!  * we'll do CCI after having collected the index list.  (This way we can still
!  * use catalog indexes while collecting the list.)
!  *
!  * To avoid deadlocks, VACUUM FULL or CLUSTER on a system catalog must omit 
the
!  * REINDEX_CHECK_CONSTRAINTS flag.  REINDEX should be used to rebuild an index
!  * if constraint inconsistency is suspected.  For optimal performance, other
!  * callers should include the flag only after transforming the data in a 
manner
!  * that risks a change in constraint validity.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
   */
  bool
! reindex_relation(Oid relid, bool toast_too, int flags)
  {
Relationrel;
Oid toast_relid;
***
*** 2608,2614  reindex_relation(Oid relid, bool toast_too, bool 
heap_rebuilt)
List   *doneIndexes;
ListCell   *indexId;
  
!   if (heap_rebuilt)
{
/* Suppress use of all the indexes until they are 
rebuilt */
SetReindexPending(indexIds);
--- 2611,2617 
List   *doneIndexes;
ListCell   *indexId;
  
!   if (flags  REINDEX_SUPPRESS_INDEX_USE)
{
/* Suppress 

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 How much has that code been updated from one release to the next?

Just an FYI, I talked to depesz on IRC (please chime in if you disagree
with any of this) and he indicated that he's had to update the code
from time to time, mostly because the parser was too strict.

He also mentioned that he didn't feel it was terribly complicated or
that it'd be difficult to update for this.  Looking over the code, it's
got a simple regex for matching that line which would have to be
updated, but I don't think it'd require much more than that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 1:47 PM, Stephen Frost sfr...@snowman.net wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Robert Haas robertmh...@gmail.com writes:
  I agree; we make bigger changes than this all the time.

 No, we don't.

 Alright, do we want to go down the road of adding new things to the
 XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
 version, to avoid this concern?

No, because, for one thing, the text output is what people are going
to send me when they want me to fix their crap.  If the information
isn't there, I lose.  And no, I don't want them to send me the XML.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think that the basic problem with wal_level is that to increase it
 you need to somehow ensure that all the backends have the new setting,
 and then checkpoint.  Right now, the backends get the value through
 the GUC machinery, and so there's no particular bound on how long it
 could take for them to pick up the new value.  I think if we could
 find some way of making sure that the backends got the new value in a
 reasonably timely fashion, we'd be pretty close to being able to do
 this.  But it's hard to see how to do that.

 Well, you just said when to force the reload to take effect: at
 checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
 add that behavior here?  And signal every backend at checkpoint time
 when wal_level has changed?

Sending them a signal seems like a promising approach, but the trick
is guaranteeing that they've actually acted on it before you start the
checkpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread hubert depesz lubaczewski
On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  How much has that code been updated from one release to the next?
 
 Just an FYI, I talked to depesz on IRC (please chime in if you disagree
 with any of this) and he indicated that he's had to update the code
 from time to time, mostly because the parser was too strict.
 
 He also mentioned that he didn't feel it was terribly complicated or
 that it'd be difficult to update for this.  Looking over the code, it's
 got a simple regex for matching that line which would have to be
 updated, but I don't think it'd require much more than that.

i'll be happy to update the Pg::Explain to handle new elements of
textual plans, so if this would be of concern - please don't treat
compatibility with explain.depesz.com as your responsibility/problem.

I'll fix the parser (have to add json/xml parsing too anyway), and I,
too, would love to get more information.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Sending them a signal seems like a promising approach, but the trick
 is guaranteeing that they've actually acted on it before you start the
 checkpoint.

How much using a latch here would help?  Or be overkill?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Heikki Linnakangas

On 20.01.2011 22:15, Dimitri Fontaine wrote:

Robert Haasrobertmh...@gmail.com  writes:

Sending them a signal seems like a promising approach, but the trick
is guaranteeing that they've actually acted on it before you start the
checkpoint.


How much using a latch here would help?  Or be overkill?


A latch doesn't give you an acknowledgment from the backends that 
they've received and acted on the guc change. You could use it as a 
building block to construct that, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exceptions not present in plpy with Python 3

2011-01-20 Thread Peter Eisentraut
On lör, 2010-12-18 at 18:56 +0100, Jan Urbański wrote:
 there seems to be a problem in the way we add exceptions to the plpy
 module in PL/Python compiled with Python 3k.
 
 Try this: DO $$ plpy.SPIError $$ language plpython3u;
 
 I'm not a Python 3 expert, but I nicked some code from the Internet and
 came up with this patch (passes regression tests on both Python 2 and 3).

It looks like the PyModule_AddObject() approach also works in Python 2.
Anyone see an issue with using that uniformly?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-20 Thread Peter Eisentraut
On ons, 2011-01-19 at 10:06 +0900, Hitoshi Harada wrote:
 - This is not in the patch, but around line 184 vis versa in comment
 seems like typo.

Fixed.

 - A line break should be added before PLy_add_exception() after static void

I'll add that when I get to the patch.

 - This is also not in the patch, but the comment
 /* these should only be called once at the first call
  * of plpython_call_handler.  initialize the python interpreter
  * and global data.
  */
 is bogus. PLy_init_interp() is called in _PG_init().

Fixed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane
hubert depesz lubaczewski dep...@depesz.com writes:
 On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
 He also mentioned that he didn't feel it was terribly complicated or
 that it'd be difficult to update for this.  Looking over the code, it's
 got a simple regex for matching that line which would have to be
 updated, but I don't think it'd require much more than that.

 i'll be happy to update the Pg::Explain to handle new elements of
 textual plans, so if this would be of concern - please don't treat
 compatibility with explain.depesz.com as your responsibility/problem.

The point isn't whether it'd be terribly difficult to update client
side EXPLAIN-parsing code ... it's whether we should break it in the
first place.  I don't find the proposed format so remarkably
well-designed that it's worth creating compatibility problems for.

The main functional problem I see with this format is that it assumes
there is one and only one filter step associated with every plan node.
That is just plain wrong.  Many don't have any, and there are important
cases where there are two.  I'm thinking in particular that it might be
useful to distinguish the effects of the recheck and the filter
conditions of a bitmap heap scan.  Maybe it'd also be interesting to
separate the join and non-join filter clauses of a join node, though
I'm less sure about the usefulness of that.

So the line I'm thinking we should pursue is to visually associate the
new counter with the filter condition, either like

Filter Cond: (x  42)  (nfiltered = 123)

or

Filter Cond: (x  42)
Rows Filtered: 123

The latter is less ambiguous, but takes more vertical space.  The former
is very unlikely to break any client code, because I doubt there is any
that inquires into the details of what a filter condition expression
really means.  The latter *might* break code depending on how much
it assumes about the number of detail lines attached to a plan node
... but as Robert pointed out, we've added new detail lines before.

BTW, is it just me, or is the terminology number filtered pretty
confusing/ambiguous in itself?  It doesn't seem at all clear to me
whether that's the number of rows passed by the filter condition or
the number of rows rejected.  Perhaps nremoved would be clearer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The main functional problem I see with this format is that it assumes
 there is one and only one filter step associated with every plan node.
 That is just plain wrong.  Many don't have any, and there are important
 cases where there are two.  I'm thinking in particular that it might be
 useful to distinguish the effects of the recheck and the filter
 conditions of a bitmap heap scan.

If it's not too hard to do that, I'm all in favor.

 Maybe it'd also be interesting to
 separate the join and non-join filter clauses of a join node, though
 I'm less sure about the usefulness of that.

That would also be extremely useful.

 So the line I'm thinking we should pursue is to visually associate the
 new counter with the filter condition, either like

        Filter Cond: (x  42)  (nfiltered = 123)

 or

        Filter Cond: (x  42)
        Rows Filtered: 123

 The latter is less ambiguous, but takes more vertical space.  The former
 is very unlikely to break any client code, because I doubt there is any
 that inquires into the details of what a filter condition expression
 really means.  The latter *might* break code depending on how much
 it assumes about the number of detail lines attached to a plan node
 ... but as Robert pointed out, we've added new detail lines before.

I like the idea of putting it on the same line as the filter
condition, but your proposal for how to do that doesn't wow me - the
parentheses look too similar to the ones around the qual itself.

 BTW, is it just me, or is the terminology number filtered pretty
 confusing/ambiguous in itself?  It doesn't seem at all clear to me
 whether that's the number of rows passed by the filter condition or
 the number of rows rejected.  Perhaps nremoved would be clearer.

I think filtered is pretty clear and like it...  removed sounds like
you deleted something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I think filtered is pretty clear and like it...
 
I find it ambiguous.  [Takes sip of filtered water.]  How about
excluded?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus
Hackers,

One of our clients is seeing an unusual issue with statements which are
waiting going into sleep forever and never completing.   This first
e-mail is for a has anyone else seen this problem before? while we try
to collect additional information for a diagnosis.

This problem only happens under load and only when performing load tests
that insert large data (250 Kib) into bytea columns.  It's takes a
couple hours but we've been able to reproduce the issue with a 100%
success rate.  Sometime the locked query shows up as 'BIND' in the
pg_stat_activity sometimes it's a complicated query using a multi-table
left outer join, other times is a simple select.  The only thing in
common is that there is never a corresponding entry for that statement
in the pg_locks table, and if you drop the connection the query goes away.

An strace on the process shows it to be in RECV, and otherwise doing
nothing.  We have not been able to run GDB because it takes a couple
hours of running a heavy load test to cause the issue.  While memory is
heavily used during the test, there is no swapping during the test which
would indicate Linux memory management as the culprit.

We can reproduce the issue on 8.1.11 and 8.1.23.  Currently we are
working on testing it on 9.0 and seeing if we can reproduce the issue.

We compiled the postgres from the Redhat source RPM.  The only
modification that we make is the config file.

The OS is Centos 5.4 32bit.

Hardware:
IBM 3650
2 x Dual Core Intel Xeon 5160 @ 3.00 GHz
16 GB memory
6 x 146 GB SAS 10K RPM in RAID-5

Please note that while we can reproduce the issue, access to the test
system is fairly restrictive and test runs take a while, so I'd like to
get requests for additional information-collecting all at once if possible.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think that the basic problem with wal_level is that to increase it
 you need to somehow ensure that all the backends have the new setting,
 and then checkpoint.
 
 Well, you just said when to force the reload to take effect: at
 checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
 add that behavior here?  And signal every backend at checkpoint time
 when wal_level has changed?

 Sending them a signal seems like a promising approach, but the trick
 is guaranteeing that they've actually acted on it before you start the
 checkpoint.

Have the backends show their current wal_level in their PGPROC entries.
Sleep till they're all reporting the right thing, then fire checkpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Alvaro Herrera
Excerpts from Josh Berkus's message of jue ene 20 18:05:15 -0300 2011:

 
 One of our clients is seeing an unusual issue with statements which are
 waiting going into sleep forever and never completing.   This first
 e-mail is for a has anyone else seen this problem before? while we try
 to collect additional information for a diagnosis.

I have seen it -- on 8.1 too.  On our case it was caused by an insert
that was doing lots of toast insertions, so it needed to grab the
extension lock frequently for the toast table; and this was slowed
down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
partitioning, so this was expensive).  I don't recall details on why
these were related.  If this is your case too, I doubt you'd be able to
reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
introduced).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Heikki's suggestion seemed straightforward, so much so that I couldn't figure
 why nobody had done it.  That would usually mean I'm missing something.

If you're willing to substitute an incompatible table, it's not clear
why you don't just do

begin;
drop table t;
alter table t_new rename to t;
commit;

There are some implementation issues with this: concurrent accesses are
likely to end up failing with relation with OID nnn doesn't exist,
because backends translate the table's name to OID before acquiring
lock.  But you'd have to solve those issues anyway to make an ALTER
REPLACE WITH work as transparently as you seem to hope it would.
Unless the idea here is to also have t_new acquire t's OID, and that
is an absolute complete won't-happen if you're not enforcing a pretty
thorough level of compatibility between the two tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 03:06, Nathan Boley napsal(a):
 And actually it does not depend on ndistinct for the columns only, it
 depends on ndistinct estimates for the combination of columns. So
 improving the ndistinct estimates for columns is just a necessary first
 step (and only if it works reasonably well, we can do the next step).
 
 I think that any approach which depends on precise estimates of
 ndistinct is not practical.

I'm not aware of any other approach to the 'discrete fail case' (where
the multi-dimensional histograms are not applicable). If someone finds a
better solution, I'll be the first one to throw away this stuff.

 I am very happy that you've spent so much time on this, and I'm sorry
 if my previous email came off as combative. My point was only that
 simple heuristics have served us well in the past and, before we go to
 the effort of new, complicated schemes, we should see how well similar
 heuristics work in the multiple column case. I am worried that if the
 initial plan is too complicated then nothing will happen and, even if
 something does happen, it will be tough to get it committed ( check
 the archives for cross column stat threads - there are a lot ).

If I've leaned one thing over the years in IT, it's not to take critique
personally. All the problems mentioned in this thread are valid
concerns, pointing out weak points of the approach. And I'm quite happy
to receive this feedback - that's why I started it.

On the other hand - Jara Cimrman (a famous Czech fictional character,
depicted as the best scientist/poet/teacher/traveller/... - see [1])
once said that you can't be really sure you don't get gold by blowing
cigarette smoke into a basin drain, until you actually try it. So I'm
blowing cigaretter smoke into the drain ...

It may wery vell happen this will be a dead end, but I'll do my best to
fix all the issues or to prove that the pros outweight the cons. And
even if it will be eventually rejected, I hope to get -1 from TL to be
eligible for that t-shirt ...

[1] http://en.wikipedia.org/wiki/Jara_Cimrman

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, is it just me, or is the terminology number filtered pretty
 confusing/ambiguous in itself?  It doesn't seem at all clear to me
 whether that's the number of rows passed by the filter condition or
 the number of rows rejected.  Perhaps nremoved would be clearer.

 I think filtered is pretty clear and like it...  removed sounds like
 you deleted something.

Well, you did delete something, no?  There are rows that aren't in the
output that would have been there if not for the filter condition.

And, btw, one person thinking it's clear doesn't make it so.  There
are actually three numbers that could be involved here: the number of
rows arriving at the filter, the number passed by it, and the number
rejected by it.  I think that nfiltered could possibly mean any of
those three.  A non-native speaker of English would be even less
likely to be sure of what was meant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus

 I have seen it -- on 8.1 too.  On our case it was caused by an insert
 that was doing lots of toast insertions, so it needed to grab the
 extension lock frequently for the toast table; and this was slowed
 down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
 partitioning, so this was expensive).  I don't recall details on why
 these were related.  If this is your case too, I doubt you'd be able to
 reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
 introduced).

Thanks, I'll bet that's the case.  This is happening on machines with
more RAM, so they've increased shared_buffers.

Now, to get them off 8.1.  Been trying for over a year now ...

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Simon Riggs
On Thu, 2011-01-20 at 13:14 -0500, Noah Misch wrote:

 When DDL has taken AccessExclusiveLock and a query waits for it, it's the 
 Right
 Thing for that query to wake up and proceed based on the complete, final state
 of that committed DDL.  Aside from the waiting itself, the query should behave
 as though it started after the DDL completed.
 
 In my example, the SELECT silently reads data from a table named old_t.  
 What
 if that were an INSERT?  The data falls in the wrong table.
 
  Heikki's suggestion, and your patch, contain no checking to see whether
  the old and new tables are similar. If they are not similar then we have
  all the same problems raised by my patch. SQL will suddenly fail because
  columns have ceased to exist, FKs suddenly disappear etc..
 
 Indeed, Heikki's suggestion and my patch would not do such verification.  I
 can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
 ...; CREATE ...; than we allow today.  Those need to stay open-ended, with the
 user responsible for choosing well.  So, what's the right concurrent behavior
 around use of those statements?  I answer that above.
 
 That said, I see utility in a feature that compares two tables, swaps them if
 similar, and fixes up foreign keys.  Having such a feature does not justify
 wrong concurrent behavior around ALTER TABLE RENAME.  Having right concurrent
 behavior around ALTER TABLE RENAME does not remove the utility of this 
 feature.
 We should do both.

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Having said that, writing the patch did nothing to convince me this was
the correct approach. Reviews should be reviews, they are not an
opportunity to provide your own alternate version of a patch. That just
confuses things and creates a competitive, not a cooperative
environment. Authors do need to listen to reviewers, so I hope I'm
demonstrating that here. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 One of our clients is seeing an unusual issue with statements which are
 waiting going into sleep forever and never completing.   This first
 e-mail is for a has anyone else seen this problem before? while we try
 to collect additional information for a diagnosis.

 An strace on the process shows it to be in RECV, and otherwise doing
 nothing.

I would take that to mean that it's waiting on the client.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Simone Aiken
After playing with this in benchmarks and researching the weird results I
got I'm going to advise dropping the todo for now unless something happens
to change how postgres handles clustering.  You guys probably already
grokked this so I am just recording it for the list archives.  
  

The primary factor here is that postgres doesn't maintain clustered indexes.
Clustering is a one-time operation that clusters the table at this current
point in time.  Basically, there really isn't any such thing in postgres as
a clustered index.  There is an operation - Cluster - which takes an index
and a table as input and re-orders the table according to the index.   But
it is borderline fiction to call the index used clustered because the next
row inserted will pop in at the end of the table instead of slipping into
the middle of the table per the desired ordering.  

All the pg_table cluster candidates are candidates because they have a row
per table column and we expect that a query will want to get several of
these rows at once.  These rows are naturally clustered because the scripts
that create them insert their information into the catalog contiguously.
When you create a catalog table the pg_attribute rows for its columns are
inserted together.  When you then create all its triggers they too are put
into pg_triggers one after the other.  So calling the Cluster operation
after dbinit doesn't help anything.

Over time table alterations can fragment this information.   If a user loads
a bunch of tables, then alters them over time the columns added later on
will have their metadata stored separately from the columns created
originally. 

Which gets us to the down and dirty of how the Cluster function works.  It
puts an access exclusive lock on the entire table - blocking all attempts to
read and write to the table - creates a copy of the table in the desired
order, drops the original, and renames the copy.  Doing this to a catalog
table that is relevant to queries pretty much brings everything else in the
database to a halt while the system table is locked up.  And the brute force
logic makes this time consuming even if the table is perfectly ordered
already.  Additionally, snapshots taken of the table during the Cluster
operation make the table appear to be empty which introduces the possibility
of system table corruption if transactions are run concurrently with a
Cluster operation.

So basically, the Cluster operation in its current form is not something you
want running automatically on a bunch of system table as it is currently
implemented.  It gives your system the hiccups.  You would only want to run
it manually during downtime.  And you can do that just as easily with or
without any preparation during dbinit.


Thanks everyone,

-Simone Aiken





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 03:36, Robert Haas napsal(a):
 On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Regarding the crash scenario - if the commit fails, just throw away the
 local estimator copy, it's not needed. I'm not sure how to take care of
 the case when commit succeeds and the write of the merged estimator
 fails, but I think it might be possible to write the estimator to xlog
 or something like that. So it would be replayed during recovery etc. Or
 is it a stupid idea?

 It's not stupid, in the sense that that is what you'd need to do if
 you want to avoid ever having to rescan the table.  But it is another
 thing that I think is going to be way too expensive.

 Way too expensive? All you need to put into the logfile is a copy of the
 estimator, which is a few kBs. How is that 'way too expensive'?
 
 At this point, this is all a matter of religion, right?  Neither of us
 has a working implementation we've benchmarked.  But yes, I believe
 you're going to find that implementing some kind of streaming
 estimator is going to impose a...  pulls number out of rear end 6%
 performance penalty, even after you've optimized the living daylights
 out of it.  Now you might say... big deal, it improves my problem
 queries by 100x.  OK, but if you could get the same benefit by doing
 an occasional full table scan during off hours, you could have the
 same performance with a *0%* performance penalty.  Even better, the
 code changes would be confined to ANALYZE rather than spread out all
 over the system, which has positive implications for robustness and
 likelihood of commit.

Good point. What I was trying to do was to continuously update the
estimator with new data - that was the whole idea behind the collecting
of new values (which might lead to problems with memory etc. as you've
pointed out) and updating a local copy of the estimator (which is a good
idea I think).

But this might be another option - let the user decide if he wants to
continuously update the estimates (and pay the price) or do that off the
hours (and pay almost nothing). That sounds as a very good solution to me.

 I'm not trying to argue you out of working on this.  It's obviously
 your time to spend, and if works better than I think it will, great!
 I'm merely offering you an opinion on what will probably happen if you
 go this route - namely, it'll carry an unpalatable run-time penalty.
 That opinion may be worth no more than what you paid for it, but there
 you have it.

Yes, and I appreciate all feedback. But I still believe this can be done
so that users that don't need the feature don't pay for it.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus

 I would take that to mean that it's waiting on the client.

You mean that the client timed out and isn't accepting data from the
query anymore?  Shouldn't Postgres time out on that after a while?  In
one case, the orphaned statement was 16 hours old before we killed it.

If it's relevant, the client connection is from a C application via
libpq on localhost.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 09:10, Heikki Linnakangas napsal(a):
 It seems that the suggested multi-column selectivity estimator would be
 more sensitive to ndistinct of the individual columns. Is that correct?
 How is it biased? If we routinely under-estimate ndistinct of individual
 columns, for example, does the bias accumulate or cancel itself in the
 multi-column estimate?
 
 I'd like to see some testing of the suggested selectivity estimator with
 the ndistinct estimates we have. Who knows, maybe it works fine in
 practice.

The estimator for two columns and query 'A=a AND B=b' is about

 0.5 * (dist(A)/dist(A,B) * Prob(A=a) + dist(B)/dist(A,B) * Prob(B=b))

so it's quite simple. It's not that sensitive to errors or ndistinct
estimates for individual columns, but the problem is in the multi-column
ndistinct estimates. It's very likely that with dependent colunms (e.g.
with the ZIP codes / cities) the distribution is so pathological that
the sampling-based estimate will be very off.

I guess this was a way too short analysis, but if you can provide more
details of the expected tests etc. I'll be happy to provide that.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 On 20.01.2011 03:05, Kevin Grittner wrote:
 If we don't do something like this, do we just provide REPEATABLE
 READ on the standby as the strictest level of transaction
 isolation?  If so, do we generate an error on a request for
 SERIALIZABLE, warn and provide degraded behavior, or just quietly
 give them REPEATABLE READ behavior?
 
 +1 for generating an error.
 
Before I go do that, I want to be sure everyone is clear about the
state of things.
 
If SSI is used to provide data integrity on the master, it will
prevent any serialization anomalies from being persisted on any hot
standby *long term*.  For example, at any point where the standby is
at a point in the transaction stream where there were no read/write
transaction active, no anomalies can be observed.  (That isn't the
*only* time; it's just the simplest one to describe as an example.) 
Queries on the standby can, however, see *transient* anomalies when
they run queries which would cause a serialization failure if run on
the master at the same point in the transaction stream.  This can
only occur when, of two concurrent transactions, the one which
*appears* to run second because the other can't read what it wrote,
*commits* first.
 
The most common and alarming situation where this occurs, in my
opinion, is batch processing.  This is extremely common in financial
applications, and tends to show up in a lot of other places, too. 
(The receipting query set is an instance of this type of problem,
but I'm going to keep it more general in hopes that people can see
where it impacts them.)  Imagine an application which has some small
control record in a table, and inserts to some other table are
assigned to a batch based on the control record.  The batches are
normally identified by ascending dates or serial numbers. 
Periodically a new batch is opened and the old batch is closed by
updating a current batch id column in the control table.  If the
batch ID is updated and the transaction in which that update was
executed commits while a transaction which read the old batch ID is
still in flight, a read of the database will show that the batch is
closed, but if you look at the detail of the batch, it will not yet
be complete.
 
Under SSI, one of these transactions will be canceled to prevent
this.  Our implementation will always allow the update which closes
the batch to complete, and either the insert or the select of the
detail will be rolled back with a serialization failure, depending
on the timing the actions inside those transactions.  If the insert
fails, it can be retried, and will land in the new batch -- making
the list of the batch which omits it OK.  If the listing of the
batch details is canceled, it will be because the insert into the
old batch committed before it recognized the problem, so an
immediate retry of the select will see the complete batch contents.
 
A hot standby can't really take part in the predicate locking and
transaction cancellation on the master.
 
Dan and I have both come to the conclusion that the only reasonable
way to allow hot standby to work with SSI is for the WAL (when
wal_level = hot_standby) to contain information about which
snapshots develop which won't see such a state.  In the above
example, barring some throttling mechanism skipping these particular
snapshots, or other problematic conflicts around the same time, the
master would tell the standby that the snapshot before either of the
two problem transactions was OK, and then it would tell them that
the snapshot after both had committed was OK.  It would not suggest
using the snapshot available between the commit of the control
record update and the commit of the insert into the batch.
 
This seems to me to be not completely unrelated to the snapshot
synchronization patch.  It is clearly closely related to the READ
ONLY DEFERRABLE mode, which also looks for a snapshot which is
immune to serialization anomalies without predicate locking,
conflict detection, transaction cancellation, etc.  Melding these
two things with hot standby seems to be beyond what can reasonably
happen for 9.1 without delaying the release.
 
If someone is using one feature and not the other, they really don't
have a problem.  Like anyone else, if a hot standby user has been
using SERIALIZABLE mode under 9.0 or earlier, they will need to
switch to REPEATABLE READ.  A SERIALIZABLE user who doesn't set up
hot standby has no issue.  Opinions so far seem to be in favor of
reporting an error on the standby if SERIALIZABLE is requested, so
that people don't silently get less protection than they expect. 
The most annoying thing about that is that if the use would *like*
to use truly serializable transactions on the standby, and will do
so when they get it in 9.2, they must switch to REPEATABLE READ now,
and switch back to SERIALIZABLE with the next release.
 
So, based on a more complete description of the issues, any more
opinions 

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra
Dne 20.1.2011 11:05, Csaba Nagy napsal(a):
 Hi Tomas,
 
 On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
 No, the multi-column statistics do not require constant updating. There
 are cases where a sampling is perfectly fine, although you may need a
 bit larger sample. Generally if you can use a multi-dimensional
 histogram, you don't need to scan the whole table.
 
 In the cases where sampling is enough, you can do that to the updates
 too: do a sampling on the changes, in that you only process every Nth
 change to make it to the estimator. If you can also dynamically tune the
 N to grow it as the statistics stabilize, and lower it if you detect
 high variance, even better.
 
 If the analyze process could be decoupled from the backends, and maybe
 just get the data passed over to be processed asynchronously, then that
 could be a feasible way to have always up to date statistics when the
 bottleneck is IO and CPU power is in excess. If that then leads to
 better plans, it could really be a win exceeding the overhead.

OK, this sounds interesting. I'm not sure how to do that but it might be
a good solution. What about transactions? If the client inserts data
(and it will be sent asynchronously to update the estimator) and then
rolls back, is the estimator 'rolled back' or what happens?

This was exactly the reason why I initially wanted to collect all the
data at the backend (and send them to the estimator at commit time).
Which was then replaced by the idea to keep a local estimator copy and
merge it back to the original estimator at commit time.

 If this analyze process (or more of them) could also just get the data
 from the modified buffers in a cyclic way, so that backends need nothing
 extra to do, then I don't see any performance disadvantage other than
 possible extra locking contention on the buffers and non-determinism of
 the actual time when a change makes it to the statistics. Then you just
 need to get more CPU power and higher memory bandwidth to pay for the
 accurate statistics.

Well, the possible locking contention sounds like a quite significant
problem to me :-(

The lag between an update and a change to the stats is not that big deal
I guess - we have the same behaviour with the rest of the stats (updated
by autovacuum every once a while).

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Josh Berkus
Kevin,

 So, based on a more complete description of the issues, any more
 opinions on whether to generate the error, as suggested by Heikki? 

If it's a choice between generating an error and letting users see
inconsistent data, I'll take the former.

 Does anyone think this justifies the compatibility GUC as suggested
 by Jeff?  

I think it might, yes.  Since someone could simply turn on the backwards
compatibility flag for 9.1 and turn it off for 9.2, rather than trying
to mess with transaction states which might be set in application code.

Unfortunately, people have not responded to our survey :-(
http://www.postgresql.org/community/survey.77

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I would take that to mean that it's waiting on the client.

 You mean that the client timed out and isn't accepting data from the
 query anymore?

No, if the backend is in RECV state, it's waiting for the client to
*send* it something.

(Although if this is an SSL connection, it's a bit harder to be sure
about what the logical state of the connection is.)

 Shouldn't Postgres time out on that after a while?

Not if the problem is the client is confused.  As long as the remote-end
kernel doesn't indicate the connection is dead, we'll wait for the
client to wake up and send us a command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina
Hello list,

I wanted to test the waters on how receptive people might be to an
extension that would allow Postgres to support two passwords for a
given role. I have recently encountered a case where this would be
highly useful when performing rolling password upgrades across many
client applications and/or application instances.

It is possible (as far as I know) to get around some of the sticker
parts of this with some teeth gnashing, using some CREATE ROLE ... IN
ROLE dancing, but I wanted to see if there was any interest in
supporting this for real.

This design is not uncommon, one example is Amazon Web Services (e.g.
EC2, S3), whereby one identification key can have many, independently
revokable secret keys.

I haven't given much thought to the mechanism yet, rather, I am just
trying to assess gut reactions on the principle.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Andrew Dunstan



On 01/20/2011 05:28 PM, Daniel Farina wrote:

Hello list,

I wanted to test the waters on how receptive people might be to an
extension that would allow Postgres to support two passwords for a
given role. I have recently encountered a case where this would be
highly useful when performing rolling password upgrades across many
client applications and/or application instances.

It is possible (as far as I know) to get around some of the sticker
parts of this with some teeth gnashing, using some CREATE ROLE ... IN
ROLE dancing, but I wanted to see if there was any interest in
supporting this for real.

This design is not uncommon, one example is Amazon Web Services (e.g.
EC2, S3), whereby one identification key can have many, independently
revokable secret keys.

I haven't given much thought to the mechanism yet, rather, I am just
trying to assess gut reactions on the principle.


Have you thought of trying to use an external auth source like LDAP for 
such a scheme?



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Noah Misch n...@leadboat.com writes:
 Heikki's suggestion seemed straightforward, so much so that I couldn't figure
 why nobody had done it.  That would usually mean I'm missing something.

 If you're willing to substitute an incompatible table, it's not clear
 why you don't just do

                begin;
                drop table t;
                alter table t_new rename to t;
                commit;

Because the whole source of this problem is dependency hell.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina
On Thu, Jan 20, 2011 at 2:45 PM, Andrew Dunstan and...@dunslane.net wrote:
 Have you thought of trying to use an external auth source like LDAP for such
 a scheme?

I have thought about that, although LDAP is the only one that came to
mind (I don't know a whole lot of systems in detail, only by name...so
suggestions welcome for low-administrative-overhead variants). I also
briefly considered investigating what hooks I could exploit for auth 
auth; I do not know these very well right now. It would be ideal to
not have to run another full bore set of services to support phased
password rotation, though -- in this case it would still appear be
better, but frustrating to use the CREATE ROLE ... IN ROLE dance.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Simon Riggs
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:

 The idea is that whenever we see a valid snapshot which would yield
 a truly serializable view of the data for a READ ONLY transaction,
 we add a WAL record with that snapshot information. 

You haven't explained why this approach is the way forwards. What other
options have been ruled out, and why. The above approach doesn't sound
particularly viable to me.

It's not clear to me what the reason is that this doesn't just work on
HS already. If you started there it might help.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Large object corruption during 'piped' pg_restore

2011-01-20 Thread Tom Lane
Bosco Rama postg...@boscorama.com writes:
 If 'standard_conforming_strings = on' is set in our DB (which is required 
 for
 our app) then the piped restore method (e.g. pg_restore -O backup.dat | 
 psql)
 results in the large objects being corrupted.

 All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 
 LTS
 with all current updates applied.

I've been able to replicate this in 8.4; it doesn't happen in 9.0
(but probably does in all 8.x versions).

The problem is that pg_dump (or in this case really pg_restore) is
relying on libpq's PQescapeBytea() to format the bytea literal that
will be given as argument to lowrite() during the restore.  When
pg_dump is producing SQL directly, or when pg_restore is connected
to a database, PQescapeBytea() mooches the standard_conforming_strings
value from the active libpq connection and gets the right answer.
In the single case where pg_restore is producing SQL without ever
opening a database connection, PQescapeBytea doesn't know what to do
and defaults to the old non-standard-strings behavior.  Unfortunately
pg_restore set standard_conforming_strings=on earlier in the script
(if it was set in the original source database) so you get the wrong
thing.

The bottom line is that pg_dump can't depend on libpq's PQescapeBytea,
but needs its own copy.  We have in fact done that as of 9.0, which is
what I was vaguely remembering:

Author: Tom Lane t...@sss.pgh.pa.us
Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 +

Fix pg_dump to do the right thing when escaping the contents of large 
objects.

The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, 
then
pg_restore to a script file (not directly to a database), the script will 
set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.

At the moment the code is made to emit hex-format bytea strings when dumping
to a script file.  We might want to change to old-style escaping for 
backwards
compatibility, but that would be slower and bulkier.  If we do, it's just a
matter of reimplementing appendByteaLiteral().

This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.

I'm not sure whether this new complaint is enough reason to reconsider
back-patching.  We cannot just backport the 9.0 patch, since it assumes
it can do bytea hex output --- we'd need to emit old style escaped
output instead.  So it's a bit of work, and more to the point would
involve pushing poorly-tested code into stable branches.  I doubt it
would go wrong, but in the worst-case scenario we might create failures
for blob-restore cases that work now.

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches.  Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Noah Misch
On Thu, Jan 20, 2011 at 09:36:11PM +, Simon Riggs wrote:
 I agree that the DDL behaviour is wrong and should be fixed. Thank you
 for championing that alternative view.
 
 Swapping based upon names only works and is very flexible, much more so
 than EXCHANGE could be.
 
 A separate utility might be worth it, but the feature set of that should
 be defined in terms of correctly-working DDL behaviour. It's possible
 that no further requirement exists. I remove my own patch from
 consideration for this release.
 
 I'll review your patch and commit it, problems or objections excepted. I
 haven't looked at it in any detail.

Thanks.  I wouldn't be very surprised if that patch is even the wrong way to
achieve these semantics, but it's great that we're on the same page as to which
semantics they are.

 Having said that, writing the patch did nothing to convince me this was
 the correct approach. Reviews should be reviews, they are not an
 opportunity to provide your own alternate version of a patch. That just
 confuses things and creates a competitive, not a cooperative
 environment. Authors do need to listen to reviewers, so I hope I'm
 demonstrating that here. 

Understood.  I can see now that posting a second code patch, however framed, in
the same thread creates a presumption of aggression that is difficult to dispel.
I will have a lot to think about before doing that again.  Thanks for giving
this discussion, which started poorly due to my actions, a second chance.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If you're willing to substitute an incompatible table, it's not clear
 why you don't just do
 
begin;
drop table t;
alter table t_new rename to t;
commit;

 Because the whole source of this problem is dependency hell.

Well, if you want to preserve dependencies, you can *not* just blindly
substitute an incompatible table.  You must ensure that views and
foreign keys referencing the table are still valid.  So I'm not sure
where anybody got the idea that an implementation that fails to check
all that is even worth presenting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
 
 The idea is that whenever we see a valid snapshot which would
 yield a truly serializable view of the data for a READ ONLY
 transaction, we add a WAL record with that snapshot information. 
 
 You haven't explained why this approach is the way forwards. What
 other options have been ruled out, and why. The above approach
 doesn't sound particularly viable to me.
 
Why not?  We already generate appropriate snapshots for this in SSI,
so is the problem in getting the appropriate information into the
WAL stream or in having a request for a snapshot within a
serializable transaction while running in hot standby the problem?
 
 It's not clear to me what the reason is that this doesn't just
 work on HS already. If you started there it might help.
 
Because the standby would need to bombard the server with a stream
of predicate lock information, we would need to allow transactions
on the master to be canceled do in part to activity on the standby,
and I don't even know how you would begin to track read/write
conflicts between transactions on two different clusters.
 
If any of that didn't make sense, it would probably be more
efficient for everyone involved if those interested browsed the
Overview section of the Wiki page than to have me duplicate its
contents in a post.
 
http://wiki.postgresql.org/wiki/Serializable
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Marko Tiikkaja

On 1/20/2011 12:47 PM, Tom Lane wrote:

So the line I'm thinking we should pursue is to visually associate the
new counter with the filter condition, either like

Filter Cond: (x  42)  (nfiltered = 123)

or

Filter Cond: (x  42)
Rows Filtered: 123


I'd prefer the latter.  Sometimes the Filter Cond is very complex and 
finding the nfiltered information would be easier if it always had its 
own row.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Florian Pflug
On Jan21, 2011, at 00:11 , Simon Riggs wrote:
 It's not clear to me what the reason is that this doesn't just work on
 HS already. If you started there it might help.


The problem is that snapshots taken on the master sometimes represent a
state of the database which cannot occur under any (valid) serial schedule.
Hence, if you use that snapshot to read the *whole* database, you've
surely violated serializability. If you read only parts of the database,
things may or may not be fine, depending on the parts you read.

To have the same stringent guarantees that SERIALIZABLE provides on the
master also for queries run against the slave, you somehow need to prevent
this. The easiest way is to only use snapshots on the slave which *cannot*
produce such anomalies. We already know now to generate such snapshots -
SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question
is mainly how to transfer such snapshots to the slave, and how often we
transmit a new one.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner
I wrote:
 
 Why not?  We already generate appropriate snapshots for this in
 SSI, so is the problem in getting the appropriate information into
 the WAL stream or in having a request for a snapshot within a
 serializable transaction while running in hot standby the problem?
 
I dropped few words.  
 
That was supposed to ask whether the problem was in getting hot
standby to *use such a snapshot*.
 
I'm open to other suggestions on how else we might do this.  I don't
see any alternatives, but maybe you're seeing some possibility that
eludes me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus
On 1/20/11 2:26 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 I would take that to mean that it's waiting on the client.
 
 You mean that the client timed out and isn't accepting data from the
 query anymore?
 
 No, if the backend is in RECV state, it's waiting for the client to
 *send* it something.

I don't think that's consistent with what we're seeing except maybe in
the BIND case.  In the other cases, there's a query supposedly
executing.  But, will look for that possibility.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Tom Lane
Daniel Farina drfar...@acm.org writes:
 I wanted to test the waters on how receptive people might be to an
 extension that would allow Postgres to support two passwords for a
 given role.

Not very.  Why don't you just put two roles in the same group?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
 The idea is that whenever we see a valid snapshot which would yield
 a truly serializable view of the data for a READ ONLY transaction,
 we add a WAL record with that snapshot information. 

 You haven't explained why this approach is the way forwards. What other
 options have been ruled out, and why. The above approach doesn't sound
 particularly viable to me.

I'm pretty concerned about the performance implications, too.  In
particular that sounds like you could get an unbounded amount of WAL
emitted from a *purely read only* transaction flow.  Which is not
going to fly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >