date:20110104

[HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?

2011-01-04 Thread Joel Jacobson

Hi hackers,

The project I'm currently working with fsnapshot[1], is written in
plain plpgsql and I need to sort all the oids in their
creatable/droppable order.
This has already been properly implemented in pg_dump_sort.c using
Knuth's algorithm for topological sorting, with some special magic to
find and break dependency loops.

It's not possible to use a plain recursive query to do the trick (due
to 'i' bidirectional dependencies and dependency loops).

I need a general approach, only making use of pg_depend.

The function should take no input arguments and the output argument
should be oid[], containing a list of the oids in a
creatable/droppable or order.
It doesn't matter if it is left-to-right, least number of edges first.
Any valid topological sort will do.

I'm sure it's possible to implement it in plpgsql or plperl, but I
wanted to check first if anyone has already made such a function to
hopefully save some time?

Thanks a lot!

[1] https://github.com/gluefinance/fsnapshot

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension upgrade, patch v0: debug help needed

2011-01-04 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Committed.

Thanks!

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread Dimitri Fontaine

David E. Wheeler da...@kineticode.com writes:
 Just so long as you're aware that you might get more challenges on this going 
 forward.

Sure, thanks for the reminder.  That said I also remember the reaction
when I used to scan the SHARE/contrib directory to find the extension
control file having the right name property, and I don't see scanning
the same directory in order to find out which upgrade file to consider
depending on several parts of its name as so different.

Current code allows you to use the same upgrade script for more than one
source version, and does so in a way that it's easy to determine which
upgrade file to seek for.

 I don't see the benefit of having the 'relocatable' property optional in
 the control file, but I see a huge drawback.  Requiring it will force
 extension authors to at least have a glance at the docs to understand
 how to set it.  It's important not to overlook it.

 I guess. I'll have to think about how to support it in PGXN, though. And the 
 upgrade keys if they stay in.

Disclaimer: the following is based on my understanding of how you want
  to bundle things, from several discussions we had together at pubs or
  on IRC, please don't read further if you're changed your mind about
  generating the control file from your PGXN YAML specification.

Well, I think you're having a dependency inversion problem here.  PGXN
depends on extensions, not the other way round.  Also, I really expect
the extension facility to be mainly used for internal proprietary code,
mainly procedure collections, and only occasionaly for publishing Open
Source components.

So you should be considering the control file as an input to your
processes, a source file, not something that your service will hide for
extension authors: there's no benefit that I can see in doing so.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regclass without error?

2011-01-04 Thread Tatsuo Ishii

Long time ago, I propose regclass like function which does not throw
an error if the table is not found. Instead I want to let it return
InvalidOid or NULL.

 Tatsuo Ishii is...@postgresql.org writes:
  Is there any way to use regclass without having ERROR?
 
  pgpool-II needs to find the oid from table name and for the purpose it
  issues something like SELECT 'table_name'::regproc::oid. Problem is,
  if the table does not exist, an error occured and the transaction
  aborts. Ideally if the table does not exist, the SELECT returns 0
  (InvalidOid).
 
 I don't think the cast should act that way, but I could see providing a
 separate conversion function that returns 0 ... or perhaps better NULL
 ... if no match.
 
 Such a function should be very helpfull. Great!

I made pretty simple function for this. Essential part is something
like this:

Datum
pgpool_regclass(PG_FUNCTION_ARGS)
{
char*pro_name_or_oid = PG_GETARG_CSTRING(0);
Oid result;

PG_TRY();
{
result = DirectFunctionCall1(regclassin, 
 
CStringGetDatum(pro_name_or_oid));
}
PG_CATCH();
{
result = InvalidOid;
}
PG_END_TRY();

PG_RETURN_OID(result);
}

IMO this implementation is the least invasive but not so
elegant.

Before proposing more complete patches, I would like to hear comments:
which way I should go? The least invasive one like above? Or Should I
refactor regclassin, for example implementing regclassin_gut which
do the essential job, and making wrapper functions, one is active
existing regclass, and the other act as new one?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] back branches vs. VS 2008

2011-01-04 Thread Magnus Hagander

On Tue, Jan 4, 2011 at 04:49, Andrew Dunstan and...@dunslane.net wrote:

 On 01/03/2011 12:15 PM, I wrote:

 The following patch allows me to build the 8.3 and 8.4 branches using
 Visual Studio 2008, once the build system is patched. But I don't really
 know why. HEAD and 9.0 build fine without it. But those branches branches
 fail with a complaint about IPPROTO_IPV6 being undefined.

 The patch seems harmless enough. But I'd like to know why it's happening.
 Does anyone have a clue?


 -#ifdef IPV6_V6ONLY
 +#if defined(IPV6_V6ONLY)  defined(IPPROTO_IPV6)
        if (addr-ai_family == AF_INET6)
        {
            if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY,


 OK, what's going here is that, in the newer SDK, IPV6_V6ONLY is defined
 unconditionally, but IPPROTO_IPV6 is only defined if _WIN32_WINNT is set to
 0x0501 or higher. We defined _WIN32_WINNT as 0x0500 until 9.0, when we
 changed it specifically to allow use of the right IPV6 settings.

I wonder if anything else changed with that #define, though.


 This seems to me like a clear error in the MS headers. I don't think it
 makes any sense to define the settings constant but not the context
 constant. The fix I have suggested above doesn't seem unreasonable or
 terribly unsafe in these circumstances. The code clearly contemplates the
 setsockopt() call in question not having been run, as shown in this comment:

Yeah, it seems reasonable - I assume you tested it and it doesn't fail
in some *different* way than the one we expect in the code?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Greg Smith


Heikki Linnakangas wrote:

You can of course LOCK TABLE as a work-around, if that's what you want.


What I was trying to suggest upthread is that while there are other 
possible ways around this problem, the only one that has any hope of 
shipping with 9.1 is to do just that.  So from my perspective, the rest 
of the discussion about the right way to proceed is moot for now.


For some reason it didn't hit me until you said this that I could do the 
locking manually in my test case, without even touching the server-side 
code yet.  Attached are a new pair of scripts where each pgbench UPDATE 
statement executes an explicit LOCK TABLE.  Here's the result of a 
sample run here:


$ pgbench -f update-merge.sql -T 60 -c 16 -j 4 -s 2 pgbench
starting vacuum...end.
transaction type: Custom query
scaling factor: 2
query mode: simple
number of clients: 16
number of threads: 4
duration: 60 s
number of transactions actually processed: 84375
tps = 1405.953672 (including connections establishing)
tps = 1406.137456 (excluding connections establishing)
$ psql -c 'select count(*) as updated FROM pgbench_accounts WHERE NOT 
abalance=0' -d pgbench

updated
-
  68897
(1 row)

$ psql -c 'select count(*) as inserted FROM pgbench_accounts WHERE aid  
10' -d pgbench

inserted
--
   34497
(1 row)

No assertion crashes, no duplicate key failures.  All the weird stuff I 
was running into is gone, so decent evidence the worst of the problems 
were all because the heavy lock I expecting just wasn't integrated into 
the patch.  Congratulations to Boxuan:  for the first time this is 
starting to act like a viable feature addition to me, just one with a 
moderately long list of limitations and performance issues.


1400 TPS worth of UPSERT on my modest 8-core desktop (single drive with 
cheating fsync) isn't uselessly slow.  If I add SET TRANSACTION 
ISOLATION LEVEL SERIALIZABLE; just after the BEGIN;, I don't see any 
serialization errors, and performance is exactly the same.


Run a straight UPDATE over only the existing range of keys, and I get 
7000 TPS instead.  So the locking etc. is reducing performance to 20% of 
its normal rate, on this assertion+debug build.  I can run this tomorrow 
(err, later today I guess looking at the time) on a proper system with 
BBWC and without asseritions to see if the magnitude of the difference 
changes, but I don't think that's the main issue here.


Presuming the code quality issues and other little quirks I've 
documented (and new ones yet to be discovered) can get resolved here, 
and that's a sizeable open question, I could see shipping this with the 
automatic heavy LOCK TABLE in there.  Then simple UPSERT could work out 
of the box via a straightforward MERGE.  We'd need a big warning 
disclaiming that concurrent performance is very limited in this first 
release of the feature, but I don't know that this is at the 
unacceptable level of slow for smaller web apps and such.


Until proper fine-grained concurrency is implemented, I think it would 
be PR suicide to release a version of this without a full table lock 
happening automatically though.  The idea Robert advocated well, that it 
would be possible for advanced users to use even this rough feature in a 
smarter way to avoid conflicts and not suffer the full performance 
penalty, is true.  But if you consider the main purpose here to be 
making it easier to get smaller MySQL apps and the like ported to 
PostgreSQL (which is what I see as goal #1), putting that burden on the 
user is just going to reinforce the old PostgreSQL is so much harder 
than MySQL stereotype.  I'd much prefer to see everyone have a slow but 
simple to use UPSERT via MERGE available initially, rather than to worry 
about optimizing for the advanced user in a way that makes life harder 
for the newbies.  The sort of people who must have optimal performance 
already have trigger functions available to them, that they can write 
and tweak for best performance.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books



test-merge.sh
Description: Bourne shell script
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta -5000 5000
BEGIN;

-- Optional mode change
-- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

LOCK TABLE pgbench_accounts;
MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 100)::integer,:delta,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);
COMMIT;

-- This syntax worked with MERGE v203 patch, but isn't compatible with v204
--MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 100)::integer,:delta,'')) AS

Re: [HACKERS] pg_dump --split patch

2011-01-04 Thread Hannu Krosing


On 28.12.2010 22:44, Joel Jacobson wrote:



Sent from my iPhone

On 28 dec 2010, at 21:45, Gurjeet Singh singh.gurj...@gmail.com 
mailto:singh.gurj...@gmail.com wrote:


The problem I see with suffixing a sequence id to the objects with 
name collision is that one day the dump may name myfunc(int) as 
myfunc.sql and after an overloaded version is created, say 
myfunc(char, int), then the same myfunc(int) may be dumped in 
myfunc-2.sql, which again is non-deterministic.


I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc 
to reduce the need of truncating filenames.





Also, it is a project policy that we do not introduce new features in 
back branches, so spending time on an 8.4.6 patch may not be the best 
use of your time.


My company is using 8.4 and needs this feature, so I'll have to patch 
it anyway :)

Start the easy way, by writing a (python|perl) filter for pg_dump -s output

Once this is done, convert it into a patch for pg_dump


Hannu Krosing
http://www.2ndQuadrant.com/books/

Re: [HACKERS] pg_dump --split patch

2011-01-04 Thread Hannu Krosing


On 28.12.2010 23:51, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 12/28/2010 04:44 PM, Joel Jacobson wrote:

Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc
to reduce the need of truncating filenames.

I think that's just horrible. Does the i stand for integer or inet? And
it will get *really* ugly for type names with spaces in them ...

You think spaces are bad, try slashes ;-)

Not to mention the need for including schemas in typenames sometimes.
I think you're going to have a real problem trying to fully describe a
function's signature in a file name of reasonable max length.

something like

funcname_number-of-arguments_hash_of_argument_type_list.sql

seems like a reasonable compromise - you can find the function you are 
looking for without too much searching, even when overloaded and the 
uniqueness is still guaranteed.



Hannu Krosing
http://www.2ndQuadrant.com/books/




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2011-01-04 Thread Hannu Krosing


On 28.12.2010 17:00, Joel Jacobson wrote:

Dear fellow hackers,

Problem: A normal diff of two slightly different schema dump files 
(pg_dump -s), will not produce a user-friendly diff, as you get all 
changes in the same file.


Another Solution: I have used a python script for spliiting dump -s 
output into a directory structure for years


A structure that seems to work well is

1. database_name.load.sql - file containing \i for all other files
2. tree of
schema1\
+- objtype1\
|+ obj_of_type1_name1.sql
|+ ...
+-objtype2\
 ...

That is, first have one directory per schema on top level and then one 
for each type of objects

inside have one file per object.
Some things which may have non-unique names, like functions need extra 
uniquefication, like adding a underscore (or colon) separated list of 
argument types at the end.


Most times you want to  leave out the comment lines with OIDs so that 
you can diff the files against another version


--
Hannu Krosing
http://www.2ndQuadrant.com/books/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Greg Smith


Robert Haas wrote:

And even if it isn't, the MERGE syntax is insane if what you really
want to do is insert or update ONE record.  If all we have is MERGE,
people will keep doing it with a PL/pgsql stored procedure or some
crummy application logic just so that they don't have to spend several
days trying to understand the syntax.  Heck, I understand the syntax
(or I think I do) and I still think it's more trouble than its worth


I hoped that the manual would have a clear example of this is how you 
do UPSERT with MERGE, preferrably cross-linked to the existing Example 
39-2. Exceptions with UPDATE/INSERT trigger implementation that's been 
the reference implementation for this for a long time, so people can see 
both alternatives.  New users will cut and paste that example into their 
code, and in the beginning neither know nor care how MERGE actually 
works, so long as the example does what it claims.  I would wager the 
majority of PL/pgsql implementations of this requirement start the exact 
same way.  I don't think the learning curve there is really smaller, 
it's just that you've just already been through it.


I've been purposefully ignoring the larger applications of MERGE in the 
interest of keeping focus on a managable subset.  But the more general 
feature set is in fact enormously useful for some types of data 
warehouse applications.  Build REPLACE, and you built REPLACE.  Build 
MERGE that is REPLACE now and eventually full high-performance MERGE, 
and you've done something with a much brighter future.  I don't think 
the concurrency hurdles here are unique to this feature either, as shown 
by the regular overlap noted with the other serialization work. 


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSPI client authentication in non-Windows builds

2011-01-04 Thread Christian Ullrich


* Robert Haas wrote:


On Mon, Jan 3, 2011 at 8:11 AM, Christian Ullrichch...@chrullrich.net  wrote:



this patch adds support for connecting to servers running on Windows
and requesting SSPI authentication. It does this by treating
AUTH_REQ_SSPI the same as AUTH_REQ_GSS if no native SSPI support is
available.


I have to confess that I don't know whether this is a good idea or a
bad idea.


Both GSS and SSPI have advantages and disadvantages.

To use SSPI, your backends must run as a dedicated domain account, so if 
you use the binary installer, you have to change permissions on the data 
directory and reconfigure the service. On the other hand, you do not 
need a keytab.


To use GSS, you need a keytab, but not a domain user account, and thus 
no domain at all.


Earlier, I had the concern that using client-side GSSAPI (or the 
Kerberos SSPI package) to connect to a server using SSPI Negotiate (as 
the backend currently does) was a violation of the published protocol, 
but that the Negotiate SSP handled this by falling back to Kerberos. I 
would have been reluctant to rely on this behavior, which I thought 
undocumented. However, I just found the documentation that says this is 
all right:


http://msdn.microsoft.com/en-us/library/aa378748(v=VS.85).aspx

A server that uses the Negotiate package is able to respond to client 
applications that specifically select either the Kerberos or NTLM 
security provider.


This covers the case where the client is running Windows, because then 
libpq will actually use SSPI instead of GSSAPI, satisfying the letter of 
the documentation. By implication, because SSPI Kerberos is wire-level 
compatible with GSSAPI, it also covers the case where the client is 
running on another platform and uses native GSSAPI libraries to 
authenticate.


If it was not for supporting NTLM through SSPI, it would even be 
possible to simply get rid of AUTH_REQ_SSPI entirely and let the server 
send AUTH_REQ_GSS for sspi lines in pg_hba.conf. By doing this, no 
patches to the client libraries are needed, because both libpq and the 
JDBC driver will automatically do the right thing on all platforms.


--
Christian Ullrich


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ALTER EXTENSION UPGRADE patch v1

2011-01-04 Thread Dimitri Fontaine

Hi,

Please find attached the patch to support ALTER EXTENSION UPGRADE,
following exchanges made on this list — we can't really speak about
decisions here, apparently, until commit is done :)

The documentation is available online for easy browsing here:

  http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html

The things we might want to discuss are, AFAIUI:

 - support for upgrading from pre-9.1 extension, or to get from a bunch
   of related pl code to a proper extension in your database

   I've done the code for that to work, the idea is pretty simple and
   relies on the following points.  Please note that the attached patch
   only implements support for upgrading some contrib modules from
   pre-9.1, that's because I'm willing to get some approvement before
   continuing the effort — I foresee no road blocks here.

 - CREATE WRAPPER EXTENSION

   If you want to upgrade an extension, it first has to exists as an
   extension.  When you're wanting to switch to extension, you need a
   way to bootstrap your extension object, because you want an entry in
   the catalogs and you don't want to run the script: it has already
   been run before you decided this would make for an extension.

   This command allows you to register an extension in the catalogs,
   bypassing any script and forcing the version to NULL.

   I'm not sold on the 'WRAPPER' keyword here, I just used an existing
   one in the grammar that looked like a good candidate.  This was the
   best I could find, but IANANS (I am not a native speaker).

 - ALTER EXTENSION ... UPGRADE

   This allows you to upgrade an existing extension, wrapper included.

   The extension author needs to prepare the upgrade scripts, and to
   setup in the control file which to use.  The new extension's version
   is static as the upgrade files you distribute depend on this (think
   packaging here) and so you pick the upgrade script file to execute
   depending only on the currently installed version.

   This picking is made by regular expression matching (using the code
   backing the ~(text, text) operator), and supports null as a special
   case.  The first match wins, other candidates are bypassed.

 - CREATE EXTENSION ... WITH SYSID ...

   Thinking about upgrades, I added support for that too.  This might be
   necessary for pg_upgrade, but I'm not so sure.  It was an experiment
   but I though we might as well talk about it before I remove it from
   the patch :)

 - ALTER OBJECT ... SET EXTENSION ...

   This implements the magic behind upgrading from null, as it allows
   the users to add objects to an existing extension.  Internally, this
   will only record a INTERNAL depedency between the object and the
   extension.

   This is mainly for use in upgrade scripts when you support pre-9.1
   extension upgrade, or for internal PL code that grows into an
   extension after the fact.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



upgrade_extension.v1.patch.gz
Description: ALTER EXTENSION UPGRADE, patch version 1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regclass without error?

2011-01-04 Thread Heikki Linnakangas


On 04.01.2011 11:07, Tatsuo Ishii wrote:

Datum
pgpool_regclass(PG_FUNCTION_ARGS)
{
char*pro_name_or_oid = PG_GETARG_CSTRING(0);
Oid result;

PG_TRY();
{
result = DirectFunctionCall1(regclassin,
 
CStringGetDatum(pro_name_or_oid));
}
PG_CATCH();
{
result = InvalidOid;
}
PG_END_TRY();

PG_RETURN_OID(result);
}

IMO this implementation is the least invasive but not so
elegant.


It's not generally safe to suppress errors like that. You could leak 
locks or tuple descriptors etc. And if the error is not no scuh table, 
but e.g. out of memory, you don't want to suppress it anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-01-04 Thread Greg Smith


t...@fuzzy.cz wrote:

- I really am not sure about the changes made in pg_proc.h. I'm not sure
how to assign OIDs to the new functions (I've simply chosen values that
are were not used in this file), and I'm not sure about the other columns
(I've copied and modified another function with the same parameter/return
types)
  


The description of the columns is at the beginning of pg_proc.h, the 
part that begins with CATALOG(pg_proc,1255)...  The descriptions of some 
of the first 11 fields are mostly straighforward.  The first fun part is 
that how may times the information expected in the second VARIABLE 
LENGTH FIELDS section repeats varies based on the parameters listed.  
The other thing that's usually confusing is that the types for the 
values are all expressed as type OID numbers.  For example, if you see 
25, that's the OID of the text type.  You can see the whole list with:


select oid,typname from pg_type;

And if you go back to the file with that list in handle, a lot more of 
it should make sense.  If you see a multiple parameter type list like 
23 21, that's a function whose input values are of types (int4,int2),


As for getting a new OID, if you go into src/include/catalog/ and run 
the unused_oids script, it will give you some help in figuring which 
have been used and which not.  It's not worth getting too stressed about 
the number you choose in the patch submission, because commits between 
when you got your free number and when your patch is considered for 
commit can make your choice worthless anyway.  There's a process 
referred to as catversion bump, where the database catalog version get 
updated to reflect things like new pg_proc information, that committers 
take care of as one of the last adjustments before final commit.  Doing 
a final correction to the OID choice is a part every committer knows to 
look at.


I wrote a talk that covered some additional trivia in this area, as well 
as other things people tend to get confused about in the source code, 
that you can find at 
http://www.pgcon.org/2010/schedule/attachments/142_HackingWithUDFs.pdf ; 
that might be helpful for some other things you might wonder about 
eventually.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regclass without error?

2011-01-04 Thread Tatsuo Ishii

 It's not generally safe to suppress errors like that. You could leak
 locks or tuple descriptors etc. And if the error is not no scuh
 table, but e.g. out of memory, you don't want to suppress it anyway.

Thanks. I will create more invasive patch.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-01-04 Thread Greg Smith


Tomas Vondra wrote:

OK, so here goes the simplified patch - it tracks one reset timestamp
for a background writer and for each database.
  


Adding timestamps like this was something I wanted to do after adding 
pg_stat_reset_shared to 9.0, so since you've beaten me to it I'll review 
your patch and make sure it all works the way I was hoping instead.  The 
whole per-table idea was never going to fly given how few people touch 
this area at all, but the way you're tracking things now seems reasonable.


When you post an updated version of a patch that's already being tracked 
on the CommitFest app, please try to remember to add that update to the 
tracker there.  I just did that for this 12/23 update so that's covered 
already.


Next problem is that the preferred method for submitted patches uses 
context diffs.  See http://wiki.postgresql.org/wiki/Working_with_Git for 
some information about the somewhat annoying way you have to setup git 
to generate those.  Don't worry about that for this round though.  I 
don't care about the diff formatting given the code involved, but it's 
something you should sort out if you do another update.



PS: I've noticed Magnus posted a patch to track recovery conflicts,
adding a new view pg_stat_database_conflicts. I have not checked it yet
but it should not influence this patch.
  


I need to do some testing of that anyway, so I'll take a look at any 
potential clash as part of my review.  I want to check how this 
interacts with track_functions resets too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2011-01-04 Thread Greg Smith


Joel Jacobson wrote:
To understand a change to my database functions, I would start by 
looking at the top-level, only focusing on the names of the functions 
modified/added/removed.
At this stage, you want as little information as possible about each 
change, such as only the names of the functions.
To do this, get a list of changes functions, you cannot compare two 
full schema plain text dumps using diff, as it would only reveal the 
lines changed, not the name of the functions, unless you are lucky to 
get the name of the function within the (by default) 3 lines of copied 
context.


While you could increase the number of copied lines of context to a 
value which would ensure you would see the name of the function in the 
diff, that is not feasible if you want to quickly get a picture of 
the code areas modified, since you would then need to read through 
even more lines of diff output.


I can agree on some use cases you've outlined, where there's merit to 
the general idea of your patch.  But as an aside, you really should 
launch an investigation into some better diff tools if this is how 
you're doing this type of work.  Last week I reviewed 3K lines worth of 
changes from two versions of a 12K line schema dump I'd never seen 
before in a couple of hours using kdiff3.  I'd have killed myself before 
finishing if I had to do the same job with traditional diff as you're 
describing it here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] texteq/byteaeq: avoid detoast

2011-01-04 Thread Noah Misch

On Mon, Jan 03, 2011 at 10:23:03PM -0500, Robert Haas wrote:
 Can you add this to the currently-open CommitFest, so we don't lose track of 
 it?
 
 https://commitfest.postgresql.org/action/commitfest_view/open

Done.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Hitoshi Harada

2011/1/4 Jeff Davis pg...@j-davis.com:
 I have been updating my work in progress here:

 http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

 Right now, it's not in a reviewable state, but those interested can
 glance through the code.

 Quick synopsis (for illustration purposes only; don't expect much from
 the current code):

  CREATE TYPE numrange
    AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);

I am interested in how you define increment/decrement operation of
range value in discrete types. The window functions and PARTITION also
want to represent RANGE but there's no clear solution.

Sorry if it's already been discussed since I didn't track the threads.

Regards


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] texteq/byteaeq: avoid detoast

2011-01-04 Thread Pavel Stehule

Hello

I looked on patch

does work toast_raw_datum_size on packed varlena corectly?

regards

Pavel Stehule

2011/1/4 Noah Misch n...@leadboat.com:
 On Mon, Jan 03, 2011 at 10:23:03PM -0500, Robert Haas wrote:
 Can you add this to the currently-open CommitFest, so we don't lose track of 
 it?

 https://commitfest.postgresql.org/action/commitfest_view/open

 Done.

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Florian Weimer

* Jeff Davis:

 4. For the GiST penalty function, and perhaps some picksplit algorithms,
 it might be nice to know the length of a range, or do some other kinds
 of math. It introduces a lot of complexity to try to define math
 functions for each subtype, and try to make sure they behave sanely. So
 I was thinking that the user might need to specify a function that
 converts the subtype into a float that approximates a value's position
 in the total order.

Doesn't the eqsel hint already provide this information?

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] texteq/byteaeq: avoid detoast

2011-01-04 Thread Noah Misch

Hi Pavel,

On Tue, Jan 04, 2011 at 03:13:11PM +0100, Pavel Stehule wrote:
 I looked on patch

Thanks.

 does work toast_raw_datum_size on packed varlena corectly?

Yes, as best I can tell.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] back branches vs. VS 2008

2011-01-04 Thread Andrew Dunstan




On 01/04/2011 04:43 AM, Magnus Hagander wrote:



OK, what's going here is that, in the newer SDK, IPV6_V6ONLY is defined
unconditionally, but IPPROTO_IPV6 is only defined if _WIN32_WINNT is set to
0x0501 or higher. We defined _WIN32_WINNT as 0x0500 until 9.0, when we
changed it specifically to allow use of the right IPV6 settings.

I wonder if anything else changed with that #define, though.



Probably. I'm not going to suggest turning it on at this stage. There 
are lots of references to this specific OS level in the headers.






This seems to me like a clear error in the MS headers. I don't think it
makes any sense to define the settings constant but not the context
constant. The fix I have suggested above doesn't seem unreasonable or
terribly unsafe in these circumstances. The code clearly contemplates the
setsockopt() call in question not having been run, as shown in this comment:

Yeah, it seems reasonable - I assume you tested it and it doesn't fail
in some *different* way than the one we expect in the code?



Yes, I enabled IPV6 and set listen_addresses to * an no untoward events 
appeared.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regclass without error?

2011-01-04 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 It's not generally safe to suppress errors like that. You could leak
 locks or tuple descriptors etc. And if the error is not no scuh
 table, but e.g. out of memory, you don't want to suppress it anyway.

 Thanks. I will create more invasive patch.

Why is any of this necessary?  It sure looks like you are solving a
problem at the wrong level.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2011-01-04 Thread Devrim GÜNDÜZ

On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote:
 
 I note that this uninitialized pages with standbys has cropped up from
 time to time - I wonder if in most/all the cases folk were using
 Pitrtools?

I deployed Pitrtools a lot when I was working for CMD, and I haven't
seen any issues with that. It is just a wrapper, nothing else...

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] regclass without error?

2011-01-04 Thread Tatsuo Ishii

 Why is any of this necessary?  It sure looks like you are solving a
 problem at the wrong level.

Please read upthread.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regclass without error?

2011-01-04 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 Why is any of this necessary?  It sure looks like you are solving a
 problem at the wrong level.

 Please read upthread.

You haven't made any argument why this shouldn't be solvable at the
client side, or at worst with a plpgsql DO block; either of which answer
would have the considerable merit of working against existing server
releases.

But in any case I see no reason to mess with the regclass code.
The C code you want is just

names = stringToQualifiedNameList(class_name);
result = RangeVarGetRelid(makeRangeVarFromNameList(names), true);

and there is no way that refactoring is going to yield a solution more
elegant than just duplicating those two lines --- especially since
regclassin has other cases for which zero is a non-error result.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:
 Heikki Linnakangas wrote:
 You can of course LOCK TABLE as a work-around, if that's what you want.
 
 Presuming the code quality issues and other little quirks I've
 documented (and new ones yet to be discovered) can get resolved
 here, and that's a sizeable open question, I could see shipping this
 with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
 could work out of the box via a straightforward MERGE.

How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 2. We need to use the subtype's IO functions, but those may not be
 immutable. So, rather than create new IO functions for each range type,
 I was thinking that I'd use just three (anyrange_i_in, anyrange_s_in,
 and anyrange_v_in), and select the right one at definition time, based
 on the subtype's IO functions' volatility. That seems like a bit of a
 hack -- any better ideas?

You should just do what we do for arrays and records, ie, mark the I/O
functions stable.  There is no reason for anyrange to have a more
complicated approach to this than the existing composite-type structures
do.  See discussion thread here
http://archives.postgresql.org/pgsql-hackers/2010-07/msg00932.php
and commit here
http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php

 3. Right now I allow user-defined parse/deparse functions to be
 specified. In almost all cases, I would think that we want the text
 format to be something like:
   [ 2010-01-01, 2011-01-01 )
 where the brackets denote inclusivity, and the left and right sides can
 be optionally double-quoted. Is it even worth having these parse/deparse
 functions, or should we just force the obvious format?

+1 for forcing a single consistent format.  I compare this to the
Berkeley-era decision to let types specify nondefault array delimiters
--- that was flexibility that didn't help anybody, just resulted in
over-complicated code (or code that would fall over if someone tried
to actually use a delimiter other than comma...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?

2011-01-04 Thread Dimitri Fontaine

Joel Jacobson j...@gluefinance.com writes:
 It's not possible to use a plain recursive query to do the trick (due
 to 'i' bidirectional dependencies and dependency loops).

Well I came up with that while working on some extension related fun
dependency problems, I guess it could help you:

~:5490=# WITH RECURSIVE depends AS (
 select 16385 as nsp, objid, refobjid, array[refobjid] as deps
   from pg_depend
  where refobjid = 16854 and deptype != 'p'
 UNION ALL
 select p.nsp, p.objid, d.refobjid, deps || d.refobjid
   from pg_depend d JOIN depends p ON d.objid = p.objid
  where d.deptype != 'p' and not d.refobjid = any(deps)
)
select * from depends;
  nsp  | objid | refobjid |deps
---+---+--+
 16385 | 16851 |16854 | {16854}
 16385 | 16852 |16854 | {16854}
 16385 | 16853 |16854 | {16854}
 16385 | 16851 | 2200 | {16854,2200}
 16385 | 16852 | 2200 | {16854,2200}
 16385 | 16852 |16851 | {16854,16851}
 16385 | 16853 | 2200 | {16854,2200}
 16385 | 16852 | 2200 | {16854,16851,2200}
 16385 | 16852 |16851 | {16854,2200,16851}
(9 rows)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 09:29:55AM +0100, Joel Jacobson wrote:
 Hi hackers,
 
 The project I'm currently working with fsnapshot[1], is written in
 plain plpgsql and I need to sort all the oids in their
 creatable/droppable order.  This has already been properly
 implemented in pg_dump_sort.c using Knuth's algorithm for
 topological sorting, with some special magic to find and break
 dependency loops.
 
 It's not possible to use a plain recursive query to do the trick
 (due to 'i' bidirectional dependencies and dependency loops).

I believe it is possible.  I'll try to do it this evening.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 12:31:55PM +0100, Dimitri Fontaine wrote:
 Hi,
 
 Please find attached the patch to support ALTER EXTENSION UPGRADE,

Do you plan to have

ALTER EXTENSION ... UPGRADE TO VERSION ...

, or the more general,

ALTER EXTENSION ... ALTER VERSION TO ... ?

I get that this might not be a 9.1 feature, but it's sure to be one
people who need to deploy software want.  A lot.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Marko Tiikkaja


On 2011-01-04 6:27 PM, David Fetter wrote:

On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:

Heikki Linnakangas wrote:

You can of course LOCK TABLE as a work-around, if that's what you want.


Presuming the code quality issues and other little quirks I've
documented (and new ones yet to be discovered) can get resolved
here, and that's a sizeable open question, I could see shipping this
with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
could work out of the box via a straightforward MERGE.


How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.


That, unfortunately, won't work so well in REPEATABLE READ :-(  But I, 
too, am starting to think that we should have a separate, optimized 
command to do UPSERT/INSERT .. IGNORE efficiently and correctly while 
making MERGE's correctness the user's responsibility.  Preferably with 
huge warning signs on the documentation page.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 07:02:54PM +0200, Marko Tiikkaja wrote:
 On 2011-01-04 6:27 PM, David Fetter wrote:
 On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:
 Heikki Linnakangas wrote:
 You can of course LOCK TABLE as a work-around, if that's what you want.
 
 Presuming the code quality issues and other little quirks I've
 documented (and new ones yet to be discovered) can get resolved
 here, and that's a sizeable open question, I could see shipping this
 with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
 could work out of the box via a straightforward MERGE.
 
 How about implementing an UPSERT command as take the lock, do the
 merge?  That way, we'd have both the simplicity for the simpler cases
 and a way to relax consistency guarantees for those who would like to
 do so.
 
 That, unfortunately, won't work so well in REPEATABLE READ :-(

There are caveats all over READ COMMITTED/REPEATABLE READ/SNAPSHOT.
The only really intuitively obvious behavior is SERIALIZABLE, which
we'll have available in 9.1. :)

 But I, too, am starting to think that we should have a separate,
 optimized command to do UPSERT/INSERT .. IGNORE efficiently and
 correctly while making MERGE's correctness the user's
 responsibility.  Preferably with huge warning signs on the
 documentation page.

+1 for the HUGE WARNING SIGNS :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 2:29 AM, Jeff Davis pg...@j-davis.com wrote:
 I liked Robert's suggestion here:

 http://archives.postgresql.org/message-id/aanlktiks_x93_k82b4f_ga634wci0oeb9ftrurf28...@mail.gmail.com

 which says that the user can just define a canonicalize function that
 will take a range as input (or perhaps the logical pieces of a range)
 and put it into an appropriate canonical representation. For instance,
 int4range_canonical might take (1,4] and turn it into [2,4]. This is
 similar to a few other ideas, but Robert's idea seems to require the
 least effort by the person defining the range type, because postgresql
 can still handle representation.

 It doesn't allow for all of the suggested features. In particular, it
 would not allow granules to be specified for discrete ranges. But on
 balance, it seems like this is the most conceptually simple and I think
 it satisfies the primary use cases.

Maybe I'm missing something, but it seems like this approach could
support granules.  You just have to define the canonicalize function
in terms of the granule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread David E. Wheeler

On Jan 4, 2011, at 12:46 AM, Dimitri Fontaine wrote:

 David E. Wheeler da...@kineticode.com writes:
 Just so long as you're aware that you might get more challenges on this 
 going forward.
 
 Sure, thanks for the reminder.  That said I also remember the reaction
 when I used to scan the SHARE/contrib directory to find the extension
 control file having the right name property, and I don't see scanning
 the same directory in order to find out which upgrade file to consider
 depending on several parts of its name as so different.

Silly programmer! You don't have to do that yourself! You can teach the 
computer to do it for you. It's very good at that sort of thing!

 Current code allows you to use the same upgrade script for more than one
 source version, and does so in a way that it's easy to determine which
 upgrade file to seek for.

As Tom pointed out, you can do the same with naming conventions by having 
scripts \i each other as appropriate.

 I guess. I'll have to think about how to support it in PGXN, though. And the 
 upgrade keys if they stay in.
 
 Disclaimer: the following is based on my understanding of how you want
  to bundle things, from several discussions we had together at pubs or
  on IRC, please don't read further if you're changed your mind about
  generating the control file from your PGXN YAML specification.

s/YAML/JSON/, and okay. :-)

 Well, I think you're having a dependency inversion problem here.  PGXN
 depends on extensions, not the other way round.

What? That makes no sense, so I must be misunderstanding what you're trying to 
say.

 Also, I really expect
 the extension facility to be mainly used for internal proprietary code,
 mainly procedure collections, and only occasionaly for publishing Open
 Source components.

This is because you're not a Perl programmer. See CPAN.

 So you should be considering the control file as an input to your
 processes, a source file, not something that your service will hide for
 extension authors: there's no benefit that I can see in doing so.

I know, but then you're not a CPAN guy. You're a Debian package guy. It's 
hardly surprising that we'll have inverted views of this sort of thing. 
Frankly, I think that you might find StackBuilder a better fit with your world 
view.

  http://pgfoundry.org/projects/stackbuilder/

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Simon Riggs

On Tue, 2011-01-04 at 15:51 +0900, Itagaki Takahiro wrote:
 On Tue, Dec 28, 2010 at 22:17, Magnus Hagander mag...@hagander.net wrote:
  We definitely need the very basic level for 9.1, and we can always
  improve on it later :-)
 
  pg_stat_walsender. It would then only need the columns for procpid,
  usesysid, usename, client_addr, client_port, and the WALsender
  specific fields.
  Yeah, agreed. backend_start is probably the best one
 
 Here are patches for pg_stat_walsender.
 I split the feature into two pieces:
 
 * get_host_and_port.patch
 It separates host and port formatter as a subroutine from pg_stat_activity.
 In addition, make pg_stat_get_backend_client_addr/port() functions to
 use the subroutine to reduce duplicated codes.
 
 * pg_stat_walsender.patch
 It adds pg_stat_walsender system view. It has subset columns of
 pg_stat_activity and only one additional WAL sender specific information
 via WALSndStatus(). I named the column sending location because
 standby servers might not have received the WAL record; if we had
 synchronous replication, a new sent location wold be added.
 But the naming is still an open question. Comments welcome.
 
 There is O(max_wal_senders^2) complexity in the view, But I think
 it is not so serious problem because we can expect max_wal_senders
 is 10 or so at most.
 
 CREATE VIEW pg_stat_walsender AS
 SELECT
 S.procpid,
 S.usesysid,
 U.rolname AS usename,
 S.client_addr,
 S.client_port,
 S.backend_start,
 S.xlog_sending
 FROM pg_stat_get_walsender(NULL) AS S, pg_authid U
 WHERE S.usesysid = U.oid;

Just seen you started working on this again. Very good.

I enclose some snippets of code I was working on, which I am removing
from my patch in favour of your work as a separate commit.

The way I coded it was a new SRF that joins to the existing
pg_stat_activity. So no initdb required, and this can also easily be
included as an external module for 9.0.

Please notice also that my coding of the new SRF does not have the O^2
issue you mention, which I was keen to avoid.

The sent pointer is needed whether or not we have sync rep. We should
also include application name, since the user may set that in the
standby for all the same reasons it is set elsewhere.

Small point: please lets not call this pg_stat_walsender?
pg_stat_replication_sent and pg_stat_replication_received would be
easier for normal humans to understand.

I would very much appreciate it if one of you could complete something
here and commit in the next few days. That would then allow me to extend
the view with sync rep specific info for monitoring and patch testing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 346eaaf..75419b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -488,6 +488,21 @@ CREATE VIEW pg_stat_activity AS
 WHERE S.datid = D.oid AND
 S.usesysid = U.oid;
 
+CREATE VIEW pg_stat_replication_activity AS
+SELECT
+S.procpid,
+S.usesysid,
+U.rolname AS usename,
+S.application_name,
+S.client_addr,
+S.client_port,
+S.backend_start,
+R.sent_location
+FROM pg_stat_get_activity(NULL) AS S, pg_authid U,
+pg_stat_get_replication_activity() AS R
+WHERE S.usesysid = U.oid AND
+S.procpid = R.procpid;
+
 CREATE VIEW pg_stat_database AS
 SELECT
 D.oid AS datid,
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e9d8847..4422f5a 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -37,6 +37,7 @@
 #include signal.h
 #include unistd.h
 
+#include funcapi.h
 #include access/xlog_internal.h
 #include catalog/pg_type.h
 #include libpq/libpq.h
@@ -49,6 +50,7 @@
 #include storage/ipc.h
 #include storage/pmsignal.h
 #include tcop/tcopprot.h
+#include utils/builtins.h
 #include utils/guc.h
 #include utils/memutils.h
 #include utils/ps_status.h
@@ -1122,6 +1124,91 @@ WalSndWakeup(void)
 }
 
 /*
+ * Returns the Send position of walsenders with given pid.
+ * Or InvalidXLogRecPtr if none.
+ */
+Datum
+pg_stat_get_replication_activity(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		TupleDesc	tupdesc;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+
+		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+
+#define PG_STAT_GET_REP_ACTIVITY_COLS 	2
+		tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REP_ACTIVITY_COLS, false);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 1, procpid, INT4OID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 2, sent_location, TEXTOID, -1, 0);
+
+		funcctx-tuple_desc =

Re: [HACKERS] can shared cache be swapped to disk?

2011-01-04 Thread Jeff Janes

On Sun, Dec 19, 2010 at 6:14 AM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Sat, Dec 18, 2010 at 11:59:33PM -0800, Jeff Janes wrote:
 On Sat, Dec 18, 2010 at 10:11 PM, flyusa2010 fly flyusa2...@gmail.com 
 wrote:
  hi, folks!
  I see that shared cache is implemented by system v shared memory. I wonder
  whether data in this area can be swapped out to disk.
  Isn't it bad that we read data from disk, put data in shared cache, and
  finally data in shared cache is swapped to disk again!
  Why not use shmctl(..SHM_LOCK..) to pin data in main memory?
  Thanks!

 I've tried that on a recent linux kernel, to see if it would allow
 shared_buffers to usefully be a large fraction of total memory.  It
 didn't help.  So either swapping wasn't the problem in the first
 place, or the kernel ignores the order.

 Correct. The kernel ignores locking requests because it's a great way
 to DOS a machine. For example, mlock() of large blocks of memory is
 also not permitted for similar reasons.

Does it ignore such requests in general, or only under certain situations?

If the latter, do you know what those situations are?

If the former, that seems incredibly bogus.  There are plenty of ways
to DOS a machine.  The main way you prevent DOS by your own authorized
users (other than firing them) on linux is by setrlimit, not by
claiming to implement a feature you haven't actually implemented, or
by implementing a feature but rendering it completely useless for the
purpose it was intended for.

RLIMIT_MEMLOCK exists, it has a small default hard limit, and only
root can increase that.  If root has gone out of its way to grant the
postgres user a higher limit, the kernel should respect that, at least
up until the situation become truly desperate.

However, I don't have any evidence it is being ignored.  I just know
that locking the shared memory did not improve things, but I didn't
verify that shared memory getting swapped out was the problem in the
first place.


 The way you make sure shared memory doesn't get swapped out is to make
 sure it gets used. (i.e. don't give 2GB shared memory when your
 database is 100MB). And don't make your shared memory so large that
 you're creating significant memory pressure, otherwise the kernel might
 choose to swap our your shared memory rather than say the webserver.

 Your shared memory should be reasonably sized, but you should make sure
 the kernel has enough cache memory it can throw away first.

Unfortunately it is hard to know what the kernel considers to be
significant memory pressure.

My experience (from mostly non-pgsql work) is that kernel has what I
would consider enough cache memory to throw away, but for some reason
doesn't throw it away but does more counter productive things instead.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote:
  It doesn't allow for all of the suggested features. In particular, it
  would not allow granules to be specified for discrete ranges. But on
  balance, it seems like this is the most conceptually simple and I think
  it satisfies the primary use cases.
 
 Maybe I'm missing something, but it seems like this approach could
 support granules.  You just have to define the canonicalize function
 in terms of the granule.

I meant that it doesn't support them as an explicit, user-visible
concept.

The main drawback here is that only a select group of people will be
defining discrete range types at all, because it would require them to
define a function first. Perhaps that's for the best, because, (as Tom
pointed out) we don't want someone using floats and then specifying a
granule of '0.01'.

While we're talking about it, one question I had is: should the
canonicalize function be:
  /* works on the deserialized information right before serialization */
  canonical(flags, lower_bound, upper_bound)
or
  /* works on the serialized form right after serialization */
  range = canonical(range)

I would lean toward the latter because it's simpler on the user (and
allows non-C functions). But perhaps an efficiency argument could be
made for the former because it could avoid one round of
deserialize/reserialize when the representation is not already in
canonical form.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 14:18 +, Florian Weimer wrote:
 * Jeff Davis:
 
  4. For the GiST penalty function, and perhaps some picksplit algorithms,
  it might be nice to know the length of a range, or do some other kinds
  of math. It introduces a lot of complexity to try to define math
  functions for each subtype, and try to make sure they behave sanely. So
  I was thinking that the user might need to specify a function that
  converts the subtype into a float that approximates a value's position
  in the total order.
 
 Doesn't the eqsel hint already provide this information?
 

Can you clarify what you mean? I don't know what the eqsel hint is.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Josh Berkus

All,

This is a pointless argument.

Eventually, we will be implementing all possible sync rep
configurations, because different users *need* different configurations.
 Some users care more about durability, some more about availability,
and some more about response time.  And you can't have all three, which
was my point about A,D,R (also the point of CAP).

For that matter, any single configuration will be useful to a large
number of users, and an even larger number will be able to work around
while they wait for 9.2.  Further, the knowledge we gain by having some
kind of synch rep in the field will allow us to implement the different
configurations correctly, which *no* amount of arguing on e-mail will.

The perfect is the enemy of the good.

The relevant question is: which configuration(s) can we have ready for
the next CommitFest and alpha release?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Josh Berkus

On 1/2/11 12:35 AM, Heikki Linnakangas wrote:
 Very likely. A synchronous standby can bring the master to a halt, while
 an asynchronous one is rather harmless. If I were a DBA, and the data
 wasn't very sensitive, I would liberally hand out async privileges to my
 colleagues to set up reporting standbys, test servers etc. But I would
 *not* give them synchronous privileges, because sooner or later one
 would go hmm, I wonder what happens if I make this synchronous, or
 haphazardly copy the config file from a synchronous standby. That would
 either bring down the master, or act as a fake standby, acknowledging
 commits before they're flushed to the real synchronous standby. Either
 one would be bad.

That's not very likely.  Shops with enough DBAs to actually have a
heirarchy of database access are rare indeed; the average business has
trouble employing *one* fulltime DBA.  Also, you're mistaken if you
think that DBA's wouldn't restrict the ability of development users to
set up asynch connections; those have a significant impact on the
production server as well.

However, the possibility of *accidental* DOS is a bit more likely, per
accidentally using the wrong config file.  I don't think you can guard
against that with permissions, though; if a DBA is copying the config
file from a synchronous server accidentally, presumably he's copying the
.pgpass file as well, and then the new server is identical to the legit
synch rep servers as far as the master is concerned.

So, again, I don't agree that a separate synchrep permission is useful,
or warranted.

However, your arguments *do* make me backpedal on the issue of having a
list of synch rep roles on the master.  I can easily imagine a DBA
needing to rapidly disable synch rep if replication is failing, without
taking the time to log in to several separate standbys, one or more of
which might be improperly configured and inaccessible.  I can't think of
a simpler way to do that than having a synch_rep_roles configuration on
the master.  That would also handle control issues for the senior DBA,
since you'd need superuser access to the master to modify it.

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 12:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The sent pointer is needed whether or not we have sync rep. We should
 also include application name, since the user may set that in the
 standby for all the same reasons it is set elsewhere.

 Small point: please lets not call this pg_stat_walsender?
 pg_stat_replication_sent and pg_stat_replication_received would be
 easier for normal humans to understand.

Eh... I may be showing my status as a non-normal human, but I know
exactly what pg_stat_walsender is (it's the view that shows you the
status of the WAL senders you've allowed by configuring
max_wal_senders0) but I have no idea what pg_stat_replication_sent
and pg_stat_replication_received are supposed to be.  Why two views?
*scratches head in confusion*

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Josh Berkus


 Eh... I may be showing my status as a non-normal human, but I know
 exactly what pg_stat_walsender is (it's the view that shows you the
 status of the WAL senders you've allowed by configuring
 max_wal_senders0) but I have no idea what pg_stat_replication_sent
 and pg_stat_replication_received are supposed to be.  Why two views?
 *scratches head in confusion*

How about one view, called pg_stat_replication?  This would be clear
even to newbies, which none of the other view names would ...

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote:
 2011/1/4 Jeff Davis pg...@j-davis.com:
  I have been updating my work in progress here:
 
  http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes
 
  Right now, it's not in a reviewable state, but those interested can
  glance through the code.
 
  Quick synopsis (for illustration purposes only; don't expect much from
  the current code):
 
   CREATE TYPE numrange
 AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);
 
 I am interested in how you define increment/decrement operation of
 range value in discrete types. The window functions and PARTITION also
 want to represent RANGE but there's no clear solution.
 
 Sorry if it's already been discussed since I didn't track the threads.

The user would specify a canonical function like:

   CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp,
 CANONICAL=my_int4range_canonical);

That function would be called when constructing ranges on input or after
a computation, and could change something like (1,4] into [2,4] if you
prefer the latter form.

So the range types would not have increments, decrements, granules, or
knowledge about the difference type (e.g. interval is the difference
type for timestamp).

What support do you need/want from range types to help with new window
function features?

Also, partitioning might have some use for range types to represent
range partitions. Comments are welcome.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Simon Riggs

On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote:

 The relevant question is: which configuration(s) can we have ready for
 the next CommitFest and alpha release?

Based upon that series of conversations, I've reworked the design so
that there is (currently) only a single standby offering sync rep at any
one time. Other standbys can request to be sync standbys but they only
become the sync standby if the first one fails. Which was simple to do
and bridges the challenges of an exactly identified sync standby and the
fragility of too closely specifying the config.

I think you're right that trying to please everyone is not going to be
possible in this release, because of the explosion of parameter
combinations that require testing and because of the explosion in my
head that causes.

I'm not feeling well now, so I'm going to go to bed, not just to avoid
snapping at people. Even given that short interlude, I see no problem
about delivery.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Robert Haas

On Sun, Jan 2, 2011 at 4:19 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sun, 2011-01-02 at 18:54 +0200, Heikki Linnakangas wrote:

 I believe we all agree that there's different use cases that require
 different setups. Both first-past-the-post and wait-for-all-to-ack
 have their uses.

 Robert's analysis is that first-past-the-post doesn't actually improve
 the durability guarantee (according to his calcs). Which means that
  1 primary, 2 sync standbys with first-past-the-post
 is actually worse than
  1 primary, 1 sync and 1 async standby
 in terms of its durability guarantees.

 So ISTM that Robert does not agree that both have their uses.

I think it depends on what failure modes you want to protect against.
If you have a primary in New York, a secondary in Los Angeles, and
another secondary in London, you might decide that the chances of two
standbys being taken out by the same event are negligible, or
alternatively that if one event does take out both of them, it'll be
something like a meteor where you'll have bigger things to worry about
than lost transactions.  In that case, requiring one ACK but not two
is pretty sensible.  If the primary goes down, you'll look at the two
remaining machines (which, by presumption, will still be up) and
promote whichever one is ahead.  In this setup, you get a performance
benefit from waiting for either ACK rather than both ACKs, and you
haven't compromised any of the cases you care about.

However, if you have the traditional close/far setup, things are
different.  Suppose you have a primary and a secondary in New York and
another secondary in Los Angeles.  Now it has to be viewed as a
reasonable possibility that you could lose the New York site.  If that
happens, you need to be able to promote the LA standby *without
reference to the NY standby*.  So you really can't afford to do the
1-of-2 thing, because then when NY goes away you're not sure whether
the LA standby is safe to promote.

So, IMHO, it just depends on what you want to do.

 I'm not
 sure what the point of such a timeout in general is, but people have
 requested that.

 Again, this sounds like you think a timeout has no measurable benefit,
 other than to please some people's perceived needs.

 The wait-for-all-to-ack looks a lot less ridiculous if you also
 configure a timeout and don't wait for disconnected standbys

 Does it? Do Robert, Stefan and Aidan agree? What are the availability
 and durability percentages if we do that? Based on those, we may decide
 to do that instead. But I'd like to see some analysis of your ideas, not
 just a we could. Since nobody has commented on my analysis, lets see
 someone else's.

Here's my take on this point.  I think there is a use case for waiting
for a disconnected standby and a use case for not waiting for a
disconnected standby.  The danger of NOT waiting for a disconnected
standby is that if you then go on to irretrievably lose the primary,
you lose transactions.  But on the other hand, if you do wait, you've
made the primary unavailable.  I don't know that there's one right
answer here.  For some people, if they can't be certain of recording
the transaction in two places, then it may be better to not process
any transactions at all.  For other people, it may be better to
process transactions unprotected for a while while you get a new
standby up.  It's not for us to make that judgment; we're here to
provide options.

Having said that, I am OK with whichever one we want to implement
first so long as we keep the door open to doing the other one later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Joshua D. Drake


 I'm not feeling well now, so I'm going to go to bed, not just to avoid
 snapping at people. Even given that short interlude, I see no problem
 about delivery.

Cool! Thanks Simon. Feel better.

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Joshua D. Drake

On Tue, 2011-01-04 at 10:50 -0800, Josh Berkus wrote:
  Eh... I may be showing my status as a non-normal human, but I know
  exactly what pg_stat_walsender is (it's the view that shows you the
  status of the WAL senders you've allowed by configuring
  max_wal_senders0) but I have no idea what pg_stat_replication_sent
  and pg_stat_replication_received are supposed to be.  Why two views?
  *scratches head in confusion*
 
 How about one view, called pg_stat_replication?  This would be clear
 even to newbies, which none of the other view names would ...

hmmm I think pg_stat_standby might be more relevant but I definitely
agree something more newbie appropriate is in order.

Joshua D. Drake

 
 -- 
   -- Josh Berkus
  PostgreSQL Experts Inc.
  http://www.pgexperts.com
 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Josh Berkus


 hmmm I think pg_stat_standby might be more relevant but I definitely
 agree something more newbie appropriate is in order.

I'd be fine with that name, too.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Magnus Hagander

On Tue, Jan 4, 2011 at 20:28, Josh Berkus j...@agliodbs.com wrote:

 hmmm I think pg_stat_standby might be more relevant but I definitely
 agree something more newbie appropriate is in order.

 I'd be fine with that name, too.

That seems kind of backwards though - given that the view only
contains data on the master...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1

2011-01-04 Thread Dimitri Fontaine

David Fetter da...@fetter.org writes:
 Do you plan to have

 ALTER EXTENSION ... UPGRADE TO VERSION ...

 , or the more general,

 ALTER EXTENSION ... ALTER VERSION TO ... ?

Well why not, but I'm not sure I understand what you have in mind here.
I don't exactly see how to install more than one version on the server,
at the filesystem level, with the current infrastructure.

The way debian solves it for multi-versions packages, such as PostgreSQL
for example, is to add the (major) version number in the package name,
so that you have independent packages such as:
  postgresql-8.3
  postgresql-8.4
  postgresql-9.0

Are you thinking about doing something similar with extensions? 

The only other way I'm able to think about your request would be for the
command to simply ERROR out when the premises given by the users are not
matching the reality of what's installed both in the catalogs and on the
file system.  If that's what you're talking about, we still can make it
material for 9.1, I believe.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 2:31 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jan 4, 2011 at 20:28, Josh Berkus j...@agliodbs.com wrote:

 hmmm I think pg_stat_standby might be more relevant but I definitely
 agree something more newbie appropriate is in order.

 I'd be fine with that name, too.

 That seems kind of backwards though - given that the view only
 contains data on the master...

I think pg_stat_replication is better than pg_stat_standby, but I'm
still not convinced we shouldn't go with the obvious
pg_stat_walsenders.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread Dimitri Fontaine

David E. Wheeler da...@kineticode.com writes:
 As Tom pointed out, you can do the same with naming conventions by having 
 scripts \i each other as appropriate.

This is a deprecated idea, though.  We're talking about the
pg_execute_from_file() patch that has been applied, but without the
pg_execute_sql_file() function.  So that part is internal to the backend
extension code and not available from SQL anymore.

There's no consensus to publish a bakend \i like function.  So there's
no support for this upgrade script organizing you're promoting.  Unless
the consensus changes again (but a commit has been done).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Stefan Kaltenbrunner


On 01/04/2011 07:51 PM, Simon Riggs wrote:

On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote:


The relevant question is: which configuration(s) can we have ready for
the next CommitFest and alpha release?


Based upon that series of conversations, I've reworked the design so
that there is (currently) only a single standby offering sync rep at any
one time. Other standbys can request to be sync standbys but they only
become the sync standby if the first one fails. Which was simple to do
and bridges the challenges of an exactly identified sync standby and the
fragility of too closely specifying the config.


ah cool - like that approach for 9.1!


Thanks for working on that!


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 1:18 PM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote:
  It doesn't allow for all of the suggested features. In particular, it
  would not allow granules to be specified for discrete ranges. But on
  balance, it seems like this is the most conceptually simple and I think
  it satisfies the primary use cases.

 Maybe I'm missing something, but it seems like this approach could
 support granules.  You just have to define the canonicalize function
 in terms of the granule.

 I meant that it doesn't support them as an explicit, user-visible
 concept.

 The main drawback here is that only a select group of people will be
 defining discrete range types at all, because it would require them to
 define a function first. Perhaps that's for the best, because, (as Tom
 pointed out) we don't want someone using floats and then specifying a
 granule of '0.01'.

 While we're talking about it, one question I had is: should the
 canonicalize function be:
  /* works on the deserialized information right before serialization */
  canonical(flags, lower_bound, upper_bound)
 or
  /* works on the serialized form right after serialization */
  range = canonical(range)

 I would lean toward the latter because it's simpler on the user (and
 allows non-C functions).

Yeah, me too.

 But perhaps an efficiency argument could be
 made for the former because it could avoid one round of
 deserialize/reserialize when the representation is not already in
 canonical form.

I believe this might be an appropriate time to apply Knuth's Law.  I'm
not thrilled with the amount of palloc overhead we have in the
backend, but absent some evidence that this case is going to be
particularly significant, I'd be disinclined to contort the interface.
 I suspect that if you run oprofile this won't be the bottleneck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread David E. Wheeler

On Jan 4, 2011, at 11:48 AM, Dimitri Fontaine wrote:

 As Tom pointed out, you can do the same with naming conventions by having 
 scripts \i each other as appropriate.
 
 This is a deprecated idea, though.  We're talking about the
 pg_execute_from_file() patch that has been applied, but without the
 pg_execute_sql_file() function.  So that part is internal to the backend
 extension code and not available from SQL anymore.
 
 There's no consensus to publish a bakend \i like function.  So there's
 no support for this upgrade script organizing you're promoting.  Unless
 the consensus changes again (but a commit has been done).

To be clear, consensus was not reached, by my reading. It may be that it makes 
sense to restore pg_execute_sql_file(), perhaps to run only in the context of 
ALTER EXTENSION.

Just to be clear where I'm coming from, as an extension developer, I would like 
PostgreSQL extensions to:

* Prefer convention over configuration
* Not make me do more work that the computer can do

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 2:50 PM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 On 01/04/2011 07:51 PM, Simon Riggs wrote:

 On Tue, 2011-01-04 at 10:28 -0800, Josh Berkus wrote:

 The relevant question is: which configuration(s) can we have ready for
 the next CommitFest and alpha release?

 Based upon that series of conversations, I've reworked the design so
 that there is (currently) only a single standby offering sync rep at any
 one time. Other standbys can request to be sync standbys but they only
 become the sync standby if the first one fails. Which was simple to do
 and bridges the challenges of an exactly identified sync standby and the
 fragility of too closely specifying the config.

 ah cool - like that approach for 9.1!

Yeah, I like that idea too, on first blush.  I think we should think
it over and see whether we're committing ourselves to any design
decisions we may later regret - what parameters will we need to add
from that point to get all the configs we ultimately want to support?
But it seems a reasonable starting point, and we can argue about the
rest once we have working code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Heikki Linnakangas


On 04.01.2011 21:43, Robert Haas wrote:

On Tue, Jan 4, 2011 at 2:31 PM, Magnus Hagandermag...@hagander.net  wrote:

On Tue, Jan 4, 2011 at 20:28, Josh Berkusj...@agliodbs.com  wrote:



hmmm I think pg_stat_standby might be more relevant but I definitely
agree something more newbie appropriate is in order.


I'd be fine with that name, too.


That seems kind of backwards though - given that the view only
contains data on the master...


I think pg_stat_replication is better than pg_stat_standby, but I'm
still not convinced we shouldn't go with the obvious
pg_stat_walsenders.


How about pg_stat_replication_activity? If I understood correctly, the 
view is similar to pg_stat_activity, but displays information about 
connected standbys rather than regular backends. It's a bit long name, 
though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 10:50:12AM -0800, Josh Berkus wrote:
 
  Eh... I may be showing my status as a non-normal human, but I know
  exactly what pg_stat_walsender is (it's the view that shows you the
  status of the WAL senders you've allowed by configuring
  max_wal_senders0) but I have no idea what pg_stat_replication_sent
  and pg_stat_replication_received are supposed to be.  Why two views?
  *scratches head in confusion*
 
 How about one view, called pg_stat_replication?  This would be clear
 even to newbies, which none of the other view names would ...

Wait.  We can't do that.  We'd be breaking a decades-old tradition of
terrible names! ;)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread Dimitri Fontaine

David E. Wheeler da...@kineticode.com writes:
 * Prefer convention over configuration

The previous idea about the convention is not flying well with the very
recent proposal of ALTER EXTENSION ... UPGRADE TO VERSION ..., because
it would certainly require that the extension's name include its major
version number, like debian is doing for a number of packages.

Also, how are PostGIS 1.4 and 1.5 (and 2.0) packaged nowadays?

 * Not make me do more work that the computer can do

No computer will guess reliably which upgrade file to apply given the
currently installed version and the newer one, as soon as the same file
can get used for more than a single combination of those two strings.

I much prefer to avoid shipping that many files, and thinks that even in
the worst case where you have to add a setup line per supported upgrade
setup, the control file support for that is better.

Now I perfectly understand that there's more to this world than my eyes
can see, that's why we're talking about alternatives.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracking latest timeline in standby mode

2011-01-04 Thread Heikki Linnakangas


On 02.11.2010 07:15, Fujii Masao wrote:

On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.


Yeah, that seems better.


I finally got around to look at this. I wrote a patch to validate that 
the TLI on xlog page header matches ThisTimeLineID during recovery, and 
noticed quickly in testing that it doesn't catch all the cases I'd like 
to catch :-(.


The problem scenario is this:


TLI 1 ---+C---+---Standby
 .
 .
TLI 2+C---+---


The two horizontal lines represent two timelines. TLI 2 forks off from 
TLI 1, because of a failover to a not-completely up-to-date standby 
server, for example. The plus-signs represent WAL segment boundaries and 
C's represent checkpoint records.


Another standby server has replayed all the WAL on TLI 2. Its latest 
restartpoint is C. The checkpoint records on the different timelines are 
at the same location, at the beginning of the WAL files - not all that 
impossible if you have archive_timeout set, for example.


Now, if you stop and restart the standby, it will try to recover to the 
latest timeline, which is TLI 2. But before the restart, it had already 
replayed the WAL from TLI 1, so it's wrong to replay the WAL from the 
parallel universe of TLI 2. At the moment, it will go ahead and do it, 
and you end up with an inconsistent database.


I planned to fix that by checking the TLI on the xlog page header, but 
that alone isn't enough in the above scenario. The TLI on the page 
headers on timeline 2 are what's expected; the first page on the segment 
has TLI==1, because it was just forked off from timeline 1, and the 
subsequent pages have TLI==2, as they should after the checkpoint record.


So we have to remember that before the restart, which timeline where we 
on. We already remember how far we had replayed, that's the 
minRecoveryPoint we store in the control file, but we have to memorize 
the timeline along that.


On reflection, your idea of checking the history file before replaying 
anything seems much easier. We'll still need to add the timeline 
alongside minRecoveryPoint to do the checking, but it's a lot easier to 
do against the history file. And we can validate the TLIs on page 
headers against the information from the history file as we read in the WAL.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading Extension, version numbers

2011-01-04 Thread David E. Wheeler

On Jan 4, 2011, at 12:05 PM, Dimitri Fontaine wrote:

 David E. Wheeler da...@kineticode.com writes:
 * Prefer convention over configuration
 
 The previous idea about the convention is not flying well with the very
 recent proposal of ALTER EXTENSION ... UPGRADE TO VERSION ..., because
 it would certainly require that the extension's name include its major
 version number, like debian is doing for a number of packages.

No, just the file.

 Also, how are PostGIS 1.4 and 1.5 (and 2.0) packaged nowadays?

Tarballs.

 * Not make me do more work that the computer can do
 
 No computer will guess reliably which upgrade file to apply given the
 currently installed version and the newer one, as soon as the same file
 can get used for more than a single combination of those two strings.

Why not? Version numbers would have to be part of the file names. The only 
wrinkle is being able to properly order version numbers, and we could address 
that by requiring a specific version format. Tom suggested integers; I 
suggested semantic versions.

 I much prefer to avoid shipping that many files, and thinks that even in
 the worst case where you have to add a setup line per supported upgrade
 setup, the control file support for that is better.

Well, for a version that requires no upgrade script, there just wouldn't be one.

It's a matter of taste.

 Now I perfectly understand that there's more to this world than my eyes
 can see, that's why we're talking about alternatives.

You are?

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-04 Thread Tom Lane

I've been thinking about how to fix GIN's assorted corner-case problems,
as has been discussed several times, most recently here:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg00521.php
See also
http://wiki.postgresql.org/wiki/Todo#GIN

There are basically three related issues:

1. GIN doesn't store anything in the index for a NULL item.
2. GIN doesn't store anything in the index for an empty (zero-key) item.
3. GIN can't deal with NULL key values.

(An item is a composite value to be indexed, such as a tsvector or
array.  A key is an individual indexable value, such as a lexeme or
array element.)

Because of #1 and #2, GIN can't handle full-index scans.  This is not
because the code can't scan all of the index, but because doing so
wouldn't necessarily return a TID for every existing heap row.

We have to fix #1 and #2, and then get rid of the prohibition on
full-index scans, in order to deal with the complaints that appear in our
TODO list.  The problem with NULL key values is somewhat less pressing,
because most GIN-indexable operators are strict enough to not care about
null keys, so we could perhaps just ignore nulls.  But I'm inclined to
think that it'd be best to fix that now while we're whacking the code and
opclass APIs around, rather than probably having to go back for yet
another round later.  A concrete example of a hit we'll take if we don't
index nulls is that array-is-contained-in would have to be treated as a
lossy rather than lossless index search, since there'd be no way to tell
from the index whether the array item contains any nulls (rendering it not
contained in anything).


As far as storage in the index goes, NULL key values seem perfectly simple
to deal with: just allow a null to get stored for the key value of a GIN
index entry.  What we discussed doing to fix #1 and #2 was to store a
dummy index entry, rather than no entries at all.  The least complicated
way to do that would be to store a NULL key.  That would mean that,
for example with integer arrays, all three of these item values would have
identical index entries:
NULL::int[]
'{}'::int[]
'{NULL}'::int[]
So any searches that need to yield different answers for these cases
(ie find some but not all of them) would have to be treated as lossy.
That's not necessarily a show-stopper, but I'm inclined to think that
it is worth working a bit harder so that we can distinguish them.

It's already true that GIN requires special-case code to construct its
index entries (look at GinFormTuple).  What I'm thinking we could do
without too much additional ugliness is store either the key value (for
the normal, non-null key case) or an int16 representing a category:
1 = null key value
2 = placeholder for zero-key item
3 = placeholder for null item
The index entry's IndexTupleHasNulls flag would be sufficient to
distinguish whether a key or a category flag is present, since there
are no other potentially-null values in a GIN index entry.  There is
room in this scheme for more categories if we ever need any, though
I can't think of what they'd be.


The other sticky problem is how to extend the GIN opclass support function
API definitions for all this.  I propose the following:

compare(): doesn't really need any changes.  Just as in btree, we only
need to call the key comparison function for non-null keys.  The various
categories of nulls can have hard-wired comparison behavior.

extractValue(): needs an extension so it can return null key values.
I propose adding a third argument:
Datum *extractValue(Datum inputValue, int32 *nkeys,
bool **nullFlags)
If the function wants to return any nulls, it has to palloc an array
of nkeys bools and store a pointer to it at *nullFlags.  We can initialize
*nullFlags to NULL (implying no null keys) for backwards compatibility
with existing functions that aren't aware of the third argument.  In the
case of a null item value, we needn't call the function at all, we can
just generate the dummy entry directly.  Zero-key items can be handled
compatibly with the current behavior: if nkeys is returned as zero,
we'll generate a dummy entry instead of generating nothing at all.

extractQuery(): likewise, needs to be able to return null query element
values.  I propose adding a sixth argument:
Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n,
bool **pmatch, Pointer **extra_data,
bool **nullFlags)
As above, we can initialize *nullFlags to NULL for backwards
compatibility.  We will continue to assume that a null query value means
an unsatisfiable query, so we don't need to be able to call extractQuery
with a null input query.  We'll keep the current convention that returning
nkeys = -1 means an unsatisfiable query while returning zero requests a
full index scan.

consistent(): needs to be able to deal with possible nulls in the
extracted query

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * Existing GIN indexes are upwards compatible so far as on-disk storage
 goes, but they will of course be missing entries for empty, null, or
 null-containing items.  Users who want to do searches that should find
 such items will need to reindex after updating to 9.1.

This is the only part of this proposal that bothers me a little bit.
It would be nice if the system could determine whether a GIN index is
upgraded from 9.0 or earlier and thus doesn't contain these entries
- and avoid trying to use the index for these sorts of queries in
cases where it might return wrong answers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * Existing GIN indexes are upwards compatible so far as on-disk storage
 goes, but they will of course be missing entries for empty, null, or
 null-containing items.  Users who want to do searches that should find
 such items will need to reindex after updating to 9.1.

 This is the only part of this proposal that bothers me a little bit.
 It would be nice if the system could determine whether a GIN index is
 upgraded from 9.0 or earlier and thus doesn't contain these entries
 - and avoid trying to use the index for these sorts of queries in
 cases where it might return wrong answers.

I don't think it's really worth the trouble.  The GIN code has been
broken for these types of queries since day one, and yet we've had only
maybe half a dozen complaints about it.  Moreover there's no practical
way to avoid trying to use the index, since in many cases the fact
that a query requires a full-index scan isn't determinable at plan time.

The best we could really do is throw an error at indexscan start, and
that doesn't seem all that helpful.  But it probably wouldn't take much
code either, if you're satisfied with that answer.  (I'm envisioning
adding a version ID to the GIN metapage and then checking that before
proceeding with a full-index scan.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2011-01-04 Thread Mark Kirkwood


On 05/01/11 04:43, Devrim GÜNDÜZ wrote:

On Fri, 2010-12-31 at 11:11 +1300, Mark Kirkwood wrote:

I note that this uninitialized pages with standbys has cropped up from
time to time - I wonder if in most/all the cases folk were using
Pitrtools?

I deployed Pitrtools a lot when I was working for CMD, and I haven't
seen any issues with that. It is just a wrapper, nothing else...




Note that if you perform the backup step when the system is quiet then 
there is typically no problem with the standby - it is only when the 
backup coincides with any level of activity that you can elicit the 
previously mentioned problems.


Obviously the case I have included in the first mail shows up the 
problem immediately, but in fact it is reasonably hard to trigger and I 
spent quite a while getting to the state of being able to trigger the 
uninitialized pages on demand.


While Pitrtools it is a wrapper,  the use of --exclude=backup_label is 
incorrect and needs to be removed.


regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Dimitri Fontaine

Josh Berkus j...@postgresql.org writes:
 So, again, I don't agree that a separate synchrep permission is useful,
 or warranted.

+1

 However, your arguments *do* make me backpedal on the issue of having a
 list of synch rep roles on the master.  I can easily imagine a DBA
 needing to rapidly disable synch rep if replication is failing, without
 taking the time to log in to several separate standbys, one or more of
 which might be improperly configured and inaccessible.  I can't think of
 a simpler way to do that than having a synch_rep_roles configuration on
 the master.  That would also handle control issues for the senior DBA,
 since you'd need superuser access to the master to modify it.

What about the HBA here?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Josh Berkus


 What about the HBA here?

Hmmm.  That's tempting; an synchronous HBA instead of a GUC?  But that
doesn't solve the problem of standby #6 is failing, I want to kick it
off synch rep.

I'd be opposed to having a GUC *and* an HBA.  making DBAs set things
independantly in two places just frustrates our users.

--Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Dimitri Fontaine

Josh Berkus j...@postgresql.org writes:
 What about the HBA here?

 Hmmm.  That's tempting; an synchronous HBA instead of a GUC?  But that
 doesn't solve the problem of standby #6 is failing, I want to kick it
 off synch rep.

 I'd be opposed to having a GUC *and* an HBA.  making DBAs set things
 independantly in two places just frustrates our users.

I was just thinking that you could prepend a reject line at the right
place in the file.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can shared cache be swapped to disk?

2011-01-04 Thread Martijn van Oosterhout

On Tue, Jan 04, 2011 at 09:51:05AM -0800, Jeff Janes wrote:
  Correct. The kernel ignores locking requests because it's a great way
  to DOS a machine. For example, mlock() of large blocks of memory is
  also not permitted for similar reasons.
 
 Does it ignore such requests in general, or only under certain situations?
 
 If the latter, do you know what those situations are?

Well, not in general, but for shared memory it's ignored (not sure
about if you're root). It used to be that shared memory was always
locked, which sounds like a great idea, until people started abusing it.

So now shared memory is on ethe same footing as other memory. Not sure
where I read this, I know it came up several years ago. I think it
changed back in 2.0 times.

 RLIMIT_MEMLOCK exists, it has a small default hard limit, and only
 root can increase that.  If root has gone out of its way to grant the
 postgres user a higher limit, the kernel should respect that, at least
 up until the situation become truly desperate.

Like I said, not sure about how it works for root.

 Unfortunately it is hard to know what the kernel considers to be
 significant memory pressure.
 
 My experience (from mostly non-pgsql work) is that kernel has what I
 would consider enough cache memory to throw away, but for some reason
 doesn't throw it away but does more counter productive things instead.

Possibly. Everyone always considers their memory to be more important
than all other memory on the system, but the kernel has a much better
idea of what's going on than the user. That doesn't mean it's without
fault or couldn't be improved.

But if there's a bunch of shared memory not being accessed very often
and the kernel thinks it's better used somewhere else, it may be right.
Repeatable test cases in this area are really hard.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * Existing GIN indexes are upwards compatible so far as on-disk storage
 goes, but they will of course be missing entries for empty, null, or
 null-containing items.  Users who want to do searches that should find
 such items will need to reindex after updating to 9.1.

 This is the only part of this proposal that bothers me a little bit.
 It would be nice if the system could determine whether a GIN index is
 upgraded from 9.0 or earlier and thus doesn't contain these entries
 - and avoid trying to use the index for these sorts of queries in
 cases where it might return wrong answers.

 I don't think it's really worth the trouble.  The GIN code has been
 broken for these types of queries since day one, and yet we've had only
 maybe half a dozen complaints about it.  Moreover there's no practical
 way to avoid trying to use the index, since in many cases the fact
 that a query requires a full-index scan isn't determinable at plan time.

 The best we could really do is throw an error at indexscan start, and
 that doesn't seem all that helpful.  But it probably wouldn't take much
 code either, if you're satisfied with that answer.  (I'm envisioning
 adding a version ID to the GIN metapage and then checking that before
 proceeding with a full-index scan.)

I'd be satisfied with that answer.  It at least makes it a lot more
clear when you've got a problem.  If this were a more common scenario,
I'd probably advocate for a better solution, but the one you propose
seems adequate given the frequency of the problem as you describe it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-04 Thread Josh Berkus

On 1/4/11 1:49 PM, Tom Lane wrote:
 I don't think it's really worth the trouble.  The GIN code has been
 broken for these types of queries since day one, and yet we've had only
 maybe half a dozen complaints about it.  Moreover there's no practical
 way to avoid trying to use the index, since in many cases the fact
 that a query requires a full-index scan isn't determinable at plan time.

Actually, there's been a *lot* of complaining about the GIN issues.
It's just that most of that complaining doesn't reach -hackers.

The common pattern I've seen in our practice and on IRC is:

1) user has GiST indexes
2) user tries converting them to GIN
3) user gets full index scan errors
4) user switches back and gives up

I agree that backwards compatibility should not be a priority; it is
sufficient to tell users to reindex.  For one thing, anyone who *is*
using GIN presently will have written their application code to avoid
full index scans.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Josh Berkus


 I was just thinking that you could prepend a reject line at the right
 place in the file.

Hmmm, that's worth thinking about.  How do others feel about this?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER EXTENSION UPGRADE patch v1

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 08:31:19PM +0100, Dimitri Fontaine wrote:
 David Fetter da...@fetter.org writes:
  Do you plan to have
 
  ALTER EXTENSION ... UPGRADE TO VERSION ...
 
  , or the more general,
 
  ALTER EXTENSION ... ALTER VERSION TO ... ?
 
 Well why not, but I'm not sure I understand what you have in mind here.

One could imagine that an extension was updated more quickly than
PostgreSQL major versions come out, or at least not at the exact same
time.

 I don't exactly see how to install more than one version on the
 server, at the filesystem level, with the current infrastructure.

It's probably not worth worrying about the multiple version issue for
9.1, but it probably *is* for 9.2.

 The way debian solves it for multi-versions packages, such as PostgreSQL
 for example, is to add the (major) version number in the package name,
 so that you have independent packages such as:
   postgresql-8.3
   postgresql-8.4
   postgresql-9.0
 
 Are you thinking about doing something similar with extensions? 

Vaguely.  See above.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Josh Berkus

On 1/4/11 10:18 AM, Jeff Davis wrote:
 The main drawback here is that only a select group of people will be
 defining discrete range types at all, because it would require them to
 define a function first. Perhaps that's for the best, because, (as Tom
 pointed out) we don't want someone using floats and then specifying a
 granule of '0.01'.

Frankly, I'm still not convinced that *anyone* will really need discrete
range types -- as opposed to continuous range types, which I'm already
using in production ala temporal.  So I'm completely OK with making
discrete range types hard to use, as long as continous range types are
easy to use.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-04 Thread Itagaki Takahiro

On Wed, Jan 5, 2011 at 04:56, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think pg_stat_replication is better than pg_stat_standby, but I'm
 still not convinced we shouldn't go with the obvious
 pg_stat_walsenders.

 How about pg_stat_replication_activity? If I understood correctly, the view
 is similar to pg_stat_activity, but displays information about connected
 standbys rather than regular backends. It's a bit long name, though.

The view currently discussed is for *master* servers. We might have some
views for replication activity in *standby* servers. So, I'd like to
choose consistent and symmetric names for them -- for example,
pg_stat_replication_master and pg_stat_replication_standby.
I've expected they will be pg_stat_wal_[senders|receivers]
when I was writing the patch, but any other better names welcome.

However, we have max_wal_senders GUC parameter. So, users still
need to know what wal_senders is.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can shared cache be swapped to disk?

2011-01-04 Thread Jeff Janes

On Tue, Jan 4, 2011 at 2:52 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Tue, Jan 04, 2011 at 09:51:05AM -0800, Jeff Janes wrote:
  Correct. The kernel ignores locking requests because it's a great way
  to DOS a machine. For example, mlock() of large blocks of memory is
  also not permitted for similar reasons.

 Does it ignore such requests in general, or only under certain situations?

 If the latter, do you know what those situations are?

 Well, not in general, but for shared memory it's ignored (not sure
 about if you're root). It used to be that shared memory was always
 locked, which sounds like a great idea, until people started abusing it.

 So now shared memory is on ethe same footing as other memory. Not sure
 where I read this, I know it came up several years ago. I think it
 changed back in 2.0 times.

 RLIMIT_MEMLOCK exists, it has a small default hard limit, and only
 root can increase that.  If root has gone out of its way to grant the
 postgres user a higher limit, the kernel should respect that, at least
 up until the situation become truly desperate.

 Like I said, not sure about how it works for root.

I mean that root can increase it for *other* users.

I've done the experiment on kernel 2.6.31.5, as a non-root user, and
it looks like the kernel is respecting the SHM_LOCK.

On a 2GB machine I set shared_buffers to 1200MB and run pgbench -S
with scale of 80, and run it until it seems to be fully cached.

(top doesn't distinguish between memory that has been requested but
never accessed, versus memory that has been accessed and then truly
swapped out to disk.  So unless you first let it run to steady-state
before applying pressure, it is hard to interpret the results.)

Then I start up a Perl program that just perpetually loops through
~1.1 GB of memory.

If I SHM_LOCK postgres's memory, then only perl starts swapping.  If I
don't lock it, then both perl and postgres start swapping.

Obviously there is a lot of territory not covered here, but it looks
like locking memory is respected in general.  It still doesn't let you
benefit from using shared_buffers that are a large portion of RAM
(other than in silly test cases), and I don't know why that is, but
I'm now pretty sure it isn't due to swapping out the shared memory.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2011-01-04 Thread Greg Smith


Simon Riggs wrote:

Based upon that series of conversations, I've reworked the design so
that there is (currently) only a single standby offering sync rep at any
one time. Other standbys can request to be sync standbys but they only
become the sync standby if the first one fails. Which was simple to do
and bridges the challenges of an exactly identified sync standby and the
fragility of too closely specifying the config.
  


That seems like a good enough starting point to cover a lot of cases.  
Presuming the two servers each at two sites config that shows up in a 
lot of these discussions, people in the I need sync to a remote spot 
can get that, and if that site is unavailable for long enough to be 
kicked out they'll smoothly degrade to sync on a secondary local copy.  
And those who want high performance local sync and best-effort for the 
remote site can configure for that too, and if the local secondary dies 
they'll just degrade to the slow remote commits.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Greg Smith


Kevin Grittner wrote:

Greg Smith  wrote:
 
  

I could see shipping this with the automatic heavy LOCK TABLE in
there.

 
How would you handle or document behavior in REPEATABLE READ

isolation?  The lock doesn't do much good unless you acquire it
before you get your snapshot, right?
  


Hand-wave and hope you offer a suggested implementation?  I haven't 
gotten to thinking about this part just yet--am still assimilating 
toward a next move after the pleasant surprise that this is actually 
working to some degree now.  You're right that turning the high-level 
idea of just lock the table actually has to be mapped into exact 
snapshot mechanics and pitfalls before moving in that direction will get 
very far.  I'm probably not the right person to answer just exactly how 
feasibile that is this week though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Greg Smith


David Fetter wrote:

How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.
  


Main argument against is that path leads to a permanent non-standard 
wart to support forever, just to work around what should be a short-term 
problem.  And I'm not sure whether reducing the goals to only this 
actually improves the ability to ship something in the near term too 
much.  Many of the hard problems people are bothered by don't go away, 
it just makes deciding which side of the speed/complexity trade-off 
you're more interested in becomes more obvious.  What I've been 
advocating is making that decision go away altogether by only worrying 
about the simple to use and slow path for now, but that's a highly 
debatable viewpoint I appreciate the resistence to, if it's possible to 
do at all.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] making an unlogged table logged

2011-01-04 Thread Robert Haas

Somebody asked about this on Depesz's blog today, and I think it's
come up here before too, so I thought it might be worth my writing up
a few comments on this.  I don't think I'm going to have time to work
on this any time soon, but if someone else wants to work up a patch,
I'm game to review.  I think it'd clearly be a good feature.

Generally, to do this, it would be necessary to do the following
things (plus anything I'm forgetting):

1. Take an AccessExclusiveLock on the target table.  You might think
that concurrent selects could be allowed, but I believe that's not the
case.  Read on.

2. Verify that there are no foreign keys referencing other unlogged
tables, because if that were the case then after the change we'd have
a permanent table referencing an unlogged table, which would violate
referential integrity.  (Note that unlogged referencing permanent is
OK, but permanent referencing unlogged is a no-no, so what matters
when upgrading is outbound foreign keys.)

3. Write out all shared buffers for the target table, and drop them.
This ensures that there are no buffers floating around for the target
relation that are marked BM_UNLOGGED, which would be a bad thing.  Or
maybe it's possible to just clear the BM_UNLOGGED flag, instead of
dropping them.  This is the step that makes me think we need an access
exclusive lock - otherwise, somebody else might read in a buffer and,
seeing that the relation is unlogged (which is true, since we haven't
committed yet), mark it BM_UNLOGGED.

4. fsync() any segments of the target relation - of any fork except
that init fork - that might have dirty pages not on disk.

5. Arrange for the appropriate file deletions at commit or abort, by
updating pendingDeletes.  On commit, we want to delete the init fork
for the table and all its indexes.  On abort, we want to delete
everything else, but only for pretend; that is, the abort record
should reflect the deletions since they'll need to happen on any
standbys, but we shouldn't actually perform them on the master since
we don't want to obliterate the contents of the table for no reason.
There's a subtle problem here I'm not quite sure how to deal with:
what happens if we *crash* without writing an abort record?  It seems
like that could leave a stray file around on a standby, because the
current code only cleans things up on the standby at the start of
recovery; to make this bullet-proof, I suppose it'd need to repeat
that every time a crash happens on the master, but I don't know how to
do that.  Note also that if wal_level is minimal, then we need only
worry about the commit case; the abort case can be a no-op.

6. If wal_level != minimal, XLOG every page of every fork except the
init fork, for both the table and the associated indexes. (Note that
this step also requires an AccessExclusiveLock rather than some weaker
lock, because of the arbitrary rule that only AccessExclusiveLocks are
sent to standbys.  If we held only ShareRowExclusiveLock on the
master, for example, a Hot Standby backend might read the table while
it's only been half-copied.)

7. Update pg_class.relpersistence from 'u' to 'p', for both the table
and the associated indexes.

Going the other direction ought to be possible too, although it seems
somewhat less useful.  For that, you'd need to flip around the check
in step #2 (i.e. check for a reference FROM a permanent table),
perform step #3, skip step #4, do step #5 backwards (create and log
init forks, arranging for them to be removed on abort - this too has
an issue with crashes that don't leave abort records behind); and
arrange for the rest of the forks to be removed on commit on any
standby without doing it on the master), skip step #6, and do step #7
backwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - core functionality

2011-01-04 Thread Robert Haas

On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote:
 Hanada-san, can you rebase the fdw_scan patch over what I committed
 and post an updated version ASAP?  It'd be better for Heikki or Tom to
 work on that part of this than me, since they have a better
 understanding of the executor than I do, but I'm sure that they will
 not want to work from the previously posted patches as the changes I
 made are fairly extensive.

Is anyone working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2011-01-04 Thread Robert Haas

On Mon, Jan 3, 2011 at 5:50 PM, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Jan 3, 2011 at 17:23, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 3, 2011 at 11:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On the other hand, the REPLICATION privilege is denying you the right to
 perform an operation *even though you already are authenticated as a
 superuser*.  I don't think there's anywhere else in the system where
 we allow a privilege to non-super-users but deny that same privilege
 to super-users, and I don't think we should be starting now.

 You might want to reflect on rolcatupdate a bit before asserting that
 there are no cases where privileges are ever denied to superusers.

 Oh, huh.  I wasn't aware of that.

 However, that precedent would suggest that the default should be to
 grant the replication bit to superusers.

 Yes it would.

 Just to be clear: are we saying that CREATE ROLE foo SUPERUSER
 should grant both superuser and replication, as well as the default
 postgres user also having replication as well?

I think that's what we're saying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making an unlogged table logged

2011-01-04 Thread Rob Wultsch

On Tue, Jan 4, 2011 at 7:41 PM, Robert Haas robertmh...@gmail.com wrote:
 Somebody asked about this on Depesz's blog today, and I think it's
 come up here before too, so I thought it might be worth my writing up
 a few comments on this.  I don't think I'm going to have time to work
 on this any time soon, but if someone else wants to work up a patch,
 I'm game to review.  I think it'd clearly be a good feature.

 Generally, to do this, it would be necessary to do the following
 things (plus anything I'm forgetting):

 1. Take an AccessExclusiveLock on the target table.  You might think
 that concurrent selects could be allowed, but I believe that's not the
 case.  Read on.

 2. Verify that there are no foreign keys referencing other unlogged
 tables, because if that were the case then after the change we'd have
 a permanent table referencing an unlogged table, which would violate
 referential integrity.  (Note that unlogged referencing permanent is
 OK, but permanent referencing unlogged is a no-no, so what matters
 when upgrading is outbound foreign keys.)

 3. Write out all shared buffers for the target table, and drop them.
 This ensures that there are no buffers floating around for the target
 relation that are marked BM_UNLOGGED, which would be a bad thing.  Or
 maybe it's possible to just clear the BM_UNLOGGED flag, instead of
 dropping them.  This is the step that makes me think we need an access
 exclusive lock - otherwise, somebody else might read in a buffer and,
 seeing that the relation is unlogged (which is true, since we haven't
 committed yet), mark it BM_UNLOGGED.

 4. fsync() any segments of the target relation - of any fork except
 that init fork - that might have dirty pages not on disk.

 5. Arrange for the appropriate file deletions at commit or abort, by
 updating pendingDeletes.  On commit, we want to delete the init fork
 for the table and all its indexes.  On abort, we want to delete
 everything else, but only for pretend; that is, the abort record
 should reflect the deletions since they'll need to happen on any
 standbys, but we shouldn't actually perform them on the master since
 we don't want to obliterate the contents of the table for no reason.
 There's a subtle problem here I'm not quite sure how to deal with:
 what happens if we *crash* without writing an abort record?  It seems
 like that could leave a stray file around on a standby, because the
 current code only cleans things up on the standby at the start of
 recovery; to make this bullet-proof, I suppose it'd need to repeat
 that every time a crash happens on the master, but I don't know how to
 do that.  Note also that if wal_level is minimal, then we need only
 worry about the commit case; the abort case can be a no-op.

 6. If wal_level != minimal, XLOG every page of every fork except the
 init fork, for both the table and the associated indexes. (Note that
 this step also requires an AccessExclusiveLock rather than some weaker
 lock, because of the arbitrary rule that only AccessExclusiveLocks are
 sent to standbys.  If we held only ShareRowExclusiveLock on the
 master, for example, a Hot Standby backend might read the table while
 it's only been half-copied.)

 7. Update pg_class.relpersistence from 'u' to 'p', for both the table
 and the associated indexes.

 Going the other direction ought to be possible too, although it seems
 somewhat less useful.  For that, you'd need to flip around the check
 in step #2 (i.e. check for a reference FROM a permanent table),
 perform step #3, skip step #4, do step #5 backwards (create and log
 init forks, arranging for them to be removed on abort - this too has
 an issue with crashes that don't leave abort records behind); and
 arrange for the rest of the forks to be removed on commit on any
 standby without doing it on the master), skip step #6, and do step #7
 backwards.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

A couple thoughts:
1. Could the making a table logged be a non-exclusive lock if the
ALTER is allowed to take a full checkpoint?
2. Unlogged to logged has giant use case.
3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
they held data that was not vital, but the server was out of IO. Going
logged - unlogged has a significant placed, I think.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - core functionality

2011-01-04 Thread Shigeru HANADA

On Tue, 4 Jan 2011 22:16:26 -0500
Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas robertmh...@gmail.com wrote:
  Hanada-san, can you rebase the fdw_scan patch over what I committed
  and post an updated version ASAP?  It'd be better for Heikki or Tom to
  work on that part of this than me, since they have a better
  understanding of the executor than I do, but I'm sure that they will
  not want to work from the previously posted patches as the changes I
  made are fairly extensive.
 
 Is anyone working on this?

Sorry for late replyl, I'm working on this item.
I would post rebased fdw_scan patch soon.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making an unlogged table logged

2011-01-04 Thread Joshua D. Drake


  --
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 A couple thoughts:
 1. Could the making a table logged be a non-exclusive lock if the
 ALTER is allowed to take a full checkpoint?

If possible, that would certainly be better. If the bgwriter is doing
what it is supposed to, it would be relatively painless.

 2. Unlogged to logged has giant use case.

Agreed.

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - core functionality

2011-01-04 Thread Shigeru HANADA

On Sat, 1 Jan 2011 23:54:05 -0500
Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 27, 2010 at 10:16 PM, Robert Haas robertmh...@gmail.com wrote:
  On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas robertmh...@gmail.com wrote:
  I'm working on getting a first chunk of this committed.
 
  OK, here's the patch.
 
 I've now committed a version of this with a bunch of further
 revisions, corrections, and cleanup.  It looks to me as though this
 patch was written based on the 9.0 code and not thoroughly updated for
 some of the 9.1 changes, but I think I cleaned most of that up.  With
 a patch of this size, I am sure there are a few things I overlooked,
 so please point 'em out and I'll try to fix them promptly.

While testing the commit, I found that CREATE FOREIGN TABLE requires
unnecessary USAGE privilege on the FOREIGN DATA WRAPPER.  SQL/MED
standard requires only USAGE on the SERVER as follows.

quote
1) If foreign table definition is contained in an SQL-client module,
then the enabled authorization identifiers shall include A.

2) The applicable privileges shall include the USAGE privilege on the
foreign-server identified by foreign server name.

3) Additional privileges, if any, necessary to execute foreign table
definition are implementation-defined.
/quote

Sorry, this problem comes from original patch.
OTOH, the document about this specification which is written in GRANT
page is correct.

Regards,
--
Shigeru Hanada


20110105-no_fdw_check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 16:45 -0800, Josh Berkus wrote:
 On 1/4/11 10:18 AM, Jeff Davis wrote:
  The main drawback here is that only a select group of people will be
  defining discrete range types at all, because it would require them to
  define a function first. Perhaps that's for the best, because, (as Tom
  pointed out) we don't want someone using floats and then specifying a
  granule of '0.01'.
 
 Frankly, I'm still not convinced that *anyone* will really need discrete
 range types 

Well, *need* is a standard that can never be met. But with something
like a date range, it's very possible that a discrete version matches
the real-world problem more closely than a continuous one.

If you use only continuous ranges, then be careful to stick with exactly
one convention, or you will likely get wrong results (I think this point
has already been established). That sounds easy, but consider:
 * If you want to know whether two ranges are adjacent (a common
requirement), then you need to use [ ) or ( ].
 * If you need to map a single point into a range, the only thing that
makes sense is [ ].
 * If your query contains current_date, you'll probably want ranges that
are either in ( ] or [ ] form.
 * If you are mixing data sets, they may use different conventions.

You can work around all of these problems by making the query more
complex (and more error-prone). But I wouldn't like to give up on
discrete ranges for types where it really makes sense (dates, IPs,
integers).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We need to log aborted autovacuums

2011-01-04 Thread Greg Smith


Josh Berkus wrote:

I've been trying to diagnose in a production database why certain tables
never get autovacuumed despite having a substantial % of updates.  The
obvious reason is locks blocking autovacuum from vacuuming the table ...
  


Missed this dicussion when it popped up but have plenty to say about it 
now.  What I do here is look for such anomolies using 
pg_stat_user_tables, that the dead rows number has exceeded the vacuum 
threshold.  That comparison is logged in the code at DEBUG3:


   elog(DEBUG3, %s: vac: %.0f (threshold %.0f), anl: %.0f 
(threshold %.0f),

NameStr(classForm-relname),
vactuples, vacthresh, anltuples, anlthresh);

But a rough computation isn't too hard to derive in a report, so long as 
you haven't customized per-table parameters.  As you suggested in your 
message, the really bad cases here stick out a whole lot.  If you pay 
the slightest amount of attention to the dead row percentages they jump 
right out at you.  This all works easily on any version back to 8.3.  
Not having as much relevant data stored in pg_stat_user_tables makes the 
problem cases less obvious to spot in older versions.


If I start seeing these badly maintained tables and suspect locking is 
getting in the way, I then dump traces from pg_locks+pg_stat_activity 
often enough that I can estimate how often someone has an interfering 
lock and what they're doing.


Should the log level on this message go up from DEBUG3?  I could see 
rewriting it so that it logs at DEBUG1 instead when 
Log_autovacuum_min_duration is set *and* when the trigger threshold is 
crossed, and at DEBUG3 the rest of the time.  Given you can derive this 
with a bit of work in userland, I don't see this even being justified as 
an INFO or LOG level message.  Anytime I can script a SQL-level monitor 
for something that's easy to tie into Nagios or something, I greatly 
prefer that to log file scraping for it anyway.



What I'd like to do is add some logging code to autovacuum.c so that if
log_autovacuum is any value other than -1, failure to vacuum due to
locks gets logged.   Does this make sense?
  


The general idea is interesting and probably more productive for the 
situation you theorize is happening then messing with the logging 
discussed above.  But that's not where the code needs to go--the lock 
isn't opened until much further down the function call stack.  Attached 
quickie and only tested for compilation patch probably does what you 
want here.  Since this would eliminate the messy follow-up step I 
sometimes have gone through, dumping pg_locks data to confirm or rule 
out locking issues messing with AV processing, I can see some potential 
that it may have simplified situations I've ran into in the past.  And 
it's not out of line with the logging verbosity of similar failure mode 
tests that follow it.  Right now failure to acquire a lock is just not 
considered a log-worthy issue, and I agree that it's worth considering 
whether it should be. 

If you could gather more info on whether this logging catches the 
problem cases you're seeing, that would really be the right test for the 
patch's usefulness.  I'd give you solid 50/50 odds that you've correctly 
diagnosed the issue, and knowing for sure would make advocating for this 
logging a pretty easy sell to me at least.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2f68df4..2124e25 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*** vacuum_rel(Oid relid, VacuumStmt *vacstm
*** 851,856 
--- 851,864 
  	{
  		PopActiveSnapshot();
  		CommitTransactionCommand();
+ 
+ 		if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
+ 		{
+ 		ereport(INFO,
+ 			(errmsg(skipping \%s\ --- cannot open or obtain lock,
+ 	RelationGetRelationName(onerel;
+ 		}
+ 
  		return;
  	}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 09:27:10PM -0500, Greg Smith wrote:
 David Fetter wrote:
 How about implementing an UPSERT command as take the lock, do the
 merge?  That way, we'd have both the simplicity for the simpler cases
 and a way to relax consistency guarantees for those who would like to
 do so.
 
 Main argument against is that path leads to a permanent non-standard
 wart to support forever, just to work around what should be a
 short-term problem.  And I'm not sure whether reducing the goals to
 only this actually improves the ability to ship something in the
 near term too much.

I think I haven't communicated clearly what I'm suggesting, which is
that we ship with both an UPSERT and a MERGE, the former being ugly,
crude and simple, and the latter festooned with dire warnings about
isolation levels and locking.

If shipping with a wart, as you term it, isn't acceptable, then I'd
advocate for going with just MERGE and documenting it inside and out,
including one or more clearly written UPSERT and/or REPLACE INTO
recipes.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

91 matches

Mail list logo