date:20120123

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-23 Thread Benedikt Grundmann

On 19/01/12 17:39, Greg Smith wrote:
 On 1/19/12 1:10 PM, Robert Haas wrote:
 I have to say that I find that intensely counterintuitive.  The
 current settings are not entirely easy to tune correctly, but at least
 they're easy to explain.
 
 If there's anyone out there who has run a larger PostgreSQL database
 and not at some point been extremely frustrated with how the current
 VACUUM settings are controlled, please speak up and say I'm wrong
 about this. I thought it was well understood the UI was near unusably
 bad, it just wasn't obvious what to do about it.
 
We are frustrated but mostly our frustration is not about the
somewhat inscrutable knobs but the inscrutable meters or lack
there of.  

Postgres (auto or manual for that matter) vacuuming and analyzing 
is essentially a performance tuning problem without a good way to 
measure the current performance, the fact that the knobs to turn 
are confusing as well is secondary.

What I think is missing is a clear way to know if you are vacuuming 
(and analyzing) enough, and how much you are paying for that.  

At the moment we are basically changing the knobs blindly based on
some back of the envelope calculations and hearsay.  Than sometimes
month later we find out that eps we haven't been analyzing enough
and that's why on that particular table the planner is now picking
a bad query.

What I want is that page 

http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html

to start with Here is how you know if you are vacuuming enough...

In an ideal world one would like some meter in a statistics table
or similar that returns a percentage 100% means just enough 50% 
means you have to double 150% means 50% too much (e.g. wasted)...
But I could do with a boolean as well.  A complicated extension
and the recommendation to install 3 different extensions would
be better than what is there right now but only very barely. Of
course a meter wouldn't tell you that if traffic doubled you would 
still keep up and for that you need a complicated calculation or
(you just keep looking at the meter and adjust).

But at the moment there is no such meter (at least I don't know
of it) and that is the actual problem.

My 2cents,

Bene


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Daniel Farina

On Sun, Jan 22, 2012 at 8:42 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 3:20 PM, Daniel Farina dan...@heroku.com wrote:
 A few anecdotes does not constitute evidence, but it does look like
 some people pay attention to any additional versioning foothold they
 can get.

 Sure, but just because some people do it doesn't make it a good idea.

True, I did not mean to suggest that this is clearly the best
mechanism, only to express support for the general idea that people
are appreciative and willing to experiment with any way to make
version management better, and people who use those features are
*probably* going to try to use them correctly.  It's up to us to make
the right-thing easy, too, otherwise I fear we will see too many lousy
version numbers creeping about in the wild.

 Dimitri's proposal was to neuter the pg_dump
 support that is the raison d'être of the extension mechanism.  That's
 clearly necessary if you don't want to end up with an unreloadable
 database, but it begs the question (which no one's really answered
 AFAICT) of what good the extension mechanism is without that feature.

Oh, no, non-reloadability is a really bad thing -- I'd say a pretty
bad deal-breaker -- but as Tom wrote, it does seem like it should
somehow be a tractable problem.

Is it such a bad idea to store the literal text of the extension's
pieces (control file and corresponding SQL program) in catalogs?  I'm
not sure if I understand why everyone is so interested in a special
interaction with the file system in some way.  By the same token,
extensions can be dumped in the literal syntax -- even the ones that
were installed from a file.

 There are certainly easier ways to remember a version number than
 building support for it into core.  If people create their own
 versioning mechanisms, they can create something which is tailor-made
 for their particular requirements, rather than relying on decisions
 which we made in core that may or may not be right for them (e.g. the
 lack of version ordering, or even that we have versions rather than
 some more general type of control table).

I understand the desire to avoid investing in something that is not
what people want.  However, in the interest of scoping the discussion
to the inline extension support, I can't seem to understand the
objection to supporting what is basically a different transport for
precisely the same semantic operation as having to ssh into a machine
and untar some files, except available without the bizarre
side-channel of ssh and fie system mangling when one is loading
trustable operators, itself a raft of usability issues if one wishes
to enable more software reuse.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-23 Thread Fujii Masao

On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

I agree that the feature you pointed is useful for the Apply mode. But
I'm afraid that implementing that feature is not easy and would make
the patch big and complicated, so I didn't implement the Apply mode first.

To make the walreceiver call WaitLatchOrSocket(), we would need to
merge it and libpq_select() into one function. But the former is the backend
function and the latter is the frontend one. Now I have no good idea to
merge them cleanly.

If we send back the reply as soon as the Apply pointer is changed, I'm
afraid quite lots of reply messages are sent frequently, which might
cause performance problem. This is also one of the reasons why I didn't
implement the quick-response feature. To address this problem, we might
need to change the master so that it sends the Wait pointer to the standby,
and change the standby so that it replies whenever the Apply pointer
catches up with the Wait one. This can reduce the number of useless
reply from the standby about the Apply pointer.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Florian Weimer

* Ants Aasma:

 I had a run in with this. JDBC driver versions  9.0 with the default
 configuration resulted in silent data corruption. The fix was easy, but not
 having an useful error was what really bothered me.

Same for the DBD::Pg driver.

In this particular case, I knew that the change was coming and could
push updated Java and Perl client libraries well before the server-side
change hit our internal repository, but I really don't want to have to
pay attention to such details.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

 I agree that the feature you pointed is useful for the Apply mode. But
 I'm afraid that implementing that feature is not easy and would make
 the patch big and complicated, so I didn't implement the Apply mode first.

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

We can wait on the socket wherever it comes from. poll/select doesn't
care how we got the socket.

So we just need a common handler that calls either
walreceiver/libpqwalreceiver function as required to handle the
wakeup.


 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

We send back one reply per incoming message. The incoming messages
don't know request state and checking that has a cost which I don't
think is an appropriate payment since we only need this info when the
link goes quiet.

When the link goes quiet we still need to send replies if we have
apply mode, but we only need to send apply messages if the lsn has
changed because of a commit. That will considerably reduce the
messages sent so I don't see a problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-01-23 Thread Hitoshi Harada

On Sat, Jan 21, 2012 at 9:20 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hi,

 Thank you for reviewing this patch!

 Hitoshi Harada umi.tan...@gmail.com writes:
 The patch applies with one reject, which I could fix easily. The make
 check passed.

 Bitrot happens fast in this season…  will produce another version of the
 patch.

 Table pg_catalog.pg_extension_feature
    Column   | Type | Modifiers
 +--+---
  extoid     | oid  | not null
  extfeature | name | not null
 Indexes:
     pg_extension_feature_index UNIQUE, btree (extoid, extfeature)
     pg_extension_feature_oid_index UNIQUE, btree (oid)

 * I'm not quit sure why pg_extension_feature_index needs extoid column.

 That allows managing features per extension: you need to know which
 extension is providing which feature to be able to solve dependencies.

Do you mean you want UNIQUE constraint by this index? I found the
usage is to search feature by (only) its name, so I wondered if extoid
is not necessary.

 * I have a big question to add two-column catalog. I don't mind the
 actual number of columns, but if the table has only two columns, it
 implies the design may be bad. Only two column catalog other than this
 is pg_largeobject_metadata.

 We need each feature to be a full PostgreSQL object so that we can use
 the dependency tracking.  That allows to manage DROP EXTENSION foo and
 cascade to extensions that depend on feature(s) provided by foo.

I guess if we spend more time, we'll figure out what is feature
actually, and then will see what kind of columns/attributes are needed
to represent it. Although I agree we can add them later, again, this
may imply the design is premature. (it's ok if i am the only person
who thinks so)

 Next, some questions:
 - Why is the finer dependency needed? Do you have tangible example
 that struggles with the dependency granularity? I feel so good about
 the existing dependency on extension as an extension developer of
 several ones.

 The problem is not yet very apparent only because extensions are very
 new. The main thing we address with this patch is depending on a feature
 that appeared while developing an extension or that gets removed down
 the line. It allows to depend on features and avoid needing to compare
 version numbers and maintain a list of which version number is providing
 which feature.

 This feature has been asked by several extension users, beginning even
 before 9.1 got released.

 - What happens if DROP EXTENSION ... CASCADE? Does it work?

 It should, what happens when you try? :)

I just tried DROP EXTENSION now, and found it broken :(

db1=# create extension kmeans;
CREATE EXTENSION
db1=# drop extension kmeans;
ERROR:  cannot drop extension kmeans because extension feature kmeans
requires it
HINT:  You can drop extension feature kmeans instead.
db1=# drop extension kmeans cascade;
ERROR:  cannot drop extension kmeans because extension feature kmeans
requires it
HINT:  You can drop extension feature kmeans instead.

Am I missing something? I'm confused why this happens.


Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-01-23 Thread Dimitri Fontaine

Hitoshi Harada umi.tan...@gmail.com writes:
     pg_extension_feature_index UNIQUE, btree (extoid, extfeature)
 Do you mean you want UNIQUE constraint by this index? I found the
 usage is to search feature by (only) its name, so I wondered if extoid
 is not necessary.

I guess you're right and that's something I've just left when I should
have cleaned it.  We need to find which extension is providing which
feature, and we need feature names to be globally unique.  I'll remove
extoid from this index in the next revision on the patch.

I'm not in a position to provide that next revision just now, that would
happen before the end of the week though.

 I guess if we spend more time, we'll figure out what is feature
 actually, and then will see what kind of columns/attributes are needed
 to represent it. Although I agree we can add them later, again, this
 may imply the design is premature. (it's ok if i am the only person
 who thinks so)

You might be right that a feature is more than just a unique name but as
things are, that's their only useful property.

 - What happens if DROP EXTENSION ... CASCADE? Does it work?

 It should, what happens when you try? :)

 I just tried DROP EXTENSION now, and found it broken :(

 db1=# create extension kmeans;
 CREATE EXTENSION
 db1=# drop extension kmeans;
 ERROR:  cannot drop extension kmeans because extension feature kmeans
 requires it
 HINT:  You can drop extension feature kmeans instead.

Can you provide me the test case you've been using?  That looks like a
bug I need to fix, indeed (unless the problem lies in the test case,
which would mean I need to tighten things some more).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-23 Thread Fujii Masao

On Mon, Jan 23, 2012 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

 I notice that the Apply mode isn't fully implemented. I had in mind
 that you would add the latch required to respond more quickly when
 only the Apply pointer has changed.

 Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
 was there another reason for not implementing that?

 I agree that the feature you pointed is useful for the Apply mode. But
 I'm afraid that implementing that feature is not easy and would make
 the patch big and complicated, so I didn't implement the Apply mode first.

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

 We can wait on the socket wherever it comes from. poll/select doesn't
 care how we got the socket.

 So we just need a common handler that calls either
 walreceiver/libpqwalreceiver function as required to handle the
 wakeup.

I'm afraid I could not understand your idea. Could you explain it in
more detail?

 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

 We send back one reply per incoming message. The incoming messages
 don't know request state and checking that has a cost which I don't
 think is an appropriate payment since we only need this info when the
 link goes quiet.

 When the link goes quiet we still need to send replies if we have
 apply mode, but we only need to send apply messages if the lsn has
 changed because of a commit. That will considerably reduce the
 messages sent so I don't see a problem.

You mean to change the meaning of apply_location? Currently it indicates
the end + 1 of the last replayed WAL record, regardless of whether it's
a commit record or not. So too many replies can be sent per incoming
message because it might contain many WAL records. But you mean to
change apply_location only when a commit record is replayed?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgstat wait timeout

2012-01-23 Thread pratikchirania

Hi,

Any ideas on this?

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5165651.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-23 Thread Fujii Masao

On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for the review!

 On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I'm looking at this patch and wondering why we're doing so many
 press-ups to ensure full_page_writes parameter is on. This will still
 fail if you use a utility that removes the full page writes, but fail
 silently.

 I think it would be beneficial to explicitly check that all WAL
 records have full page writes actually attached to them until we
 achieve consistency.

 I agree that it's worth adding such a safeguard. That can be a self-contained
 feature, so I'll submit a separate patch for that, to keep each patch small.

 Maybe, but you mean do this now as well? Not sure I like silent errors.

If many people think the patch is not acceptable without such a safeguard,
I will do that right now. Otherwise, I'd like to take more time to do
that, i.e.,
add it to 9.2dev Oepn Items.

I've not come up with good idea. Ugly idea is to keep track of all replays of
full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
table and set the specified bit to 1 when full_page_writes is applied),
and then check whether full_page_writes has been already applied when
replaying normal WAL record... Do you have any better idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Cédric Villemain

 Is it such a bad idea to store the literal text of the extension's
 pieces (control file and corresponding SQL program) in catalogs?  I'm
 not sure if I understand why everyone is so interested in a special
 interaction with the file system in some way.  By the same token,
 extensions can be dumped in the literal syntax -- even the ones that
 were installed from a file.

 There are certainly easier ways to remember a version number than
 building support for it into core.  If people create their own
 versioning mechanisms, they can create something which is tailor-made
 for their particular requirements, rather than relying on decisions
 which we made in core that may or may not be right for them (e.g. the
 lack of version ordering, or even that we have versions rather than
 some more general type of control table).

 I understand the desire to avoid investing in something that is not
 what people want.  However, in the interest of scoping the discussion
 to the inline extension support, I can't seem to understand the
 objection to supporting what is basically a different transport for
 precisely the same semantic operation as having to ssh into a machine
 and untar some files, except available without the bizarre
 side-channel of ssh and fie system mangling when one is loading
 trustable operators, itself a raft of usability issues if one wishes
 to enable more software reuse.

or with adminpack, and without ssh, but still interaction with filesystem.
Filesystem rw access is a pain for the DBA. There are hacks possible
to get ride of that (but not completely: mount -o ro partitions for
example...)
I am in favor to be able to create extension directly in plain sql,
without file creation or access or system administrators privileges.
Why wouldn't we want that ?!

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 On reflection it seems like this patch is simply offering the wrong
 solution for the problem.  I agree that it could be useful to install
 extensions without having direct access to the server's filesystem,
 but it doesn't seem to follow that we must lobotomize existing extension
 features in order to have that.  I pointed out earlier that you could
 get such functionality via contrib/adminpack, though people not
 unreasonably complained that that was pretty ugly and low-level.
 But couldn't we define some SQL-level operations to allow installing
 extension control and script files?

Yeah, that's what I was trying to do…

 Probably the worst issue with that is that in typical installations,
 the share/extension/ directory would be read-only to the server, and a
 lot of people might be uncomfortable with making it writable.  Not sure
 whether we should consider inventing another place to keep
 SQL-command-installed extensions, or just say if you want this
 functionality you have to make share/extension/ writable.

So I've been wondering about storing the script content in the catalogs,
but you would have to store all the update scripts too and that's
useless because you want to dump the current state of the system, which
pg_dump is doing just fine.

Back to using the file system on the server when handed the script
content over the protocol, we could get there with a new GUC telling the
server where to find and store “inline” extensions, right?

  extension_inline_directory = '/path/to/some/writable/place'

Then creating an extension would look in both the system extension
directory and the inline one, which covers dump and restore.  Creating
an inline extension means creating the .control and the .sql files in
the extension_inline_directory, then running the current code.

I can adapt the patch to this behavior this week.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] database schema quality survey postgresql pitfalls

2012-01-23 Thread Fabien COELHO



Dear pgdevs,

I've just completed the final version of a survey of database schema quality in
open source software. The survey covers 512 projects which use MySQL and/or
PostgreSQL for storing their data. Automatic analyses are performed by querying
the information schema. Statistical validations are computed on the results.

Here are some findings of specific interest to the list.


About MySQL vs PostgreSQL usage:

 - MySQL is much more often used than PostgreSQL... not a surprise.

 - Even for projects which seems to support PostgreSQL, this support is often
   an afterthough and not necessarily functional. For instance, you may find
   MySQL-specific syntax in the PostgreSQL-specific script, which were clearly
   never tested.

 - Projects which use PostgreSQL statistically have a better quality compared
   to projects which use MySQL. However, the difference is mostly due to
   issues with MySQL.

 - Projects which use PostgreSQL are more often maintained than projects
   with MySQL.


Some features or default behavior of PostgreSQL seem especially error-prone:

 - SERIAL attributes seem to be considered automatically as a primary key,
   so that the primary key declaration is often forgotten.

   This suggests that:

   * the documentation should insist on the potential issue.

   * a WARNING should be displayed when SERIAL is used without an associated
 PRIMARY KEY, or possibly UNIQUE. It should be very rare to desire a
 SERIAL which is not a PK, so this is worth a warning.

 - when loading a schema definition from an SQL script, the default behavior of
   psql is to ignore errors and go on.

   This lead to projects with failing declarations to be ignored because the
   ERROR is lost in the flow of WARNING and NOTICE. If a table is missing,
   the error will be detected because the project is not functional, but if a
   constraint is  missing, it will just be lost. This occur in about 10% of
   pg projects!

   In order to avoid this behavior, one must do a \set ON_ERROR_STOP 1
   at the beginning of the script. However, this is never done. Moreover,
   there is no simple way to trigger the safer behavior from the command
   line but quite a long -v ON_ERROR_STOP=1.

   It seems to me that:

   * the documentation should suggest to use an explicit stop on error setting
 in every script.

   * a psql -C foo.sql (check?) or equivalent short option would help?

   * The current client default verbosity is counter productive for quality.
 It should be reduced to WARNING and above, but should *not* include NOTICE
 which add a lot of noise ignored by the user and which hides more
 important messages. For instance, a PK implies an INDEX, a SERIAL implies
 a SEQUENCE, fine, but what is the point of telling it over and over?
 So I suggest to choose a default client_min_messages = warning.


Finally, some issues where found and already reported some time ago about the
implementation of the information schema by PostgreSQL. For instance,
auto-generated constraint names are not unique as they should be, which
makes having a standard information schema a little bit pointless, as
querying it returns wrong results:-( The summary of the answer was do not use
the information schema, or give unique names, which does not make much sense
for me who is analysing existing projects, and as most constraint names are
generated automatically by PostgreSQL.


For those interested in more details about the survey, a preprint of the paper
is available here :

http://www.cri.ensmp.fr/classement/doc/A-488.pdf

And the tool used for analysing the projects is available at :

http://coelho.net/salix/

Comments are welcome.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Cédric Villemain

Le 23 janvier 2012 11:53, Dimitri Fontaine dimi...@2ndquadrant.fr a écrit :
 Tom Lane t...@sss.pgh.pa.us writes:
 On reflection it seems like this patch is simply offering the wrong
 solution for the problem.  I agree that it could be useful to install
 extensions without having direct access to the server's filesystem,
 but it doesn't seem to follow that we must lobotomize existing extension
 features in order to have that.  I pointed out earlier that you could
 get such functionality via contrib/adminpack, though people not
 unreasonably complained that that was pretty ugly and low-level.
 But couldn't we define some SQL-level operations to allow installing
 extension control and script files?

 Yeah, that's what I was trying to do…

 Probably the worst issue with that is that in typical installations,
 the share/extension/ directory would be read-only to the server, and a
 lot of people might be uncomfortable with making it writable.  Not sure
 whether we should consider inventing another place to keep
 SQL-command-installed extensions, or just say if you want this
 functionality you have to make share/extension/ writable.

 So I've been wondering about storing the script content in the catalogs,
 but you would have to store all the update scripts too and that's
 useless because you want to dump the current state of the system, which
 pg_dump is doing just fine.

 Back to using the file system on the server when handed the script
 content over the protocol, we could get there with a new GUC telling the
 server where to find and store “inline” extensions, right?

  extension_inline_directory = '/path/to/some/writable/place'

 Then creating an extension would look in both the system extension
 directory and the inline one, which covers dump and restore.  Creating
 an inline extension means creating the .control and the .sql files in
 the extension_inline_directory, then running the current code.

 I can adapt the patch to this behavior this week.

if we agree to have that per cluster, then it can be in the
$pgdata/pg_extension or something similar...

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-01-23 Thread Hitoshi Harada

On Mon, Jan 23, 2012 at 2:00 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hitoshi Harada umi.tan...@gmail.com writes:
 - What happens if DROP EXTENSION ... CASCADE? Does it work?

 It should, what happens when you try? :)

 I just tried DROP EXTENSION now, and found it broken :(

 db1=# create extension kmeans;
 CREATE EXTENSION
 db1=# drop extension kmeans;
 ERROR:  cannot drop extension kmeans because extension feature kmeans
 requires it
 HINT:  You can drop extension feature kmeans instead.

 Can you provide me the test case you've been using?  That looks like a
 bug I need to fix, indeed (unless the problem lies in the test case,
 which would mean I need to tighten things some more).

The test case is just above; createdb db1 and create and drop an
extension. The kmean extension is on pgxn. I tried my small test
extension named ext1 which contains only one plpgsql function, and
created it then dropped it, reproduced.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-23 Thread Marko Kreen

On Sat, Jan 21, 2012 at 1:52 PM, Marc Mamin m.ma...@intershop.de wrote:
 
  c. Refine error handling of dblink.c. I think it preserves the
     previous behavior for column number mismatch and type
     conversion exception.

 Hello,

 I don't know if this cover following issue.
 I just mention it for the case you didn't notice it and would like to
 handle this rather cosmetic issue as well.

 http://archives.postgresql.org/pgsql-bugs/2011-08/msg00113.php

It is not relevant to this thread, but seems good idea to implement indeed.
It should be simple matter of creating handler that uses dblink_res_error()
to report the notice.

Perhaps you could create and submit the patch by yourself?

For reference, here it the full flow in PL/Proxy:

1) PQsetNoticeReceiver:
https://github.com/markokr/plproxy-dev/blob/master/src/execute.c#L422
2) handle_notice:
https://github.com/markokr/plproxy-dev/blob/master/src/execute.c#L370
3) plproxy_remote_error:
https://github.com/markokr/plproxy-dev/blob/master/src/main.c#L82

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-23 Thread Hitoshi Harada

On Thu, Jan 19, 2012 at 1:58 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 1:11 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 On Sat, Jan 14, 2012 at 8:10 AM, Matthew Draper matt...@trebex.net wrote:

 I just remembered to make time to advance this from WIP to proposed
 patch this week... and then worked out I'm rudely dropping it into the
 last commitfest at the last minute. :/

 The patch applies clean against master but compiles with warnings.
 functions.c: In function ‘prepare_sql_fn_parse_info’:
 functions.c:212: warning: unused variable ‘argnum’
 functions.c: In function ‘sql_fn_post_column_ref’:
 functions.c:341: warning: implicit declaration of function 
 ‘ParseFuncOrColumn’
 functions.c:345: warning: assignment makes pointer from integer without a 
 cast

 (Now it occurred to me that forgetting the #include parse_func.h might
 hit this breakage..., so I'll fix it here and continue to test, but if
 you'll fix it yourself, let me know)

 I fixed it here and it now works with my environment. The regression
 tests pass, the feature seems working as aimed, but it seems to me
 that it needs more test cases and documentation. For the tests, I
 believe at least we need ambiguous case given upthread, so that we
 can ensure to keep compatibility. For the document, it should describe
 the name resolution rule, as stated in the patch comment.

 Aside from them, I wondered at first what if the function is
 schema-qualified. Say,

 CREATE FUNCTION s.f(a int) RETURNS int AS $$
  SELECT b FROM t WHERE a = s.f.a
 $$ LANGUAGE sql;

 It actually errors out, since function-name-qualified parameter only
 accepts function name without schema name, but it looked weird to me
 at first. No better idea from me at the moment, though.

 I mark this Waiting on Author for now.

It's been a few days since my last comment, but are you sending a new
patch? If there's no reply, I'll make it Returned with Feedback.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multithread Query Planner

2012-01-23 Thread Yeb Havinga


On 2012-01-13 21:14, Frederico wrote:

Hi folks.

Is there any restriction in create and start threads inside Postgres?

I'm trying to develop a multithread planner, and some times is raised a 
exception of access memory.

I'm debugging the code to see if is a bug in the planner, but until now, I 
still not found. I tried to use the same memory context of root process and 
create a new context to each new thread, but doesn't worked.


Any tips?


Not sure if it is of any use to you, but the vldb paper 'Parallelizing 
Query Optimization' http://www.vldb.org/pvldb describes a experimental 
implementation in PostgreSQL.


regards,
Yeb


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Restore process during recovery

2012-01-23 Thread Fujii Masao

On Fri, Jan 20, 2012 at 7:50 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Requested update

 Thanks! Will review.

In StartChildProcess(), the code which emits an error when fork of walrestore
fails is required.

In reaper(), the following comment needs to be updated because an unexpected
exit (including FATAL) is treated as a crash in the patch.

/*
 * Was it the wal restore?  If exit status is zero (normal) or 
one
 * (FATAL exit), we assume everything is all right just like 
normal
 * backends.
 */
if (pid == WalRestorePID)

Why does walrestore need to be invoked even when restore_command is
not specified? It seems to be useless. We invoke walreceiver only when
primary_conninfo is specified now. Similarly we should invoke walrestore
only when restore_command is specified?

When I set up the file-based log-shipping environment using pg_standby,
ran pgbench -i -s2, waited for walrestore to restore at least one WAL
file, and created the trigger file, then I encounterd the following error in
the standby.

sby LOG:  startup process requests 00010003 from archive
trigger file found: smart failover
sby LOG:  startup process sees last file was 00010003
sby FATAL:  could not rename file pg_xlog/RECOVERYXLOG to
pg_xlog/00010003: No such file or directory
sby LOG:  startup process (PID 11079) exited with exit code 1
sby LOG:  terminating any other active server processes

When I set up streaming replication with setting restore_command,
I got the following messages repeatedly. The WAL file name was always
.

sby1 LOG:  walrestore checking for next file to restore
sby1 LOG:  restore of  is already complete, so sleep

In PostmasterStateMachine(), the following comment needs to mention WALRestore.

 * PM_WAIT_READONLY state ends when we have no regular backends 
that
 * have been started during recovery.  We kill the startup and
 * walreceiver processes and transition to PM_WAIT_BACKENDS.  
Ideally,

In walrestore.c, the following comments seem to be incorrect. At least
an unexpected
exit of WALRestore doesn't start a recovery cycle in the patch.

 * If the WAL restore exits unexpectedly, the postmaster treats
that the same
 * as a backend crash: shared memory may be corrupted, so remaining backends
 * should be killed by SIGQUIT and then a recovery cycle started.

In walrestore.c
+ * Main entry point for walrestore process
+ *
+ * This is invoked from BootstrapMain, which has already created the basic
+ * execution environment, but not enabled signals yet.

BootstrapMain() doesn't exist, and it should be changed to
AuxiliaryProcessMain().
This is not a fault of the patch. There are the same typos in
bgwriter.c, walwriter.c
and checkpointer.c

In walrestore.c
+* SIGUSR1 is presently unused; keep it spare in case someday we want 
this
+* process to participate in ProcSignal signalling.

The above comment is incorrect because SIGUSR1 is presently used.

+   /*
+* From here on, elog(ERROR) should end with exit(1), 
not send
+* control back to the sigsetjmp block above
+*/
+   ExitOnAnyError = true;

The above is not required because sigsetjmp is not used in walrestore.c

+   /* Normal exit from the walwriter is here */
+   proc_exit(0);   /* done */

Typo: s/walwriter/walrestore

I've not reviewed the patch enough yet. Will review the patch tomorrow again.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao masao.fu...@gmail.com wrote:

 To make the walreceiver call WaitLatchOrSocket(), we would need to
 merge it and libpq_select() into one function. But the former is the backend
 function and the latter is the frontend one. Now I have no good idea to
 merge them cleanly.

 We can wait on the socket wherever it comes from. poll/select doesn't
 care how we got the socket.

 So we just need a common handler that calls either
 walreceiver/libpqwalreceiver function as required to handle the
 wakeup.

 I'm afraid I could not understand your idea. Could you explain it in
 more detail?

We either tell libpqwalreceiver about the latch, or we tell
walreceiver about the socket used by libpqwalreceiver.

In either case we share a pointer from one module to another.

 If we send back the reply as soon as the Apply pointer is changed, I'm
 afraid quite lots of reply messages are sent frequently, which might
 cause performance problem. This is also one of the reasons why I didn't
 implement the quick-response feature. To address this problem, we might
 need to change the master so that it sends the Wait pointer to the standby,
 and change the standby so that it replies whenever the Apply pointer
 catches up with the Wait one. This can reduce the number of useless
 reply from the standby about the Apply pointer.

 We send back one reply per incoming message. The incoming messages
 don't know request state and checking that has a cost which I don't
 think is an appropriate payment since we only need this info when the
 link goes quiet.

 When the link goes quiet we still need to send replies if we have
 apply mode, but we only need to send apply messages if the lsn has
 changed because of a commit. That will considerably reduce the
 messages sent so I don't see a problem.

 You mean to change the meaning of apply_location? Currently it indicates
 the end + 1 of the last replayed WAL record, regardless of whether it's
 a commit record or not. So too many replies can be sent per incoming
 message because it might contain many WAL records. But you mean to
 change apply_location only when a commit record is replayed?

There is no change to the meaning of apply_location. The only change
is that we send that message only when it has an updated value of
committed lsn.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 5:53 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Probably the worst issue with that is that in typical installations,
 the share/extension/ directory would be read-only to the server, and a
 lot of people might be uncomfortable with making it writable.  Not sure
 whether we should consider inventing another place to keep
 SQL-command-installed extensions, or just say if you want this
 functionality you have to make share/extension/ writable.

 So I've been wondering about storing the script content in the catalogs,
 but you would have to store all the update scripts too and that's
 useless because you want to dump the current state of the system, which
 pg_dump is doing just fine.

I'm not convinced that's useless.  It would meet Dan's requirement to
be able to manage the whole thing via the FE-BE protocol, and we could
make the CREATE EXTENSION mechanism transparently search both the
catalog and the filesystem when an extension is installed.  I'm
imagining that we'd create a catalog that would act as a sort of
virtual directory - e.g. CREATE TABLE pg_extension_virtualdir
(filename text, content text) which would be modifiable by the DBA and
would be searched either before or after the filesystem itself.  This
catalog wouldn't be dumped by pg_dump, and there would be no changes
to how extensions whose files are loaded from this catalog are dumped
vs. those whose files are loaded from the filesystem.  Rather, just as
now, it would be the DBA's responsibility to make sure that the
extensions needed to reload a given dump file are present on the new
system - except now they'd have two choices where to put the related
fies: on the file system, or in the per-database virtual directory.

 Back to using the file system on the server when handed the script
 content over the protocol, we could get there with a new GUC telling the
 server where to find and store “inline” extensions, right?

  extension_inline_directory = '/path/to/some/writable/place'

 Then creating an extension would look in both the system extension
 directory and the inline one, which covers dump and restore.  Creating
 an inline extension means creating the .control and the .sql files in
 the extension_inline_directory, then running the current code.

This is another possible approach, but it requires a bit more
configuration, and we'd better think carefully about what a malicious
non-superuser DBA can do by changing that GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Restore process during recovery

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 12:23 PM, Fujii Masao masao.fu...@gmail.com wrote:

 I've not reviewed the patch enough yet. Will review the patch tomorrow again.

Thanks very much. I'm sure that's enough to keep me busy a few days.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

 That's kinda my feeling as well.  The free list in its current form is
 pretty much useless, but I don't think we'll save much by getting rid
 of it, because that's just a single test.  The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?  Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

 So I think
 it would be advisable to prove rather than just assume that that's a
 problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock.  I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 1:38 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 What options are available to see rate of workloads of components
 within a particular query?

I usually use oprofile, though I'm given to understand it's been
superseded by a new tool called perf.  I haven't had a chance to
experiment with perf yet, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 10:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for the review!

 On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I'm looking at this patch and wondering why we're doing so many
 press-ups to ensure full_page_writes parameter is on. This will still
 fail if you use a utility that removes the full page writes, but fail
 silently.

 I think it would be beneficial to explicitly check that all WAL
 records have full page writes actually attached to them until we
 achieve consistency.

 I agree that it's worth adding such a safeguard. That can be a 
 self-contained
 feature, so I'll submit a separate patch for that, to keep each patch small.

 Maybe, but you mean do this now as well? Not sure I like silent errors.

 If many people think the patch is not acceptable without such a safeguard,
 I will do that right now. Otherwise, I'd like to take more time to do
 that, i.e.,
 add it to 9.2dev Oepn Items.

 I've not come up with good idea. Ugly idea is to keep track of all replays of
 full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
 table and set the specified bit to 1 when full_page_writes is applied),
 and then check whether full_page_writes has been already applied when
 replaying normal WAL record... Do you have any better idea?

Not sure.

I think the only possible bug here is one introduced by an outside utility.

In that case, I don't think it should be the job of the backend to go
too far to protect against such atypical error. So if we can't solve
it fairly easily and with no overhead then I'd say lets skip it. We
could easily introduce a bug here just by having faulty checking code.

So lets add it to 9.2 open items as a non-priority item. I'll proceed
to commit for this now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
 If many people think the patch is not acceptable without such a safeguard,
 I will do that right now.

That's my view.  I think we ought to resolve this issue before commit,
especially since it seems unclear that we know how to fix it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 8:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao masao.fu...@gmail.com wrote:
 If many people think the patch is not acceptable without such a safeguard,
 I will do that right now.

 That's my view.  I think we ought to resolve this issue before commit,
 especially since it seems unclear that we know how to fix it.

Actually, never mind.  On reading this more carefully, I'm not too
concerned about the possibility of people breaking it with pg_lesslog
or similar.  But it should be solid if you use only the functionality
built into core.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas robertmh...@gmail.com wrote:

 It's pretty trivial to prove that there is a very serious problem with
 BufFreelistLock.  I'll admit I can't prove what the right fix is just
 yet, and certainly measurement is warranted.

I agree there is a problem with BufFreelistLock (so please share your
results with Heikki, who seems not to).

As a result, I've published patches which reduce contention on that
lock in various ways, all of which seem valid to me.

There are lots of things we could have done for 9.2 but didn't, yet we
have some things that did get done on the table right now so it would
be useful to push those through immediately or at least defer
discussion on other things until we get back around to this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 virtual directory - e.g. CREATE TABLE pg_extension_virtualdir
 (filename text, content text) which would be modifiable by the DBA and
 would be searched either before or after the filesystem itself.  This
 catalog wouldn't be dumped by pg_dump, and there would be no changes
 to how extensions whose files are loaded from this catalog are dumped
 vs. those whose files are loaded from the filesystem.  Rather, just as

That's the thing I don't like in this approach. Maybe it's just me but
the primary intention on working on extension was to make dump and
restore do the right thing all by itself.

Now if we have “inline” (SQL only) extensions, the right thing happens
to be very different from when you're dealing with contrib like ones,
namely I would want the script to be dumped.

  extension_inline_directory = '/path/to/some/writable/place'

 This is another possible approach, but it requires a bit more
 configuration, and we'd better think carefully about what a malicious
 non-superuser DBA can do by changing that GUC.

I think Cédric nailed it down upthread, proposing that we just use a
PGDATA sub directory called 'pg_extension'. In fact, that would need to
be a per-database sub directory. Then there's nothing to setup, nothing
to abuse.

Also remember that we're limiting this feature to SQL only extensions
(because we don't want to be loading our .so from anywhere in the system
and forcing them into a place owned by root is giving confidence, IIUC).
With SQL only extensions, it's all non-superuser land anyway.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-23 Thread Simon Riggs

On Sun, Jan 22, 2012 at 3:48 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I tried to implement a fdw module that is designed to utilize GPU
 devices to execute
 qualifiers of sequential-scan on foreign tables managed by this module.

 It was named PG-Strom, and the following wikipage gives a brief
 overview of this module.
    http://wiki.postgresql.org/wiki/PGStrom

 In our measurement, it achieves about x10 times faster on
 sequential-scan with complex-
 qualifiers, of course, it quite depends on type of workloads.

Very cool. Someone's been busy.

I see you've introduced 3 new features here at same time
* GPU access
* column store
* compiled WHERE clauses

It would be useful to see if we can determine which of those gives the
most benefit and whether other directions emerge.

Also, the query you mention is probably the best performing query you
can come up with. It looks like a GIS query, yet isn't. Would it be
possible to run tests on the TPC-H suite and do a full comparison of
strengths/weaknesses so we can understand the breadth of applicability
of the techniques.

This is a very interesting line of discussion, but please can we hold
off further posts about it until after the CF is over?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Robert Haas

There was finally some time available on Nate Boley's server, which he
has been kind enough to make highly available for performance testing
throughout this cycle, and I got a chance to run some benchmarks
against a bunch of the perfomance-related patches in the current
CommitFest.  Specifically, I did my usual pgbench tests: 3 runs at
scale factor 100, with various client counts.  I realize that this is
not the only or even most interesting thing to test, but I felt it
would be useful to have this information as a baseline before
proceeding to more complicated testing.  I have another set of tests
running now with a significantly different configuration that will
hopefully provide some useful feedback on some of the things this test
fails to capture, and will post the results of the tests (and the
details of the test configuration) as soon as those results are in.

For the most part, I only tested each patch individually, but in one
case I also tested two patches together (buffreelistlock-reduction-v1
with freelist-ok-v2).  Results are the median of three five-minute
test runs, with one exception: buffreelistlock-reduction-v1 crapped
out during one of the test runs with the following errors, so I've
shown the results for both of the successful runs (though I'm not sure
how relevant the numbers are given the errors, as I expect there is a
bug here somewhere):

log.ws.buffreelistlock-reduction-v1.1.100.300:ERROR:  could not read
block 0 in file base/20024/11780: read only 0 of 8192 bytes
log.ws.buffreelistlock-reduction-v1.1.100.300:CONTEXT:  automatic
analyze of table rhaas.public.pgbench_branches
log.ws.buffreelistlock-reduction-v1.1.100.300:ERROR:  could not read
block 0 in file base/20024/11780: read only 0 of 8192 bytes
log.ws.buffreelistlock-reduction-v1.1.100.300:CONTEXT:  automatic
analyze of table rhaas.public.pgbench_tellers
log.ws.buffreelistlock-reduction-v1.1.100.300:ERROR:  could not read
block 0 in file base/20024/11780: read only 0 of 8192 bytes
log.ws.buffreelistlock-reduction-v1.1.100.300:CONTEXT:  automatic
analyze of table rhaas.pg_catalog.pg_database
log.ws.buffreelistlock-reduction-v1.1.100.300:ERROR:  could not read
block 0 in file base/20024/11780: read only 0 of 8192 bytes
log.ws.buffreelistlock-reduction-v1.1.100.300:STATEMENT:  vacuum
analyze pgbench_branches
log.ws.buffreelistlock-reduction-v1.1.100.300:ERROR:  could not read
block 0 in file base/20024/11780: read only 0 of 8192 bytes
log.ws.buffreelistlock-reduction-v1.1.100.300:STATEMENT:  select
count(*) from pgbench_branches

Just for grins, I ran the same set of tests against REL9_1_STABLE, and
the results of those tests are also included below.  It's worth
grinning about: on this test, at 32 clients, 9.2devel (as of commit
4f42b546fd87a80be30c53a0f2c897acb826ad52, on which all of these tests
are based) is 25% faster on permanent tables, 109% faster on unlogged
tables, and 474% faster on a SELECT-only test.

Here's the test configuration:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

And here are the results.  For everything against master, I've also
included the percentage speedup or slowdown vs. the same test run
against master.  Many of these numbers are likely not statistically
significant, though some clearly are.

** pgbench, permanent tables, scale factor 100, 300 s **
1 master 686.038059
8 master 4425.79
16 master 7808.389490
24 master 13276.472813
32 master 11920.691220
80 master 12560.803169
1 REL9_1_STABLE 627.879523 -8.5%
8 REL9_1_STABLE 4188.731855 -5.4%
16 REL9_1_STABLE 7433.309556 -4.8%
24 REL9_1_STABLE 10496.411773 -20.9%
32 REL9_1_STABLE 9547.804833 -19.9%
80 REL9_1_STABLE 7197.655050 -42.7%
1 background-clean-slru-v2 629.518668 -8.2%
8 background-clean-slru-v2 4794.662182 +8.3%
16 background-clean-slru-v2 8062.151120 +3.2%
24 background-clean-slru-v2 13275.834722 -0.0%
32 background-clean-slru-v2 12024.410625 +0.9%
80 background-clean-slru-v2 12113.589954 -3.6%
1 buffreelistlock-reduction-v1 512.828482 -25.2%
8 buffreelistlock-reduction-v1 4765.576805 +7.7%
16 buffreelistlock-reduction-v1 8030.477792 +2.8%
24 buffreelistlock-reduction-v1 13118.481248 -1.2%
32 buffreelistlock-reduction-v1 11895.847998 -0.2%
80 buffreelistlock-reduction-v1 12015.291045 -4.3%
1 buffreelistlock-reduction-v1-freelist-ok-v2 621.960997 -9.3%
8 buffreelistlock-reduction-v1-freelist-ok-v2 4650.200642 +5.1%
16 buffreelistlock-reduction-v1-freelist-ok-v2 7999.167629 +2.4%
24 buffreelistlock-reduction-v1-freelist-ok-v2 13070.123153 -1.6%
32 buffreelistlock-reduction-v1-freelist-ok-v2 11808.986473 -0.9%
80 buffreelistlock-reduction-v1-freelist-ok-v2 12136.960028 -3.4%
1 freelist-ok-v2 629.832419 -8.2%
8 freelist-ok-v2 4800.267011 +8.5%
16 freelist-ok-v2 8018.571815 +2.7%
24 freelist-ok-v2 13122.167158 -1.2%
32 freelist-ok-v2 12004.261737 +0.7%
80 freelist-ok-v2 12188.211067 -3.0%
1

Re: [HACKERS] Inline Extension

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 8:25 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 virtual directory - e.g. CREATE TABLE pg_extension_virtualdir
 (filename text, content text) which would be modifiable by the DBA and
 would be searched either before or after the filesystem itself.  This
 catalog wouldn't be dumped by pg_dump, and there would be no changes
 to how extensions whose files are loaded from this catalog are dumped
 vs. those whose files are loaded from the filesystem.  Rather, just as

 That's the thing I don't like in this approach. Maybe it's just me but
 the primary intention on working on extension was to make dump and
 restore do the right thing all by itself.

Well, fair enough: there's no accounting for taste.  You could make my
solution work with pg_dump in a fully automated fashion if you dumped
out the virtual directory contents before dumping any CREATE
EXTENSION statements, but I'm not going to get up on my soapbox and
say that's the world's best design, so if you don't like it, fine!

I am pretty concerned that we find a design that does not involve
pg_dump needing to dump out the extension contents, though: that seems
to me to be missing the point of having extensions in the first place.

 Now if we have “inline” (SQL only) extensions, the right thing happens
 to be very different from when you're dealing with contrib like ones,
 namely I would want the script to be dumped.

  extension_inline_directory = '/path/to/some/writable/place'

 This is another possible approach, but it requires a bit more
 configuration, and we'd better think carefully about what a malicious
 non-superuser DBA can do by changing that GUC.

 I think Cédric nailed it down upthread, proposing that we just use a
 PGDATA sub directory called 'pg_extension'. In fact, that would need to
 be a per-database sub directory. Then there's nothing to setup, nothing
 to abuse.

Hmm, that might have something going for it.  It seems comparatively
easy to implement, and it also seems to do a pretty good job hiding
the complexity under the hood where users don't have to worry about
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 4:03 AM, Florian Weimer fwei...@bfk.de wrote:
 * Ants Aasma:
 I had a run in with this. JDBC driver versions  9.0 with the default
 configuration resulted in silent data corruption. The fix was easy, but not
 having an useful error was what really bothered me.

 Same for the DBD::Pg driver.

 In this particular case, I knew that the change was coming and could
 push updated Java and Perl client libraries well before the server-side
 change hit our internal repository, but I really don't want to have to
 pay attention to such details.

But if we *don't* turn this on by default, then chances are very good
that it will get much less use.  That doesn't seem good either.  If
it's important enough to do it at all, then IMHO it's important enough
for it to be turned on by default.  We have never made any guarantee
that the binary format won't change from release to release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 1:53 PM, Robert Haas robertmh...@gmail.com wrote:

 Results are the median of three five-minute test runs

 checkpoint_timeout = 15min

Test duration is important for tests that don't relate to pure
contention reduction, which is every patch apart from XLogInsert.
We've discussed that before, so not sure what value you assign to
these results. Very little, is my view, so I'm a little disappointed
to see this post and the associated comments.

I'm very happy to see that your personal work has resulted in gains
and these results are valid tests of that work, IMHO. If you only
measure throughput you're only measuring half of what users care
about. We've not yet seen any tests that confirm that other important
issues have not been made worse.

Before commenting on individual patches its clear that the tests
you've run aren't even designed to highlight the BufFreelistLock
contention that is present in different configs, so that alone is
sufficient to throw most of this away.

On particular patches

* background-clean-slru-v2 related very directly to reducing the
response time spikes you showed us in your last set of results. Why
not repeat those same tests??

* removebufmgrfreelist-v1 related to the impact of dropping
tables/index/databases, so given the variability of the results, that
at least shows it has no effect in the general case.

 And here are the results.  For everything against master, I've also
 included the percentage speedup or slowdown vs. the same test run
 against master.  Many of these numbers are likely not statistically
 significant, though some clearly are.

 with one exception: buffreelistlock-reduction-v1 crapped
 out during one of the test runs with the following errors

That patch comes with the proviso, stated in comments:
We didn't get the lock, but read the value anyway on the assumption
that reading this value is atomic.
So we seem to have proved that reading it without the lock isn't safe.

The remaining patch you tested was withdrawn and not submitted to the CF.

Sigh.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Florian Weimer

* Robert Haas:

 In this particular case, I knew that the change was coming and could
 push updated Java and Perl client libraries well before the server-side
 change hit our internal repository, but I really don't want to have to
 pay attention to such details.

 But if we *don't* turn this on by default, then chances are very good
 that it will get much less use.  That doesn't seem good either.  If
 it's important enough to do it at all, then IMHO it's important enough
 for it to be turned on by default.  We have never made any guarantee
 that the binary format won't change from release to release.

The problem here are libpq-style drivers which expose the binary format
to the application.  The Java driver doesn't do that, but the Perl
driver does.  (However, both transparently decode BYTEA values received
in text format, which led to the compatibility issue.)

If you've version negotiation and you don't expose the binary format,
then all clients can use the most efficient format automatically.  If I
understand things correctly, this is where the JDBC driver is heading
(that is, automatic use of binary format).

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-23 Thread Kohei KaiGai

2012/1/23 Simon Riggs si...@2ndquadrant.com:
 On Sun, Jan 22, 2012 at 3:48 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I tried to implement a fdw module that is designed to utilize GPU
 devices to execute
 qualifiers of sequential-scan on foreign tables managed by this module.

 It was named PG-Strom, and the following wikipage gives a brief
 overview of this module.
    http://wiki.postgresql.org/wiki/PGStrom

 In our measurement, it achieves about x10 times faster on
 sequential-scan with complex-
 qualifiers, of course, it quite depends on type of workloads.

 Very cool. Someone's been busy.

 I see you've introduced 3 new features here at same time
 * GPU access
 * column store
 * compiled WHERE clauses

 It would be useful to see if we can determine which of those gives the
 most benefit and whether other directions emerge.

 Also, the query you mention is probably the best performing query you
 can come up with. It looks like a GIS query, yet isn't. Would it be
 possible to run tests on the TPC-H suite and do a full comparison of
 strengths/weaknesses so we can understand the breadth of applicability
 of the techniques.

DBT-2 is a good alternative, even though TPC-H is expensive to run.

 This is a very interesting line of discussion, but please can we hold
 off further posts about it until after the CF is over?

Yep, I agree.
We should handle existing patches first, then new features of v9.3.

I'll back to review the pgsql_fdw.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 2:49 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Also, the query you mention is probably the best performing query you
 can come up with. It looks like a GIS query, yet isn't. Would it be
 possible to run tests on the TPC-H suite and do a full comparison of
 strengths/weaknesses so we can understand the breadth of applicability
 of the techniques.

 DBT-2 is a good alternative, even though TPC-H is expensive to run.

DBT-2 is an OLTP test, not a DSS/DW test.

I'm not interested in the full TPC-H test, just a query by query
comparison of how well this stacks up. If there are other tests that
are also balanced/representative, I'd like to see those also. Just so
we can see the benefit envelope.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Marko Kreen

On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen
mikko.tiiho...@nitorcreations.com wrote:
 * introduced a new GUC variable array_output copying the current
  bytea_output type, with values full (old value) and
  smallfixed (new default)
 * added documentation for the new GUC variable

If this variable changes protocol-level layout
and is user-settable, shouldn't it be GUC_REPORT?

Now that I think about it, same applies to bytea_output?

You could say the problem does not appear if the
clients always accepts server default.  But how can
the client know the default?  If the client is required
to do SHOW before it can talk to server then that
seems to hint those vars should be GUC_REPORT.

Same story when clients are always expected to set
the vars to their preferred values.  Then you get
clients with different settings on one server.
This breaks transaction-pooling setups (pgbouncer).
Again, such protocol-changing tunables should be
GUC_REPORT.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-23 Thread Simon Riggs

On Tue, Jan 3, 2012 at 7:49 PM, Pavel Stehule pavel.steh...@gmail.com wrote:

 I change a patch and now ALTER TABLE, ALTER INDEX, ALTER SEQUENCE and
 ALTER VIEW has IF EXISTS clause

Patch no longer applies. Pls update.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 I am pretty concerned that we find a design that does not involve
 pg_dump needing to dump out the extension contents, though: that seems
 to me to be missing the point of having extensions in the first place.

I was just trying to explain where I'm coming from, I'm not wedded to
the idea though, all the more when I think that we're reaching a much
better one.

 I think Cédric nailed it down upthread, proposing that we just use a
 PGDATA sub directory called 'pg_extension'. In fact, that would need to
 be a per-database sub directory. Then there's nothing to setup, nothing
 to abuse.

 Hmm, that might have something going for it.  It seems comparatively
 easy to implement, and it also seems to do a pretty good job hiding
 the complexity under the hood where users don't have to worry about
 it.

And then basebackup and pg_upgrade would just work, and for dump and
restore we still need to find something not violating the POLA.

I think that would mean offering a backend function that list all files
from a given extension and their content, including the control files,
and a query that stores that output for only “inline” extensions.  The
content of the query result is formatted as a series of create extension
and alter extension update (in the right order) in the dump file so that
it just transparently re-creates the files for you on the new databse.

Or do you still want to insist that dump/restore shouldn't care about
any extension, inline or not, and so you're given the responsibility to
do the exact same thing yourself on the client side?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 9:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Test duration is important for tests that don't relate to pure
 contention reduction, which is every patch apart from XLogInsert.

Yes, I know.  I already said that I was working on more tests to
address other use cases.

 I'm very happy to see that your personal work has resulted in gains
 and these results are valid tests of that work, IMHO. If you only
 measure throughput you're only measuring half of what users care
 about. We've not yet seen any tests that confirm that other important
 issues have not been made worse.

I personally think throughput is awfully important, but clearly
latency matters as well, and that is why *even as we speak* I am
running more tests.  If there are other issues with which you are
concerned besides latency and throughput, please say what they are.

 On particular patches

 * background-clean-slru-v2 related very directly to reducing the
 response time spikes you showed us in your last set of results. Why
 not repeat those same tests??

I'm working on it.  Actually, I'm attempting to improve my previous
test configuration by making some alterations per some of your
previous suggestions.  I plan to post the results of those tests once
I have run them.

 * removebufmgrfreelist-v1 related to the impact of dropping
 tables/index/databases, so given the variability of the results, that
 at least shows it has no effect in the general case.

I think it needs some tests with a larger scale factor before drawing
any general conclusions, since this test, as you mentioned above,
doesn't involve much buffer eviction.  As it turns out, I am working
on running such tests.

 That patch comes with the proviso, stated in comments:
 We didn't get the lock, but read the value anyway on the assumption
 that reading this value is atomic.
 So we seem to have proved that reading it without the lock isn't safe.

I am not sure what's going on with that patch, but clearly something
isn't working right.  I don't know whether it's that or something
else, but it does look like there's a bug.

 The remaining patch you tested was withdrawn and not submitted to the CF.

Oh.  Which one was that?  I thought all of these were in play.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-23 Thread Robert Haas

On Tue, Jan 3, 2012 at 2:49 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 jup, we can continue in enhancing step by step.

 I change a patch and now ALTER TABLE, ALTER INDEX, ALTER SEQUENCE and
 ALTER VIEW has IF EXISTS clause

ALTER FOREIGN TABLE should be parallel as well, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 10:04 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 And then basebackup and pg_upgrade would just work, and for dump and
 restore we still need to find something not violating the POLA.

 I think that would mean offering a backend function that list all files
 from a given extension and their content, including the control files,
 and a query that stores that output for only “inline” extensions.  The
 content of the query result is formatted as a series of create extension
 and alter extension update (in the right order) in the dump file so that
 it just transparently re-creates the files for you on the new databse.

Hmm.  But CREATE EXTENSION / ALTER EXTENSION doesn't seem right,
because the files in the directory correspond to *available*
extensions, not already-created ones.  We need some way of dumping and
restoring the files themselves, not the extension that can be created
from them.  I suspect internal functions (pg_whatever) make more sense
than new SQL syntax, since this is really only to make pg_dump happy.

 Or do you still want to insist that dump/restore shouldn't care about
 any extension, inline or not, and so you're given the responsibility to
 do the exact same thing yourself on the client side?

How about adding a new pg_dump option to suppress this part of the dump?

It seems to me that there will be people who want to do that; for
example, it might be that all (or some, or one) of the extensions that
were installed this way on the old server are installed globally on
the new server.  We need some way to cope with that.  Having a new
pg_dump option to suppress this output is not terribly granular but
maybe it'd be enough for round one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Hmm.  But CREATE EXTENSION / ALTER EXTENSION doesn't seem right,
 because the files in the directory correspond to *available*
 extensions, not already-created ones.  We need some way of dumping and

I would have limited the dump query to only known installed extensions,
right.  The update scripts are still needed because with inline
extensions you typically never see a 1.2 script but a 1.0 then 1.0--1.1
and then a 1.1--1.2.

 restoring the files themselves, not the extension that can be created
 from them.  I suspect internal functions (pg_whatever) make more sense
 than new SQL syntax, since this is really only to make pg_dump happy.

That could well be, yes, but what would this function do that the
commands are not doing?  I'm ok not to invent specific syntax to solve
that problem, I just think that we should already have all we need :)

 Or do you still want to insist that dump/restore shouldn't care about
 any extension, inline or not, and so you're given the responsibility to
 do the exact same thing yourself on the client side?

 How about adding a new pg_dump option to suppress this part of the dump?

Makes sense, indeed.  Well one could of course manually filter the dump
object list too, of course…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer fwei...@bfk.de wrote:
 * Robert Haas:
 In this particular case, I knew that the change was coming and could
 push updated Java and Perl client libraries well before the server-side
 change hit our internal repository, but I really don't want to have to
 pay attention to such details.

 But if we *don't* turn this on by default, then chances are very good
 that it will get much less use.  That doesn't seem good either.  If
 it's important enough to do it at all, then IMHO it's important enough
 for it to be turned on by default.  We have never made any guarantee
 that the binary format won't change from release to release.

 The problem here are libpq-style drivers which expose the binary format
 to the application.  The Java driver doesn't do that, but the Perl
 driver does.  (However, both transparently decode BYTEA values received
 in text format, which led to the compatibility issue.)

I can see where that could cause some headaches... but it seems to me
that if we take that concern seriously, it brings us right back to
square one.  If breaking the binary format causes too much pain to
actually go do it, then we shouldn't change it until we're breaking
everything else anyway (i.e. new protocol version, as Tom suggested).

Even if you have version negotiation, it doesn't help that much in
this scenario.  Drivers that know about the new protocol can
theoretically negotiate upward, but if they expose the binary format
to their users in turn then it's a can of worms: they then need to
negotiate with their client applications to see what version the
client is OK with, and then turn around and negotiate with the server
to that level and not higher.  That strikes me as a lot of work for a
lot of people given the amount of improvement we can expect out of
this one change.

I think it's also worth noting that a GUC is not really a good
mechanism for negotiating the protocol version.  GUCs can be changed
by the local administrator on a server-wide (or per-user, or
per-database, or per-user-and-database) basis, and that's not what you
want for a protocol negotiation.  Every client will have to query the
server version and then decide whether to try to change it, and handle
errors if that fails.  All of that is going to add start-up overhead
to connections, which will need to make multiple round trips to get
everything straightened out.

I think if the only way to make this change without excessive pain is
by having a good mechanism for negotiating the protocol version, then
we need to defer it to a future release.  Now is not the time to
invent entirely new mechanisms, especially around just one example.
I'm OK with breaking it and adding a GUC for backward-compatibility,
but so far the other suggestions strike me as being not convincingly
well-enough engineered to stand the test of time, and whatever we do
here is going to be with us for quite a while.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 10:26 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm.  But CREATE EXTENSION / ALTER EXTENSION doesn't seem right,
 because the files in the directory correspond to *available*
 extensions, not already-created ones.  We need some way of dumping and

 I would have limited the dump query to only known installed extensions,
 right.  The update scripts are still needed because with inline
 extensions you typically never see a 1.2 script but a 1.0 then 1.0--1.1
 and then a 1.1--1.2.

Hmm, I don't think I like that design.  I think we should view this as
a way to embed the SQL and control files needed by the extension in
the server, rather than a separate thing called an inline extension.
If pg_dump is going to dump those files, it ought to dump them all,
not just some subset of them.

 restoring the files themselves, not the extension that can be created
 from them.  I suspect internal functions (pg_whatever) make more sense
 than new SQL syntax, since this is really only to make pg_dump happy.

 That could well be, yes, but what would this function do that the
 commands are not doing?  I'm ok not to invent specific syntax to solve
 that problem, I just think that we should already have all we need :)

I was thinking of something like
pg_write_extension_file('foobar.control', 'content goes
here');

 Or do you still want to insist that dump/restore shouldn't care about
 any extension, inline or not, and so you're given the responsibility to
 do the exact same thing yourself on the client side?

 How about adding a new pg_dump option to suppress this part of the dump?

 Makes sense, indeed.  Well one could of course manually filter the dump
 object list too, of course…

True.  I guess that's another reason why a global flag to shut it all
off is probably sufficient, but I'm still in favor of having at least
that much.  I think the average user is much more likely to find
pg_dump --skip-whatever than they are to understand how to do dump
object filtering correctly, and even the advanced user may appreciate
the shortcut on occasion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 3:09 PM, Robert Haas robertmh...@gmail.com wrote:

 I'm working on it.

Good, thanks for the update.


 The remaining patch you tested was withdrawn and not submitted to the CF.

 Oh.  Which one was that?  I thought all of these were in play.

freelist_ok was a prototype for testing/discussion, which contained an
arguable heuristic. I guess that means its also in play, but I
wasn't thinking we'd be able to assemble clear evidence for 9.2.

The other patches have clearer and specific roles without heuristics
(mostly), so are at least viable for 9.2, though still requiring
agreement.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Hmm, I don't think I like that design.  I think we should view this as
 a way to embed the SQL and control files needed by the extension in
 the server, rather than a separate thing called an inline extension.
 If pg_dump is going to dump those files, it ought to dump them all,
 not just some subset of them.

Ok, but then, what about .so files?  Wouldn't it make sense to be able
to ship also the executable modules needed, and if not, why not?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 10:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 freelist_ok was a prototype for testing/discussion, which contained an
 arguable heuristic. I guess that means its also in play, but I
 wasn't thinking we'd be able to assemble clear evidence for 9.2.

OK, that one is still in the test runs I am doing right now, but I
will drop it from future batches to save time and energy that can be
better spent on things we have a chance of getting done for 9.2.

 The other patches have clearer and specific roles without heuristics
 (mostly), so are at least viable for 9.2, though still requiring
 agreement.

I think we must also drop removebufmgrfreelist-v1 from consideration,
unless you want to go over it some more and try to figure out a fix
for whatever caused it to crap out on these tests.  IIUC, that
corresponds to this CommitFest entry:

https://commitfest.postgresql.org/action/patch_view?id=744

Whatever is wrong must be something that happens pretty darn
infrequently, since it only happened on one test run out of 54, which
also means that if you do want to pursue that one we'll have to go
over it pretty darn carefully to make sure that we've fixed that issue
and don't have any others.  I have to admit my personal preference is
for postponing that one to 9.3 anyway, since there are some related
issues I'd like to experiment with.  But let me know how you'd like to
proceed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 10:46 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm, I don't think I like that design.  I think we should view this as
 a way to embed the SQL and control files needed by the extension in
 the server, rather than a separate thing called an inline extension.
 If pg_dump is going to dump those files, it ought to dump them all,
 not just some subset of them.

 Ok, but then, what about .so files?  Wouldn't it make sense to be able
 to ship also the executable modules needed, and if not, why not?

Sure, that would be as useful as any other part of this feature.  We'd
have to think carefully about how to make it secure, though: obviously
it's no good to let a non-superuser database owner install compiled C
code that gets loaded into the backend!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen
 mikko.tiiho...@nitorcreations.com wrote:
 * introduced a new GUC variable array_output copying the current
  bytea_output type, with values full (old value) and
  smallfixed (new default)
 * added documentation for the new GUC variable

 If this variable changes protocol-level layout
 and is user-settable, shouldn't it be GUC_REPORT?

 Now that I think about it, same applies to bytea_output?

 You could say the problem does not appear if the
 clients always accepts server default.  But how can
 the client know the default?  If the client is required
 to do SHOW before it can talk to server then that
 seems to hint those vars should be GUC_REPORT.

 Same story when clients are always expected to set
 the vars to their preferred values.  Then you get
 clients with different settings on one server.
 This breaks transaction-pooling setups (pgbouncer).
 Again, such protocol-changing tunables should be
 GUC_REPORT.

Probably so.  But I think we need not introduce quite so many new
threads on this patch.  This is, I think, at least thread #4, and
that's making the discussion hard to follow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Noah Misch

On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote:
 Updated patch is attached. I've updated comment
 of mcelem_array_contained_selec with more detailed description of
 probability distribution assumption. Also, I found that rest behavious
 should be better described by Poisson distribution, relevant changes were
 made.

Thanks.  That makes more of the math clear to me.  I do not follow all of it,
but I feel that the comments now have enough information that I could go about
doing so.

 + /* Take care about events with low probabilities. */
 + if (rest  DEFAULT_CONTAIN_SEL)
 + {

Why the change from rest  0 to this in the latest version?

 + /* emit some statistics for debug purposes */
 + elog(DEBUG3, array: target # mces = %d, bucket width = %d, 
 +  # elements = %llu, hashtable size = %d, usable 
 entries = %d,
 +  num_mcelem, bucket_width, element_no, i, track_len);

That should be UINT64_FMT.  (I introduced that error in v0.10.)


I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files.  I see no
other issues precluding commit, so I am marking the patch Ready for Committer.
If I made any of the comments worse, please post another update.

Thanks,
nm


arrayanalyze-0.13.patch.gz
Description: application/gunzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

 What makes you think that they are uncontested?

Ah, never mind.  I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect.  The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer.  I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently.  Or else fix
things so that the freelist never (well, hardly ever) runs dry.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

 What makes you think that they are uncontested?

 Ah, never mind.  I was thinking that we'd only be touching buffers that
 were *on* the freelist, but of course this is incorrect.  The real
 problem there is that BufFreelistLock is also used to protect the
 clock sweep pointer.  I think basically we gotta find a way to allow
 multiple backends to run clock sweeps concurrently.  Or else fix
 things so that the freelist never (well, hardly ever) runs dry.

I'd come to the same conclusion myself.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Noah Misch

On Mon, Jan 23, 2012 at 10:34:12AM -0500, Robert Haas wrote:
 On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer fwei...@bfk.de wrote:
  * Robert Haas:
  In this particular case, I knew that the change was coming and could
  push updated Java and Perl client libraries well before the server-side
  change hit our internal repository, but I really don't want to have to
  pay attention to such details.
 
  But if we *don't* turn this on by default, then chances are very good
  that it will get much less use. ?That doesn't seem good either. ?If
  it's important enough to do it at all, then IMHO it's important enough
  for it to be turned on by default. ?We have never made any guarantee
  that the binary format won't change from release to release.
 
  The problem here are libpq-style drivers which expose the binary format
  to the application. ?The Java driver doesn't do that, but the Perl
  driver does. ?(However, both transparently decode BYTEA values received
  in text format, which led to the compatibility issue.)
 
 I can see where that could cause some headaches... but it seems to me
 that if we take that concern seriously, it brings us right back to
 square one.  If breaking the binary format causes too much pain to
 actually go do it, then we shouldn't change it until we're breaking
 everything else anyway (i.e. new protocol version, as Tom suggested).

As I said upthread, and you appeared to agree, the protocol is independent of
individual data type send/recv formats.  Even if we were already adding
protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in
response to the active protocol version would be wrong.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-23 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Ok, but then, what about .so files?  Wouldn't it make sense to be able
 to ship also the executable modules needed, and if not, why not?

 Sure, that would be as useful as any other part of this feature.  We'd
 have to think carefully about how to make it secure, though: obviously
 it's no good to let a non-superuser database owner install compiled C
 code that gets loaded into the backend!

The big problem is that .so are installed in directories where the
system postgres user usually is not granted permissions to write, that's
root business.

It already has been asked before about allowing PostgreSQL to load .so
from a non-root location, and I think that to be consistent with your
view of the world it would be good to add that feature. Then there's no
such beast as an “inline” extension so much as a way to install an
extension from the protocol, without file system level access to the
production server. This would need to be superuser only, of course.

That opens up the possibility to ship the modules to any standby server
too: the situation now is unfortunate in that create extension hstore on
the primary then using it in some indexes means the standby has no means
to use those index until you fix it and install the same extension files
there.

The module itself could be accepted as a bytea value and written at the
right place, where the server knows to load it.

Now you can dump/restore any extension fully, and we can even ship any
extension in the WAL stream (a new message is needed though).

The only remaining issue would be the default policy as far as including
or avoiding an extension in the dump is concerned, and I would be ok
with a default of including none as of now and a pg_dump switch that you
can repeat to include whichever extension you wish to.  The effect is to
transport the files and install them in the expected place on the target
server where you restore.

And now we really process all extensions the same and you can update
inline an extension that you installed the 9.1 way, and vice versa.

Don't let me speak about extension distribution facilities just now :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
 Now that I think about it, same applies to bytea_output?

 Probably so.  But I think we need not introduce quite so many new
 threads on this patch.  This is, I think, at least thread #4, and
 that's making the discussion hard to follow.

Well, this is independent of the proposed patch, so I think a separate
thread is okay.  The question is shouldn't bytea_output be marked
GUC_REPORT?  I think that probably it should be, though I wonder
whether we're not too late.  Clients relying on it to be transmitted are
not going to work with existing 9.0 or 9.1 releases; so maybe changing
it to be reported going forward would just make things worse.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Marko Kreen

On Mon, Jan 23, 2012 at 5:34 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer fwei...@bfk.de wrote:
 * Robert Haas:
 In this particular case, I knew that the change was coming and could
 push updated Java and Perl client libraries well before the server-side
 change hit our internal repository, but I really don't want to have to
 pay attention to such details.

 But if we *don't* turn this on by default, then chances are very good
 that it will get much less use.  That doesn't seem good either.  If
 it's important enough to do it at all, then IMHO it's important enough
 for it to be turned on by default.  We have never made any guarantee
 that the binary format won't change from release to release.

 The problem here are libpq-style drivers which expose the binary format
 to the application.  The Java driver doesn't do that, but the Perl
 driver does.  (However, both transparently decode BYTEA values received
 in text format, which led to the compatibility issue.)

 I can see where that could cause some headaches... but it seems to me
 that if we take that concern seriously, it brings us right back to
 square one.  If breaking the binary format causes too much pain to
 actually go do it, then we shouldn't change it until we're breaking
 everything else anyway (i.e. new protocol version, as Tom suggested).

My suggestion - please avoid per-session-protocol.  Either something
is Postgres version-dependent or it can be toggled/tracked per request.
That means any data can either be passed through, or you need
to understand formats of Postgres version X.Y.

This kind of hints at per-request gimme-formats-from-version-x.y
flag for ExecuteV4 packet.  Or some equivalent of it.


Anything that cannot be processed without tracking per-session
state over whole stack (poolers/client frameworks) is major pain
to maintain.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Marko Kreen

On Mon, Jan 23, 2012 at 11:20:52AM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
  Now that I think about it, same applies to bytea_output?
 
  Probably so.  But I think we need not introduce quite so many new
  threads on this patch.  This is, I think, at least thread #4, and
  that's making the discussion hard to follow.
 
 Well, this is independent of the proposed patch, so I think a separate
 thread is okay.  The question is shouldn't bytea_output be marked
 GUC_REPORT?  I think that probably it should be, though I wonder
 whether we're not too late.  Clients relying on it to be transmitted are
 not going to work with existing 9.0 or 9.1 releases; so maybe changing
 it to be reported going forward would just make things worse.


Well, in a complex setup it can change under you at will,
but as clients can process the data without knowing the
server state, maybe it's not a big problem.  (Unless there
are old clients in the mix...)

Perhaps we can leave it as-is?

But this leaves the question of future policy for
data format change in protocol.  Note I'm talking
about both text and binary formats here together.
Although we could have different policy for them.

Also note that any kind of per-session flag is basically a GUC.


Question 1 - how does client know about which format data is?

1) new format is detectable from lossy GUC
2) new format is detectable from GUC_REPORT
3) new format is detectable from Postgres version
4) new format was requested in query (V4 proto)
5) new format is detectable from data (\x in bytea)

1. obviously does not work.
2. works, but requires changes across all infrastructure.
3. works and is simple, but painful.
4. is good, but in the future
5. is good, now


Question 2 - how does client request new format?

1) Postgres new version forces it.
2) GUC_REPORT + non-detectable data
3) Lossy GUC + autodetectable data
4) GUC_REPORT + autodetectable data
5) Per-request data (V4 proto)

1. is painful
2. is painful - all infra components need to know about the GUC.
34. are both ugly and non-maintanable in long term.  Only
   difference is that with 3) the infrastructure can give slight
   guarantees that it does not change under client.
4. seems good...


Btw, it does not seems that per-request metainfo change requires
major version.  It just client can send extra metainfo packet
before bind+execute, if it knows server version is good enough.
For older servers it can simply skip the extra info.  [Oh yeah,
that requires data format is autodetectable, always.]

My conclusions:

1. Any change in data format should be compatible with old data.
   IOW - if client requested new data format, it should always
   accept old format too.

2. Can we postpone minor data format changes on the wire until there
   is proper way for clients to request on-the-wire formats?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Alexander Korotkov

On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote:

  + /* Take care about events with low probabilities. */
  + if (rest  DEFAULT_CONTAIN_SEL)
  + {

 Why the change from rest  0 to this in the latest version?

Ealier addition of rest distribution require O(m) time. Now there is a
more accurate and proved estimate, but it takes O(m^2) time.It doesn't make
general assymptotical time worse, but it significant. That's why I decided
to skip for low values of rest which don't change distribution
significantly.



  + /* emit some statistics for debug purposes */
  + elog(DEBUG3, array: target # mces = %d, bucket width =
 %d, 
  +  # elements = %llu, hashtable size = %d, usable
 entries = %d,
  +  num_mcelem, bucket_width, element_no, i,
 track_len);

 That should be UINT64_FMT.  (I introduced that error in v0.10.)


 I've attached a new version that includes the UINT64_FMT fix, some edits of
 your newest comments, and a rerun of pgindent on the new files.  I see no
 other issues precluding commit, so I am marking the patch Ready for
 Committer.

Great!


 If I made any of the comments worse, please post another update.

Changes looks reasonable for me. Thanks!

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Multithread Query Planner

2012-01-23 Thread Merlin Moncure

On Fri, Jan 13, 2012 at 2:29 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 3:14 PM, Frederico zepf...@gmail.com wrote:
 Hi folks.

 Is there any restriction in create and start threads inside Postgres?

 I'm trying to develop a multithread planner, and some times is raised a 
 exception of access memory.

 I'm debugging the code to see if is a bug in the planner, but until now, I 
 still not found. I tried to use the same memory context of root process and 
 create a new context to each new thread, but doesn't worked.


 Any tips?

 Yes, don't try to use threads.

 http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

 ... threads are not currently used instead of multiple processes for
 backends because:

Yes, but OP is proposing to use multiple threads inside the forked
execution process.  That's a completely different beast.  Many other
databases support parallel execution of a single query and it might
very well be better/easier to do that with threads.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable 
 ]

Fair enough.  Anyway we're really about two years too late to revisit that.

 Btw, it does not seems that per-request metainfo change requires
 major version.  It just client can send extra metainfo packet
 before bind+execute, if it knows server version is good enough.

That is nonsense.  You're changing the protocol, and then saying
that clients should consult the server version instead of the
protocol version to know what to do.

 2. Can we postpone minor data format changes on the wire until there
is proper way for clients to request on-the-wire formats?

I think that people are coming around to that position, ie, we need
a well-engineered solution to the versioning problem *first*, and
should not accept incompatible minor improvements until we have that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring relation free space

2012-01-23 Thread Alvaro Herrera


Excerpts from Noah Misch's message of vie ene 20 22:33:30 -0300 2012:
 On Fri, Jan 20, 2012 at 07:03:22PM -0500, Jaime Casanova wrote:
  On Wed, Jan 18, 2012 at 7:01 PM, Noah Misch n...@leadboat.com wrote:
   On Wed, Jan 18, 2012 at 09:46:20AM -0500, Jaime Casanova wrote:
  
   ignoring all non-leaf pages still gives a considerable difference
   between pgstattuple and relation_free_space()
  
   pgstattuple() counts the single B-tree meta page as always-full, while
   relation_free_space() skips it for all purposes. ?For tiny indexes, that 
   can
   shift the percentage dramatically.
  
  
  ok, i will reformulate the question. why is fine ignoring non-leaf
  pages but is not fine to ignore the meta page?
 
 pgstattuple() figures the free_percent by adding up all space available to
 hold tuples and dividing that by the simple size of the relation.  Non-leaf
 pages and the meta page get identical treatment: both never hold tuples, so
 they do not contribute to the free space.

Hm.  Leaf pages hold as much tuples as non-leaf pages, no?  I mean
for each page element there's a value and a CTID.  In non-leaf those
CTIDs point to other index pages, one level down the tree; in leaf pages
they point to the heap.

The metapage is special in that it is not used to store any user data,
just a pointer to the root page.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread A.M.


On Jan 23, 2012, at 2:49 PM, Tom Lane wrote:

 Marko Kreen mark...@gmail.com writes:
 [ bytea_output doesn't need to be GUC_REPORT because format is 
 autodetectable ]
 
 Fair enough.  Anyway we're really about two years too late to revisit that.
 
 Btw, it does not seems that per-request metainfo change requires
 major version.  It just client can send extra metainfo packet
 before bind+execute, if it knows server version is good enough.
 
 That is nonsense.  You're changing the protocol, and then saying
 that clients should consult the server version instead of the
 protocol version to know what to do.
 
 2. Can we postpone minor data format changes on the wire until there
   is proper way for clients to request on-the-wire formats?
 
 I think that people are coming around to that position, ie, we need
 a well-engineered solution to the versioning problem *first*, and
 should not accept incompatible minor improvements until we have that.

One simple way clients could detect the binary encoding at startup would be to 
pass known test parameters and match against the returned values. If the client 
cannot match the response, then it should choose the text representation.

Alternatively, the 16-bit int in the Bind and RowDescription messages could be 
incremented to indicate a new format and then clients can specify the highest 
version of the binary format which they support.

Cheers,
M 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-23 Thread Peter Eisentraut

On sön, 2012-01-22 at 11:43 -0500, Andrew Dunstan wrote:
 Actually, given recent discussion I think that test should just be 
 removed from json.c. We don't actually have any test that the code
 point is valid (e.g. that it doesn't refer to an unallocated code
 point). We don't do that elsewhere either - the unicode_to_utf8()
 function the scanner uses to turn \u escapes into utf8 doesn't
 look for unallocated code points. I'm not sure how much other
 validation we should do - for example on correct use of surrogate
 pairs.

We do check the correctness of surrogate pairs elsewhere.  Search for
surrogate in scan.l; should be easy to copy.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Client Messages

2012-01-23 Thread Jim Mlodgenski

On Wed, Jan 18, 2012 at 9:19 AM, Jim Mlodgenski jimm...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 3:08 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 18.01.2012 07:49, Fujii Masao wrote:

 On Fri, Jan 6, 2012 at 1:38 AM, Jim Mlodgenskijimm...@gmail.com  wrote:

 I have a need to send banner messages to a psql client that I can set
 on the server and will be displayed on any psql client that connects
 to the database. This would be mostly used as an additional indicator
 to which database you are connecting, but could also be used by people
 to force their users to see an security message when connecting to the
 database. The attached patch will allow you to execute

 ALTER DATABASE postgres SET

 client_message=E'\nBEWARE:
 You are connecting to a production database. If you do anything to\n
     bring this server down, you will be destroyed by your supreme

 overlord.\n\n';

 And then when you connect to psql, you will see:

 [e3@workstation bin]$ ./psql -U user1 postgres
 psql (9.2devel)

 
 BEWARE: You are connecting to a production database. If you do anything
 to
        bring this server down, you will be destroyed by your supreme
 overlord.

 

 Type help for help.

 postgres=


 Any feedback is welcome.


 Adding new GUC parameter only for the purpose of warning psql users
 seems overkill to me.  Basically we try to reduce the number of GUC
 parameters to make a configuration easier to a user, so I don't think that
 it's good idea to add new GUC for such a small benefit.


 It seems quite useful to me...


 Instead, how
 about using .psqlrc file and writing a warning message in it by using
 \echo command?


 That's not the same thing at all. Each client would need to put the warning
 in that file, and you'd get it regardless of the database you connect to.


 Anyway, I found one problem in the patch. The patch defines client_message
 as PGC_USERSET parameter, which means that any psql can falsify a
 warning message, e.g., by setting the environment variable PGOPTIONS
 to -c client_message=hoge. This seems to be something to avoid from
 security point of view.


 I don't think that's a problem, it's just a free-form message to display.
 But it also doesn't seem very useful to have it PGC_USERSET: if it's only
 displayed at connect time, there's no point in changing it after connecting.
 Should we make it PGC_BACKEND?


 The only security problem that I can think of is a malicious server
 (man-in-the-middle perhaps), that sends a banner that confuses

 Docs for PQparameterStatus() needs adjustment, now that client_message is
 also one of the settings automatically reported to the client.
 I'll add the docs for that..


 The placement of the banner in psql looks currently like this:

 $ psql postgres

 psql (9.2devel)
 Hello world!
 Type help for help.


 or

 postgres=# \c postgres
 Hello world!
 You are now connected to database postgres as user heikki.


 Are we happy with that? I think it would be better to print the banner just
 before the prompt:
 I like that better. I'll make that change as well.

Here is the revised patch based on the feedback.



 psql (9.2devel)
 Type help for help.

 Hello world!

 postgres=# \c postgres
 You are now connected to database postgres as user heikki.

 Hello world!
 postgres=#

 Should we prefix the banner with something that makes it clear that it's a
 message coming from the server? Something like:
 I don't think the default prefix adds much for the user. If the
 administrator wants to let the user know that its from the server, he
 can add it to the message.


 psql (9.2devel)
 Type help for help.

 Notice from server: Hello world!

 postgres=# \c postgres
 You are now connected to database postgres as user heikki.
 Notice from server: Hello world!
 postgres=#

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e55b503..04bc671 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5324,6 +5324,19 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-client-message xreflabel=client_message
+  termvarnameclient_message/varname (typestring/type)/term
+  indexterm
+   primaryvarnameclient_message/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+The varnameclient_message/varname can be any string that will be 
+displayed to the user in the banner of psql. 
+   /para
+  /listitem
+ /varlistentry
+
  /variablelist
 /sect2
/sect1
diff --git

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 11:15 AM, Noah Misch n...@leadboat.com wrote:
 As I said upthread, and you appeared to agree, the protocol is independent of
 individual data type send/recv formats.  Even if we were already adding
 protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in
 response to the active protocol version would be wrong.

Sure, it's not directly related to the active protocol version, but my
point is that we need to decide whether we need an autonegotiation
mechanism or some kind, or not.  We can reasonably decide that:

1. It's OK to change the binary format incompatibly, and clients must
be prepared to deal with that, possibly assisted by a
backward-compatibility GUC.

-or else-

2. It's not OK to change the binary format incompatibility, and
therefore we need some kind of negotiation mechanism to make sure that
we give the new and improved format only to clients that can cope with
it.

Not being responsible from the maintenance of any PostgreSQL drivers
whatsoever, I don't have a strong feeling about which of these is the
case, and I'd like us to hear from the people who do.  What I do think
is that we can't look at a GUC (however named) as a poor man's
replacement for #2.  It's not gonna work, or at least not very well.
If the default is off, then clients have to go through a round-trip to
turn it on, which means that every client will have to decide whether
to pay the price of turning it on (and possibly not recoup the
investment) or whether to leave it off (and possibly get hosed if many
large arrays that would have met the criteria for the optimization are
transferred).  Furthermore, if we turn it on by default, drivers and
applications that haven't been updated will deliver corrupted results.
 None of that sounds very good to me.  If there are enough
dependencies on the details of the binary format that we can't afford
to just change it, then we'd better have a cheap and reliable way for
clients to negotiate upward - and a GUC is not going to give us that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Merlin Moncure

On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote:
 One simple way clients could detect the binary encoding at startup would be 
 to pass known test parameters and match against the returned values. If the 
 client cannot match the response, then it should choose the text 
 representation.

 Alternatively, the 16-bit int in the Bind and RowDescription messages could 
 be incremented to indicate a new format and then clients can specify the 
 highest version of the binary format which they support.

Prefer the version.  But why send this over and over with each bind?
Wouldn't you negotiate that when connecting? Most likely, optionally,
doing as much as you can from the server version?  Personally I'm not
really enthusiastic about a solution that adds a non-avoidable penalty
to all queries.

Also, a small nit: this problem is not specific to binary formats.
Text formats can and do change, albeit rarely, with predictable
headaches for the client.  I see no reason to deal with text/binary
differently.  The only difference between text/binary wire formats in
my eyes are that the text formats are documented.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL return value in after triggers

2012-01-23 Thread Pavel Stehule

Hello Peter

I checked code, and I don't think so this is good.

A design of optional NULL is going to inconsistent syntax.

RETURN (OLD, NEW, NULL, /* nothing */) is not consistent

But my main argument is not intuitive behave of BEFORE triggers after
this change.

When somebody write  BEFORE trigger function like:

BEGIN
  RAISE NOTICE '%', NEW.x;
  RETURN;
END;

then don't expect so all rows will be lost.

Preferred default return value for BEFORE INSERT UPDATE trigger should
be NEW, and for DELETE trigger should be OLD - not NULL.

And because we cannot to distinct between BEFORE and AFTER trigger in
parser, I propose don't change current behave. Current behave is not
too friendly - but is consistent with simple rules.

Regards

Pavel


2012/1/2 Peter Eisentraut pete...@gmx.net:
 On mån, 2011-02-28 at 19:07 +0200, Peter Eisentraut wrote:
 PL/pgSQL trigger functions currently require a value to be returned,
 even though that value is not used for anything in case of a trigger
 fired AFTER.  I was wondering if we could relax that.  It would make
 things a bit more robust and produce clearer PL/pgSQL code.  The
 specific case I'm concerned about is that a trigger function could
 accidentally be run in a BEFORE trigger even though it was not meant for
 that.  It is common practice that trigger functions for AFTER triggers
 return NULL, which would have unpleasant effects if used in a BEFORE
 trigger.

 I think it is very uncommon to have the same function usable for BEFORE
 and AFTER triggers, so it would be valuable to have coding support
 specifically for AFTER triggers.  We could just allow RETURN without
 argument, or perhaps no RETURN at all.

 Here is a patch for that.

 One thing that I'm concerned about with this is that it treats a plain
 RETURN in a BEFORE trigger as RETURN NULL, whereas arguably it should be
 an error.  I haven't found a good way to handle that yet, but I'll keep
 looking.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Merlin Moncure

On Mon, Jan 23, 2012 at 3:06 PM, Robert Haas robertmh...@gmail.com wrote:
 Not being responsible from the maintenance of any PostgreSQL drivers
 whatsoever, I don't have a strong feeling about which of these is the
 case, and I'd like us to hear from the people who do.

I'm just gonna come right out and say that GUC-based solutions are not
the way -- bytea encoding change is a perfect example of that.
IMSNHO, there's only two plausible paths ahead:

1) document the binary formats and continue with 'go at your own risk'
2) full auto-negotiation for all wire formats (text and binary).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-23 Thread Marko Kreen

On Mon, Jan 23, 2012 at 11:06 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 11:15 AM, Noah Misch n...@leadboat.com wrote:
 As I said upthread, and you appeared to agree, the protocol is independent of
 individual data type send/recv formats.  Even if we were already adding
 protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in
 response to the active protocol version would be wrong.

 Sure, it's not directly related to the active protocol version, but my
 point is that we need to decide whether we need an autonegotiation
 mechanism or some kind, or not.  We can reasonably decide that:

 1. It's OK to change the binary format incompatibly, and clients must
 be prepared to deal with that, possibly assisted by a
 backward-compatibility GUC.

 -or else-

 2. It's not OK to change the binary format incompatibility, and
 therefore we need some kind of negotiation mechanism to make sure that
 we give the new and improved format only to clients that can cope with
 it.

 Not being responsible from the maintenance of any PostgreSQL drivers
 whatsoever, I don't have a strong feeling about which of these is the
 case, and I'd like us to hear from the people who do.  What I do think
 is that we can't look at a GUC (however named) as a poor man's
 replacement for #2.  It's not gonna work, or at least not very well.
 If the default is off, then clients have to go through a round-trip to
 turn it on, which means that every client will have to decide whether
 to pay the price of turning it on (and possibly not recoup the
 investment) or whether to leave it off (and possibly get hosed if many
 large arrays that would have met the criteria for the optimization are
 transferred).  Furthermore, if we turn it on by default, drivers and
 applications that haven't been updated will deliver corrupted results.
  None of that sounds very good to me.  If there are enough
 dependencies on the details of the binary format that we can't afford
 to just change it, then we'd better have a cheap and reliable way for
 clients to negotiate upward - and a GUC is not going to give us that.

Trying to solve it with startup-time negotiation, or some GUC
is a dead end, in the sense that it will actively discourage any
kind of pass-through protocol processing.  If simple protocol
processor (~pgbouncer) needs to know about some GUC,
and tune it on-the-fly, it's not a payload feature, it's a protocol feature.

Instead this should be solved with extending the per-query text/bin
flag to include version info and maybe also type groups.
Some way of saying numerics:9.0-bin, the-rest:8.4-text.

The groups would also solve the problem with no way
of turning on binary formats on result columns safely.

The flags could be text (~http accept), or maybe integers
for more efficiency (group code: group format ver).

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread A.M.


On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:

 On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote:
 One simple way clients could detect the binary encoding at startup would be 
 to pass known test parameters and match against the returned values. If the 
 client cannot match the response, then it should choose the text 
 representation.
 
 Alternatively, the 16-bit int in the Bind and RowDescription messages could 
 be incremented to indicate a new format and then clients can specify the 
 highest version of the binary format which they support.
 
 Prefer the version.  But why send this over and over with each bind?
 Wouldn't you negotiate that when connecting? Most likely, optionally,
 doing as much as you can from the server version?  Personally I'm not
 really enthusiastic about a solution that adds a non-avoidable penalty
 to all queries.
 
 Also, a small nit: this problem is not specific to binary formats.
 Text formats can and do change, albeit rarely, with predictable
 headaches for the client.  I see no reason to deal with text/binary
 differently.  The only difference between text/binary wire formats in
 my eyes are that the text formats are documented.
 
 merlin


In terms of backwards compatibility (to support the widest range of clients), 
wouldn't it make sense to freeze each format option? That way, an updated text 
version could also assume a new int16 format identifier. The client would 
simply pass its preferred format. This could also allow for multiple in-flight 
formats; for example, if a client anticipates a large in-bound bytea column, it 
could specify format X which indicates the server should use gzip the result 
before sending. That same format may not be preferable on a different request.

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-23 Thread Merlin Moncure

On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote:
 Here's an update that adds row_to_json, plus a bit more cleanup.

why not call all these functions 'to_json' and overload them?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-23 Thread Andrew Dunstan




On 01/23/2012 05:21 PM, Merlin Moncure wrote:

On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstanand...@dunslane.net  wrote:

Here's an update that adds row_to_json, plus a bit more cleanup.

why not call all these functions 'to_json' and overload them?




I don't honestly feel that advances clarity much. And we might want to 
overload each at some stage with options that are specific to the datum 
type. We have various foo_to_xml() functions now.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-23 Thread Pavel Stehule

2012/1/23 Merlin Moncure mmonc...@gmail.com:
 On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote:
 Here's an update that adds row_to_json, plus a bit more cleanup.

 why not call all these functions 'to_json' and overload them?

-1

older proposal is more consistent with xml functions

Pavel


 merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Merlin Moncure

On Mon, Jan 23, 2012 at 4:12 PM, A.M. age...@themactionfaction.com wrote:
 On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:
 Prefer the version.  But why send this over and over with each bind?
 Wouldn't you negotiate that when connecting? Most likely, optionally,
 doing as much as you can from the server version?  Personally I'm not
 really enthusiastic about a solution that adds a non-avoidable penalty
 to all queries.

 In terms of backwards compatibility (to support the widest range of clients), 
 wouldn't it make sense to freeze each format option? That way, an updated 
 text version could also assume a new int16 format identifier. The client 
 would simply pass its preferred format. This could also allow for multiple 
 in-flight formats; for example, if a client anticipates a large in-bound 
 bytea column, it could specify format X which indicates the server should use 
 gzip the result before sending. That same format may not be preferable on a 
 different request.

hm.  well, I'd say that you're much better off if you can hold to the
principle that newer versions of the format are always better and
should both be used if the application and the server agree.  Using
your example, since you can already do something like:

select zlib_compress(byteacol) from foo;

I'm not sure that you're getting anything with that user facing
complexity.  The only realistic case I can see for explicit control of
wire formats chosen is to defend your application from format changes
in the server when upgrading the server and/or libpq.   This isn't a
let's get better compression problem, this is I upgraded my
database and my application broke problem.

Fixing this problem in non documentation fashion is going to require a
full protocol change, period.  It's the only way we can safely get all
the various players (libpq, jdbc, etc) on the same page without
breaking/recompiling millions of lines of old code that is currently
in production.  The new protocol should *require* at minimum the
application, not libpq, to explicitly send the version of the database
it was coded against.  That's just not getting sent now, and without
that information there's no realistic way to prevent application
breakage -- depending on libpq versions is useless since it can be
upgraded and there's always jdbc to deal with.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring relation free space

2012-01-23 Thread Noah Misch

On Mon, Jan 23, 2012 at 04:56:24PM -0300, Alvaro Herrera wrote:
 Excerpts from Noah Misch's message of vie ene 20 22:33:30 -0300 2012:
  pgstattuple() figures the free_percent by adding up all space available to
  hold tuples and dividing that by the simple size of the relation.  Non-leaf
  pages and the meta page get identical treatment: both never hold tuples, so
  they do not contribute to the free space.
 
 Hm.  Leaf pages hold as much tuples as non-leaf pages, no?  I mean
 for each page element there's a value and a CTID.  In non-leaf those
 CTIDs point to other index pages, one level down the tree; in leaf pages
 they point to the heap.

That distinction seemed important when I sent my last message, but now I agree
that it's largely irrelevant for free space purposes.  If someone feels like
doing it, +1 for making pgstattuple() count non-leaf free space.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 3:49 PM, Robert Haas robertmh...@gmail.com wrote:

 The other patches have clearer and specific roles without heuristics
 (mostly), so are at least viable for 9.2, though still requiring
 agreement.

 I think we must also drop removebufmgrfreelist-v1 from consideration,
...

I think you misidentify the patch. Earlier you said it that
buffreelistlock-reduction-v1 crapped
out  and I already said that the assumption in the code clearly
doesn't hold, implying the patch was dropped.

The removebufmgrfreelist and its alternate patch is still valid, with
applicability to special cases.

I've written another patch to assist with testing/assessment of the
problems, attached.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..36b0160 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -17,6 +17,7 @@
 
 #include storage/buf_internals.h
 #include storage/bufmgr.h
+#include utils/timestamp.h
 
 
 /*
@@ -41,6 +42,21 @@ typedef struct
 	 */
 	uint32		completePasses; /* Complete cycles of the clock sweep */
 	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+
+	/*
+	 * Wait Statistics
+	 */
+	long	waitBufferAllocSecs;
+	int		waitBufferAllocUSecs;
+	int		waitBufferAlloc;
+
+	long	waitBufferFreeSecs;
+	int		waitBufferFreeUSecs;
+	int		waitBufferFree;
+
+	long	waitSyncStartSecs;
+	int		waitSyncStartUSecs;
+	int		waitSyncStart;
 } BufferStrategyControl;
 
 /* Pointers to shared state */
@@ -125,7 +141,29 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 
 	/* Nope, so lock the freelist */
 	*lock_held = true;
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+	{
+		TimestampTz waitStart = GetCurrentTimestamp();
+		TimestampTz waitEnd;
+		long		wait_secs;
+		int			wait_usecs;
+
+		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+
+		waitEnd = GetCurrentTimestamp();
+
+		TimestampDifference(waitStart, waitEnd,
+		wait_secs, wait_usecs);
+
+		StrategyControl-waitBufferAllocSecs += wait_secs;
+		StrategyControl-waitBufferAllocUSecs += wait_usecs;
+		if (StrategyControl-waitBufferAllocUSecs  100)
+		{
+			StrategyControl-waitBufferAllocUSecs -= 100;
+			StrategyControl-waitBufferAllocSecs += 1;
+		}
+		StrategyControl-waitBufferAlloc++;
+	}
 
 	/*
 	 * We count buffer allocation requests so that the bgwriter can estimate
@@ -223,7 +261,29 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+	{
+		TimestampTz waitStart = GetCurrentTimestamp();
+		TimestampTz waitEnd;
+		long		wait_secs;
+		int			wait_usecs;
+
+		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+
+		waitEnd = GetCurrentTimestamp();
+
+		TimestampDifference(waitStart, waitEnd,
+		wait_secs, wait_usecs);
+
+		StrategyControl-waitBufferFreeSecs += wait_secs;
+		StrategyControl-waitBufferFreeUSecs += wait_usecs;
+		if (StrategyControl-waitBufferFreeUSecs  100)
+		{
+			StrategyControl-waitBufferFreeUSecs -= 100;
+			StrategyControl-waitBufferFreeSecs += 1;
+		}
+		StrategyControl-waitBufferFree++;
+	}
 
 	/*
 	 * It is possible that we are told to put something in the freelist that
@@ -256,7 +316,30 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
 {
 	int			result;
 
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+	{
+		TimestampTz waitStart = GetCurrentTimestamp();
+		TimestampTz waitEnd;
+		long		wait_secs;
+		int			wait_usecs;
+
+		LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+
+		waitEnd = GetCurrentTimestamp();
+
+		TimestampDifference(waitStart, waitEnd,
+		wait_secs, wait_usecs);
+
+		StrategyControl-waitSyncStartSecs += wait_secs;
+		StrategyControl-waitSyncStartUSecs += wait_usecs;
+		if (StrategyControl-waitSyncStartUSecs  100)
+		{
+			StrategyControl-waitSyncStartUSecs -= 100;
+			StrategyControl-waitSyncStartSecs += 1;
+		}
+		StrategyControl-waitSyncStart++;
+	}
+
 	result = StrategyControl-nextVictimBuffer;
 	if (complete_passes)
 		*complete_passes = StrategyControl-completePasses;
@@ -265,7 +348,59 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
 		*num_buf_alloc = StrategyControl-numBufferAllocs;
 		StrategyControl-numBufferAllocs = 0;
 	}
+	else
+	{
+		long	waitBufferAllocSecs;
+		int		waitBufferAllocUSecs;
+		int		waitBufferAlloc;
+
+		long	waitBufferFreeSecs;
+		int		waitBufferFreeUSecs;
+		int		waitBufferFree;
+
+		long	waitSyncStartSecs;
+		int		waitSyncStartUSecs;
+		int		waitSyncStart;
+
+		waitBufferAllocSecs = StrategyControl-waitBufferAllocSecs;
+		waitBufferAllocUSecs =

Re: [HACKERS] Inline Extension

2012-01-23 Thread Daniel Farina

On Mon, Jan 23, 2012 at 8:17 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Ok, but then, what about .so files?  Wouldn't it make sense to be able
 to ship also the executable modules needed, and if not, why not?

 Now you can dump/restore any extension fully, and we can even ship any
 extension in the WAL stream (a new message is needed though).

Things are still a bit ugly in the more complex cases: consider
PostGIS's linkage against libproj and other libraries.  In order to
cover all cases, I feel that what I need is an optional hook (for the
same of argument, let's say it's another command type hook, e.g.
archive_command) to be executed when extension (un)installation is
occurs on a primary or is replayed on a standby whereby I can acquire
the necessary dependencies for an extension or signal some kind of
error (as to exactly how that interfaces with the server is delicate,
should one want to supply good error messages to the user).

I think that hook could be useful for a number of reasons:

* Extension distribution (as long as we're downloading dependent
libraries, why not get the extension too?)

  * Extension distribution on standbys, too -- by much the same mechanism

* Extension whitelisting for non-superusers (are you allowed/can you
even have that extension?)

And, more to the point, if one wants to make replication and
extensions work nicely together, I don't really see an option outside
such a hook other than insisting on a form of packaging whereby all
dependencies are declared to Postgres and Postgres becomes the
dependency management system for all dependent binary assets.  That
could lead to a more cohesive system, but is also a pretty hefty
burden, both on this project and others.

But getting back to in-line extensions: I think it makes sense to dump
all extensions in their in-line representation even if in some
situations carrying a copy of the extension in the backup is not
strictly necessary.  The only bloat is including the literal
sourcetext of the extension in the dump.  With a
extension-installation hook, the literal version of the extension
could be supplied but ignored if an side-channel mechanism for getting
the extension makes sense.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 7:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Jan 23, 2012 at 3:49 PM, Robert Haas robertmh...@gmail.com wrote:

 The other patches have clearer and specific roles without heuristics
 (mostly), so are at least viable for 9.2, though still requiring
 agreement.

 I think we must also drop removebufmgrfreelist-v1 from consideration,
 ...

 I think you misidentify the patch. Earlier you said it that
 buffreelistlock-reduction-v1 crapped
 out  and I already said that the assumption in the code clearly
 doesn't hold, implying the patch was dropped.

Argh.  I am clearly having a senior moment here, a few years early.
So is it correct to say that both of the patches associated with
message attached to the following CommitFest entry are now off the
table for 9.2?

https://commitfest.postgresql.org/action/patch_view?id=743

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-23 Thread Matthew Draper

On 19/01/12 20:28, Hitoshi Harada wrote:
 (Now it occurred to me that forgetting the #include parse_func.h might
 hit this breakage..., so I'll fix it here and continue to test, but if
 you'll fix it yourself, let me know)
 
 I fixed it here and it now works with my environment.

Well spotted; that's exactly what I'd done. :/


 The regression tests pass, the feature seems working as aimed, but it
 seems to me that it needs more test cases and documentation. For the
 tests, I believe at least we need ambiguous case given upthread, so
 that we can ensure to keep compatibility. For the document, it should
 describe the name resolution rule, as stated in the patch comment.

Attached are a new pair of patches, fixing the missing include (and the
other warning), plus addressing the above.

I'm still not sure whether to just revise (almost) all the SQL function
examples to use parameter names, and declare them the right choice; as
it's currently written, named parameters still seem rather second-class.


 Aside from them, I wondered at first what if the function is
 schema-qualified. Say,
 
 CREATE FUNCTION s.f(a int) RETURNS int AS $$
   SELECT b FROM t WHERE a = s.f.a
 $$ LANGUAGE sql;
 
 It actually errors out, since function-name-qualified parameter only
 accepts function name without schema name, but it looked weird to me
 at first. No better idea from me at the moment, though.

By my reading of (a draft of) the spec, Subclause 6.6, identifier
chain, Syntax Rules 8.b.i-iii, the current behaviour is correct.
But I join you in wondering whether we should permit the function name
to be schema-qualified anyway.


Matthew


-- 
matt...@trebex.net

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
new file mode 100644
index 7064312..cc5b5ef
*** a/doc/src/sgml/xfunc.sgml
--- b/doc/src/sgml/xfunc.sgml
*** SELECT getname(new_emp());
*** 538,556 
  programlisting
  CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$
  UPDATE bank
! SET balance = balance - $2
! WHERE accountno = $1
  RETURNING balance;
  $$ LANGUAGE SQL;
  /programlisting
  
   Here the first parameter has been given the name literalacct_no/,
   and the second parameter the name literaldebit/.
!  So far as the SQL function itself is concerned, these names are just
!  decoration; you must still refer to the parameters as literal$1/,
!  literal$2/, etc within the function body.  (Some procedural
!  languages let you use the parameter names instead.)  However,
!  attaching names to the parameters is useful for documentation purposes.
   When a function has many parameters, it is also useful to use the names
   while calling the function, as described in
   xref linkend=sql-syntax-calling-funcs.
--- 538,580 
  programlisting
  CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$
  UPDATE bank
! SET balance = balance - debit
! WHERE accountno = acct_no
  RETURNING balance;
  $$ LANGUAGE SQL;
  /programlisting
  
   Here the first parameter has been given the name literalacct_no/,
   and the second parameter the name literaldebit/.
!  Named parameters can still be referenced as
!  literal$replaceablen//; in this example, the second
!  parameter can be referenced as literal$2/, literaldebit/,
!  or literaltf1.debit/.
! /para
! 
! para
!  If a parameter is given the same name as a column that is available
!  in the query, the name will refer to the column. To explicitly
!  refer to the parameter, you can qualify its name with the name of
!  the containing function. For example,
! 
! programlisting
! CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
! UPDATE bank
! SET balance = balance - debit
! WHERE accountno = tf1.accountno
! RETURNING balance;
! $$ LANGUAGE SQL;
! /programlisting
! 
!  This time, the first parameter has been given the ambiguous name
!  literalaccountno/.
!  Notice that inside the function body, literalaccountno/ still
!  refers to literalbank.accountno/, so literaltf1.accountno/
!  must be used to refer to the parameter.
! /para
! 
! para
   When a function has many parameters, it is also useful to use the names
   while calling the function, as described in
   xref linkend=sql-syntax-calling-funcs.

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
new file mode 100644
index 5642687..fe87990
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
***
*** 23,28 
--- 23,29 
  #include nodes/makefuncs.h
  #include nodes/nodeFuncs.h
  #include parser/parse_coerce.h
+ #include parser/parse_func.h
  #include tcop/utility.h
  #include utils/builtins.h
  #include utils/datum.h
*** typedef SQLFunctionCache *SQLFunctionCac
*** 115,121 
--- 116,124 
   */

Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-23 Thread Peter Geoghegan

On 22 January 2012 05:30, Peter Geoghegan pe...@2ndquadrant.com wrote:
 The syntax for constants is sufficiently simple that I think that a
 good set of regression tests should make this entirely practicable,
 covering all permutations of relevant factors affecting how the
 implementation should act, including for example
 standard_conforming_strings. There's no reason to think that the SQL
 syntax rules for constants should need to change very frequently, or
 even at all, so we should be fine with just knowing the starting
 position. It's quicker and easier to do it this way than to argue the
 case for my original implementation, so that's what I'll do. Whatever
 overhead this independent, pg_stat_statements-internal const parsing
 may impose, it will at least only be paid once per query, when we
 first require a canonicalised representation of the query for the
 pg_stat_statements view.

I've decided that a better approach to this problem is to use the
low-level scanner API (declared in scanner.h) which is currently
exclusively used for plpgsql. This seems to work well, as I'm using
the authoritative scanner to scan constants. Obviously this does not
imply that everyone must pay any overhead, so this seems like the best
of both worlds.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-23 Thread Greg Smith


Robert Haas wrote:

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.



+1 for separating those.
  


I'm tied up with some on-site things until Friday, can rev the patch 
with this in mind (and clear some of my overall review work pile) then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-23 Thread Greg Smith


Benedikt Grundmann wrote:
What I think is missing is a clear way to know if you are vacuuming 
(and analyzing) enough, and how much you are paying for that.  
  


Any good way to measure if you're vacuuming a particular table enough 
needs to note how much free space is in that table and its indexes over 
time.  That's why one of the other building blocks in the submission 
queue for 9.2 is a function to make that easier to do.  It's one of the 
important dependencies to get settled before we can move very far toward 
answering am I vacuuming enough?.


A second piece to that is recording a history of that information over 
time.  Can't predict the future need for something without some record 
of its past to extrapolate from.  That's probably a job better suited 
for an external tool.  The way you'd want to audit it most easily is to 
graph it over time, which isn't the sort of thing PostgreSQL is likely 
to build in.  Also, the proof of whether a suggested implementation for 
a vacuum meter was useful or not would be easiest to validate that 
way.  No sense in doing the difficult work of building one until there's 
a working prototype, which is possible to do more quickly in languages 
other than C.


A simple meter might not be possible to create even with some better 
building blocks to base it on.  There is a lot of difference in this 
area that is workload dependent, and there are many types of database 
workloads out there.  The two hardest systems to tune vacuum for that I 
work on have settled on completely different approaches to the problem.  
The only thing I've found so far that is true about both of them is that 
they'd really appreciate easier controls on the maximum rate.



At the moment we are basically changing the knobs blindly based on
some back of the envelope calculations and hearsay.  Than sometimes
month later we find out that eps we haven't been analyzing enough
and that's why on that particular table the planner is now picking
a bad query.
  


Unlike VACUUM, ANALYZE is so cheap to run that it's possible to improve 
this situation more easily--just do it a lot more.  Lowering 
autovacuum_analyze_scale_factor is the easiest way.  By default that is 
0.10, requiring approximately a 10% change in the table size before a 
new ANALYZE is done.  I think the lowest production setting I have for 
that somewhere is 0.03 on a roughly terabyte scale database.  There a 
10% change in one the larger tables is well over the point of impacting 
query plans badly.


If your data changes its character quite frequently based on new 
information, I wouldn't be afraid in that case to drop as low as 0.01 
here.  That would give you ANALYZE that happened 10X as often as it does 
now.  You'll waste a moderate amount of CPU and disk resources, but a 
tuning error that leans toward analyzing too frequently isn't that 
expensive.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums

2012-01-23 Thread Robert Treat

On Sat, Jan 21, 2012 at 6:12 PM, Jim Nasby j...@nasby.net wrote:
 On Jan 10, 2012, at 3:07 AM, Simon Riggs wrote:
 I think we could add an option to check the checksum immediately after
 we pin a block for the first time but it would be very expensive and
 sounds like we're re-inventing hardware or OS features again. Work on
 50% performance drain, as an estimate.

 That is a level of protection no other DBMS offers, so that is either
 an advantage or a warning. Jim, if you want this, please do the
 research and work out what the probability of losing shared buffer
 data in your ECC RAM really is so we are doing it for quantifiable
 reasons (via old Google memory academic paper) and to verify that the
 cost/benefit means you would actually use it if we built it. Research
 into requirements is at least as important and time consuming as
 research on possible designs.

 Maybe I'm just dense, but it wasn't clear to me how you could use the 
 information in the google paper to extrapolate data corruption probability.

 I can say this: we have seen corruption from bad memory, and our Postgres 
 buffer pool (8G) is FAR smaller than
 available memory on all of our servers (192G or 512G). So at least in our 
 case, CRCs that protect the filesystem
 cache would protect the vast majority of our memory (96% or 98.5%).

Would it be unfair to assert that people who want checksums but aren't
willing to pay the cost of running a filesystem that provides
checksums aren't going to be willing to make the cost/benefit trade
off that will be asked for? Yes, it is unfair of course, but it's
interesting how small the camp of those using checksummed filesystems
is.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-23 Thread Robert Treat

On Mon, Jan 23, 2012 at 3:21 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:
 On 19/01/12 17:39, Greg Smith wrote:
 On 1/19/12 1:10 PM, Robert Haas wrote:
 I have to say that I find that intensely counterintuitive.  The
 current settings are not entirely easy to tune correctly, but at least
 they're easy to explain.

 If there's anyone out there who has run a larger PostgreSQL database
 and not at some point been extremely frustrated with how the current
 VACUUM settings are controlled, please speak up and say I'm wrong
 about this. I thought it was well understood the UI was near unusably
 bad, it just wasn't obvious what to do about it.

 We are frustrated but mostly our frustration is not about the
 somewhat inscrutable knobs but the inscrutable meters or lack
 there of.


I keep thinking Greg has mistaken happiness with the MB based info in
the vacuum patch as being happy without the output format, when really
it is all about increased visibility. (For the record, we've
backpatched that initial change to a large number of our customers,
just cause we're a bit zealous about monitoring).

 Postgres (auto or manual for that matter) vacuuming and analyzing
 is essentially a performance tuning problem without a good way to
 measure the current performance, the fact that the knobs to turn
 are confusing as well is secondary.

 What I think is missing is a clear way to know if you are vacuuming
 (and analyzing) enough, and how much you are paying for that.

 At the moment we are basically changing the knobs blindly based on
 some back of the envelope calculations and hearsay.  Than sometimes
 month later we find out that eps we haven't been analyzing enough
 and that's why on that particular table the planner is now picking
 a bad query.


Hmm, I've always thought the answer here is just a systematic approach
to operations. We monitor free space across the system (along with a
bunch of other stuff) so that we know when we're not vacuuming /
analyzing enough.

 What I want is that page

 http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html

 to start with Here is how you know if you are vacuuming enough...

 In an ideal world one would like some meter in a statistics table
 or similar that returns a percentage 100% means just enough 50%
 means you have to double 150% means 50% too much (e.g. wasted)...
 But I could do with a boolean as well.  A complicated extension
 and the recommendation to install 3 different extensions would
 be better than what is there right now but only very barely. Of
 course a meter wouldn't tell you that if traffic doubled you would
 still keep up and for that you need a complicated calculation or
 (you just keep looking at the meter and adjust).

 But at the moment there is no such meter (at least I don't know
 of it) and that is the actual problem.


These pieces are out there. I guess I'd say they are crude, but you
can get a handle on it. Of course, if your problem is with analyze,
that's cheap enough that you should probably just do it more. We're
probably a lot more agressive on our vacuum / analyze scale settings
than some people (we cut the defaults in half as a matter of course),
and I come from the don't limit stuff camp too, but by and large
what we do works, even if it's more black magic than people would
like. :-)

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Tatsuo Ishii

 ** pgbench, permanent tables, scale factor 100, 300 s **
 1 group-commit-2012-01-21 614.425851 -10.4%
 8 group-commit-2012-01-21 4705.129896 +6.3%
 16 group-commit-2012-01-21 7962.131701 +2.0%
 24 group-commit-2012-01-21 13074.939290 -1.5%
 32 group-commit-2012-01-21 12458.962510 +4.5%
 80 group-commit-2012-01-21 12907.062908 +2.8%

Interesting. Comparing with this:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00804.php
you achieved very small enhancement. Do you think of any reason which
makes the difference?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basic pgbench runs with various performance-related patches

2012-01-23 Thread Peter Geoghegan

On 24 January 2012 06:26, Tatsuo Ishii is...@postgresql.org wrote:
 ** pgbench, permanent tables, scale factor 100, 300 s **
 1 group-commit-2012-01-21 614.425851 -10.4%
 8 group-commit-2012-01-21 4705.129896 +6.3%
 16 group-commit-2012-01-21 7962.131701 +2.0%
 24 group-commit-2012-01-21 13074.939290 -1.5%
 32 group-commit-2012-01-21 12458.962510 +4.5%
 80 group-commit-2012-01-21 12907.062908 +2.8%

 Interesting. Comparing with this:
 http://archives.postgresql.org/pgsql-hackers/2012-01/msg00804.php
 you achieved very small enhancement. Do you think of any reason which
 makes the difference?

Presumably this system has a battery-backed cache, whereas my numbers
were obtained on my laptop.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums

2012-01-23 Thread Florian Weimer

* Robert Treat:

 Would it be unfair to assert that people who want checksums but aren't
 willing to pay the cost of running a filesystem that provides
 checksums aren't going to be willing to make the cost/benefit trade
 off that will be asked for? Yes, it is unfair of course, but it's
 interesting how small the camp of those using checksummed filesystems
 is.

Don't checksumming file systems currently come bundled with other
features you might not want (such as certain vendors)?

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

89 matches

Mail list logo