Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Itagaki Takahiro
On Wed, Oct 20, 2010 at 6:39 AM, Terry Laurenzo t...@laurenzo.org wrote:
 The answer may be to have both a jsontext and jsonbinary type as each will
 be optimized for a different case.

I want to choose one format for JSON rather than having two types.
It should be more efficient than other format in many cases,
and not so bad in other cases.

I think the discussion was started with
  BSON could represent was a subset of what JSON could represent.
So, any binary format could be acceptable that have enough
representational power compared with text format.

For example, a sequence of byte-length text could reduce
CPU cycles for reparsing and hold all of the input as-is except
ignorable white-spaces. It is not a BSON, but is a binary format.

Or, if we want to store numbers in binary form, I think the
format will be numeric type in postgres. It has high precision,
and we don't need any higher precision than it to compare two
numbers eventually. Even if we use BSON format, we need to extend
it to store all of numeric values, that precision is 10^1000.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Terry Laurenzo
Good points.  In addition, any binary format needs to support object
property traversal without having to do a deep scan of all descendants.
BSON handles this with explicit lengths for document types (objects and
arrays) so that entire parts of the tree can be skipped during sibling
traversal.

It would also be nice to make sure that we store fully parsed strings.
There are lots of escape options that simply do not need to be preserved (c
escapes, unicode, octal, hex sequences) and hinder the ability to do direct
comparisons.  BSON also makes a small extra effort to ensure that object
property names are encoded in a way that is easily comparable, as this will
be the most frequently compared items.

I'm still going to write up a proposed grammar that takes these items into
account - just ran out of time tonight.

Terry

On Wed, Oct 20, 2010 at 12:46 AM, Itagaki Takahiro 
itagaki.takah...@gmail.com wrote:

 On Wed, Oct 20, 2010 at 6:39 AM, Terry Laurenzo t...@laurenzo.org wrote:
  The answer may be to have both a jsontext and jsonbinary type as each
 will
  be optimized for a different case.

 I want to choose one format for JSON rather than having two types.
 It should be more efficient than other format in many cases,
 and not so bad in other cases.

 I think the discussion was started with
  BSON could represent was a subset of what JSON could represent.
 So, any binary format could be acceptable that have enough
 representational power compared with text format.

 For example, a sequence of byte-length text could reduce
 CPU cycles for reparsing and hold all of the input as-is except
 ignorable white-spaces. It is not a BSON, but is a binary format.

 Or, if we want to store numbers in binary form, I think the
 format will be numeric type in postgres. It has high precision,
 and we don't need any higher precision than it to compare two
 numbers eventually. Even if we use BSON format, we need to extend
 it to store all of numeric values, that precision is 10^1000.

 --
 Itagaki Takahiro



Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Robert Haas robertmh...@gmail.com writes:
 It seems good to do this in the normal case, but (1) if
 client_min_messages was already set higher than WARNING, we probably
 should not lower it and (2) we might want to have a way to lower it
 for troubleshooting purposes.

 I think the standard way of troubleshooting would be to run the
 extension's script by hand, no?  So while I agree with (1),
 I'm not sure we need to sweat about (2).

Will go and fix (1), but like Tom I think (2) is handled by the
extension's author running pg_execute_from_file() rather than CREATE
EXTENSION for debugging purposes: then the settings are not changed.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Tom Lane wrote:
Stephen R. van den Berg s...@cuci.nl writes:
 In order to simplify recovery at this point (enormously), it would
 have been very helpful (at almost negligible cost), to have the name
 of the table, the name of the columns, and the types of the
 columns available.

 Why don't we insert that data into the first page of a regular table
 file after in the special data area?

(1) it wouldn't necessarily fit

Three viable options (pick any one, depending on other criteria):
a. If it doesn't fit at first, drop columnnames, and try again.
b. If necessary extend it into the special data area of the following page
   (repeat until you stored everything).
c. Simply put in what fits and discard the rest.

Please note that the information in there is purely informational
and of a best-effort nature.  It is not required for regular operation.
It should have close to no performance impact in the normal use case.
It is meant to help with forensics if the catalog is damaged or lost,
and you still want to attempt to recover most of the data contained
in this tablefile.

(2) what are you going to do to maintain it during ALTER TABLE?

Simply pick a point in time where it will be blatantly overwritten.
Either at reception of the command, or at commit time, whatever
is more convenient to implement.  It's a best effort service, it
is not a problem if we get it wrong sometimes due to wildly
overlapping alter table/commit/rollback sequences.

(3) if there are any custom types involved, you're still lost.

Yes, this is a not a complete solution, it's a valid attempt at
making forensics a lot easier in the common case at virtually
no cost to the running system.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Installer Fix on some Windows 7 64-bit Systems

2010-10-20 Thread Humair Mohammed
One some Windows machines the command processor is not invoked properly when 
running the WScript.Run method. The same PostgreSQL 9.0.1-1 one click installer 
that worked perfrectly fine on one Windows 7 64-bit machine did not work as 
expected on another machine with same OS. Adding the command processor 
explicity: 

%comspec% /c   

to the first parameter on all objShell.Run calls in the installer WScript files 
below resolves this issue!

For installruntimes.vbs file used the following mod:
iRet = objShell.Run(%comspec% /c strPackage   /q:a /c:msiexec 
/i vcredist.msi /qb!, 0, True)

For initcluster.vbs, loadmodules.vbs and startupcfg.vbs used the following mod:
DoCmd = objShell.Run(%comspec% /c   objTempFolder.Path  \  strBatchFile, 
0, True)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Roberto Mello wrote:
On Tue, Oct 19, 2010 at 6:13 PM, Stephen R. van den Berg s...@cuci.nl wrote:
 Greg Stark wrote:
premise this on the idea that you've lost everything in the catalog
but not the data in other tables. Which seems like a narrow use case.

 It happens, more often than you'd think. ??My client had it, I've
 seen numerous google hits which show the same.

It happened to us recently when a customer had disk issues, and we

It usually happens when there are disk issues, that's exactly what it is for.

A tool like Stephen is proposing would most likely have helped us
recover at least some or most of the data, I would hope.

Well, because the customer could recreate (within reason) the original
table definitions, we were able to recover all of his data (12 tables,
including some toasted/compressed).

It's just that matching table and file, and subsequently figuring out
some missing columns which may have been added/removed later,
can be rather timeconsuming and could be made a lot easier (not necessarily
perfect) if that information would have been present in the first page of
a file.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ISN patch that applies cleanly with git apply

2010-10-20 Thread Peter Geoghegan
 I have committed the patch and the text proposed above.

Can I take it that there is no need for a formal review, where I
answer various questions per
http://wiki.postgresql.org/wiki/Reviewing_a_Patch?


-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-20 Thread Dimitri Fontaine
Josh Berkus j...@agliodbs.com writes:
 Well, one thing to be addressed is separating the PITR functionality from
 replication.  PITR needs a lot of features -- timelines, recovery stop
 points, etc. -- which replication doesn't need or want.  I think that
 focussing on streaming replication functionality and ignoring the archive
 logs case is probably the best way to logically separate these two.
 Presumably anyone who needs archive logs as well will be a professional DBA.

So, I've been thinking some more about this.

We now have two modes of operation when starting up a PostgreSQL
cluster, either it's a primary/master or it's a standby (in recovery or
hot_standby). What I think would make sense here would be to add another
mode of operation, archiving.

A cluster that is archiving will accept replication connections and will
accept WAL files streamed there, that it will put on its archive
directory, defaults to $PGDATA/pg_xlog_archive. It should also be able
to take a base backup from its primary server, maybe with a new pg_ctl
command. The base backup location defaults to
$PGDATA/pg_basebackup/$label.

Then, it would also accept replication connection in the other way
around, starting a walsender to publish its archive or its base backup,
so that you could prepare a new standby against the archive server, then
switch the primary_conninfo to the master and that's it.

Now, the archive and restore commands in this setup would be internal
commands pg_walsender and pg_walreceiver. That would mean we need to
extend those spacial backends to be able to send and receive a full WAL
file but it sounds simple enough, right?

Then we could add support for those to compress the WAL files before
streaming, as we're talking internal commands here it sounds easy
enough (we're already linking against gzip I think).

The base backup streaming support would be there for the archiving mode
but also for the normal standby empty $PGDATA initial start up.

Comments?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ISN patch that applies cleanly with git apply

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:12 AM, Peter Geoghegan
peter.geoghega...@gmail.com wrote:
 I have committed the patch and the text proposed above.

 Can I take it that there is no need for a formal review, where I
 answer various questions per
 http://wiki.postgresql.org/wiki/Reviewing_a_Patch?

That is correct.  That page is just a list of things to think about,
anyway; a review needn't be in exactly that format (in fact I'm not
sure it was even intended to be used that way).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:22 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 In v6 patch, should client_min_messages or log_min_messages be lower
 than WARNING, they get set to WARNING for the script install context. We
 still dump the extension's script at each WARNING, but you can set your
 client_min_messages (and log_min_messages) to ERROR before hand.

I would vote for overriding client_min_messages but not log_min_messages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ISN patch that applies cleanly with git apply

2010-10-20 Thread Peter Eisentraut
On ons, 2010-10-20 at 11:12 +0100, Peter Geoghegan wrote:
 Can I take it that there is no need for a formal review, where I
 answer various questions per
 http://wiki.postgresql.org/wiki/Reviewing_a_Patch?

The short answer is no.  But note that there is no such thing as a
formal review.  The page you link to is only a guideline if you set
out to review something and don't know what to do.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 1:06 AM, Greg Smith g...@2ndquadrant.com wrote:
 Josh Berkus wrote:

 Well, now that you mention it, I also think that hot standby should be
 the default.  Yes, I know about the overhead, but I also think that the
 number of our users who want easy replication *far* outnumber the users
 who care about an extra 10% WAL overhead.

 I think this whole situation is similar to the resistance to increasing
 default_statistics_target.  There's additional overhead added by enabling
 both of these settings, in return for making it more likely for the average
 person to see useful behavior by default.  If things are rejiggered so the
 advanced user has to turn things off in order to get optimal performance
 when not using these features, in return for the newbie being more likely to
 get things working in basic form, that's probably a net win for PostgreSQL
 advocacy.  But much like default_statistics_target, there needs to be some
 more formal work done on quantifying just how bad each of these overheads
 really are first.  We lost 3-7% on multiple simple benchmarks between 8.3
 and 8.4[1] because of that retuning for ease of use on real-world workloads,
 and that's not something you want to repeat too often.

Exactly.  It doesn't take many 3-7% slowdowns to add up to being 50%
or 100% slower, and that sucks.  In fact, I'm still not convinced that
we were wise to boost default_statistics_target as much as we did.  I
argued for a smaller boost at the time.

Actually, I think the best thing for default_statistics_target might
be to scale the target based on the number of rows in the table, e.g.
given N rows:

10 + (N / 1000), if N  40,000
46 + (N / 1), if 50,000  N  3,540,000
400, if N  3,540,000

Consider a table with 2,000 rows.  With default_statistics_target =
100, we can store up to 100 MCVs; and we break the remaining ~1900
values up into 100 buckets with 19 values/bucket.  In most cases, that
is probably overkill.  Where you tend to run into problems with
inadequate statistics is with the values that are not quite common
enough to be an MCV, but are still significantly more common than
their companions in the same bucket.  However, with only 19 values in
a bucket, you're probably not going to have that problem.  If you
scale the table down to 1000 rows you now have 9 values in a bucket,
which makes it *really* unlikely you're going to have that problem.
On the other hand, on a table with 4 million rows, it is entirely
likely that there could be more than 100 values whose frequencies are
worth tracking individually, and odds are good also that even if the
planning time is a little longer to no purpose, it'll still be small
relatively to the query execution time.  It's unfortunately
impractical for the size of the MCV list to track linearly with the
size of the table, because there are O(n^2) algorithms in use, but I
think some kind of graduated scheme might enable us to buy back some
of that lost performance without damaging real workloads very much.
Possibly even helping real workloads, because you may very well join
large fact tables against small dimension tables, and odds are good
that under the present scheme the fact tables have more statistics
than they really need.

As to replication, I don't believe the contention that most people
will want to use replication.  Many will, and that is fine, but many
also won't.  The world is full of development and test machines where
replication is a non-issue, and some people won't even run it in
production because the nature of their application makes the data on
that box non-essential, or because they replicate with Bucardo or
Slony.  I completely agree that we should make it easier to get
replication set up without (multiple) server restarts, but imposing a
performance overhead on untuned systems is not the right way to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Robert Haas
On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote:
 On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
 On 20/10/10 16:05, Mark Kirkwood wrote:
 
 
 shmget and friends are hugetlbpage  aware, so it seems it should 'just
 work'.
 

 Heh - provided you specify

 SHM_HUGETLB


 in the relevant call that is :-)

 I had a patch for this against 8.3 that I could update if there is any
 interest. I suspect it is helpful.

I think it would be a good feature.  Of course, we would need
appropriate documentation, and some benchmarks showing that it really
works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I would vote for overriding client_min_messages but not log_min_messages.

Well it defaults to WARNING so I see your point. Then again, we're
talking about hundreds of lines (3197 lines of isn, 531 lines for
hstore) of output per message, containing a script that you're not
maintaining. Do we really want that amount of extra logging?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 9:33 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I would vote for overriding client_min_messages but not log_min_messages.

 Well it defaults to WARNING so I see your point. Then again, we're
 talking about hundreds of lines (3197 lines of isn, 531 lines for
 hstore) of output per message, containing a script that you're not
 maintaining. Do we really want that amount of extra logging?

Well, my thought was that it makes sense to override the user's
logging preferences because, after all, if they wanted the extra
logging, they could run the script by hand.  But what gets logged to
the system log is server policy, not user preference, and I'm
reluctant to think we should second-guess whatever the admin has
configured.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] leaky views, yet again

2010-10-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I get the impression that you think that there's a problem not only
 with the approach but with any approach whatsoever to that underlying
 problem.

Let's just say that the approaches proposed so far have performance
and/or functionality and/or code maintenance penalties that are utterly
unacceptable from the standpoint of people who don't need RLS.  I don't
know if there is a workable solution, but I do know I've not seen one.

 With respect to selectivity estimation, do we have a live bug there
 now?

No, I don't believe so.  Given that you'd like to get the planner to
call function XYZ, you could create an operator using XYZ and attach to
it one of the estimation functions that will actually call the
underlying function --- but you have to have call permission on the
function in order to create the operator.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We've already accepted the cost of doing getBaseTypeAndTypmod() in a
 whole lot of performance-critical parsing paths, on the off chance that
 the target datatype might be a domain.  It's not apparent to me that
 array subscripting is so important as to deserve an exemption from that.
 Especially when not doing so doesn't work.

 Hmm... so are there no cases where zeroing out the typelem will cost
 us an otherwise-unnecessary syscache lookup?

My point is that anyplace that is relying on the surface typelem,
without drilling down to see what the base type is, is wrong.
So yeah, those lookups are (will be) necessary.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote:
 On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
 Heh - provided you specify
 SHM_HUGETLB
 in the relevant call that is :-)

 I had a patch for this against 8.3 that I could update if there is any
 interest. I suspect it is helpful.

 I think it would be a good feature.  Of course, we would need
 appropriate documentation, and some benchmarks showing that it really
 works.

I believe that for the equivalent Solaris option, we just automatically
enable it when available.  So there'd be no need for user documentation.
However, I definitely *would* like to see some benchmarks proving that
the change actually does something useful.  I've always harbored the
suspicion that this is just a knob to satisfy people who need knobs to
frob.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Itagaki Takahiro itagaki.takah...@gmail.com writes:
 On Wed, Oct 20, 2010 at 12:58 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Lets rename the directory.

 Hmmm, but we call it 'xml2' in the doc. There is no 'pgxml' at all in it.
 http://developer.postgresql.org/pgdocs/postgres/xml2.html

 However, I don't think we can change the module name because pg_upgrade
 will fail if the module (.so) name was changed. So, it might be the
 point of compromise to keep two names until we deprecate it completely.

If the extensions manager is dependent on the assumption that a module's
name matches the name of the directory it's built in, that assumption
needs to be removed anyway.  There are too many use-cases where that
wouldn't hold, even if we try to force the standard contrib modules to
follow such a rule.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Josh Berkus wrote:
 Well, now that you mention it, I also think that hot standby should be
 the default.  Yes, I know about the overhead, but I also think that the
 number of our users who want easy replication *far* outnumber the users
 who care about an extra 10% WAL overhead.

 ... But much like 
 default_statistics_target, there needs to be some more formal work done 
 on quantifying just how bad each of these overheads really are first.  

Quite.  Josh, have you got any evidence showing that the penalty is
only 10%?  There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose.  I'm not sure what the
penalty for typical workloads is, partly because I'm not sure what
should be considered a typical workload for this purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 If the extensions manager is dependent on the assumption that a module's
 name matches the name of the directory it's built in

It is not. There's some magic for simple cases so that contrib mostly
works with no editing, but of course, that's only mostly.

The version Itakagi-San worked with had not a single change to the
contrib sources, I've only begun to change things there (in v6) with the
spi case, that now produces 5 extensions control files out of a single
Makefile, thanks to this single new line:

CONTROL = $(addsuffix .control, $(MODULES))

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Tom Lane
Stephen R. van den Berg s...@cuci.nl writes:
 It's just that matching table and file, and subsequently figuring out
 some missing columns which may have been added/removed later,
 can be rather timeconsuming and could be made a lot easier (not necessarily
 perfect) if that information would have been present in the first page of
 a file.

So you've already moved the goalposts from what was claimed in your
prior message.  If the data is not maintained (with 100% reliability)
during ALTER TABLE, how are you going to do something like figure out
missing columns?

I can see the potential usefulness of a self-documenting table storage
format, but this proposal isn't that; it's just an unreliable kluge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Tatsuo Ishii
 pg_is_in_recovery() returns a bool, are you proposing to change that?
 
 No. I just thought about adding more condition when it returns true.

Here is the patch. Comments are welcome!

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 5604,5610  GetLatestXTime(void)
  Datum
  pg_is_in_recovery(PG_FUNCTION_ARGS)
  {
!   PG_RETURN_BOOL(RecoveryInProgress());
  }
  
  /*
--- 5604,5613 
  Datum
  pg_is_in_recovery(PG_FUNCTION_ARGS)
  {
!   /* use volatile pointer to prevent code rearrangement */
!   volatile WalRcvData *walrcv = WalRcv;
! 
!   PG_RETURN_BOOL(RecoveryInProgress()  walrcv-walRcvState == 
WALRCV_RUNNING);
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 20, 2010 at 6:22 AM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
 In v6 patch, should client_min_messages or log_min_messages be lower
 than WARNING, they get set to WARNING for the script install context. We
 still dump the extension's script at each WARNING, but you can set your
 client_min_messages (and log_min_messages) to ERROR before hand.

 I would vote for overriding client_min_messages but not log_min_messages.

Why?  The problem with unreasonably bulky messages is just as
objectionable for the log.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Kenneth Marshall
On Wed, Oct 20, 2010 at 10:10:00AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote:
  On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
  Heh - provided you specify
  SHM_HUGETLB
  in the relevant call that is :-)
 
  I had a patch for this against 8.3 that I could update if there is any
  interest. I suspect it is helpful.
 
  I think it would be a good feature.  Of course, we would need
  appropriate documentation, and some benchmarks showing that it really
  works.
 
 I believe that for the equivalent Solaris option, we just automatically
 enable it when available.  So there'd be no need for user documentation.
 However, I definitely *would* like to see some benchmarks proving that
 the change actually does something useful.  I've always harbored the
 suspicion that this is just a knob to satisfy people who need knobs to
 frob.
 
   regards, tom lane
 

Oracle apparently uses hugepages if they are available by first trying
with the SHM_HUGETLB option. If it fails, they reissue the command
without that option. This article does mention some of the benefits
of the larger pagesizes with large shared memory regions:

http://appcrawler.com/wordpress/?p=686

Regard,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 17:19, Tom Lane wrote:

Greg Smithg...@2ndquadrant.com  writes:

Josh Berkus wrote:

Well, now that you mention it, I also think that hot standby should be
the default.  Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.



... But much like
default_statistics_target, there needs to be some more formal work done
on quantifying just how bad each of these overheads really are first.


Quite.  Josh, have you got any evidence showing that the penalty is
only 10%?  There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose.  I'm not sure what the
penalty for typical workloads is, partly because I'm not sure what
should be considered a typical workload for this purpose.


Going from wal_level='minimal' to 'archivë́' incurs the penalty on 
WAL-logging COPY etc. That's a big penalty. However, the difference 
between wal_level='archive' and wal_level='hot_standby' should be tiny.


The big reason for separating those two in 9.0 was that it's all new 
code with new ways to fail and, yes, new bugs. It's not smart to expose 
people who are not interested in using hot standby to those issues. But 
maybe we feel more comfortable merging 'archive' and 'hot_standby' 
levels in 9.1.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 17:32, Tatsuo Ishii wrote:

pg_is_in_recovery() returns a bool, are you proposing to change that?


No. I just thought about adding more condition when it returns true.


Here is the patch. Comments are welcome!
...
   Datum
   pg_is_in_recovery(PG_FUNCTION_ARGS)
   {
!   /* use volatile pointer to prevent code rearrangement */
!   volatile WalRcvData *walrcv = WalRcv;
!
!   PG_RETURN_BOOL(RecoveryInProgress()  walrcv-walRcvState == 
WALRCV_RUNNING);
   }



This returns 'false' if you're in hot standby mode running against an 
archive. That seems wrong, I don't think the walreceiver state should 
play any role in this.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Going from wal_level='minimal' to 'archivë́' incurs the penalty on 
 WAL-logging COPY etc. That's a big penalty. However, the difference 
 between wal_level='archive' and wal_level='hot_standby' should be tiny.

I'm not sure I believe that either, because of the costs associated with
logging lock acquisitions.

We really need some actual benchmarks in this area, rather than
handwaving ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Tom Lane wrote:
Stephen R. van den Berg s...@cuci.nl writes:
 It's just that matching table and file, and subsequently figuring out
 some missing columns which may have been added/removed later,
 can be rather timeconsuming and could be made a lot easier (not necessarily
 perfect) if that information would have been present in the first page of
 a file.

So you've already moved the goalposts from what was claimed in your
prior message.  If the data is not maintained (with 100% reliability)
during ALTER TABLE, how are you going to do something like figure out
missing columns?

Most alter table operations are well thought through and rarely undone
(at least not on production databases).  This means that most tables
can be restored.

I can see the potential usefulness of a self-documenting table storage
format, but this proposal isn't that; it's just an unreliable kluge.

Restoring tables/databases from table storage only is, by definition,
an unreliable kludge.  I'm not opposed to making the definition storage
more robust, but, since the records in the table already have lost their
relation to the pg_clog records, and therefore it *already* is uncertain
which records were deleted and/or have the wrong number of columns, it
seems to be a needless waste of time and energy to provide more reliable
information about the column structure.

I know for a fact that those who have lost data in such a way, and are
faced with the option to have this unreliable kludgy information
available now, or wait for a few years/months until a reliable solution
is present; they would (in every single case) opt for the former and
get at least some (if not all) of their data back in a shorter amount
of time.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010:
 Itagaki Takahiro itagaki.takah...@gmail.com writes:

   CREATE EXTENSION command 
  * Environment could be modified by the installer script.
  =# SHOW search_path; = $user,public
  =# CREATE EXTENSION dblink;
  =# SHOW search_path; = public
  because almost all of the modules have SET search_path in the scripts:
-- Adjust this setting to control where the objects get created.
SET search_path = public;
 
  Is is an intended behavior? Personally, I want the installer to run in 
  sandbox.
  One idea is to rewrite module scripts to use BEGIN - SET LOCAL - COMMIT,
  but we cannot execute CREATE EXTENSION in transaction if do so.
 
 Using SPI to execute the extension's script already means that it can
 not contain explicit BEGIN and COMMIT commands. Now, is it possible to
 force a Reset of all GUCs after script's execution?

Would it work to force a new transaction internally in CREATE EXTENSION,
and use the equivalent of SET LOCAL in the CREATE EXTENSION code?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 If the extensions manager is dependent on the assumption that a module's
 name matches the name of the directory it's built in

 It is not. There's some magic for simple cases so that contrib mostly
 works with no editing, but of course, that's only mostly.

 The version Itakagi-San worked with had not a single change to the
 contrib sources,

I don't think that no changes to the makefiles is a requirement,
or even a wish-list item, for this.  I think it's perfectly reasonable
for the makefile to have to specify the module name; far better that
than that we get the name by some magic or other.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We've already accepted the cost of doing getBaseTypeAndTypmod() in a
 whole lot of performance-critical parsing paths, on the off chance that
 the target datatype might be a domain.  It's not apparent to me that
 array subscripting is so important as to deserve an exemption from that.
 Especially when not doing so doesn't work.

 Hmm... so are there no cases where zeroing out the typelem will cost
 us an otherwise-unnecessary syscache lookup?

 My point is that anyplace that is relying on the surface typelem,
 without drilling down to see what the base type is, is wrong.
 So yeah, those lookups are (will be) necessary.

OK.  In that case, +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] leaky views, yet again

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I get the impression that you think that there's a problem not only
 with the approach but with any approach whatsoever to that underlying
 problem.

 Let's just say that the approaches proposed so far have performance
 and/or functionality and/or code maintenance penalties that are utterly
 unacceptable from the standpoint of people who don't need RLS.  I don't
 know if there is a workable solution, but I do know I've not seen one.

 With respect to selectivity estimation, do we have a live bug there
 now?

 No, I don't believe so.  Given that you'd like to get the planner to
 call function XYZ, you could create an operator using XYZ and attach to
 it one of the estimation functions that will actually call the
 underlying function --- but you have to have call permission on the
 function in order to create the operator.

But suppose the operator already exists, but I don't have permission
to call the underlying function... can I get the planner to call the
function for me by EXPLAIN-ing the right query?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:35 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 20.10.2010 17:32, Tatsuo Ishii wrote:

 pg_is_in_recovery() returns a bool, are you proposing to change that?

 No. I just thought about adding more condition when it returns true.

 Here is the patch. Comments are welcome!
 ...
   Datum
   pg_is_in_recovery(PG_FUNCTION_ARGS)
   {
 !       /* use volatile pointer to prevent code rearrangement */
 !       volatile WalRcvData *walrcv = WalRcv;
 !
 !       PG_RETURN_BOOL(RecoveryInProgress()  walrcv-walRcvState ==
 WALRCV_RUNNING);
   }


 This returns 'false' if you're in hot standby mode running against an
 archive. That seems wrong, I don't think the walreceiver state should play
 any role in this.

I think what we need is a second function, not a change to the
definition of this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:

 Actually, I think the best thing for default_statistics_target might
 be to scale the target based on the number of rows in the table, e.g.
 given N rows:
 
 10 + (N / 1000), if N  40,000
 46 + (N / 1), if 50,000  N  3,540,000
 400, if N  3,540,000
 
 Consider a table with 2,000 rows.  With default_statistics_target =
 100, we can store up to 100 MCVs; and we break the remaining ~1900
 values up into 100 buckets with 19 values/bucket.

Maybe what should be done about this is to have separate sizes for the
MCV list and the histogram, where the MCV list is automatically sized
during ANALYZE.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010:
 Using SPI to execute the extension's script already means that it can
 not contain explicit BEGIN and COMMIT commands. Now, is it possible to
 force a Reset of all GUCs after script's execution?

 Would it work to force a new transaction internally in CREATE EXTENSION,

No, but maybe a savepoint?

 and use the equivalent of SET LOCAL in the CREATE EXTENSION code?

I had assumed that that was how he was doing it ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:53 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:

 Actually, I think the best thing for default_statistics_target might
 be to scale the target based on the number of rows in the table, e.g.
 given N rows:

 10 + (N / 1000), if N  40,000
 46 + (N / 1), if 50,000  N  3,540,000
 400, if N  3,540,000

 Consider a table with 2,000 rows.  With default_statistics_target =
 100, we can store up to 100 MCVs; and we break the remaining ~1900
 values up into 100 buckets with 19 values/bucket.

 Maybe what should be done about this is to have separate sizes for the
 MCV list and the histogram, where the MCV list is automatically sized
 during ANALYZE.

I thought about that, but I'm not sure there's any particular
advantage.  Automatically scaling the histogram seems just as useful
as automatically scaling the MCV list - both things will tend to
reduce the estimation error.   For a table with 2,000,000 rows,
automatically setting the statistics target from 100 to the value that
would be computed by the above formula, which happens to be 246, will
help the 101th-246th most common values, because they will now be
MCVs.   It will also help all the remaining values, both because
you've pulled 146 fairly common values out of the histogram buckets
and also because each bucket now contains ~8130 values rather than
~20,000.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Tatsuo Ishii
 This returns 'false' if you're in hot standby mode running against an
 archive. That seems wrong, I don't think the walreceiver state should
 play any role in this.

Apart this, I wonder why walsender/walreceiver do not transfer archive
logs as well.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-20 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Tue, Oct 19, 2010 at 6:28 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 One thing that would work, but I really don't think I like it, is
 that a request for a snapshot for such a transaction would not
 only block until it could get a clean snapshot (no overlapping
 serializable non-read-only transactions which overlap
 serializable transactions which wrote data and then committed in
 time to be visible to the snapshot being acquired), but it would
 *also* block *other* serializable transactions, if they were
 non-read-only, on an attempt to acquire a snapshot.
 
 This seems pretty close to guaranteeing serializability by running
 transactions one at a time (i.e. I don't think it's likely to be
 acceptable from a performance standpoint).
 
It absolutely makes no sense except for long-running read-only
transactions, and would only be used when explicitly requested; and
like I said, I really don't think I like it even on that basis --
just putting it out there as the only alternative I've found so far
to either tolerating possible serialization anomalies in pg_dump
output (albeit only when compared to the state the database reached
after the dump's snapshot) or waiting indefinitely for a clean
snapshot to become available.
 
FWIW from a brainstorming perspective, while waiting for problem
transactions to clear so we could get a clean snapshot for the dump
I think it would work even better to block the *commit* of
serializable transactions which *had done* writes than to block
snapshot acquisition for serializable transactions which were not
read-only.  Still pretty icky, though.  I am loathe to compromise
the no new blocking promise of SSI.
 
[thinks]
 
Actually, maybe we can reduce the probability of needing to retry
at each iteration of the non-blocking alternative by checking the
conflict information for the problem transactions after they commit.
Any transaction which didn't *actually* generate a read-write
conflict out to a transaction visible to the dump's candidate
snapshot could not cause an anomaly.  If none of the problem
transactions actually generates a rw-conflict we can use the
candidate snapshot.  Adding that logic to the non-blocking
alternative might actually work pretty well.
 
There might be some workloads where conflicts would be repeatedly
generated, but there would be a lot where they wouldn't.  If we add
a switch to pg_dump to allow users to choose, I think this algorithm
works.  It never affects a transaction unless it has explicitly
requested SERIALIZABLE READ ONLY DEFERRABLE, and the only impact is
that startup may be deferred until a snapshot can be acquired which
ensures serializable behavior without worrying about SIRead locks.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I don't think that no changes to the makefiles is a requirement,
 or even a wish-list item, for this.  I think it's perfectly reasonable
 for the makefile to have to specify the module name; far better that
 than that we get the name by some magic or other.

It seemed easy to get a reasonable approach requiring very few edits in
contribs so I favoured that. Now, it's still entirely possible to hand
adjust. Determining the extension name automatically from DATA_built or
DATA is only done where EXTENSION has not been provided, and guessing
the CONTROL file name from the EXTENSION name only occurs when CONTROL
has not been provided.

Of course if those changes (inlined there after) are seen as a bad idea,
then I will change all contrib Makefiles to add EXTENSION, EXTVERSION
(which always is MAJORVERSION here) and CONTROL (which almost always is
EXTENSION.control).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

# create extension support
ifndef CONTROL
ifndef EXTENSION
ifdef DATA_built
EXTENSION = $(basename $(notdir $(firstword $(DATA_built
else ifdef DATA
EXTENSION = $(basename $(notdir $(firstword $(DATA
endif # DATA_built
endif # EXTENSION
ifndef EXTVERSION
EXTVERSION = $(MAJORVERSION)
endif
ifdef EXTENSION
CONTROL = $(EXTENSION).control
endif # EXTENSION
endif # CONTROL

control:
# create .control to keep track that we created the control file(s)
@for file in $(CONTROL); do \
  test -f `basename $$file .control`.sql -a ! -f $$file  touch 
.control || true ; \
  if [ -f .control ]; then \
if [ -n $(EXTENSION) ]; then \
(echo name = '$(EXTENSION)'; echo version = 
'$(EXTVERSION)')  $$file ; \
else \
(echo name = '`basename $$file .control`'; echo version = 
'$(EXTVERSION)')  $$file ; \
fi ; \
if [ -n $(EXTCOMMENT) ]; then echo comment = '$(EXTCOMMENT)'  
$$file ; fi ; \
  fi ; \
done

install: all installdirs control
ifneq (,$(DATA)$(DATA_built)$(CONTROL))
@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) $(CONTROL); 
do \
  echo $(INSTALL_DATA) $$file 
'$(DESTDIR)$(datadir)/$(datamoduledir)'; \
  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
done
endif # DATA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 18:06, Tatsuo Ishii wrote:

This returns 'false' if you're in hot standby mode running against an
archive. That seems wrong, I don't think the walreceiver state should
play any role in this.


Apart this, I wonder why walsender/walreceiver do not transfer archive
logs as well.


What do you mean?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 and use the equivalent of SET LOCAL in the CREATE EXTENSION code?

 I had assumed that that was how he was doing it ...

I'm currently doing:
SetConfigOption(client_min_messages, warning, PGC_SUSET, 
PGC_S_SESSION);

And then manually reverting to what was there before the command:
SetConfigOption(client_min_messages, old_cmsgs, PGC_SUSET, 
PGC_S_SESSION);

The thing is that CREATE EXTENSION can be part of a transaction, so even
SET LOCAL ain't going to work here, we need to reset before continuing
the transaction. I don't know that SET LOCAL is RESET after a savepoint,
so we would still need to care about that by hand, right?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 I don't think that no changes to the makefiles is a requirement,
 or even a wish-list item, for this.  I think it's perfectly reasonable
 for the makefile to have to specify the module name; far better that
 than that we get the name by some magic or other.

 It seemed easy to get a reasonable approach requiring very few edits in
 contribs so I favoured that. Now, it's still entirely possible to hand
 adjust. Determining the extension name automatically from DATA_built or
 DATA is only done where EXTENSION has not been provided,

That is simply a horrid idea.  Just make it specify EXTENSION.

 and guessing
 the CONTROL file name from the EXTENSION name only occurs when CONTROL
 has not been provided.

Here, on the other hand, I'm wondering why have two variables at all.
Is there any sane use-case for the control file to not be named the same
as the extension?  It seems like that would accomplish little except to
sow confusion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 That is simply a horrid idea.  Just make it specify EXTENSION.

Black magic it is, will remove in v7.

 Is there any sane use-case for the control file to not be named the same
 as the extension?  It seems like that would accomplish little except to
 sow confusion.

The goal of the 3 variables EXTENSION, EXTVERSION, EXTCOMMENT is to
prepare the control file with 3 lines formatted variable = 'value'. If
you specify CONTROL instead, that should be the file name you're
providing directly.

It then grew up into being a directive to produce the right file set for
spi, but the simplest thing would be to hand prepare the files there. If
you agree with that, that's what I'll do in v7.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Tue, Oct 19, 2010 at 4:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think they're more alike than you think.  If we had the infrastructure
 to do local temp tables this way, it'd be pretty easy to use that to
 instantiate per-backend copies of global temp tables.  (The global
 entities would be templates, not actual tables.)

 Sure, but I think the idea was that you should be able to create
 temporary tables from whole cloth on the slave. Since they're local to
 the backend they never have to be stored on disk so logically from the
 user's point of view it seems like it should be possible.

The way I'm envisioning this working is that you instantiate temporary
child tables of all the system catalogs that are needed to describe
tables.  Being system catalogs, their schemas never change, so you don't
have a problem there.  Then you use these children to store the catalog
entries describing user temp tables.  Whether those temp tables are
instantiations of spec-style global temp tables, or our current flavor
of local temp tables, won't matter.

I think it's pointless to speculate about whether we might have divvied
up the meta-information about tables differently if we'd foreseen
wanting to do this.  It is what it is, and there is *way* too much code
depending on it, both inside the backend and in clients.  Any
reimplementation of temp tables will still have to expose largely the
same catalog information that exists for tables now.  We can probably
get away with marginal changes like redefining relfilenode, but we can't
avoid providing catalog entries that describe the schema and statistics
of a temp table.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Marti Raudsepp
Hi pgsql-hackers,

Currently contrib/pgcrypto/pgp-pubenc.c contains code like:

uint8 algo = pk-algo;
if (pk == NULL)
...

However, if pk was NULL, then the if() condition would never be
reached because the pk-algo dereference would segfault.

This patch moves the dereference to below the condition which was the
intended behavior.

Regards,
Marti
From a2500cda9e0e82883854a412ea12942e174e3dd2 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Wed, 20 Oct 2010 18:32:17 +0300
Subject: [PATCH] pgcrypto: Test for NULL before dereferencing pointer

If pk is NULL, the backend would segfault when accessing -algo and the
following NULL check was never reached.

This problem was found by Coccinelle (null_ref.cocci from coccicheck)
---
 contrib/pgcrypto/pgp-pubenc.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/contrib/pgcrypto/pgp-pubenc.c b/contrib/pgcrypto/pgp-pubenc.c
index 4b4d1bf..943d2e4 100644
--- a/contrib/pgcrypto/pgp-pubenc.c
+++ b/contrib/pgcrypto/pgp-pubenc.c
@@ -199,7 +199,7 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst)
 	PGP_PubKey *pk = ctx-pub_key;
 	uint8		ver = 3;
 	PushFilter *pkt = NULL;
-	uint8		algo = pk-algo;
+	uint8		algo;
 
 	if (pk == NULL)
 	{
@@ -207,6 +207,8 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst)
 		return PXE_BUG;
 	}
 
+	algo = pk-algo;
+
 	/*
 	 * now write packet
 	 */
-- 
1.7.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Aidan Van Dyk
On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen R. van den Berg s...@cuci.nl writes:
 It's just that matching table and file, and subsequently figuring out
 some missing columns which may have been added/removed later,
 can be rather timeconsuming and could be made a lot easier (not necessarily
 perfect) if that information would have been present in the first page of
 a file.

 So you've already moved the goalposts from what was claimed in your
 prior message.  If the data is not maintained (with 100% reliability)
 during ALTER TABLE, how are you going to do something like figure out
 missing columns?

 I can see the potential usefulness of a self-documenting table storage
 format, but this proposal isn't that; it's just an unreliable kluge.

If we're looking to have any sort of out of catalog documentation of
table storage format, what about just having a new relation fork that
just appends each and every change made to the table formats,
including ones rolled back, etc.  Make this relation fork append only,
and dump a completely new set of metadata to it each and every ALTER
TABLE.  This fork would never need to be read by PG, so a relation
fork might even be too much.  All you really need is a file you can
tie to a relation, and blindly append data to on create/alter
statements.

Sure, it will have more information than *needed*, but I can't see it
ever growing too big, and people doing forensics rarely complain about
having *too much* information available.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-20 Thread Robert Treat
On Tue, Oct 19, 2010 at 11:16 AM, Greg Smith g...@2ndquadrant.com wrote:

 Josh Berkus wrote:

 It is critical that we make replication easier to set up, administrate and
 monitor than it currently is.  In my conversations with people, this is more
 important to our users and the adoption of PostgreSQL than synchronous
 replication is.


snip


 I find this launch into a new round of bike-shedding a bit distracting.  If
 you want this to be easier to use, which it's obvious to any observer it
 should be because what's delivered in 9.0 is way too complicated, please
 work on finding development resources to assign to that problem.  Because
 that's the bottleneck on simplifying things, not ideas about what to do.  I
 would recommend finding or assigning a developer to work on integrating base
 backup in to the streaming protocol as the biggest single thing that would
 improve the built-in replication.  All of the rest of the trivia about what
 knobs to set and such are tiny details that make for only a minor
 improvement until that's taken care of.


Yeah, I'm sure we all think it should be easier, but figuring out what that
means is certainly a moving target. The idea of being able to create a base
backup automagically sounds good, but comparatively it's
not significantly more difficult than what many other systems make you do,
and actually if done incorrectly could be something rather limiting. On the
whole the customers we are talking with are far more concerned about things
like managing failover scenarios when you have multiple slaves, and it's the
lack of capabilities around those kinds of things that hurt postgres
adoption much more than it being hard to set up.


Robert Treat
play: http://www.xzilla.net
work: http://omniti.com/is/hiring


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Tom Lane
Aidan Van Dyk ai...@highrise.ca writes:
 On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I can see the potential usefulness of a self-documenting table storage
 format, but this proposal isn't that; it's just an unreliable kluge.

 If we're looking to have any sort of out of catalog documentation of
 table storage format, what about just having a new relation fork that
 just appends each and every change made to the table formats,
 including ones rolled back, etc.

I thought of that too, but I'm not sure if it's going to help enough.
The trouble is that the information is only tied to the table itself
via file names.  In a fsck-recovery scenario, you may not have the
correct file names.  (One of the multiple problems with Stephen's
proposal is that the metadata would only be physically tied to the
first segment of a large table, and unless you can trust the file
names you won't be able to match it to the other segments.)

[ thinks for a bit... ]  Perhaps we could stick some sort of unique
ID into tables, which could be correlated to the same unique ID
appearing in a metadata fork.

[ thinks some more... ]  Of course, at this point you have to start
asking why the metadata fork should be thought to be any more reliable
than the system catalogs.

 Make this relation fork append only,
 and dump a completely new set of metadata to it each and every ALTER
 TABLE.

You can bet that there'll be somebody out there who whines because their
app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
grows forever.  I think we'd have to be a bit smarter than this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Florian Weimer
* Terry Laurenzo:

 Agreed.  BSON was born out of implementations that either lacked
 arbitrary precision numbers or had a strong affinity to an
 int/floating point way of thinking about numbers.  I believe that if
 BSON had an arbitrary precision number type, it would be a proper
 superset of JSON.

But JSON has only double-precision numbers!?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Andrew Dunstan



On 10/20/2010 01:15 PM, Florian Weimer wrote:

* Terry Laurenzo:


Agreed.  BSON was born out of implementations that either lacked
arbitrary precision numbers or had a strong affinity to an
int/floating point way of thinking about numbers.  I believe that if
BSON had an arbitrary precision number type, it would be a proper
superset of JSON.

But JSON has only double-precision numbers!?


AFAICT the JSON spec says nothing at all about the precision of numbers. 
It just provides a syntax for them. We should not confuse what can be 
allowed in JSON with what can be handled by some consumers of JSON such 
as ECMAScript.


However, since we would quite reasonably require that any JSON 
implementation be able to handle arbitrary precision numbers, that 
apparently rules out BSON as a storage engine for it, since BSON can not 
handle such things.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Aidan Van Dyk
On Wed, Oct 20, 2010 at 1:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 [ thinks for a bit... ]  Perhaps we could stick some sort of unique
 ID into tables, which could be correlated to the same unique ID
 appearing in a metadata fork.

It would be awfully nice if the table name was in the first few bytes
of the segments ;-)

 [ thinks some more... ]  Of course, at this point you have to start
 asking why the metadata fork should be thought to be any more reliable
 than the system catalogs.

It's not.  But it's about data duplication.  If you've lost some
files, and others are found in lost+found, etc.  Or maybe you lost
only the main tablespace, and you have your big-honking-files from
another table-space around, etc.

The more copies of the data around, the more chance you have of being
able to get something usable from of it.

But we already have WAL archive as an external safe backup copy of *everything*.

Maybe the cost of those extra forks/duplicates is too much.  Maybe it
would be cheaper to try and parse the WAL archive, and just
specifically rebuild the system catalogs for the couple of times
people actually need this type of data?

Or maybe a query/function that ultra paranoid types can run daily,
etc, which would show the system catalog information about table
storage format that the could save safely instead of munging GB of WAL
logs when they want it...

 You can bet that there'll be somebody out there who whines because their
 app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
 grows forever.  I think we'd have to be a bit smarter than this.

Well, for bad habits, we have an equally bad solution ;-)
BEGIN;
ALTER TABLE bad_design RENAME TO bad_design_too_old;
CREATE TABLE bad_design LIKE bad_design_too_old INCLUDING DEFAULTS
CONSTRAINTS INDEXES;
INSERT INTO bad_design SELECT * FROM bad_design_too_old;
DROP TABLE bad_design_too_old;
COMMIT;

It's all about how much extra stuff do we want around to make
forensic/reconstruction type work easier when they can't go to the
documented, tried, tested, normal restore from backup/WAL.

None?  Or as much as possible?  And what are the tradeoffs.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade performance with 150k tables

2010-10-20 Thread Bruce Momjian
I received a private email report yesterday from someone using
pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to
upgrade a database with 150k tables.  Yes, that is a lot of tables, but
pg_upgrade should be able to do better than that.

I have modified pg_upgrade in git master to cache scandir() and reduce
array lookups and the time is down to 38 minutes.  (He prototyped a hash
implementation that was 30 minutes but it was too much code for my
taste.)

I don't think this is reasonable to backpatch.   If anyone else sees
cases for pg_upgrade improvement, please let me know.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Bruce Momjian
Bruce Momjian wrote:
 I received a private email report yesterday from someone using
 pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to
 upgrade a database with 150k tables.  Yes, that is a lot of tables, but
 pg_upgrade should be able to do better than that.
 
 I have modified pg_upgrade in git master to cache scandir() and reduce
 array lookups and the time is down to 38 minutes.  (He prototyped a hash
 implementation that was 30 minutes but it was too much code for my
 taste.)
 
 I don't think this is reasonable to backpatch.   If anyone else sees
 cases for pg_upgrade improvement, please let me know.

One more question --- should I be sending pg_upgrade patches to the list
for approval?  The restructuring patch was large and didn't seem
necessary to post, and the speedups were tested by the bug reporter, so
I figured those were OK to apply.

Oh, and do we want to move pg_upgrade into /bin for 9.1?  There was
discussion about that six months ago.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread David Fetter
On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
 On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan and...@dunslane.net wrote:
  Well a bit more testing shows some benefit. I've sorted out a few kinks, so
  this seems to work. In particular, with the above tables, the version
  imported from 9.0 can create have an index created in about the same time as
  on the fresh table (identical data, but all even numbered Oids).
 
  Of course, with lots of odd numbered Oids, if a label gets added the
  imported version will degrade in performance much more quickly.
 
 I'm quite impressed by the amount of time and thought being put into
 optimizing this.  I didn't realize people cared so much about enum
 performance; but it's good that they do.
 
 I hope to see more such efforts in other parts of the system.

Which parts of the system, in particular, do you have in mind?  Other
people from EDB have mentioned that slimming down the on-disk
representation was one such target.  What other ones would you see as
needing such attention?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 7:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I believe that for the equivalent Solaris option, we just automatically
 enable it when available.  So there'd be no need for user documentation.
 However, I definitely *would* like to see some benchmarks proving that
 the change actually does something useful.  I've always harbored the
 suspicion that this is just a knob to satisfy people who need knobs to
 frob.

Well saving a few megabytes of kernel space memory isn't a bad thing.
But I think the major effect is on forking new processes. Having to
copy that page map is a major cost when you're talking about very
large memory footprints. While machine memory has gotten larger the 4k
page size hasn't. I don't think it's a big cost once all the processes
have been forked if you're reusing them beyond perhaps slightly more
efficient cache usage.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote:
 One more question --- should I be sending pg_upgrade patches to the list
 for approval?  The restructuring patch was large and didn't seem
 necessary to post, and the speedups were tested by the bug reporter, so
 I figured those were OK to apply.

I think it would be good to do that.  At least give people a chance to
comment, if they care.

 Oh, and do we want to move pg_upgrade into /bin for 9.1?  There was
 discussion about that six months ago.

I would be inclined to leave it in contrib for a few more releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote:
 I don't think it's a big cost once all the processes
 have been forked if you're reusing them beyond perhaps slightly more
 efficient cache usage.

Hm, this site claims to get a 13% win just from the reduced tlb misses
using a preload hack with Pg 8.2. That would be pretty substantial.

http://oss.linbit.com/hugetlb/


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Magnus Hagander
On Wed, Oct 20, 2010 at 21:28, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote:
 One more question --- should I be sending pg_upgrade patches to the list
 for approval?  The restructuring patch was large and didn't seem
 necessary to post, and the speedups were tested by the bug reporter, so
 I figured those were OK to apply.

 I think it would be good to do that.  At least give people a chance to
 comment, if they care.

+1. It's also a good way for people to get a bit more involved in the code.



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Bruce Momjian
Magnus Hagander wrote:
 On Wed, Oct 20, 2010 at 21:28, Robert Haas robertmh...@gmail.com wrote:
  On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian br...@momjian.us wrote:
  One more question --- should I be sending pg_upgrade patches to the list
  for approval? ?The restructuring patch was large and didn't seem
  necessary to post, and the speedups were tested by the bug reporter, so
  I figured those were OK to apply.
 
  I think it would be good to do that. ?At least give people a chance to
  comment, if they care.
 
 +1. It's also a good way for people to get a bit more involved in the code.

OK.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun oct 04 10:31:26 -0400 2010:

 In the particular case at hand here, I rather wonder why SQL functions
 are depending on postgres.c at all.  It might be better to just
 duplicate a bit of code to make them independent.  pg_parse_and_rewrite
 would then be dead code and could be deleted.

This idea doesn't work, unless pushed a lot further.  Attached are two
separate patches, extracted from the last patch version posted by Marko
(git commit --interactive helped here).  The first one does what you
suggest above: remove pg_parse_and_rewrite and inline it into the
callers.  The other patch is the remainder.

Read on for the details of the first patch.  As for the second patch,
which is Marko's original intention anyway, I don't see that it needs to
be delayed because of the first one.  So while I haven't reviewed it, I
think it should be considered separately.


The problem with this patch (0001) is that the inlined versions of the
code that used to be pg_parse_and_rewrite are still depending on
functions in postgres.c.  These are pg_parse_query and
pg_analyze_and_rewrite.  pg_parse_query is just a layer on top of
raw_parser.  pg_analyze_and_rewrite is a layer on top of parse_analyze
plus pg_rewrite_query (also on postgres.c).

Now, the only difference between those functions and the ones that
underlie them is that they have the bunch of tracing macros and
log_parser_stats reporting.  So one solution would be to have SQL
functions (pg_proc.c and executor/functions.c) call the raw parser.c and
analyze.c functions directly, without invoking the tracing/logging code.  

The other idea would be to move the whole of those functions out of
postgres.c and into their own modules, i.e. move pg_parse_query into
parser.c and pg_analyze_and_rewrite and pg_rewrite_query into
rewriteHandler.c.  (This actually requires a bit more effort because we
should also move pg_analyze_and_rewrite_params out of postgres.c,
following pg_analyze_and_rewrite).

Note that pg_analyze_and_rewrite and its params siblings are being
called from copy.c, spi.c, optimizer/util/clauses.c, and plancache.c.
So it does make certain sense to move them out of postgres.c, if we want
to think of postgres.c as a module only concerned with client
interaction.

The only quarrel I have with this code shuffling is that
pg_rewrite_query is being called from exec_parse_message.  Since it's
now a static function, it would have to stop being static so that it can
be called from both places (postgres.c and rewriteHandler.c)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


0001-Separate-SQL-function-processing-from-postgres.c.patch
Description: Binary data


0002-The-remainder-of-Marko-s-patch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 18:44, Marti Raudsepp wrote:

Hi pgsql-hackers,

Currently contrib/pgcrypto/pgp-pubenc.c contains code like:

uint8 algo = pk-algo;
if (pk == NULL)
...

However, if pk was NULL, then the if() condition would never be
reached because the pk-algo dereference would segfault.

This patch moves the dereference to below the condition which was the
intended behavior.


Thanks, applied. Did coccicheck find anything else interesting?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Alvaro Herrera
Excerpts from Greg Stark's message of mié oct 20 16:28:25 -0300 2010:
 On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote:
  I don't think it's a big cost once all the processes
  have been forked if you're reusing them beyond perhaps slightly more
  efficient cache usage.
 
 Hm, this site claims to get a 13% win just from the reduced tlb misses
 using a preload hack with Pg 8.2. That would be pretty substantial.
 
 http://oss.linbit.com/hugetlb/

Wow, is there no other way to get the huge page size other than opening
and reading /proc/meminfo?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote:
 Exactly.  It doesn't take many 3-7% slowdowns to add up to being 50%
 or 100% slower, and that sucks.  In fact, I'm still not convinced that
 we were wise to boost default_statistics_target as much as we did.  I
 argued for a smaller boost at the time.

Well we don't want to let ourselves be paralyzed by FUD so it was
important to identify specific concerns and then tackle those
concerns. Once we identified the worst-case planning cases we profiled
them and found that the inflection point of the curve was fairly
clearly above 100 but that there were cases where values below 1,000
caused problems. So I'm pretty happy with the evidence-based approach.

The problem with being overly conservative is that it gives free rein
to the folks who were shouting that we should just set the default to
1,000. They weren't wrong that the 10 was overly conservative and in
the absence of evidence 1,000 was just as reasonable.


 Actually, I think the best thing for default_statistics_target might
 be to scale the target based on the number of rows in the table, e.g.
 given N rows:

The number of buckets needed isn't related to the population size --
it's related to how wide the ranges you'll be estimating selectivity
for are. That is, with our current code, if you're selecting tuples
within a range a..b and that range happens to be the same size as the
bucket size then you'll get an accurate estimate with a fixed 95th
percentile precision regardless of the size of the table (to an
approximation).

I'm not sure how our selectivity works at all for the degenerate case
of selecting for specific values. I don't understand how histograms
are useful for such estimates at all. I think the MCV lists are
basically an attempt to overcome this problem and as you point out I'm
not sure the statistics target is really the right thign to control
them -- but since I don't think there's any real statistics behind
them I'm not sure there's any right way to control them.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade cleanup

2010-10-20 Thread Bruce Momjian
I have applied the attached patch to clean up pg_upgrade cache lookup
code and remove useless NULL pointer tests.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 257f5a8..67528b0 100644
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** gen_db_file_maps(DbInfo *old_db, DbInfo 
*** 74,81 
  		num_maps++;
  
  		/*
! 		 * so much for the mapping of this relation. Now we need a mapping for
! 		 * its corresponding toast relation if any.
  		 */
  		if (oldrel-toastrelid  0)
  		{
--- 74,81 
  		num_maps++;
  
  		/*
! 		 * So much for mapping this relation;  now we need a mapping
! 		 * for its corresponding toast relation, if any.
  		 */
  		if (oldrel-toastrelid  0)
  		{
*** gen_db_file_maps(DbInfo *old_db, DbInfo 
*** 117,122 
--- 117,123 
  	 newrel-reloid);
  
  			/* look them up in their respective arrays */
+ 			/* we lose our cache location here */
  			old_toast = relarr_lookup_rel(old_db-rel_arr,
  		  pg_toast, old_name, CLUSTER_OLD);
  			new_toast = relarr_lookup_rel(new_db-rel_arr,
*** get_rel_infos(const DbInfo *dbinfo, RelI
*** 385,391 
  
  	relarr-rels = relinfos;
  	relarr-nrels = num_rels;
! 	relarr-cache_name_rel = 0;
  }
  
  
--- 386,392 
  
  	relarr-rels = relinfos;
  	relarr-nrels = num_rels;
! 	relarr-last_relname_lookup = 0;
  }
  
  
*** dbarr_lookup_db(DbInfoArr *db_arr, const
*** 399,407 
  {
  	int			dbnum;
  
- 	if (!db_arr || !db_name)
- 		return NULL;
- 
  	for (dbnum = 0; dbnum  db_arr-ndbs; dbnum++)
  	{
  		if (strcmp(db_arr-dbs[dbnum].db_name, db_name) == 0)
--- 400,405 
*** relarr_lookup_rel(RelInfoArr *rel_arr, c
*** 424,439 
  {
  	int			relnum;
  
- 	if (!rel_arr || !relname)
- 		return NULL;
- 
  	/* Test next lookup first, for speed */
! 	if (rel_arr-cache_name_rel + 1  rel_arr-nrels 
! 		strcmp(rel_arr-rels[rel_arr-cache_name_rel + 1].nspname, nspname) == 0 
! 		strcmp(rel_arr-rels[rel_arr-cache_name_rel + 1].relname, relname) == 0)
  	{
! 		rel_arr-cache_name_rel++;
! 		return rel_arr-rels[rel_arr-cache_name_rel];
  	}
  
  	for (relnum = 0; relnum  rel_arr-nrels; relnum++)
--- 422,434 
  {
  	int			relnum;
  
  	/* Test next lookup first, for speed */
! 	if (rel_arr-last_relname_lookup + 1  rel_arr-nrels 
! 		strcmp(rel_arr-rels[rel_arr-last_relname_lookup + 1].nspname, nspname) == 0 
! 		strcmp(rel_arr-rels[rel_arr-last_relname_lookup + 1].relname, relname) == 0)
  	{
! 		rel_arr-last_relname_lookup++;
! 		return rel_arr-rels[rel_arr-last_relname_lookup];
  	}
  
  	for (relnum = 0; relnum  rel_arr-nrels; relnum++)
*** relarr_lookup_rel(RelInfoArr *rel_arr, c
*** 441,447 
  		if (strcmp(rel_arr-rels[relnum].nspname, nspname) == 0 
  			strcmp(rel_arr-rels[relnum].relname, relname) == 0)
  		{
! 			rel_arr-cache_name_rel = relnum;
  			return rel_arr-rels[relnum];
  		}
  	}
--- 436,442 
  		if (strcmp(rel_arr-rels[relnum].nspname, nspname) == 0 
  			strcmp(rel_arr-rels[relnum].relname, relname) == 0)
  		{
! 			rel_arr-last_relname_lookup = relnum;
  			return rel_arr-rels[relnum];
  		}
  	}
*** relarr_lookup_reloid(RelInfoArr *rel_arr
*** 464,472 
  {
  	int			relnum;
  
- 	if (!rel_arr || !oid)
- 		return NULL;
- 
  	for (relnum = 0; relnum  rel_arr-nrels; relnum++)
  	{
  		if (rel_arr-rels[relnum].reloid == oid)
--- 459,464 
*** relarr_free(RelInfoArr *rel_arr)
*** 483,489 
  {
  	pg_free(rel_arr-rels);
  	rel_arr-nrels = 0;
! 	rel_arr-cache_name_rel = 0;
  }
  
  
--- 475,481 
  {
  	pg_free(rel_arr-rels);
  	rel_arr-nrels = 0;
! 	rel_arr-last_relname_lookup = 0;
  }
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
index e40f58d..439c0a7 100644
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*** typedef struct
*** 78,84 
  {
  	RelInfo*rels;
  	int			nrels;
! 	int			cache_name_rel;	/* cache of last lookup location */
  } RelInfoArr;
  
  /*
--- 78,84 
  {
  	RelInfo*rels;
  	int			nrels;
! 	int			last_relname_lookup;	/* cache of last lookup location */
  } RelInfoArr;
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote:
 Exactly.  It doesn't take many 3-7% slowdowns to add up to being 50%
 or 100% slower, and that sucks.  In fact, I'm still not convinced that
 we were wise to boost default_statistics_target as much as we did.  I
 argued for a smaller boost at the time.

 Well we don't want to let ourselves be paralyzed by FUD so it was
 important to identify specific concerns and then tackle those
 concerns. Once we identified the worst-case planning cases we profiled
 them and found that the inflection point of the curve was fairly
 clearly above 100 but that there were cases where values below 1,000
 caused problems. So I'm pretty happy with the evidence-based approach.

The inflection point of the curve was certainly a good thing for us to
look at but the fact remains that we took a hit on a trivial
benchmark, and we can't afford to take too many of those.

 Actually, I think the best thing for default_statistics_target might
 be to scale the target based on the number of rows in the table, e.g.
 given N rows:

 The number of buckets needed isn't related to the population size --
 it's related to how wide the ranges you'll be estimating selectivity
 for are. That is, with our current code, if you're selecting tuples
 within a range a..b and that range happens to be the same size as the
 bucket size then you'll get an accurate estimate with a fixed 95th
 percentile precision regardless of the size of the table (to an
 approximation).

If you have a WHERE clause of the form WHERE x  some_constant, then
the effects vary depending on how that constant is chosen.  If it's
the median value, then as you say the statistics target doesn't matter
much at all; but that's not necessarily representative of real life.
For example, suppose x is a date and the constant is Monday of the
current week.  As the table grows, the present week's data becomes a
smaller and smaller fraction of the table data.  When it gets to be a
tiny fraction of the very last histogram bucket, the estimates start
to get progressively worse.  At some point you have to give up and
partition the table for other reasons anyway, but having to do it
because the statistics are off is inexcusable.  We've seen people hit
this precise issue on -performance a few times.

 I'm not sure how our selectivity works at all for the degenerate case
 of selecting for specific values. I don't understand how histograms
 are useful for such estimates at all. I think the MCV lists are
 basically an attempt to overcome this problem and as you point out I'm
 not sure the statistics target is really the right thign to control
 them -- but since I don't think there's any real statistics behind
 them I'm not sure there's any right way to control them.

If you have a WHERE clause of the form WHERE x = some_constant, then
you get a much better estimate if some_constant is an MCV.  If the
constant is not an MCV, however, you still get better estimates,
because the estimation code knows that no non-MCV can occur more
frequently than any MCV, so increasing the number of MCVs pushes those
estimates closer to reality.  It is especially bad when the frequency
falls off a cliff at a certain point in the distribution e.g. if
there are 243 values that occur much more frequently than any others,
a stats target of 250 will do much better than 225.  But even if
that's not an issue, it still helps.  The bottom line here is that I
can't remember any message, ever, on -performance, or any incident
within my personal experience, where it was necessary to increase the
statistics target beyond 50-100 on a table with 10K rows.  However,
there are certainly cases where we've recommended that for big tables,
which means there are also people out there who have a performance
problem on a big table but haven't asked for help and therefore
haven't gotten that advice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:47 PM, daveg da...@sonic.net wrote:
 On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
 On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote:
  I don't think it's a big cost once all the processes
  have been forked if you're reusing them beyond perhaps slightly more
  efficient cache usage.

 Hm, this site claims to get a 13% win just from the reduced tlb misses
 using a preload hack with Pg 8.2. That would be pretty substantial.

 http://oss.linbit.com/hugetlb/

 That was my motivation in trying a patch. TLB misses can be a substantial
 overhead. I'm not current on the state of play, but working at Sun's
 benchmark lab on a DB TPC-B benchmark something for the first generation
 of MP systems, something like 30% of all bus traffic was TLB misses. The
 next iteration of the hardward had a much larger TLB.

 I have a client with 512GB memory systems, currently with 128GB configured
 as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
 few dozed cpu TLB slots. I suspect there may be some contention.

 I'll benchmark of course.

Do you mean 128GB shared buffers, or shared buffers + OS cache?  I
think that the general wisdom is that performance tails off beyond
8-10GB of shared buffers anyway, so a performance improvement on 128GB
shared buffers might not mean much unless you can also show that 128GB
shared buffers actually performs better than some smaller amount.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's pointless to speculate about whether we might have divvied
 up the meta-information about tables differently if we'd foreseen
 wanting to do this.  It is what it is, and there is *way* too much code
 depending on it, both inside the backend and in clients.  Any
 reimplementation of temp tables will still have to expose largely the
 same catalog information that exists for tables now.  We can probably
 get away with marginal changes like redefining relfilenode, but we can't
 avoid providing catalog entries that describe the schema and statistics
 of a temp table.

I agree about the schema -- that's the whole point of the catalog tables.

I felt like the statistics were pretty marginal to begin with. There
may be a large number of places but there's no complex structure of
relationships to other tables or complex data structures going on
here. Surely they can all be coded to look up the relpages from
somewhere else just as easily?

But I'm not about to start working on this area so my judgement on how
much work that would be isn't very important here.

And your point that if we have a complete local copy of the entire
catalog schema then we can create temporary tables from whole cloth on
a read-only database
just as easily is attractive.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:16 PM, David Fetter da...@fetter.org wrote:
 On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
 On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan and...@dunslane.net wrote:
  Well a bit more testing shows some benefit. I've sorted out a few kinks, so
  this seems to work. In particular, with the above tables, the version
  imported from 9.0 can create have an index created in about the same time 
  as
  on the fresh table (identical data, but all even numbered Oids).
 
  Of course, with lots of odd numbered Oids, if a label gets added the
  imported version will degrade in performance much more quickly.

 I'm quite impressed by the amount of time and thought being put into
 optimizing this.  I didn't realize people cared so much about enum
 performance; but it's good that they do.

 I hope to see more such efforts in other parts of the system.

 Which parts of the system, in particular, do you have in mind?  Other
 people from EDB have mentioned that slimming down the on-disk
 representation was one such target.  What other ones would you see as
 needing such attention?

On-disk footprint.
WAL volume.
COPY speed.
Checkpoint I/O.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Marti Raudsepp
On Wed, Oct 20, 2010 at 22:34, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Did coccicheck find anything else interesting?

There's a file descriptor leak in psql/command.c function
process_file() -- on errors it just returns without closing the file.
But since it's quitting anyway, there's no practical impact. Should I
submit a patch for this as well?

Then there are a few more cases found by null_ref (same check as the
original patch). But on closer inspection, these are false positives,
because the variable is actually modified in between dereferencing and
the NULL check.

Then there's the 'badzero' check that finds a dozen cases where
pointers are compared to a literal 0, not a NULL. This is a only a
coding style check, as far as I can tell, so I thought it's not worth
it.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's pointless to speculate about whether we might have divvied
 up the meta-information about tables differently if we'd foreseen
 wanting to do this.  It is what it is, and there is *way* too much code
 depending on it, both inside the backend and in clients.  Any
 reimplementation of temp tables will still have to expose largely the
 same catalog information that exists for tables now.  We can probably
 get away with marginal changes like redefining relfilenode, but we can't
 avoid providing catalog entries that describe the schema and statistics
 of a temp table.

 I agree about the schema -- that's the whole point of the catalog tables.

 I felt like the statistics were pretty marginal to begin with.

I'm thinking more of pg_statistic than the stuff in pg_class --- I agree
that we could probably kluge some other approach for relpages and
reltuples, but that doesn't scale to the real statistics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 1:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas robertmh...@gmail.com wrote:

 Actually, I think the best thing for default_statistics_target might
 be to scale the target based on the number of rows in the table, e.g.
 given N rows:

 The number of buckets needed isn't related to the population size --
 it's related to how wide the ranges you'll be estimating selectivity
 for are.

 As the table grows, the present week's data becomes a
 smaller and smaller fraction of the table data.

That's an interesting point. I wonder if we can expose this in some
way that allows users to specify the statistics target in something
more meaningful for them that doesn't change as the ranges in the
table grow. Or even gather stats on the size of the ranges being
queried.


 If you have a WHERE clause of the form WHERE x = some_constant, then
 you get a much better estimate if some_constant is an MCV.  If the
 constant is not an MCV, however, you still get better estimates,
 because the estimation code knows that no non-MCV can occur more
 frequently than any MCV, so increasing the number of MCVs pushes those
 estimates closer to reality.  It is especially bad when the frequency
 falls off a cliff at a certain point in the distribution e.g. if
 there are 243 values that occur much more frequently than any others,
 a stats target of 250 will do much better than 225.

It sounds like what we really need here some way to characterize the
distribution of frequencies. Instead of just computing an upper bound
we should have a kind of histogram showing how many values occur
precisely once, how many occur twice, three times, etc. Or perhaps we
only need to know the most common frequency per bucket. Or, hm...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 There's a file descriptor leak in psql/command.c function
 process_file() -- on errors it just returns without closing the file.
 But since it's quitting anyway, there's no practical impact. Should I
 submit a patch for this as well?

Might as well.  It's the kind of thing that could turn into a real
bug given some rearrangement of the code.

 Then there's the 'badzero' check that finds a dozen cases where
 pointers are compared to a literal 0, not a NULL. This is a only a
 coding style check, as far as I can tell, so I thought it's not worth
 it.

I'd be in favor of fixing those too.  I have no particular problem with
either if (ptr) or if (ptr != NULL), but I think that if (ptr != 0)
gives the reader entirely the wrong impression about the datatype of ptr.
Just because C fails to distinguish doesn't make it good style.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread daveg
On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
 On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark gsst...@mit.edu wrote:
  I don't think it's a big cost once all the processes
  have been forked if you're reusing them beyond perhaps slightly more
  efficient cache usage.
 
 Hm, this site claims to get a 13% win just from the reduced tlb misses
 using a preload hack with Pg 8.2. That would be pretty substantial.
 
 http://oss.linbit.com/hugetlb/

That was my motivation in trying a patch. TLB misses can be a substantial
overhead. I'm not current on the state of play, but working at Sun's
benchmark lab on a DB TPC-B benchmark something for the first generation
of MP systems, something like 30% of all bus traffic was TLB misses. The
next iteration of the hardward had a much larger TLB.

I have a client with 512GB memory systems, currently with 128GB configured
as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
few dozed cpu TLB slots. I suspect there may be some contention.

I'll benchmark of course.

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Tom Lane wrote:
Aidan Van Dyk ai...@highrise.ca writes:
 If we're looking to have any sort of out of catalog documentation of
 table storage format, what about just having a new relation fork that
 just appends each and every change made to the table formats,
 including ones rolled back, etc.

I presume that means that all tables changes are appended to a single
central file per database?  That would be a bad idea, because in the
typical problem scenario, losing this new catalogue, basically creates
the same problem.  It would be preferable to keep the information
tied in with the actual table(file) it concerns.

[ thinks for a bit... ]  Perhaps we could stick some sort of unique
ID into tables, which could be correlated to the same unique ID
appearing in a metadata fork.

Ideal would be: put the table-oid inside the header of each page
(either in the official header, or in the special area).
This way even lost blocks can be correlated to the same table.
I'd still vote for the latest known table definition in the first
page.  It's by no means perfect, but it will help 99% of all
salvage attempts by an order of magnitude.

[ thinks some more... ]  Of course, at this point you have to start
asking why the metadata fork should be thought to be any more reliable
than the system catalogs.

Quite.  Which is why I wanted the best-effort latest version of the
table description in the first page of the tablefile instead.

 Make this relation fork append only,
 and dump a completely new set of metadata to it each and every ALTER
 TABLE.

You can bet that there'll be somebody out there who whines because their
app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
grows forever.  I think we'd have to be a bit smarter than this.

Which means we come full circle and have to conclude that doing anything
comprehensive is too invasive for normal operations; best-effort is
all a forensics operation wants or can hope for.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP

2010-10-20 Thread Marti Raudsepp
Hi list,

Here's the second patch from my coccicheck run. Originally it flagged
the fact that the opened file in psql's process_file() wasn't being
closed in the ON_ERROR_STOP path, but there seem to be two more
unintended behaviors here.

(1) In the error path, the value of pset.inputfile wasn't being
properly restored. The caller does free(fname) on line 786, so
psql.inputfile would point to unallocated memory.

(2) The more significant issue is that stdin *was closed in the
success return path. So when you run a script with two \i - lines,
the first \q would close stdin and the next one would fail with:
psql:-:0: could not read from input file: Bad file descriptor

In fact, this means that stdin was being accessed after being
fclose()d, which is undefined behavior per ANSI C, though it seems to
work just fine on Linux.

The new behavior requires the same amount of \qs as the number of
executions of '-' because stdin is never closed.

Regards,
Marti
From 43b7595fdcc69cc9db0d066a53f53c5e71c965aa Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Wed, 20 Oct 2010 23:44:36 +0300
Subject: [PATCH] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP

Changes three things:
1. Don't close stdin when returning success if input was stdin
2. Don't leak file descriptor when exiting due to ON_ERROR_STOP
3. pset.inputfile wasn't being restored with ON_ERROR_STOP, yet the
memory was freed by the caller

(1) changes the behavior of \q on stdin. Previously multiple
inclusions of stdin would be terminated with a single quit, now a
separate quit is needed for each invocation. Previous behavior also
accessed stdin after it was fclose()d, which is undefined behavior per
ANSI C.

(2) and (3) should have no practical impact, because the process would
quit immediately afterwards anyway.
---
 src/bin/psql/command.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index e6d703a..45a145c 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1987,7 +1987,10 @@ process_file(char *filename, bool single_txn)
 		if ((res = PSQLexec(BEGIN, false)) == NULL)
 		{
 			if (pset.on_error_stop)
-return EXIT_USER;
+			{
+result = EXIT_USER;
+goto error;
+			}
 		}
 		else
 			PQclear(res);
@@ -2000,13 +2003,19 @@ process_file(char *filename, bool single_txn)
 		if ((res = PSQLexec(COMMIT, false)) == NULL)
 		{
 			if (pset.on_error_stop)
-return EXIT_USER;
+			{
+result = EXIT_USER;
+goto error;
+			}
 		}
 		else
 			PQclear(res);
 	}
 
-	fclose(fd);
+error:
+	if(fd != stdin)
+		fclose(fd);
+
 	pset.inputfile = oldfilename;
 	return result;
 }
-- 
1.7.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 That is simply a horrid idea.  Just make it specify EXTENSION.

And VERSION too, finally.

So any extension 


 and guessing
 the CONTROL file name from the EXTENSION name only occurs when CONTROL
 has not been provided.

 Here, on the other hand, I'm wondering why have two variables at all.
 Is there any sane use-case for the control file to not be named the same
 as the extension?  It seems like that would accomplish little except to
 sow confusion.

   regards, tom lane

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of mié oct 20 16:33:12 -0300 2010:

 The only quarrel I have with this code shuffling is that
 pg_rewrite_query is being called from exec_parse_message.  Since it's
 now a static function, it would have to stop being static so that it can
 be called from both places (postgres.c and rewriteHandler.c)

Actually, I just noticed that the remainder patch uses pg_plan_query,
which is also in postgres.c.  This function along with its sibling
pg_plan_queries is also called from other internal places, like the
PREPARE code, SPI and the plan cache.

It strikes me that if we really want to restructure things to divide
client interaction from other query-processing routines, we should
create another file, say src/backend/tcop/queries.c; this would have
stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the
other things that the patch was evicting from postgres.c (plus, I
imagine, a bunch of other stuff that I may be missing).  In fact, if we
go down this route, there would be no point in removing
pg_parse_and_rewrite; we would just move it to this new module.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Windows-specific change: Enable crash dumps (like core files)

2010-10-20 Thread Bruce Momjian
Craig Ringer wrote:
 On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
 
 
  On 10/04/2010 07:50 AM, Craig Ringer wrote:
 
  - If the crash dump handler is enabled by setting the GUC,
  all backends register the handler during startup or (if it
  proves practical) when the GUC is changed.
 
  - When the handler is triggered by the OS trapping an unhandled
  exception, it loads dbghelp.dll, writes the appropriate dump
  format to the hardcoded path, and terminates the process.
 
 
 
  What is the performance impact of doing that? Specifically, how does it
  affect backend startup time?
 
 Without testing I can't say for sure.
 
 My expection based on how the handler works would be: near-zero, about 
 as expensive as registering a signal handler, plus the cost of reading 
 the GUC and doing one string compare to test the value. When disabled, 
 it's just the GUC test.
 
 Is there a better mechanism to use for features that're going to be 
 unused the great majority of the time? Perhaps something that does 
 require a server restart, but doesn't have any cost at all when disabled?

We definately had trouble producing crash dumps caused by the 128 return
code problem, so I can see great value in this, if it can be done
simply.  I wonder if the 128-exit would have produced a crash file.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

 Maybe what should be done about this is to have separate sizes for the
 MCV list and the histogram, where the MCV list is automatically sized
 during ANALYZE.

It's been suggested multiple times that we should base our sample size
on a % of the table, or at least offer that as an option.  I've pointed
out (with math, which Simon wrote a prototype for) that doing
block-based sampling instead of random-row sampling would allow us to
collect, say, 2% of a very large table without more I/O than we're doing
now.

Nathan Boley has also shown that we could get tremendously better
estimates without additional sampling if our statistics collector
recognized common patterns such as normal, linear and geometric
distributions.  Right now our whole stats system assumes a completely
random distribution.

So, I think we could easily be quite a bit smarter than just increasing
the MCV.  Although that might be a nice start.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 It strikes me that if we really want to restructure things to divide
 client interaction from other query-processing routines, we should
 create another file, say src/backend/tcop/queries.c; this would have
 stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the
 other things that the patch was evicting from postgres.c (plus, I
 imagine, a bunch of other stuff that I may be missing).  In fact, if we
 go down this route, there would be no point in removing
 pg_parse_and_rewrite; we would just move it to this new module.

Yeah, possibly that would be a good idea.

To my mind, the first thing that has to be resolved before messing
around in this area is whether or not we want the logging/statistics
behavior of these functions to apply when they are used in contexts
other than interactive queries.  Personally I would vote no, mainly
because I don't think that behavior is very sensible in nested
execution.  If that is the decision, then probably these functions
should stay where they are and as they are, and we just deprecate
outside use of them.  I'm not sure whether there's enough left after
deleting the logging/statistics behavior to justify making exported
versions, as opposed to just having the callers call the next-layer-down
functionality directly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Josh Berkus

 Quite.  Josh, have you got any evidence showing that the penalty is
 only 10%?  There are cases, such as COPY and ALTER TABLE, where
 you'd be looking at 2X or worse penalties, because of the existing
 optimizations that avoid writing WAL at all for operations where a
 single final fsync can serve the purpose.  I'm not sure what the
 penalty for typical workloads is, partly because I'm not sure what
 should be considered a typical workload for this purpose.

If we could agree on some workloads, I could run some benchmarks.  I'm
not sure what those would be though, given that COPY and ALTER TABLE
aren't generally included in most benchmarks.  I could see how
everything else is effected, though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg s...@cuci.nl wrote:
 Ideal would be: put the table-oid inside the header of each page
 (either in the official header, or in the special area).
 This way even lost blocks can be correlated to the same table.
 I'd still vote for the latest known table definition in the first
 page.  It's by no means perfect, but it will help 99% of all
 salvage attempts by an order of magnitude.

I don't think we should shrink the amount of usable space by 4 bytes
per block and break on-disk compatibility just to make it easier to
reconstruct corrupted tables.

I'm pretty dubious about the proposal to stuff an otherwise-useless
metapage in every heap, too.  If you have many small tables, you just
doubled your disk utilization - worse than that, maybe, if some of
them are empty.  If we needed a metapage anyway and had extra space to
play with, stuffing some useful forensic information in there might be
worthwhile, but I have a hard time thinking that forensics alone is a
sufficient justification for such a change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus j...@agliodbs.com wrote:

 Maybe what should be done about this is to have separate sizes for the
 MCV list and the histogram, where the MCV list is automatically sized
 during ANALYZE.

 It's been suggested multiple times that we should base our sample size
 on a % of the table, or at least offer that as an option.

Why? Afaict this has been suggested multiple times by people who don't
justify it in any way except with handwavy -- larger samples are
better. The sample size is picked based on what sample statistics
tells us we need to achieve a given 95th percentile confidence
interval for the bucket size given.

Robert pointed out one reason we would want smaller buckets for larger
tables but nobody has explained why we would want smaller confidence
intervals for the same size buckets. That amounts to querying larger
tables for the same percentage of the table but wanting more precise
estimates than you want for smaller tables.

  I've pointed
 out (with math, which Simon wrote a prototype for) that doing
 block-based sampling instead of random-row sampling would allow us to
 collect, say, 2% of a very large table without more I/O than we're doing
 now.

Can you explain when this would and wouldn't bias the sample for the
users so they can decide whether to use it or not?

 Nathan Boley has also shown that we could get tremendously better
 estimates without additional sampling if our statistics collector
 recognized common patterns such as normal, linear and geometric
 distributions.  Right now our whole stats system assumes a completely
 random distribution.

That's interesting, I hadn't seen that.

 So, I think we could easily be quite a bit smarter than just increasing
 the MCV.  Although that might be a nice start.

I think increasing the MCV is too simplistic since we don't really
have any basis for any particular value. I think what we need are some
statistics nerds to come along and say here's this nice tool from
which you can make the following predictions and understand how
increasing or decreasing the data set size affects the accuracy of the
predictions.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Merlin Moncure
On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 Efficiency has  always been one of the major reasons for using enums, so
 it's important that we make them extensible without badly affecting
 performance.

on that note is it worthwhile backpatching recent versions to allocate
enums with even numbered oids? That way people binary upgrading can
get the benefit of the optimization they should qualify for...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:38 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus j...@agliodbs.com wrote:
 Maybe what should be done about this is to have separate sizes for the
 MCV list and the histogram, where the MCV list is automatically sized
 during ANALYZE.

 It's been suggested multiple times that we should base our sample size
 on a % of the table, or at least offer that as an option.

 Why? Afaict this has been suggested multiple times by people who don't
 justify it in any way except with handwavy -- larger samples are
 better. The sample size is picked based on what sample statistics
 tells us we need to achieve a given 95th percentile confidence
 interval for the bucket size given.

 Robert pointed out one reason we would want smaller buckets for larger
 tables but nobody has explained why we would want smaller confidence
 intervals for the same size buckets. That amounts to querying larger
 tables for the same percentage of the table but wanting more precise
 estimates than you want for smaller tables.

Yes, I think a percentage of the table is going to break down either
at the high end or the low end.  Hand-waving (but based on
experience), for a 1000 row table a statistics target of 10 is
probably approximately right and 100 is too much and 1 is too little.
But for a 1,000,000 row table 10,000 is probably too much and even
1,000 is pushing it.  So using a constant percentage of table rows
doesn't feel right.  I had a thought today that it might make sense to
use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
justify that mathematically, but that doesn't mean it won't work well
in practice.

 So, I think we could easily be quite a bit smarter than just increasing
 the MCV.  Although that might be a nice start.

 I think increasing the MCV is too simplistic since we don't really
 have any basis for any particular value. I think what we need are some
 statistics nerds to come along and say here's this nice tool from
 which you can make the following predictions and understand how
 increasing or decreasing the data set size affects the accuracy of the
 predictions.

I'm not sure that's realistic, because everything depends on what
queries you're running, and you can get arbitrary answers by
postulating arbitrary queries.  However, this does not make me excited
about doing nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:54 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 Efficiency has  always been one of the major reasons for using enums, so
 it's important that we make them extensible without badly affecting
 performance.

 on that note is it worthwhile backpatching recent versions to allocate
 enums with even numbered oids? That way people binary upgrading can
 get the benefit of the optimization they should qualify for...

Uh, -1 from me.  This is not a bug fix, and it will only help people
who create new enums between the time they upgrade to the relevant
minor release and the time they upgrade to 9.1.  We are not into the
business of back-patching marginal peformance enhancements.  If we
want to have a 9.0R2 release, or whatever, then so be it, but let's
not be modifying behavior in stable branches unless there's a *bug*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas robertmh...@gmail.com wrote:
 Yes, I think a percentage of the table is going to break down either
 at the high end or the low end.  Hand-waving (but based on
 experience), for a 1000 row table a statistics target of 10 is
 probably approximately right and 100 is too much and 1 is too little.
 But for a 1,000,000 row table 10,000 is probably too much and even
 1,000 is pushing it.  So using a constant percentage of table rows
 doesn't feel right.  I had a thought today that it might make sense to
 use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
 justify that mathematically, but that doesn't mean it won't work well
 in practice.

Well we can analyze it but as you said later, it all depends on what
queries you're running. If we want to aim for the same confidence
interval at all times, ie that the estimated frequency is accurate to
within +/- x% 95% of the time then:

If we're querying ranges a..b which represent a constant percentage of
the table we need a fixed number of buckets and a sample size that
varies very little with respect to the size of the table (effectively
constant).

If we're querying ranges a..b which are constant sized and therefore
represent a smaller percentage of the table as it grows then we need a
number of buckets that's proportional to the size of the table. The
sample size is proportional to the number of buckets (ie, it's a
constant sized sample per bucket).

If we're querying for a specific value which isn't one of the most
common values then I'm not clear how to characterize the accuracy or
precision of our current estimates let alone how they would vary if we
changed our sample sizes.

If we need to estimate ndistinct then we clearly need a sample of the
table the size of which is proportional to the size of the table. And
in practice to get accurate results it has to be a fairly high
percentage -- effectively meaning we should read the whole table.
 I think increasing the MCV is too simplistic since we don't really
 have any basis for any particular value. I think what we need are some
 statistics nerds to come along and say here's this nice tool from
 which you can make the following predictions and understand how
 increasing or decreasing the data set size affects the accuracy of the
 predictions.

 I'm not sure that's realistic, because everything depends on what
 queries you're running, and you can get arbitrary answers by
 postulating arbitrary queries.  However, this does not make me excited
 about doing nothing.

Well our planner only needs to answer specific questions. We just
needs stats capable of answering how many occurrences of x are there
and how many values are in the range x..y for the normal estimation
functions. We have the latter but if there's a stat we're missing for
calculating the former more more robustly that would be great. We also
need ndistinct but that's another story.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread David E. Wheeler
On Oct 20, 2010, at 3:12 PM, Dimitri Fontaine wrote:

 So, the idea is that $(EXTENSION) is a list of extensions you're
 providing from the Makefile (most often, a list of one extension, but
 contrib/spi is an exception here). Each extension in the list must have
 a corresponding $EXTENSION.control file.
 
 This control file contains at minimum a single line for the name of the
 extension, but it's better already with a comment for users. I've been
 filling them for our extensions, pasting from the documentation:

Might I suggest instead a META.json file like PGXN requires? Here's a simple 
example:

{
   name: pair,
   abstract: A key/value pair data type,
   version: 0.1.0,
   maintainer: David E. Wheeler da...@justatheory.com,
   license: postgresql,
}

They can have a lot more information, too. Her's the one I actually shipped 
with pair:

  http://github.com/theory/kv-pair/blob/master/META.json

The meta spec is here:

  http://github.com/theory/pgxn/wiki/PGXN-Meta-Spec

Anyway, the point is that it might be useful for us to sync on this format. I 
went with JSON for a few reasons:

* CPAN is switching to it (from YAML)
* It's extremely widespread
* It's useful for ac-hoc REST-style requests
* The format will likely be in 9.1.

Thoughts?

BTW, really excited that you're finally getting EXTENSION done, Dim. This is 
going to be *great* for PostgreSQL developers. I'll have to work it into my 
talk at West.

  
https://www.postgresqlconference.org/content/building-and-distributing-postgresql-extensions-without-learning-c

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with Quorum Commit

2010-10-20 Thread Bruce Momjian
Tom Lane wrote:
 Greg Smith g...@2ndquadrant.com writes:
  I don't see this as needing any implementation any more complicated than 
  the usual way such timeouts are handled.  Note how long you've been 
  trying to reach the standby.  Default to -1 for forever.  And if you hit 
  the timeout, mark the standby as degraded and force them to do a proper 
  resync when they disconnect.  Once that's done, then they can re-enter 
  sync rep mode again, via the same process a new node would have done so.
 
 Well, actually, that's *considerably* more complicated than just a
 timeout.  How are you going to mark the standby as degraded?  The
 standby can't keep that information, because it's not even connected
 when the master makes the decision.  ISTM that this requires
 
 1. a unique identifier for each standby (not just role names that
 multiple standbys might share);
 
 2. state on the master associated with each possible standby -- not just
 the ones currently connected.
 
 Both of those are perhaps possible, but the sense I have of the
 discussion is that people want to avoid them.
 
 Actually, #2 seems rather difficult even if you want it.  Presumably
 you'd like to keep that state in reliable storage, so it survives master
 crashes.  But how you gonna commit a change to that state, if you just
 lost every standby (suppose master's ethernet cable got unplugged)?
 Looks to me like it has to be reliable non-replicated storage.  Leaving
 aside the question of how reliable it can really be if not replicated,
 it's still the case that we have noplace to put such information given
 the WAL-is-across-the-whole-cluster design.

I assumed we would have a parameter called sync_rep_failure that would
take a command and the command would be called when communication to the
slave was lost.  If you restart, it tries again and might call the
function again.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

 Why? Afaict this has been suggested multiple times by people who don't
 justify it in any way except with handwavy -- larger samples are
 better. The sample size is picked based on what sample statistics
 tells us we need to achieve a given 95th percentile confidence
 interval for the bucket size given.

I also just realized that I confused myself ... we don't really want
more MCVs.  What we want it more *samples* to derive a small number of
MCVs.  Right now # of samples and number of MCVs is inexorably bound,
and they shouldn't be.  On larger tables, you're correct that we don't
necessarily want more MCVs, we just need more samples to figure out
those MCVs accurately.

 Can you explain when this would and wouldn't bias the sample for the
 users so they can decide whether to use it or not?

Sure.  There's some good math in various ACM papers for this.  The
basics are that block-based sampling should be accompanied by an
increased sample size, or you are lowering your confidence level.  But
since block-based sampling allows you to increase your sample size
without increasing I/O or RAM usage, you *can* take a larger sample ...
a *much* larger sample if you have small rows.

The algorithms for deriving stats from a block-based sample are a bit
more complex, because the code needs to determine the level of physical
correlation in the blocks sampled and skew the stats based on that.  So
there would be an increase in CPU time.  As a result, we'd probably give
some advice like random sampling for small tables, block-based for
large ones.

 I think increasing the MCV is too simplistic since we don't really
 have any basis for any particular value. I think what we need are some
 statistics nerds to come along and say here's this nice tool from
 which you can make the following predictions and understand how
 increasing or decreasing the data set size affects the accuracy of the
 predictions.

Agreed.

Nathan?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 6:03 PM, Josh Berkus j...@agliodbs.com wrote:
 I also just realized that I confused myself ... we don't really want
 more MCVs.  What we want it more *samples* to derive a small number of
 MCVs.  Right now # of samples and number of MCVs is inexorably bound,
 and they shouldn't be.  On larger tables, you're correct that we don't
 necessarily want more MCVs, we just need more samples to figure out
 those MCVs accurately.

I don't see why the MCVs would need a particularly large sample size
to calculate accurately. Have you done any tests on the accuracy of
the MCV list?

Robert explained why having more MCVs might be useful because we use
the frequency of the least common MCV as an upper bound on the
frequency of any value in the MCV. That seems logical but it's all
about the number of MCV entries not the accuracy of them. And mostly
what it tells me is that we need a robust statistical method and the
data structures it requires for estimating the frequency of a single
value.

 Binding the length of the MCV list to the size of the histogram is
arbitrary but so would any other value and I haven't seen anyone
propose any rationale for any particular value. The only rationale I
can see is that we probably want to to take roughly the same amount of
space as the existing stats -- and that means we probably want it to
be roughly the same size.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] lazy snapshots?

2010-10-20 Thread Robert Haas
I had the following idea for an optimization.  Feel free to tell me I'm nuts.

Would it be possible to postpone the operation of taking a snapshot
until we encounter an in-doubt tuple - that is, a tuple whose XMIN or
XMAX is committed but not all-visible?  It seems to me that there are
many transactions that probably never look at any recently-modified
data, and that the overhead (and contention) of scanning the ProcArray
could be avoided for such transactions.  At the time when we currently
take a snapshot, we could instead record an estimate of the oldest XID
still running; I'll call this value the threshold XID.  Ideally, this
would be something we could read from shared memory in O(1) time.
Subsequently, when we examine XMIN or XMAX, we may find that it's
aborted (in which case we don't need a snapshot to decide what to do)
or that the XID we're examining precedes the threshold XID (in which
case we don't need a snapshot to decide what to do) or that the XID
we're examining is our own (in which case we again don't need a
snapshot to decide what to do).  If none of those conditions hold, we
take a snapshot.  (Possibly, we could try rereading the threshold XID
from shared memory, because it might have advanced far enough to get
us out of the woods.)

It's necessary to convince ourselves not only that this has some
performance benefit but that it's actually correct.  It's easy to see
that, if we never take a snapshot, all the tuple visibility decisions
we make will be exactly identical to the ones that we would have made
with a snapshot; the choice of snapshot in that case is arbitrary.
But if we do eventually take a snapshot, we'll likely make different
tuple visibility decisions than we would have made had we taken the
snapshot earlier. However, the decisions that we make prior to taking
the snapshot will be consistent with the snapshot, and we will
certainly see the effects of all transactions that committed before we
started.  We may also see the effects of some transactions that commit
after we started, but that is OK: it is just as if our whole
transaction had been started slightly later and then executed more
quickly thereafter.  It would be bad if we saw the effect of
transaction A but not transaction B where transaction B committed
after transaction A, but the way snapshots are taken prevents that
regardless of exactly when we do it.

VACUUM can't remove any tuples with committed XMINs unless their XMAX
precedes our threshold XID, but I think that's not any worse under
this proposal than it is anyway.  If we took a full snapshot instead
of just writing down a threshold XID, we'd have the same problem.

OK, that's it.  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

 I don't see why the MCVs would need a particularly large sample size
 to calculate accurately. Have you done any tests on the accuracy of
 the MCV list?

Yes, although I don't have them at my fingertips.  In sum, though, you
can't take 10,000 samples from a 1b row table and expect to get a
remotely accurate MCV list.

A while back I did a fair bit of reading on ndistinct and large tables
from the academic literature.  The consensus of many papers was that it
took a sample of at least 3% (or 5% for block-based) of the table in
order to have 95% confidence in ndistinct of 3X.  I can't imagine that
MCV is easier than this.

 And mostly
 what it tells me is that we need a robust statistical method and the
 data structures it requires for estimating the frequency of a single
 value.

Agreed.

  Binding the length of the MCV list to the size of the histogram is
 arbitrary but so would any other value and I haven't seen anyone
 propose any rationale for any particular value.

histogram size != sample size.  It is in our code, but that's a bug and
not a feature.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 7:13 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas robertmh...@gmail.com wrote:
 Yes, I think a percentage of the table is going to break down either
 at the high end or the low end.  Hand-waving (but based on
 experience), for a 1000 row table a statistics target of 10 is
 probably approximately right and 100 is too much and 1 is too little.
 But for a 1,000,000 row table 10,000 is probably too much and even
 1,000 is pushing it.  So using a constant percentage of table rows
 doesn't feel right.  I had a thought today that it might make sense to
 use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
 justify that mathematically, but that doesn't mean it won't work well
 in practice.

 Well we can analyze it but as you said later, it all depends on what
 queries you're running. If we want to aim for the same confidence
 interval at all times, ie that the estimated frequency is accurate to
 within +/- x% 95% of the time then:

 If we're querying ranges a..b which represent a constant percentage of
 the table we need a fixed number of buckets and a sample size that
 varies very little with respect to the size of the table (effectively
 constant).

 If we're querying ranges a..b which are constant sized and therefore
 represent a smaller percentage of the table as it grows then we need a
 number of buckets that's proportional to the size of the table. The
 sample size is proportional to the number of buckets (ie, it's a
 constant sized sample per bucket).

 If we're querying for a specific value which isn't one of the most
 common values then I'm not clear how to characterize the accuracy or
 precision of our current estimates let alone how they would vary if we
 changed our sample sizes.

I think that sums it up pretty well.  There's no one right formula.  I
think this problem needs an empirical approach rather than a
statistical analysis.  We know that it's impractical for the stats
target to be linear in the table size.  We also know that constant
values are excessive for small tables and sometimes inadequate for
large one.  Therefore, we should pick something that grows, but
sublinearly.  Discuss.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Nathan Boley
 Robert explained why having more MCVs might be useful because we use
 the frequency of the least common MCV as an upper bound on the
 frequency of any value in the MCV.

Where is that being used? The only non-MCV frequency estimate that I
recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
changing the number of mcv's affects this by lowering
n_ndistinct_rows, but it's always pretty coarse estimate.

  Binding the length of the MCV list to the size of the histogram is
 arbitrary but so would any other value

Wouldn't the best approach be to stop adding MCV's/histogram buckets
when adding new ones doesn't decrease your prediction error
'substantially'?

One very hacky threshold heuristic is to stop adding MCV's when a
simple equality select (  SELECT col FROM table WHERE col == VALUE )
would switch the plan from an index to a sequential scan ( or vice
versa, although with the current code this would never happen ). ie,
if the non_mcv frequency estimate is 0.1% ( producing an index scan ),
but adding the MCV gives us an estimate of 5% ( pbly producing a seq
scan ) then add that value as an mcv. More sophisticated variations
might also consider plan changes to very suboptimal joins; even more
sophisticated would be to stop when the MAX( curr - optimal plan /
optimal plan ) was below some threshold, say 20%, over a bunch of
recently executed queries.

A similar approach would work for histogram bins, except the queries
of interest are inequality rather than equality selections.

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:17 PM, Josh Berkus j...@agliodbs.com wrote:
 Quite.  Josh, have you got any evidence showing that the penalty is
 only 10%?  There are cases, such as COPY and ALTER TABLE, where
 you'd be looking at 2X or worse penalties, because of the existing
 optimizations that avoid writing WAL at all for operations where a
 single final fsync can serve the purpose.  I'm not sure what the
 penalty for typical workloads is, partly because I'm not sure what
 should be considered a typical workload for this purpose.

 If we could agree on some workloads, I could run some benchmarks.  I'm
 not sure what those would be though, given that COPY and ALTER TABLE
 aren't generally included in most benchmarks.  I could see how
 everything else is effected, though.

I think this whole thing is a complete non-starter.  Are we seriously
talking about shipping a configuration that will slow down COPY by 2X
or more, just so that someone who wants replication can do it by
changing one fewer parameter?  I find it impossible to believe that's
a good decision, and IMHO we should be focusing on how to make the
parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
most of the same benefits without throwing away hard-won performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Itagaki Takahiro
On Thu, Oct 21, 2010 at 7:12 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 This control file contains at minimum a single line for the name of the
 extension, but it's better already with a comment for users. I've been
 filling them for our extensions, pasting from the documentation:

         name        | version  |
 +--+
  fuzzystrmatch      | 9.1devel |
  hstore             | 9.1      |

Why does only hstore have version '9.1'? Any other modules have '9.1devel'.

 If you provide a $(VERSION) variable, then a line in the control file is
 automatically added at make install (version = '$(VERSION)'), in order
 to make life easier for extension authors.

In v7, a line of version = '...' is added at make install, and removed
at make clean. Also, if we runs make install multiple times, version
lines are added repeatedly. I don't think they are good ideas; we should
not modify source codes stored in git repo when we build them.

How about having *.control.in and replace magic keywords in them at make?
make install won't modify files at all, and make clean just removes
*.control. It is the way we're using for *.sql.in and MODULE_PATHNAME.

 Some extensions are missing here because they fail to compile on my
 workstation where all the libs aren't installed --- ossp, xml2, etc

I found xml2/pgxml.control should have 'pgxml for the name.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 9:53 PM, Nathan Boley npbo...@gmail.com wrote:
 Robert explained why having more MCVs might be useful because we use
 the frequency of the least common MCV as an upper bound on the
 frequency of any value in the MCV.

 Where is that being used?

var_eq_const

 The only non-MCV frequency estimate that I
 recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
 changing the number of mcv's affects this by lowering
 n_ndistinct_rows, but it's always pretty coarse estimate.

That one's used, too, but the other is used as an upper bound.
n_distinct tends to come out too small on large tables, so that
formula is prone to overestimation.  Actually, both formulas are prone
to overestimation.

  Binding the length of the MCV list to the size of the histogram is
 arbitrary but so would any other value

 Wouldn't the best approach be to stop adding MCV's/histogram buckets
 when adding new ones doesn't decrease your prediction error
 'substantially'?

 One very hacky threshold heuristic is to stop adding MCV's when a
 simple equality select (  SELECT col FROM table WHERE col == VALUE )
 would switch the plan from an index to a sequential scan ( or vice
 versa, although with the current code this would never happen ). ie,
 if the non_mcv frequency estimate is 0.1% ( producing an index scan ),

When this happens depends on the values of a whole boat-load of GUCs...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy snapshots?

2010-10-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It's necessary to convince ourselves not only that this has some
 performance benefit but that it's actually correct.  It's easy to see
 that, if we never take a snapshot, all the tuple visibility decisions
 we make will be exactly identical to the ones that we would have made
 with a snapshot; the choice of snapshot in that case is arbitrary.
 But if we do eventually take a snapshot, we'll likely make different
 tuple visibility decisions than we would have made had we taken the
 snapshot earlier. However, the decisions that we make prior to taking
 the snapshot will be consistent with the snapshot, and we will
 certainly see the effects of all transactions that committed before we
 started.  We may also see the effects of some transactions that commit
 after we started, but that is OK: it is just as if our whole
 transaction had been started slightly later and then executed more
 quickly thereafter.

I don't think this is going to be acceptable at all.  You're assuming
that clients have no independent means of determining what order
transactions execute in, which isn't the case.  It would be quite
possible, for example, for a query submitted to one backend to see the
effects of a transaction that was submitted to another backend long
after the first query started.  If the two clients involved interact
at all, they're not going to be happy.  Even if they just compare
transaction timestamps, they're not going to be happy.

I'm less than convinced by the hypothesis that most transactions would
avoid taking snapshots in this regime, anyway.  It would only hold up
if there's little locality of reference in terms of which tuples are
getting examined/modified by concurrent transactions, and that's a
theory that seems much more likely to be wrong than right.

I wonder whether we could do something involving WAL properties --- the
current tuple visibility logic was designed before WAL existed, so it's
not exploiting that resource at all.  I'm imagining that the kernel of a
snapshot is just a WAL position, ie the end of WAL as of the time you
take the snapshot (easy to get in O(1) time).  Visibility tests then
reduce to did this transaction commit with a WAL record located before
the specified position?.  You'd need some index datastructure that made
it reasonably cheap to find out the commit locations of recently
committed transactions, where recent means back to recentGlobalXmin.
That seems possibly do-able, though I don't have a concrete design in
mind.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >