Re: [HACKERS] Domains versus arrays versus typmods

2010-10-20 Thread Richard Huxton

On 20/10/10 01:47, Robert Haas wrote:

On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane  wrote:

Comments?


It might be reasonable to back-patch whatever we decide on into 9.0,
because it is so new, but I would be reluctant to go back further
unless we have some evidence that it's bothering people.  It seems to
me that this can could have a lot of worms in it, and I fear that
there could be several rounds of fixes, which I would rather not
inflict on users of supposedly-stable branches.


The work-around I applied when I stumbled across this was just to apply 
an explicit cast before my function's RETURN. That neatly solves my 
particular problem (which I at first thought was a formatting issue 
somewhere in my app).


The real danger with this is the opportunity to end up with occasional 
bad data in tables, quite possibly unnoticed. If I'd come across this in 
an existing system rather than a new app I'm pretty sure it would have 
confused me for a lot longer than it did.

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-20 Thread Itagaki Takahiro
On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane  wrote:
> The plan for UNION initially involves a couple of SubqueryScan nodes,
> which impose an extra cost of cpu_tuple_cost per tuple.  Those later
> get optimized away, but we don't try to readjust the cost estimates
> for that.

Thanks. It also explains my another question why Merge Append cannot
be used for UNION ALL plans. Inheritance is better than UNION ALL
in much more cases thanks to Merge Append.

=# EXPLAIN SELECT * FROM parent ORDER BY i LIMIT 10;
  QUERY PLAN
--
 Limit  (cost=1.02..1.58 rows=10 width=4)
   ->  Result  (cost=1.02..56.79 rows=1001 width=4)
 ->  Merge Append  (cost=1.02..56.79 rows=1001 width=4)
   Sort Key: public.parent.i
   ->  Sort  (cost=1.01..1.01 rows=1 width=4)
 Sort Key: public.parent.i
 ->  Seq Scan on parent  (cost=0.00..1.00 rows=1 width=4)
   ->  Index Scan using child_i_idx on child parent
(cost=0.00..43.25 rows=1000 width=4)
(8 rows)

=# EXPLAIN (SELECT * FROM ONLY parent ORDER BY i) UNION ALL (SELECT *
FROM child ORDER BY i) ORDER BY i LIMIT 10;
  QUERY PLAN
---
 Limit  (cost=75.91..75.93 rows=10 width=4)
   ->  Sort  (cost=75.91..78.41 rows=1001 width=4)
 Sort Key: parent.i
 ->  Append  (cost=1.01..54.28 rows=1001 width=4)
   ->  Sort  (cost=1.01..1.01 rows=1 width=4)
 Sort Key: parent.i
 ->  Seq Scan on parent  (cost=0.00..1.00 rows=1 width=4)
   ->  Index Scan using child_i_idx on child
(cost=0.00..43.25 rows=1000 width=4)
(8 rows)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-20 Thread Jeff Davis
On Sun, 2010-10-17 at 22:53 -0700, Jeff Davis wrote:
> 2. I think there's a GiST bug (illustrating with PERIOD type):
> 
>   create table foo(p period);
>   create index foo_idx on foo using gist (p);
>   insert into foo select period(
>   '2009-01-01'::timestamptz + g * '1 microsecond'::interval,
>   '2009-01-01'::timestamptz + (g+1) * '1 microsecond'::interval)
> from generate_series(1,200) g;
> 
> Session1:
>   begin isolation level serializable;
>   select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
>   insert into foo values('[2009-01-01, 2009-01-01]'::period);
> 
> Session2:
>   begin isolation level serializable;
>   select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
>   insert into foo values('[2009-01-01, 2009-01-01]'::period);
>   commit;
> 
> Session1:
>   commit;
> 
> In pg_locks (didn't paste here due to formatting), it looks like the
> SIRead locks are holding locks on different pages. Can you clarify your
> design for GiST and the interaction with page-level locks? It looks like
> you're making some assumption about which pages will be visited when
> searching for conflicting values which doesn't hold true. However, that
> seems odd, because even if the value is actually inserted in one
> transaction, the other doesn't seem to find the conflict. Perhaps the
> bug is simpler than that? Or perhaps I have some kind of odd bug in
> PERIOD's gist implementation?
> 
> Also, it appears to be non-deterministic, to a degree at least, so you
> may not observe the problem in the exact way that I do.
> 

I have more information on this failure. Everything in GiST actually
looks fine. I modified the example slightly:

 T1: begin isolation level serializable;
 T2: begin isolation level serializable;
 T1: select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
 T2: select * from foo where p && '[2009-01-01, 2009-01-01]'::period;
 T2: commit;
 T1: commit;

The SELECTs only look at the root and the predicate doesn't match. So
each SELECT sets an SIReadLock on block 0 and exits the search. Looks
good so far.

T1 then inserts, and it has to modify page 0, so it does
FlagRWConflict(). That sets writer->inConflict = reader and
reader->outConflict = writer (where writer is T1 and reader is T2); and
T1->outConflict and T2->inConflict remain NULL.

Then T2 inserts, and I didn't catch that part in as much detail in gdb,
but it apparently has no effect on that state, so we still have
T1->inConflict = T2, T1->outConflict = NULL, T2->inConflict = NULL, and
T2->outConflict = T1.

That looks like a reasonable state to me, but I'm not sure exactly what
the design calls for. I am guessing that the real problem is in
PreCommit_CheckForSerializationFailure(), where there are 6 conditions
that must be met for an error to be thrown. T2 falls out right away at
condition 1. T1 falls out on condition 4. I don't really understand
condition 4 at all -- can you explain it? And can you explain conditions
5 and 6 too?

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL has higher cost than inheritance

2010-10-20 Thread Tom Lane
Itagaki Takahiro  writes:
> I found an explicit UNION ALL has higher cost than an automatic expansion
> by inheritance (49 vs. 83 in the example below). Where does the difference
> come from?

The plan for UNION initially involves a couple of SubqueryScan nodes,
which impose an extra cost of cpu_tuple_cost per tuple.  Those later
get optimized away, but we don't try to readjust the cost estimates
for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread David E. Wheeler
On Oct 20, 2010, at 9:58 PM, Alvaro Herrera wrote:

> What's wrong with sticking to Makefile syntax?  Are we intending to
> build a JSON parser in GNU make perchance?

That metadata isn't *for* make, is it?

D


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UNION ALL has higher cost than inheritance

2010-10-20 Thread Itagaki Takahiro
I found an explicit UNION ALL has higher cost than an automatic expansion
by inheritance (49 vs. 83 in the example below). Where does the difference
come from?  Since they have almost same plan trees, should it be the same cost?

=# CREATE TABLE parent (i integer);
=# CREATE TABLE child () INHERITS (parent);
=# INSERT INTO child SELECT generate_series(1, 1000);
=# CREATE INDEX ON child (i);
=# ANALYZE;

=# EXPLAIN SELECT * FROM parent;
 QUERY PLAN

 Result  (cost=0.00..49.00 rows=3400 width=4)
   ->  Append  (cost=0.00..49.00 rows=3400 width=4)
 ->  Seq Scan on parent  (cost=0.00..34.00 rows=2400 width=4)
 ->  Seq Scan on child parent  (cost=0.00..15.00 rows=1000 width=4)
(4 rows)

=# EXPLAIN SELECT * FROM ONLY parent UNION ALL SELECT * FROM child;
   QUERY PLAN

 Append  (cost=0.00..83.00 rows=3400 width=4)
   ->  Seq Scan on parent  (cost=0.00..34.00 rows=2400 width=4)
   ->  Seq Scan on child  (cost=0.00..15.00 rows=1000 width=4)
(3 rows)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Alvaro Herrera
Excerpts from Itagaki Takahiro's message of jue oct 21 00:01:59 -0300 2010:
> On Thu, Oct 21, 2010 at 8:14 AM, David E. Wheeler  
> wrote:
> > Might I suggest instead a META.json file like PGXN requires?
> 
> I think JSON is also reasonable, but one of the problem to use JSON format is
> we cannot apply the extension patch until JSON patch has been applied ;-)

What's wrong with sticking to Makefile syntax?  Are we intending to
build a JSON parser in GNU make perchance?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba.conf host name wildcard support

2010-10-20 Thread Tom Lane
Peter Eisentraut  writes:
> So, as previously indicated, let's add some wildcard support to the
> pg_hba.conf host name feature.  After looking around a bit, two syntaxes
> appear to be on offer:

> 1. TCP Wrappers style, leading dot indicates suffix match.
> So .example.com matches anything.example.com.  Not sure how useful that
> would be, but it could be implemented in about 3 lines of code.

> 2. Full regular expressions.  I'd suggest the pg_ident.conf style, where
> a leading slash indicates a regex.  An example could be /^dbserver\d\.
> With some code refactoring, this would also only take a few extra lines
> of code.

I'd lean to #1 myself.  Regexes would be a perpetual foot-gun because
(a) dot is a metacharacter to a regex and (b) a non-anchored pattern
is default but would be insecure in most usages.

There is a SQL-ish solution to those two objections: use LIKE or SIMILAR
TO pattern language not standard regex.  But #1 would be far more
familiar to most admin types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 6:41 PM, Josh Berkus  wrote:
> A while back I did a fair bit of reading on ndistinct and large tables
> from the academic literature.  The consensus of many papers was that it
> took a sample of at least 3% (or 5% for block-based) of the table in
> order to have 95% confidence in ndistinct of 3X.  I can't imagine that
> MCV is easier than this.

Interestingly I also read up on this but found a different and even
more pessimistic conclusions. Basically unless you're willing to read
about 50% or more of the table you can't make useful estimates at all
and even then the estimates are pretty unreliable. Which makes a lot
of sense since a handful of entries can easily completely change
ndistinct


> histogram size != sample size.  It is in our code, but that's a bug and
> not a feature.

For the histogram there's a solid statistical reason why the two are related.

For ndistinct I agree you would need to sample a proportion of the
table and from what I read you really want that proportion to be 100%.

For the MCV I'm not entirely clear yet what the right answer is. It's
possible you're right but then I don't see a good algorithm for
calculating mcv accurately for large sample sizes using a reasonable
amount of resources.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_hba.conf host name wildcard support

2010-10-20 Thread Peter Eisentraut
So, as previously indicated, let's add some wildcard support to the
pg_hba.conf host name feature.  After looking around a bit, two syntaxes
appear to be on offer:

1. TCP Wrappers style, leading dot indicates suffix match.
So .example.com matches anything.example.com.  Not sure how useful that
would be, but it could be implemented in about 3 lines of code.

2. Full regular expressions.  I'd suggest the pg_ident.conf style, where
a leading slash indicates a regex.  An example could be /^dbserver\d\.
With some code refactoring, this would also only take a few extra lines
of code.

Comments, other ideas?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy snapshots?

2010-10-20 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane  wrote:
>> I'm less than convinced by the hypothesis that most transactions would
>> avoid taking snapshots in this regime, anyway.  It would only hold up
>> if there's little locality of reference in terms of which tuples are
>> getting examined/modified by concurrent transactions, and that's a
>> theory that seems much more likely to be wrong than right.

> There will certainly be workloads where most transactions acquire a
> snapshot, but just to take one obvious example, suppose we have a data
> warehouse where every night we bulk load the day's data, and then we
> run reporting queries all day.  Except during the overnight bulk
> loads, there's no concurrent write activity at all, and thus no need
> for snapshots.

Well, yeah, but in this scenario there's also no contention involved in
taking snapshots --- there are only readers of ProcArray and (IIRC) they
only need shared locks on the array.  If you want to make any meaningful
improvement in this area, you need something that solves the ProcArray
access contention caused by a heavy mixed read/write transaction load.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy snapshots?

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> It's necessary to convince ourselves not only that this has some
>> performance benefit but that it's actually correct.  It's easy to see
>> that, if we never take a snapshot, all the tuple visibility decisions
>> we make will be exactly identical to the ones that we would have made
>> with a snapshot; the choice of snapshot in that case is arbitrary.
>> But if we do eventually take a snapshot, we'll likely make different
>> tuple visibility decisions than we would have made had we taken the
>> snapshot earlier. However, the decisions that we make prior to taking
>> the snapshot will be consistent with the snapshot, and we will
>> certainly see the effects of all transactions that committed before we
>> started.  We may also see the effects of some transactions that commit
>> after we started, but that is OK: it is just as if our whole
>> transaction had been started slightly later and then executed more
>> quickly thereafter.
>
> I don't think this is going to be acceptable at all.  You're assuming
> that clients have no independent means of determining what order
> transactions execute in, which isn't the case.  It would be quite
> possible, for example, for a query submitted to one backend to see the
> effects of a transaction that was submitted to another backend long
> after the first query started.  If the two clients involved interact
> at all, they're not going to be happy.  Even if they just compare
> transaction timestamps, they're not going to be happy.

I'm not sure they're entitled to rely on any other behavior.  Couldn't
the exact same thing happen in a non-MVCC database based on SS2PL?

> I'm less than convinced by the hypothesis that most transactions would
> avoid taking snapshots in this regime, anyway.  It would only hold up
> if there's little locality of reference in terms of which tuples are
> getting examined/modified by concurrent transactions, and that's a
> theory that seems much more likely to be wrong than right.

There will certainly be workloads where most transactions acquire a
snapshot, but just to take one obvious example, suppose we have a data
warehouse where every night we bulk load the day's data, and then we
run reporting queries all day.  Except during the overnight bulk
loads, there's no concurrent write activity at all, and thus no need
for snapshots.  Or imagine a database where we store monitoring data.
There's a continuous flow of monitoring data from multiple sources;
and then people run reports.  The users running reports will need
snapshots, but the processes updating the monitoring data will
presumably be touching discrete sets of tuples.  They may be
INSERT-only, and even if they do updates, the process monitoring
resource A only needs to look at the rows for resource A, not the rows
for resource B.  If the tables are large enough that index scans are
used and the threshold XID is updated sufficiently frequently, you
might get away without snapshots.  This isn't quite so clear a win as
the first one but maybe it's worth thinking about.

One thing we could do is instrument the current code to track whether
any field of the snapshot other than snapshot->xmin is ever used, and
then run some benchmarks to see how often that happens.

> I wonder whether we could do something involving WAL properties --- the
> current tuple visibility logic was designed before WAL existed,

Wow.

> so it's
> not exploiting that resource at all.  I'm imagining that the kernel of a
> snapshot is just a WAL position, ie the end of WAL as of the time you
> take the snapshot (easy to get in O(1) time).  Visibility tests then
> reduce to "did this transaction commit with a WAL record located before
> the specified position?".  You'd need some index datastructure that made
> it reasonably cheap to find out the commit locations of recently
> committed transactions, where "recent" means "back to recentGlobalXmin".
> That seems possibly do-able, though I don't have a concrete design in
> mind.

Interesting.  O(1) snapshots would be great.  I need to think about
this more before commenting on it, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Itagaki Takahiro
On Thu, Oct 21, 2010 at 8:14 AM, David E. Wheeler  wrote:
> Might I suggest instead a META.json file like PGXN requires?

I think JSON is also reasonable, but one of the problem to use JSON format is
we cannot apply the extension patch until JSON patch has been applied ;-)

BTW, does anyone needs JSON formatted configuration files for other purposes?
There might be some discussions in "Standby registration" or "Configuring
synchronous replication" threads. Module control files are so simple that
they don't always require JSON format, such as nested variable. But
configuration files for replication might be more complex. If needed,
it would be reasonable to introduce a JSON reader.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Greg Smith

Josh Berkus wrote:

If we could agree on some workloads, I could run some benchmarks.  I'm
not sure what those would be though, given that COPY and ALTER TABLE
aren't generally included in most benchmarks.


You can usefully and easily benchmark this by timing a simple pgbench 
initialization at a decently large scale.  The COPY used to populate the 
giant accounts table takes advantage of the WAL bypass fast path if 
available, and you can watch performance tank the minute one of the 
options that disables it is turned on.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql autocompletion for \z and \dg

2010-10-20 Thread Josh Kupershmidt
Hi all,

It looks like psql's tab completion for the \z and \dg commands in
psql are missing. I couldn't see a reason for this, so attached patch
fixes.

Also, this patch proposes to change psql's "\?" help text to say that
\dg and \du are the same, since AFAICT they do exactly the same thing.

Josh


psql_tab_completion.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Tom Lane
Josh Berkus  writes:
> A while back I did a fair bit of reading on ndistinct and large tables
> from the academic literature.  The consensus of many papers was that it
> took a sample of at least 3% (or 5% for block-based) of the table in
> order to have 95% confidence in ndistinct of 3X.  I can't imagine that
> MCV is easier than this.

You've got that entirely backwards.  ndistinct is hard because there
could be a whole lot of values with very small occurrence counts,
and sampling is not going to help you distinguish between things
that occur but once and things that occur only two or three times.
But whether the table has a lot of the first or a lot of the second
means a difference of 2X or 3X on ndistinct.

MCVs, on the other hand, are MCVs precisely because they occur a lot,
and so they are highly likely to show up in a sample.  You *can* get a
decent estimate of the first few MCVs from a sample, you just have to be
cautious about not believing something is an MCV if it's only a small
proportion of your sample.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Joshua D. Drake
On Wed, 2010-10-20 at 15:15 -0700, Josh Berkus wrote:
> >> Maybe what should be done about this is to have separate sizes for the
> >> MCV list and the histogram, where the MCV list is automatically sized
> >> during ANALYZE.
> 
> It's been suggested multiple times that we should base our sample size
> on a % of the table, or at least offer that as an option.  I've pointed
> out (with math, which Simon wrote a prototype for) that doing
> block-based sampling instead of random-row sampling would allow us to
> collect, say, 2% of a very large table without more I/O than we're doing
> now.
> 
> Nathan Boley has also shown that we could get tremendously better
> estimates without additional sampling if our statistics collector
> recognized common patterns such as normal, linear and geometric
> distributions.  Right now our whole stats system assumes a completely
> random distribution.
> 
> So, I think we could easily be quite a bit smarter than just increasing
> the MCV.  Although that might be a nice start.

References would be nice.

JD


> 
> -- 
>   -- Josh Berkus
>  PostgreSQL Experts Inc.
>  http://www.pgexperts.com
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Nathan Boley
> That one's used, too, but the other is used as an upper bound.
> n_distinct tends to come out too small on large tables, so that
> formula is prone to overestimation.  Actually, both formulas are prone
> to overestimation.
>

Right - thanks.

> When this happens depends on the values of a whole boat-load of GUCs...

Well, then doesn't the 'proper' number of buckets depend on a whole
boat-load of GUCs...

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy snapshots?

2010-10-20 Thread Tom Lane
Robert Haas  writes:
> It's necessary to convince ourselves not only that this has some
> performance benefit but that it's actually correct.  It's easy to see
> that, if we never take a snapshot, all the tuple visibility decisions
> we make will be exactly identical to the ones that we would have made
> with a snapshot; the choice of snapshot in that case is arbitrary.
> But if we do eventually take a snapshot, we'll likely make different
> tuple visibility decisions than we would have made had we taken the
> snapshot earlier. However, the decisions that we make prior to taking
> the snapshot will be consistent with the snapshot, and we will
> certainly see the effects of all transactions that committed before we
> started.  We may also see the effects of some transactions that commit
> after we started, but that is OK: it is just as if our whole
> transaction had been started slightly later and then executed more
> quickly thereafter.

I don't think this is going to be acceptable at all.  You're assuming
that clients have no independent means of determining what order
transactions execute in, which isn't the case.  It would be quite
possible, for example, for a query submitted to one backend to see the
effects of a transaction that was submitted to another backend long
after the first query started.  If the two clients involved interact
at all, they're not going to be happy.  Even if they just compare
transaction timestamps, they're not going to be happy.

I'm less than convinced by the hypothesis that most transactions would
avoid taking snapshots in this regime, anyway.  It would only hold up
if there's little locality of reference in terms of which tuples are
getting examined/modified by concurrent transactions, and that's a
theory that seems much more likely to be wrong than right.

I wonder whether we could do something involving WAL properties --- the
current tuple visibility logic was designed before WAL existed, so it's
not exploiting that resource at all.  I'm imagining that the kernel of a
snapshot is just a WAL position, ie the end of WAL as of the time you
take the snapshot (easy to get in O(1) time).  Visibility tests then
reduce to "did this transaction commit with a WAL record located before
the specified position?".  You'd need some index datastructure that made
it reasonably cheap to find out the commit locations of recently
committed transactions, where "recent" means "back to recentGlobalXmin".
That seems possibly do-able, though I don't have a concrete design in
mind.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 9:53 PM, Nathan Boley  wrote:
>> Robert explained why having more MCVs might be useful because we use
>> the frequency of the least common MCV as an upper bound on the
>> frequency of any value in the MCV.
>
> Where is that being used?

var_eq_const

> The only non-MCV frequency estimate that I
> recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
> changing the number of mcv's affects this by lowering
> n_ndistinct_rows, but it's always pretty coarse estimate.

That one's used, too, but the other is used as an upper bound.
n_distinct tends to come out too small on large tables, so that
formula is prone to overestimation.  Actually, both formulas are prone
to overestimation.

>>  Binding the length of the MCV list to the size of the histogram is
>> arbitrary but so would any other value
>
> Wouldn't the best approach be to stop adding MCV's/histogram buckets
> when adding new ones doesn't decrease your prediction error
> 'substantially'?
>
> One very hacky threshold heuristic is to stop adding MCV's when a
> simple equality select (  SELECT col FROM table WHERE col == VALUE )
> would switch the plan from an index to a sequential scan ( or vice
> versa, although with the current code this would never happen ). ie,
> if the non_mcv frequency estimate is 0.1% ( producing an index scan ),

When this happens depends on the values of a whole boat-load of GUCs...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Itagaki Takahiro
On Thu, Oct 21, 2010 at 7:12 AM, Dimitri Fontaine
 wrote:
> This control file contains at minimum a single line for the name of the
> extension, but it's better already with a comment for users. I've been
> filling them for our extensions, pasting from the documentation:
>
>         name        | version  |
> +--+
>  fuzzystrmatch      | 9.1devel |
>  hstore             | 9.1      |

Why does only hstore have version '9.1'? Any other modules have '9.1devel'.

> If you provide a $(VERSION) variable, then a line in the control file is
> automatically added at make install (version = '$(VERSION)'), in order
> to make life easier for extension authors.

In v7, a line of "version = '...'" is added at "make install", and removed
at "make clean". Also, if we runs "make install" multiple times, version
lines are added repeatedly. I don't think they are good ideas; we should
not modify source codes stored in git repo when we build them.

How about having *.control.in and replace magic keywords in them at "make"?
"make install" won't modify files at all, and "make clean" just removes
*.control. It is the way we're using for *.sql.in and MODULE_PATHNAME.

> Some extensions are missing here because they fail to compile on my
> workstation where all the libs aren't installed --- ossp, xml2, etc

I found xml2/pgxml.control should have 'pgxml" for the name.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:17 PM, Josh Berkus  wrote:
>> Quite.  Josh, have you got any evidence showing that the penalty is
>> only 10%?  There are cases, such as COPY and ALTER TABLE, where
>> you'd be looking at 2X or worse penalties, because of the existing
>> optimizations that avoid writing WAL at all for operations where a
>> single final fsync can serve the purpose.  I'm not sure what the
>> penalty for "typical" workloads is, partly because I'm not sure what
>> should be considered a "typical" workload for this purpose.
>
> If we could agree on some workloads, I could run some benchmarks.  I'm
> not sure what those would be though, given that COPY and ALTER TABLE
> aren't generally included in most benchmarks.  I could see how
> everything else is effected, though.

I think this whole thing is a complete non-starter.  Are we seriously
talking about shipping a configuration that will slow down COPY by 2X
or more, just so that someone who wants replication can do it by
changing one fewer parameter?  I find it impossible to believe that's
a good decision, and IMHO we should be focusing on how to make the
parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us
most of the same benefits without throwing away hard-won performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Nathan Boley
> Robert explained why having more MCVs might be useful because we use
> the frequency of the least common MCV as an upper bound on the
> frequency of any value in the MCV.

Where is that being used? The only non-MCV frequency estimate that I
recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
changing the number of mcv's affects this by lowering
n_ndistinct_rows, but it's always pretty coarse estimate.

>  Binding the length of the MCV list to the size of the histogram is
> arbitrary but so would any other value

Wouldn't the best approach be to stop adding MCV's/histogram buckets
when adding new ones doesn't decrease your prediction error
'substantially'?

One very hacky threshold heuristic is to stop adding MCV's when a
simple equality select (  SELECT col FROM table WHERE col == VALUE )
would switch the plan from an index to a sequential scan ( or vice
versa, although with the current code this would never happen ). ie,
if the non_mcv frequency estimate is 0.1% ( producing an index scan ),
but adding the MCV gives us an estimate of 5% ( pbly producing a seq
scan ) then add that value as an mcv. More sophisticated variations
might also consider plan changes to very suboptimal joins; even more
sophisticated would be to stop when the MAX( curr - optimal plan /
optimal plan ) was below some threshold, say 20%, over a bunch of
recently executed queries.

A similar approach would work for histogram bins, except the queries
of interest are inequality rather than equality selections.

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 7:13 PM, Greg Stark  wrote:
> On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas  wrote:
>> Yes, I think a percentage of the table is going to break down either
>> at the high end or the low end.  Hand-waving (but based on
>> experience), for a 1000 row table a statistics target of 10 is
>> probably approximately right and 100 is too much and 1 is too little.
>> But for a 1,000,000 row table 10,000 is probably too much and even
>> 1,000 is pushing it.  So using a constant percentage of table rows
>> doesn't feel right.  I had a thought today that it might make sense to
>> use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
>> justify that mathematically, but that doesn't mean it won't work well
>> in practice.
>
> Well we can analyze it but as you said later, it all depends on what
> queries you're running. If we want to aim for the same confidence
> interval at all times, ie that the estimated frequency is accurate to
> within +/- x% 95% of the time then:
>
> If we're querying ranges a..b which represent a constant percentage of
> the table we need a fixed number of buckets and a sample size that
> varies very little with respect to the size of the table (effectively
> constant).
>
> If we're querying ranges a..b which are constant sized and therefore
> represent a smaller percentage of the table as it grows then we need a
> number of buckets that's proportional to the size of the table. The
> sample size is proportional to the number of buckets (ie, it's a
> constant sized sample per bucket).
>
> If we're querying for a specific value which isn't one of the most
> common values then I'm not clear how to characterize the accuracy or
> precision of our current estimates let alone how they would vary if we
> changed our sample sizes.

I think that sums it up pretty well.  There's no one right formula.  I
think this problem needs an empirical approach rather than a
statistical analysis.  We know that it's impractical for the stats
target to be linear in the table size.  We also know that constant
values are excessive for small tables and sometimes inadequate for
large one.  Therefore, we should pick something that grows, but
sublinearly.  Discuss.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

> I don't see why the MCVs would need a particularly large sample size
> to calculate accurately. Have you done any tests on the accuracy of
> the MCV list?

Yes, although I don't have them at my fingertips.  In sum, though, you
can't take 10,000 samples from a 1b row table and expect to get a
remotely accurate MCV list.

A while back I did a fair bit of reading on ndistinct and large tables
from the academic literature.  The consensus of many papers was that it
took a sample of at least 3% (or 5% for block-based) of the table in
order to have 95% confidence in ndistinct of 3X.  I can't imagine that
MCV is easier than this.

> And mostly
> what it tells me is that we need a robust statistical method and the
> data structures it requires for estimating the frequency of a single
> value.

Agreed.

>  Binding the length of the MCV list to the size of the histogram is
> arbitrary but so would any other value and I haven't seen anyone
> propose any rationale for any particular value.

histogram size != sample size.  It is in our code, but that's a bug and
not a feature.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] lazy snapshots?

2010-10-20 Thread Robert Haas
I had the following idea for an optimization.  Feel free to tell me I'm nuts.

Would it be possible to postpone the operation of taking a snapshot
until we encounter an in-doubt tuple - that is, a tuple whose XMIN or
XMAX is committed but not all-visible?  It seems to me that there are
many transactions that probably never look at any recently-modified
data, and that the overhead (and contention) of scanning the ProcArray
could be avoided for such transactions.  At the time when we currently
take a snapshot, we could instead record an estimate of the oldest XID
still running; I'll call this value the threshold XID.  Ideally, this
would be something we could read from shared memory in O(1) time.
Subsequently, when we examine XMIN or XMAX, we may find that it's
aborted (in which case we don't need a snapshot to decide what to do)
or that the XID we're examining precedes the threshold XID (in which
case we don't need a snapshot to decide what to do) or that the XID
we're examining is our own (in which case we again don't need a
snapshot to decide what to do).  If none of those conditions hold, we
take a snapshot.  (Possibly, we could try rereading the threshold XID
from shared memory, because it might have advanced far enough to get
us out of the woods.)

It's necessary to convince ourselves not only that this has some
performance benefit but that it's actually correct.  It's easy to see
that, if we never take a snapshot, all the tuple visibility decisions
we make will be exactly identical to the ones that we would have made
with a snapshot; the choice of snapshot in that case is arbitrary.
But if we do eventually take a snapshot, we'll likely make different
tuple visibility decisions than we would have made had we taken the
snapshot earlier. However, the decisions that we make prior to taking
the snapshot will be consistent with the snapshot, and we will
certainly see the effects of all transactions that committed before we
started.  We may also see the effects of some transactions that commit
after we started, but that is OK: it is just as if our whole
transaction had been started slightly later and then executed more
quickly thereafter.  It would be bad if we saw the effect of
transaction A but not transaction B where transaction B committed
after transaction A, but the way snapshots are taken prevents that
regardless of exactly when we do it.

VACUUM can't remove any tuples with committed XMINs unless their XMAX
precedes our threshold XID, but I think that's not any worse under
this proposal than it is anyway.  If we took a full snapshot instead
of just writing down a threshold XID, we'd have the same problem.

OK, that's it.  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 6:03 PM, Josh Berkus  wrote:
> I also just realized that I confused myself ... we don't really want
> more MCVs.  What we want it more *samples* to derive a small number of
> MCVs.  Right now # of samples and number of MCVs is inexorably bound,
> and they shouldn't be.  On larger tables, you're correct that we don't
> necessarily want more MCVs, we just need more samples to figure out
> those MCVs accurately.

I don't see why the MCVs would need a particularly large sample size
to calculate accurately. Have you done any tests on the accuracy of
the MCV list?

Robert explained why having more MCVs might be useful because we use
the frequency of the least common MCV as an upper bound on the
frequency of any value in the MCV. That seems logical but it's all
about the number of MCV entries not the accuracy of them. And mostly
what it tells me is that we need a robust statistical method and the
data structures it requires for estimating the frequency of a single
value.

 Binding the length of the MCV list to the size of the histogram is
arbitrary but so would any other value and I haven't seen anyone
propose any rationale for any particular value. The only rationale I
can see is that we probably want to to take roughly the same amount of
space as the existing stats -- and that means we probably want it to
be roughly the same size.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

> Why? Afaict this has been suggested multiple times by people who don't
> justify it in any way except with handwavy -- larger samples are
> better. The sample size is picked based on what sample statistics
> tells us we need to achieve a given 95th percentile confidence
> interval for the bucket size given.

I also just realized that I confused myself ... we don't really want
more MCVs.  What we want it more *samples* to derive a small number of
MCVs.  Right now # of samples and number of MCVs is inexorably bound,
and they shouldn't be.  On larger tables, you're correct that we don't
necessarily want more MCVs, we just need more samples to figure out
those MCVs accurately.

> Can you explain when this would and wouldn't bias the sample for the
> users so they can decide whether to use it or not?

Sure.  There's some good math in various ACM papers for this.  The
basics are that block-based sampling should be accompanied by an
increased sample size, or you are lowering your confidence level.  But
since block-based sampling allows you to increase your sample size
without increasing I/O or RAM usage, you *can* take a larger sample ...
a *much* larger sample if you have small rows.

The algorithms for deriving stats from a block-based sample are a bit
more complex, because the code needs to determine the level of physical
correlation in the blocks sampled and skew the stats based on that.  So
there would be an increase in CPU time.  As a result, we'd probably give
some advice like "random sampling for small tables, block-based for
large ones".

> I think increasing the MCV is too simplistic since we don't really
> have any basis for any particular value. I think what we need are some
> statistics nerds to come along and say here's this nice tool from
> which you can make the following predictions and understand how
> increasing or decreasing the data set size affects the accuracy of the
> predictions.

Agreed.

Nathan?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with Quorum Commit

2010-10-20 Thread Bruce Momjian
Tom Lane wrote:
> Greg Smith  writes:
> > I don't see this as needing any implementation any more complicated than 
> > the usual way such timeouts are handled.  Note how long you've been 
> > trying to reach the standby.  Default to -1 for forever.  And if you hit 
> > the timeout, mark the standby as degraded and force them to do a proper 
> > resync when they disconnect.  Once that's done, then they can re-enter 
> > sync rep mode again, via the same process a new node would have done so.
> 
> Well, actually, that's *considerably* more complicated than just a
> timeout.  How are you going to "mark the standby as degraded"?  The
> standby can't keep that information, because it's not even connected
> when the master makes the decision.  ISTM that this requires
> 
> 1. a unique identifier for each standby (not just role names that
> multiple standbys might share);
> 
> 2. state on the master associated with each possible standby -- not just
> the ones currently connected.
> 
> Both of those are perhaps possible, but the sense I have of the
> discussion is that people want to avoid them.
> 
> Actually, #2 seems rather difficult even if you want it.  Presumably
> you'd like to keep that state in reliable storage, so it survives master
> crashes.  But how you gonna commit a change to that state, if you just
> lost every standby (suppose master's ethernet cable got unplugged)?
> Looks to me like it has to be reliable non-replicated storage.  Leaving
> aside the question of how reliable it can really be if not replicated,
> it's still the case that we have noplace to put such information given
> the WAL-is-across-the-whole-cluster design.

I assumed we would have a parameter called "sync_rep_failure" that would
take a command and the command would be called when communication to the
slave was lost.  If you restart, it tries again and might call the
function again.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread David E. Wheeler
On Oct 20, 2010, at 3:12 PM, Dimitri Fontaine wrote:

> So, the idea is that $(EXTENSION) is a list of extensions you're
> providing from the Makefile (most often, a list of one extension, but
> contrib/spi is an exception here). Each extension in the list must have
> a corresponding $EXTENSION.control file.
> 
> This control file contains at minimum a single line for the name of the
> extension, but it's better already with a comment for users. I've been
> filling them for our extensions, pasting from the documentation:

Might I suggest instead a META.json file like PGXN requires? Here's a simple 
example:

{
   "name": "pair",
   "abstract": "A key/value pair data type",
   "version": "0.1.0",
   "maintainer": "David E. Wheeler ",
   "license": "postgresql",
}

They can have a lot more information, too. Her's the one I actually shipped 
with pair:

  http://github.com/theory/kv-pair/blob/master/META.json

The meta spec is here:

  http://github.com/theory/pgxn/wiki/PGXN-Meta-Spec

Anyway, the point is that it might be useful for us to sync on this format. I 
went with JSON for a few reasons:

* CPAN is switching to it (from YAML)
* It's extremely widespread
* It's useful for ac-hoc REST-style requests
* The format will likely be in 9.1.

Thoughts?

BTW, really excited that you're finally getting EXTENSION done, Dim. This is 
going to be *great* for PostgreSQL developers. I'll have to work it into my 
talk at West.

  
https://www.postgresqlconference.org/content/building-and-distributing-postgresql-extensions-without-learning-c

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas  wrote:
> Yes, I think a percentage of the table is going to break down either
> at the high end or the low end.  Hand-waving (but based on
> experience), for a 1000 row table a statistics target of 10 is
> probably approximately right and 100 is too much and 1 is too little.
> But for a 1,000,000 row table 10,000 is probably too much and even
> 1,000 is pushing it.  So using a constant percentage of table rows
> doesn't feel right.  I had a thought today that it might make sense to
> use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
> justify that mathematically, but that doesn't mean it won't work well
> in practice.

Well we can analyze it but as you said later, it all depends on what
queries you're running. If we want to aim for the same confidence
interval at all times, ie that the estimated frequency is accurate to
within +/- x% 95% of the time then:

If we're querying ranges a..b which represent a constant percentage of
the table we need a fixed number of buckets and a sample size that
varies very little with respect to the size of the table (effectively
constant).

If we're querying ranges a..b which are constant sized and therefore
represent a smaller percentage of the table as it grows then we need a
number of buckets that's proportional to the size of the table. The
sample size is proportional to the number of buckets (ie, it's a
constant sized sample per bucket).

If we're querying for a specific value which isn't one of the most
common values then I'm not clear how to characterize the accuracy or
precision of our current estimates let alone how they would vary if we
changed our sample sizes.

If we need to estimate ndistinct then we clearly need a sample of the
table the size of which is proportional to the size of the table. And
in practice to get accurate results it has to be a fairly high
percentage -- effectively meaning we should read the whole table.
>> I think increasing the MCV is too simplistic since we don't really
>> have any basis for any particular value. I think what we need are some
>> statistics nerds to come along and say here's this nice tool from
>> which you can make the following predictions and understand how
>> increasing or decreasing the data set size affects the accuracy of the
>> predictions.
>
> I'm not sure that's realistic, because everything depends on what
> queries you're running, and you can get arbitrary answers by
> postulating arbitrary queries.  However, this does not make me excited
> about "doing nothing".

Well our planner only needs to answer specific questions. We just
needs stats capable of answering "how many occurrences of x are there"
and "how many values are in the range x..y" for the normal estimation
functions. We have the latter but if there's a stat we're missing for
calculating the former more more robustly that would be great. We also
need ndistinct but that's another story.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:54 PM, Merlin Moncure  wrote:
> On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan  wrote:
>> Efficiency has  always been one of the major reasons for using enums, so
>> it's important that we make them extensible without badly affecting
>> performance.
>
> on that note is it worthwhile backpatching recent versions to allocate
> enums with even numbered oids? That way people binary upgrading can
> get the benefit of the optimization they should qualify for...

Uh, -1 from me.  This is not a bug fix, and it will only help people
who create new enums between the time they upgrade to the relevant
minor release and the time they upgrade to 9.1.  We are not into the
business of back-patching marginal peformance enhancements.  If we
want to have a 9.0R2 release, or whatever, then so be it, but let's
not be modifying behavior in stable branches unless there's a *bug*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 6:38 PM, Greg Stark  wrote:
> On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus  wrote:
 Maybe what should be done about this is to have separate sizes for the
 MCV list and the histogram, where the MCV list is automatically sized
 during ANALYZE.
>>
>> It's been suggested multiple times that we should base our sample size
>> on a % of the table, or at least offer that as an option.
>
> Why? Afaict this has been suggested multiple times by people who don't
> justify it in any way except with handwavy -- larger samples are
> better. The sample size is picked based on what sample statistics
> tells us we need to achieve a given 95th percentile confidence
> interval for the bucket size given.
>
> Robert pointed out one reason we would want smaller buckets for larger
> tables but nobody has explained why we would want smaller confidence
> intervals for the same size buckets. That amounts to querying larger
> tables for the same percentage of the table but wanting more precise
> estimates than you want for smaller tables.

Yes, I think a percentage of the table is going to break down either
at the high end or the low end.  Hand-waving (but based on
experience), for a 1000 row table a statistics target of 10 is
probably approximately right and 100 is too much and 1 is too little.
But for a 1,000,000 row table 10,000 is probably too much and even
1,000 is pushing it.  So using a constant percentage of table rows
doesn't feel right.  I had a thought today that it might make sense to
use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
justify that mathematically, but that doesn't mean it won't work well
in practice.

>> So, I think we could easily be quite a bit smarter than just increasing
>> the MCV.  Although that might be a nice start.
>
> I think increasing the MCV is too simplistic since we don't really
> have any basis for any particular value. I think what we need are some
> statistics nerds to come along and say here's this nice tool from
> which you can make the following predictions and understand how
> increasing or decreasing the data set size affects the accuracy of the
> predictions.

I'm not sure that's realistic, because everything depends on what
queries you're running, and you can get arbitrary answers by
postulating arbitrary queries.  However, this does not make me excited
about "doing nothing".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Merlin Moncure
On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan  wrote:
> Efficiency has  always been one of the major reasons for using enums, so
> it's important that we make them extensible without badly affecting
> performance.

on that note is it worthwhile backpatching recent versions to allocate
enums with even numbered oids? That way people binary upgrading can
get the benefit of the optimization they should qualify for...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 3:15 PM, Josh Berkus  wrote:
>
>>> Maybe what should be done about this is to have separate sizes for the
>>> MCV list and the histogram, where the MCV list is automatically sized
>>> during ANALYZE.
>
> It's been suggested multiple times that we should base our sample size
> on a % of the table, or at least offer that as an option.

Why? Afaict this has been suggested multiple times by people who don't
justify it in any way except with handwavy -- larger samples are
better. The sample size is picked based on what sample statistics
tells us we need to achieve a given 95th percentile confidence
interval for the bucket size given.

Robert pointed out one reason we would want smaller buckets for larger
tables but nobody has explained why we would want smaller confidence
intervals for the same size buckets. That amounts to querying larger
tables for the same percentage of the table but wanting more precise
estimates than you want for smaller tables.

>  I've pointed
> out (with math, which Simon wrote a prototype for) that doing
> block-based sampling instead of random-row sampling would allow us to
> collect, say, 2% of a very large table without more I/O than we're doing
> now.

Can you explain when this would and wouldn't bias the sample for the
users so they can decide whether to use it or not?

> Nathan Boley has also shown that we could get tremendously better
> estimates without additional sampling if our statistics collector
> recognized common patterns such as normal, linear and geometric
> distributions.  Right now our whole stats system assumes a completely
> random distribution.

That's interesting, I hadn't seen that.

> So, I think we could easily be quite a bit smarter than just increasing
> the MCV.  Although that might be a nice start.

I think increasing the MCV is too simplistic since we don't really
have any basis for any particular value. I think what we need are some
statistics nerds to come along and say here's this nice tool from
which you can make the following predictions and understand how
increasing or decreasing the data set size affects the accuracy of the
predictions.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg  wrote:
> Ideal would be: put the table-oid inside the header of each page
> (either in the official header, or in the special area).
> This way even lost blocks can be correlated to the same table.
> I'd still vote for the latest known table definition in the first
> page.  It's by no means perfect, but it will help 99% of all
> salvage attempts by an order of magnitude.

I don't think we should shrink the amount of usable space by 4 bytes
per block and break on-disk compatibility just to make it easier to
reconstruct corrupted tables.

I'm pretty dubious about the proposal to stuff an otherwise-useless
metapage in every heap, too.  If you have many small tables, you just
doubled your disk utilization - worse than that, maybe, if some of
them are empty.  If we needed a metapage anyway and had extra space to
play with, stuffing some useful forensic information in there might be
worthwhile, but I have a hard time thinking that forensics alone is a
sufficient justification for such a change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Josh Berkus

> Quite.  Josh, have you got any evidence showing that the penalty is
> only 10%?  There are cases, such as COPY and ALTER TABLE, where
> you'd be looking at 2X or worse penalties, because of the existing
> optimizations that avoid writing WAL at all for operations where a
> single final fsync can serve the purpose.  I'm not sure what the
> penalty for "typical" workloads is, partly because I'm not sure what
> should be considered a "typical" workload for this purpose.

If we could agree on some workloads, I could run some benchmarks.  I'm
not sure what those would be though, given that COPY and ALTER TABLE
aren't generally included in most benchmarks.  I could see how
everything else is effected, though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Tom Lane
Alvaro Herrera  writes:
> It strikes me that if we really want to restructure things to divide
> client interaction from other query-processing routines, we should
> create another file, say src/backend/tcop/queries.c; this would have
> stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the
> other things that the patch was evicting from postgres.c (plus, I
> imagine, a bunch of other stuff that I may be missing).  In fact, if we
> go down this route, there would be no point in removing
> pg_parse_and_rewrite; we would just move it to this new module.

Yeah, possibly that would be a good idea.

To my mind, the first thing that has to be resolved before messing
around in this area is whether or not we want the logging/statistics
behavior of these functions to apply when they are used in contexts
other than interactive queries.  Personally I would vote no, mainly
because I don't think that behavior is very sensible in nested
execution.  If that is the decision, then probably these functions
should stay where they are and as they are, and we just deprecate
outside use of them.  I'm not sure whether there's enough left after
deleting the logging/statistics behavior to justify making exported
versions, as opposed to just having the callers call the next-layer-down
functionality directly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

2010-10-20 Thread Josh Berkus

>> Maybe what should be done about this is to have separate sizes for the
>> MCV list and the histogram, where the MCV list is automatically sized
>> during ANALYZE.

It's been suggested multiple times that we should base our sample size
on a % of the table, or at least offer that as an option.  I've pointed
out (with math, which Simon wrote a prototype for) that doing
block-based sampling instead of random-row sampling would allow us to
collect, say, 2% of a very large table without more I/O than we're doing
now.

Nathan Boley has also shown that we could get tremendously better
estimates without additional sampling if our statistics collector
recognized common patterns such as normal, linear and geometric
distributions.  Right now our whole stats system assumes a completely
random distribution.

So, I think we could easily be quite a bit smarter than just increasing
the MCV.  Although that might be a nice start.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Windows-specific change: Enable crash dumps (like core files)

2010-10-20 Thread Bruce Momjian
Craig Ringer wrote:
> On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
> >
> >
> > On 10/04/2010 07:50 AM, Craig Ringer wrote:
> >>
> >> - If the crash dump handler is enabled by setting the GUC,
> >> all backends register the handler during startup or (if it
> >> proves practical) when the GUC is changed.
> >>
> >> - When the handler is triggered by the OS trapping an unhandled
> >> exception, it loads dbghelp.dll, writes the appropriate dump
> >> format to the hardcoded path, and terminates the process.
> >>
> >>
> >
> > What is the performance impact of doing that? Specifically, how does it
> > affect backend startup time?
> 
> Without testing I can't say for sure.
> 
> My expection based on how the handler works would be: near-zero, about 
> as expensive as registering a signal handler, plus the cost of reading 
> the GUC and doing one string compare to test the value. When disabled, 
> it's just the GUC test.
> 
> Is there a better mechanism to use for features that're going to be 
> unused the great majority of the time? Perhaps something that does 
> require a server restart, but doesn't have any cost at all when disabled?

We definately had trouble producing crash dumps caused by the 128 return
code problem, so I can see great value in this, if it can be done
simply.  I wonder if the 128-exit would have produced a crash file.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of mié oct 20 16:33:12 -0300 2010:

> The only quarrel I have with this code shuffling is that
> pg_rewrite_query is being called from exec_parse_message.  Since it's
> now a static function, it would have to stop being static so that it can
> be called from both places (postgres.c and rewriteHandler.c)

Actually, I just noticed that the "remainder" patch uses pg_plan_query,
which is also in postgres.c.  This function along with its sibling
pg_plan_queries is also called from other internal places, like the
PREPARE code, SPI and the plan cache.

It strikes me that if we really want to restructure things to divide
client interaction from other query-processing routines, we should
create another file, say src/backend/tcop/queries.c; this would have
stuff like pg_plan_query, pg_plan_queries, pg_rewrite_query, and the
other things that the patch was evicting from postgres.c (plus, I
imagine, a bunch of other stuff that I may be missing).  In fact, if we
go down this route, there would be no point in removing
pg_parse_and_rewrite; we would just move it to this new module.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane  writes:
> That is simply a horrid idea.  Just make it specify EXTENSION.

And VERSION too, finally.

So any extension 

>
>> and guessing
>> the CONTROL file name from the EXTENSION name only occurs when CONTROL
>> has not been provided.
>
> Here, on the other hand, I'm wondering why have two variables at all.
> Is there any sane use-case for the control file to not be named the same
> as the extension?  It seems like that would accomplish little except to
> sow confusion.
>
>   regards, tom lane

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP

2010-10-20 Thread Marti Raudsepp
Hi list,

Here's the second patch from my coccicheck run. Originally it flagged
the fact that the opened file in psql's process_file() wasn't being
closed in the ON_ERROR_STOP path, but there seem to be two more
unintended behaviors here.

(1) In the error path, the value of pset.inputfile wasn't being
properly restored. The caller does free(fname) on line 786, so
psql.inputfile would point to unallocated memory.

(2) The more significant issue is that stdin *was closed in the
success return path. So when you run a script with two "\i -" lines,
the first "\q" would close stdin and the next one would fail with:
psql:-:0: could not read from input file: Bad file descriptor

In fact, this means that stdin was being accessed after being
fclose()d, which is undefined behavior per ANSI C, though it seems to
work just fine on Linux.

The new behavior requires the same amount of "\q"s as the number of
executions of '-' because stdin is never closed.

Regards,
Marti
From 43b7595fdcc69cc9db0d066a53f53c5e71c965aa Mon Sep 17 00:00:00 2001
From: Marti Raudsepp 
Date: Wed, 20 Oct 2010 23:44:36 +0300
Subject: [PATCH] psql: Don't close stdin, don't leak file descriptor with ON_ERROR_STOP

Changes three things:
1. Don't close stdin when returning success if input was stdin
2. Don't leak file descriptor when exiting due to ON_ERROR_STOP
3. pset.inputfile wasn't being restored with ON_ERROR_STOP, yet the
memory was freed by the caller

(1) changes the behavior of "\q" on stdin. Previously multiple
inclusions of stdin would be terminated with a single quit, now a
separate quit is needed for each invocation. Previous behavior also
accessed stdin after it was fclose()d, which is undefined behavior per
ANSI C.

(2) and (3) should have no practical impact, because the process would
quit immediately afterwards anyway.
---
 src/bin/psql/command.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index e6d703a..45a145c 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1987,7 +1987,10 @@ process_file(char *filename, bool single_txn)
 		if ((res = PSQLexec("BEGIN", false)) == NULL)
 		{
 			if (pset.on_error_stop)
-return EXIT_USER;
+			{
+result = EXIT_USER;
+goto error;
+			}
 		}
 		else
 			PQclear(res);
@@ -2000,13 +2003,19 @@ process_file(char *filename, bool single_txn)
 		if ((res = PSQLexec("COMMIT", false)) == NULL)
 		{
 			if (pset.on_error_stop)
-return EXIT_USER;
+			{
+result = EXIT_USER;
+goto error;
+			}
 		}
 		else
 			PQclear(res);
 	}
 
-	fclose(fd);
+error:
+	if(fd != stdin)
+		fclose(fd);
+
 	pset.inputfile = oldfilename;
 	return result;
 }
-- 
1.7.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Tom Lane wrote:
>Aidan Van Dyk  writes:
>> If we're looking to have any sort of "out of catalog" documentation of
>> table storage format, what about just having a new relation fork that
>> just "appends" each and every change made to the table formats,
>> including ones rolled back, etc.

I presume that means that all tables changes are appended to a single
central file per database?  That would be a bad idea, because in the
typical problem scenario, losing this new catalogue, basically creates
the same problem.  It would be preferable to keep the information
tied in with the actual table(file) it concerns.

>[ thinks for a bit... ]  Perhaps we could stick some sort of unique
>ID into tables, which could be correlated to the same unique ID
>appearing in a metadata fork.

Ideal would be: put the table-oid inside the header of each page
(either in the official header, or in the special area).
This way even lost blocks can be correlated to the same table.
I'd still vote for the latest known table definition in the first
page.  It's by no means perfect, but it will help 99% of all
salvage attempts by an order of magnitude.

>[ thinks some more... ]  Of course, at this point you have to start
>asking why the metadata fork should be thought to be any more reliable
>than the system catalogs.

Quite.  Which is why I wanted the best-effort latest version of the
table description in the first page of the tablefile instead.

>> Make this relation fork append only,
>> and dump a completely new set of metadata to it each and every ALTER
>> TABLE.

>You can bet that there'll be somebody out there who whines because their
>app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
>grows forever.  I think we'd have to be a bit smarter than this.

Which means we come full circle and have to conclude that doing anything
comprehensive is too invasive for normal operations; best-effort is
all a forensics operation wants or can hope for.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread daveg
On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark  wrote:
> > I don't think it's a big cost once all the processes
> > have been forked if you're reusing them beyond perhaps slightly more
> > efficient cache usage.
> 
> Hm, this site claims to get a 13% win just from the reduced tlb misses
> using a preload hack with Pg 8.2. That would be pretty substantial.
> 
> http://oss.linbit.com/hugetlb/

That was my motivation in trying a patch. TLB misses can be a substantial
overhead. I'm not current on the state of play, but working at Sun's
benchmark lab on a DB TPC-B benchmark something for the first generation
of MP systems, something like 30% of all bus traffic was TLB misses. The
next iteration of the hardward had a much larger TLB.

I have a client with 512GB memory systems, currently with 128GB configured
as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
few dozed cpu TLB slots. I suspect there may be some contention.

I'll benchmark of course.

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Tom Lane
Marti Raudsepp  writes:
> There's a file descriptor leak in psql/command.c function
> process_file() -- on errors it just returns without closing the file.
> But since it's quitting anyway, there's no practical impact. Should I
> submit a patch for this as well?

Might as well.  It's the kind of thing that could turn into a real
bug given some rearrangement of the code.

> Then there's the 'badzero' check that finds a dozen cases where
> pointers are compared to a literal 0, not a NULL. This is a only a
> coding style check, as far as I can tell, so I thought it's not worth
> it.

I'd be in favor of fixing those too.  I have no particular problem with
either "if (ptr)" or "if (ptr != NULL)", but I think that "if (ptr != 0)"
gives the reader entirely the wrong impression about the datatype of ptr.
Just because C fails to distinguish doesn't make it good style.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 1:12 PM, Robert Haas  wrote:
> On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark  wrote:
>> On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas  wrote:
>
>>> Actually, I think the best thing for default_statistics_target might
>>> be to scale the target based on the number of rows in the table, e.g.
>>> given N rows:
>>
>> The number of buckets needed isn't related to the population size --
>> it's related to how wide the ranges you'll be estimating selectivity
>> for are.
>
> As the table grows, the present week's data becomes a
> smaller and smaller fraction of the table data.

That's an interesting point. I wonder if we can expose this in some
way that allows users to specify the statistics target in something
more meaningful for them that doesn't change as the ranges in the
table grow. Or even gather stats on the size of the ranges being
queried.


> If you have a WHERE clause of the form WHERE x = some_constant, then
> you get a much better estimate if some_constant is an MCV.  If the
> constant is not an MCV, however, you still get better estimates,
> because the estimation code knows that no non-MCV can occur more
> frequently than any MCV, so increasing the number of MCVs pushes those
> estimates closer to reality.  It is especially bad when the frequency
> "falls off a cliff" at a certain point in the distribution e.g. if
> there are 243 values that occur much more frequently than any others,
> a stats target of 250 will do much better than 225.

It sounds like what we really need here some way to characterize the
distribution of frequencies. Instead of just computing an upper bound
we should have a kind of histogram showing how many values occur
precisely once, how many occur twice, three times, etc. Or perhaps we
only need to know the most common frequency per bucket. Or, hm...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Tom Lane
Greg Stark  writes:
> On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane  wrote:
>> I think it's pointless to speculate about whether we might have divvied
>> up the meta-information about tables differently if we'd foreseen
>> wanting to do this.  It is what it is, and there is *way* too much code
>> depending on it, both inside the backend and in clients.  Any
>> reimplementation of temp tables will still have to expose largely the
>> same catalog information that exists for tables now.  We can probably
>> get away with marginal changes like redefining relfilenode, but we can't
>> avoid providing catalog entries that describe the schema and statistics
>> of a temp table.

> I agree about the schema -- that's the whole point of the catalog tables.

> I felt like the statistics were pretty marginal to begin with.

I'm thinking more of pg_statistic than the stuff in pg_class --- I agree
that we could probably kluge some other approach for relpages and
reltuples, but that doesn't scale to the real statistics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Marti Raudsepp
On Wed, Oct 20, 2010 at 22:34, Heikki Linnakangas
 wrote:
> Did coccicheck find anything else interesting?

There's a file descriptor leak in psql/command.c function
process_file() -- on errors it just returns without closing the file.
But since it's quitting anyway, there's no practical impact. Should I
submit a patch for this as well?

Then there are a few more cases found by null_ref (same check as the
original patch). But on closer inspection, these are false positives,
because the variable is actually modified in between dereferencing and
the NULL check.

Then there's the 'badzero' check that finds a dozen cases where
pointers are compared to a literal 0, not a NULL. This is a only a
coding style check, as far as I can tell, so I thought it's not worth
it.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:16 PM, David Fetter  wrote:
> On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
>> On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan  wrote:
>> > Well a bit more testing shows some benefit. I've sorted out a few kinks, so
>> > this seems to work. In particular, with the above tables, the version
>> > imported from 9.0 can create have an index created in about the same time 
>> > as
>> > on the fresh table (identical data, but all even numbered Oids).
>> >
>> > Of course, with lots of odd numbered Oids, if a label gets added the
>> > imported version will degrade in performance much more quickly.
>>
>> I'm quite impressed by the amount of time and thought being put into
>> optimizing this.  I didn't realize people cared so much about enum
>> performance; but it's good that they do.
>>
>> I hope to see more such efforts in other parts of the system.
>
> Which parts of the system, in particular, do you have in mind?  Other
> people from EDB have mentioned that slimming down the on-disk
> representation was one such target.  What other ones would you see as
> needing such attention?

On-disk footprint.
WAL volume.
COPY speed.
Checkpoint I/O.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 8:37 AM, Tom Lane  wrote:
> I think it's pointless to speculate about whether we might have divvied
> up the meta-information about tables differently if we'd foreseen
> wanting to do this.  It is what it is, and there is *way* too much code
> depending on it, both inside the backend and in clients.  Any
> reimplementation of temp tables will still have to expose largely the
> same catalog information that exists for tables now.  We can probably
> get away with marginal changes like redefining relfilenode, but we can't
> avoid providing catalog entries that describe the schema and statistics
> of a temp table.

I agree about the schema -- that's the whole point of the catalog tables.

I felt like the statistics were pretty marginal to begin with. There
may be a large number of places but there's no complex structure of
relationships to other tables or complex data structures going on
here. Surely they can all be coded to look up the relpages from
somewhere else just as easily?

But I'm not about to start working on this area so my judgement on how
much work that would be isn't very important here.

And your point that if we have a complete local copy of the entire
catalog schema then we can create temporary tables from whole cloth on
a read-only database
just as easily is attractive.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:47 PM, daveg  wrote:
> On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote:
>> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark  wrote:
>> > I don't think it's a big cost once all the processes
>> > have been forked if you're reusing them beyond perhaps slightly more
>> > efficient cache usage.
>>
>> Hm, this site claims to get a 13% win just from the reduced tlb misses
>> using a preload hack with Pg 8.2. That would be pretty substantial.
>>
>> http://oss.linbit.com/hugetlb/
>
> That was my motivation in trying a patch. TLB misses can be a substantial
> overhead. I'm not current on the state of play, but working at Sun's
> benchmark lab on a DB TPC-B benchmark something for the first generation
> of MP systems, something like 30% of all bus traffic was TLB misses. The
> next iteration of the hardward had a much larger TLB.
>
> I have a client with 512GB memory systems, currently with 128GB configured
> as postgresql buffer cache. Which is 32M TLB entires trying to fit in the
> few dozed cpu TLB slots. I suspect there may be some contention.
>
> I'll benchmark of course.

Do you mean 128GB shared buffers, or shared buffers + OS cache?  I
think that the general wisdom is that performance tails off beyond
8-10GB of shared buffers anyway, so a performance improvement on 128GB
shared buffers might not mean much unless you can also show that 128GB
shared buffers actually performs better than some smaller amount.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark  wrote:
> On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas  wrote:
>> Exactly.  It doesn't take many 3-7% slowdowns to add up to being 50%
>> or 100% slower, and that sucks.  In fact, I'm still not convinced that
>> we were wise to boost default_statistics_target as much as we did.  I
>> argued for a smaller boost at the time.
>
> Well we don't want to let ourselves be paralyzed by FUD so it was
> important to identify specific concerns and then tackle those
> concerns. Once we identified the worst-case planning cases we profiled
> them and found that the inflection point of the curve was fairly
> clearly above 100 but that there were cases where values below 1,000
> caused problems. So I'm pretty happy with the evidence-based approach.

The inflection point of the curve was certainly a good thing for us to
look at but the fact remains that we took a hit on a trivial
benchmark, and we can't afford to take too many of those.

>> Actually, I think the best thing for default_statistics_target might
>> be to scale the target based on the number of rows in the table, e.g.
>> given N rows:
>
> The number of buckets needed isn't related to the population size --
> it's related to how wide the ranges you'll be estimating selectivity
> for are. That is, with our current code, if you're selecting tuples
> within a range a..b and that range happens to be the same size as the
> bucket size then you'll get an accurate estimate with a fixed 95th
> percentile precision regardless of the size of the table (to an
> approximation).

If you have a WHERE clause of the form WHERE x > some_constant, then
the effects vary depending on how that constant is chosen.  If it's
the median value, then as you say the statistics target doesn't matter
much at all; but that's not necessarily representative of real life.
For example, suppose x is a date and the constant is Monday of the
current week.  As the table grows, the present week's data becomes a
smaller and smaller fraction of the table data.  When it gets to be a
tiny fraction of the very last histogram bucket, the estimates start
to get progressively worse.  At some point you have to give up and
partition the table for other reasons anyway, but having to do it
because the statistics are off is inexcusable.  We've seen people hit
this precise issue on -performance a few times.

> I'm not sure how our selectivity works at all for the degenerate case
> of selecting for specific values. I don't understand how histograms
> are useful for such estimates at all. I think the MCV lists are
> basically an attempt to overcome this problem and as you point out I'm
> not sure the statistics target is really the right thign to control
> them -- but since I don't think there's any real statistics behind
> them I'm not sure there's any right way to control them.

If you have a WHERE clause of the form WHERE x = some_constant, then
you get a much better estimate if some_constant is an MCV.  If the
constant is not an MCV, however, you still get better estimates,
because the estimation code knows that no non-MCV can occur more
frequently than any MCV, so increasing the number of MCVs pushes those
estimates closer to reality.  It is especially bad when the frequency
"falls off a cliff" at a certain point in the distribution e.g. if
there are 243 values that occur much more frequently than any others,
a stats target of 250 will do much better than 225.  But even if
that's not an issue, it still helps.  The bottom line here is that I
can't remember any message, ever, on -performance, or any incident
within my personal experience, where it was necessary to increase the
statistics target beyond 50-100 on a table with 10K rows.  However,
there are certainly cases where we've recommended that for big tables,
which means there are also people out there who have a performance
problem on a big table but haven't asked for help and therefore
haven't gotten that advice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade cleanup

2010-10-20 Thread Bruce Momjian
I have applied the attached patch to clean up pg_upgrade cache lookup
code and remove useless NULL pointer tests.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 257f5a8..67528b0 100644
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
*** gen_db_file_maps(DbInfo *old_db, DbInfo 
*** 74,81 
  		num_maps++;
  
  		/*
! 		 * so much for the mapping of this relation. Now we need a mapping for
! 		 * its corresponding toast relation if any.
  		 */
  		if (oldrel->toastrelid > 0)
  		{
--- 74,81 
  		num_maps++;
  
  		/*
! 		 * So much for mapping this relation;  now we need a mapping
! 		 * for its corresponding toast relation, if any.
  		 */
  		if (oldrel->toastrelid > 0)
  		{
*** gen_db_file_maps(DbInfo *old_db, DbInfo 
*** 117,122 
--- 117,123 
  	 newrel->reloid);
  
  			/* look them up in their respective arrays */
+ 			/* we lose our cache location here */
  			old_toast = relarr_lookup_rel(&old_db->rel_arr,
  		  "pg_toast", old_name, CLUSTER_OLD);
  			new_toast = relarr_lookup_rel(&new_db->rel_arr,
*** get_rel_infos(const DbInfo *dbinfo, RelI
*** 385,391 
  
  	relarr->rels = relinfos;
  	relarr->nrels = num_rels;
! 	relarr->cache_name_rel = 0;
  }
  
  
--- 386,392 
  
  	relarr->rels = relinfos;
  	relarr->nrels = num_rels;
! 	relarr->last_relname_lookup = 0;
  }
  
  
*** dbarr_lookup_db(DbInfoArr *db_arr, const
*** 399,407 
  {
  	int			dbnum;
  
- 	if (!db_arr || !db_name)
- 		return NULL;
- 
  	for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
  	{
  		if (strcmp(db_arr->dbs[dbnum].db_name, db_name) == 0)
--- 400,405 
*** relarr_lookup_rel(RelInfoArr *rel_arr, c
*** 424,439 
  {
  	int			relnum;
  
- 	if (!rel_arr || !relname)
- 		return NULL;
- 
  	/* Test next lookup first, for speed */
! 	if (rel_arr->cache_name_rel + 1 < rel_arr->nrels &&
! 		strcmp(rel_arr->rels[rel_arr->cache_name_rel + 1].nspname, nspname) == 0 &&
! 		strcmp(rel_arr->rels[rel_arr->cache_name_rel + 1].relname, relname) == 0)
  	{
! 		rel_arr->cache_name_rel++;
! 		return &rel_arr->rels[rel_arr->cache_name_rel];
  	}
  
  	for (relnum = 0; relnum < rel_arr->nrels; relnum++)
--- 422,434 
  {
  	int			relnum;
  
  	/* Test next lookup first, for speed */
! 	if (rel_arr->last_relname_lookup + 1 < rel_arr->nrels &&
! 		strcmp(rel_arr->rels[rel_arr->last_relname_lookup + 1].nspname, nspname) == 0 &&
! 		strcmp(rel_arr->rels[rel_arr->last_relname_lookup + 1].relname, relname) == 0)
  	{
! 		rel_arr->last_relname_lookup++;
! 		return &rel_arr->rels[rel_arr->last_relname_lookup];
  	}
  
  	for (relnum = 0; relnum < rel_arr->nrels; relnum++)
*** relarr_lookup_rel(RelInfoArr *rel_arr, c
*** 441,447 
  		if (strcmp(rel_arr->rels[relnum].nspname, nspname) == 0 &&
  			strcmp(rel_arr->rels[relnum].relname, relname) == 0)
  		{
! 			rel_arr->cache_name_rel = relnum;
  			return &rel_arr->rels[relnum];
  		}
  	}
--- 436,442 
  		if (strcmp(rel_arr->rels[relnum].nspname, nspname) == 0 &&
  			strcmp(rel_arr->rels[relnum].relname, relname) == 0)
  		{
! 			rel_arr->last_relname_lookup = relnum;
  			return &rel_arr->rels[relnum];
  		}
  	}
*** relarr_lookup_reloid(RelInfoArr *rel_arr
*** 464,472 
  {
  	int			relnum;
  
- 	if (!rel_arr || !oid)
- 		return NULL;
- 
  	for (relnum = 0; relnum < rel_arr->nrels; relnum++)
  	{
  		if (rel_arr->rels[relnum].reloid == oid)
--- 459,464 
*** relarr_free(RelInfoArr *rel_arr)
*** 483,489 
  {
  	pg_free(rel_arr->rels);
  	rel_arr->nrels = 0;
! 	rel_arr->cache_name_rel = 0;
  }
  
  
--- 475,481 
  {
  	pg_free(rel_arr->rels);
  	rel_arr->nrels = 0;
! 	rel_arr->last_relname_lookup = 0;
  }
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
index e40f58d..439c0a7 100644
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*** typedef struct
*** 78,84 
  {
  	RelInfo*rels;
  	int			nrels;
! 	int			cache_name_rel;	/* cache of last lookup location */
  } RelInfoArr;
  
  /*
--- 78,84 
  {
  	RelInfo*rels;
  	int			nrels;
! 	int			last_relname_lookup;	/* cache of last lookup location */
  } RelInfoArr;
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas  wrote:
> Exactly.  It doesn't take many 3-7% slowdowns to add up to being 50%
> or 100% slower, and that sucks.  In fact, I'm still not convinced that
> we were wise to boost default_statistics_target as much as we did.  I
> argued for a smaller boost at the time.

Well we don't want to let ourselves be paralyzed by FUD so it was
important to identify specific concerns and then tackle those
concerns. Once we identified the worst-case planning cases we profiled
them and found that the inflection point of the curve was fairly
clearly above 100 but that there were cases where values below 1,000
caused problems. So I'm pretty happy with the evidence-based approach.

The problem with being overly conservative is that it gives free rein
to the folks who were shouting that we should just set the default to
1,000. They weren't wrong that the 10 was overly conservative and in
the absence of evidence 1,000 was just as reasonable.


> Actually, I think the best thing for default_statistics_target might
> be to scale the target based on the number of rows in the table, e.g.
> given N rows:

The number of buckets needed isn't related to the population size --
it's related to how wide the ranges you'll be estimating selectivity
for are. That is, with our current code, if you're selecting tuples
within a range a..b and that range happens to be the same size as the
bucket size then you'll get an accurate estimate with a fixed 95th
percentile precision regardless of the size of the table (to an
approximation).

I'm not sure how our selectivity works at all for the degenerate case
of selecting for specific values. I don't understand how histograms
are useful for such estimates at all. I think the MCV lists are
basically an attempt to overcome this problem and as you point out I'm
not sure the statistics target is really the right thign to control
them -- but since I don't think there's any real statistics behind
them I'm not sure there's any right way to control them.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Alvaro Herrera
Excerpts from Greg Stark's message of mié oct 20 16:28:25 -0300 2010:
> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark  wrote:
> > I don't think it's a big cost once all the processes
> > have been forked if you're reusing them beyond perhaps slightly more
> > efficient cache usage.
> 
> Hm, this site claims to get a 13% win just from the reduced tlb misses
> using a preload hack with Pg 8.2. That would be pretty substantial.
> 
> http://oss.linbit.com/hugetlb/

Wow, is there no other way to get the huge page size other than opening
and reading /proc/meminfo?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 18:44, Marti Raudsepp wrote:

Hi pgsql-hackers,

Currently contrib/pgcrypto/pgp-pubenc.c contains code like:

uint8 algo = pk->algo;
if (pk == NULL)
...

However, if pk was NULL, then the if() condition would never be
reached because the pk->algo dereference would segfault.

This patch moves the dereference to below the condition which was the
intended behavior.


Thanks, applied. Did coccicheck find anything else interesting?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-20 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun oct 04 10:31:26 -0400 2010:

> In the particular case at hand here, I rather wonder why SQL functions
> are depending on postgres.c at all.  It might be better to just
> duplicate a bit of code to make them independent.  pg_parse_and_rewrite
> would then be dead code and could be deleted.

This idea doesn't work, unless pushed a lot further.  Attached are two
separate patches, extracted from the last patch version posted by Marko
(git commit --interactive helped here).  The first one does what you
suggest above: remove pg_parse_and_rewrite and inline it into the
callers.  The other patch is the remainder.

Read on for the details of the first patch.  As for the second patch,
which is Marko's original intention anyway, I don't see that it needs to
be delayed because of the first one.  So while I haven't reviewed it, I
think it should be considered separately.


The problem with this patch (0001) is that the inlined versions of the
code that used to be pg_parse_and_rewrite are still depending on
functions in postgres.c.  These are pg_parse_query and
pg_analyze_and_rewrite.  pg_parse_query is just a layer on top of
raw_parser.  pg_analyze_and_rewrite is a layer on top of parse_analyze
plus pg_rewrite_query (also on postgres.c).

Now, the only difference between those functions and the ones that
underlie them is that they have the bunch of tracing macros and
log_parser_stats reporting.  So one solution would be to have SQL
functions (pg_proc.c and executor/functions.c) call the raw parser.c and
analyze.c functions directly, without invoking the tracing/logging code.  

The other idea would be to move the whole of those functions out of
postgres.c and into their own modules, i.e. move pg_parse_query into
parser.c and pg_analyze_and_rewrite and pg_rewrite_query into
rewriteHandler.c.  (This actually requires a bit more effort because we
should also move pg_analyze_and_rewrite_params out of postgres.c,
following pg_analyze_and_rewrite).

Note that pg_analyze_and_rewrite and its "params" siblings are being
called from copy.c, spi.c, optimizer/util/clauses.c, and plancache.c.
So it does make certain sense to move them out of postgres.c, if we want
to think of postgres.c as a module only concerned with client
interaction.

The only quarrel I have with this code shuffling is that
pg_rewrite_query is being called from exec_parse_message.  Since it's
now a static function, it would have to stop being static so that it can
be called from both places (postgres.c and rewriteHandler.c)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


0001-Separate-SQL-function-processing-from-postgres.c.patch
Description: Binary data


0002-The-remainder-of-Marko-s-patch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Bruce Momjian
Magnus Hagander wrote:
> On Wed, Oct 20, 2010 at 21:28, Robert Haas  wrote:
> > On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian  wrote:
> >> One more question --- should I be sending pg_upgrade patches to the list
> >> for approval? ?The restructuring patch was large and didn't seem
> >> necessary to post, and the speedups were tested by the bug reporter, so
> >> I figured those were OK to apply.
> >
> > I think it would be good to do that. ?At least give people a chance to
> > comment, if they care.
> 
> +1. It's also a good way for people to get a bit more involved in the code.

OK.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Magnus Hagander
On Wed, Oct 20, 2010 at 21:28, Robert Haas  wrote:
> On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian  wrote:
>> One more question --- should I be sending pg_upgrade patches to the list
>> for approval?  The restructuring patch was large and didn't seem
>> necessary to post, and the speedups were tested by the bug reporter, so
>> I figured those were OK to apply.
>
> I think it would be good to do that.  At least give people a chance to
> comment, if they care.

+1. It's also a good way for people to get a bit more involved in the code.



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark  wrote:
> I don't think it's a big cost once all the processes
> have been forked if you're reusing them beyond perhaps slightly more
> efficient cache usage.

Hm, this site claims to get a 13% win just from the reduced tlb misses
using a preload hack with Pg 8.2. That would be pretty substantial.

http://oss.linbit.com/hugetlb/


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 3:01 PM, Bruce Momjian  wrote:
> One more question --- should I be sending pg_upgrade patches to the list
> for approval?  The restructuring patch was large and didn't seem
> necessary to post, and the speedups were tested by the bug reporter, so
> I figured those were OK to apply.

I think it would be good to do that.  At least give people a chance to
comment, if they care.

> Oh, and do we want to move pg_upgrade into /bin for 9.1?  There was
> discussion about that six months ago.

I would be inclined to leave it in contrib for a few more releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Greg Stark
On Wed, Oct 20, 2010 at 7:10 AM, Tom Lane  wrote:
> I believe that for the equivalent Solaris option, we just automatically
> enable it when available.  So there'd be no need for user documentation.
> However, I definitely *would* like to see some benchmarks proving that
> the change actually does something useful.  I've always harbored the
> suspicion that this is just a knob to satisfy people who need knobs to
> frob.

Well saving a few megabytes of kernel space memory isn't a bad thing.
But I think the major effect is on forking new processes. Having to
copy that page map is a major cost when you're talking about very
large memory footprints. While machine memory has gotten larger the 4k
page size hasn't. I don't think it's a big cost once all the processes
have been forked if you're reusing them beyond perhaps slightly more
efficient cache usage.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: extensible enums

2010-10-20 Thread David Fetter
On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
> On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan  wrote:
> > Well a bit more testing shows some benefit. I've sorted out a few kinks, so
> > this seems to work. In particular, with the above tables, the version
> > imported from 9.0 can create have an index created in about the same time as
> > on the fresh table (identical data, but all even numbered Oids).
> >
> > Of course, with lots of odd numbered Oids, if a label gets added the
> > imported version will degrade in performance much more quickly.
> 
> I'm quite impressed by the amount of time and thought being put into
> optimizing this.  I didn't realize people cared so much about enum
> performance; but it's good that they do.
> 
> I hope to see more such efforts in other parts of the system.

Which parts of the system, in particular, do you have in mind?  Other
people from EDB have mentioned that slimming down the on-disk
representation was one such target.  What other ones would you see as
needing such attention?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade patch application process, and move to /bin?

2010-10-20 Thread Bruce Momjian
Bruce Momjian wrote:
> I received a private email report yesterday from someone using
> pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to
> upgrade a database with 150k tables.  Yes, that is a lot of tables, but
> pg_upgrade should be able to do better than that.
> 
> I have modified pg_upgrade in git master to cache scandir() and reduce
> array lookups and the time is down to 38 minutes.  (He prototyped a hash
> implementation that was 30 minutes but it was too much code for my
> taste.)
> 
> I don't think this is reasonable to backpatch.   If anyone else sees
> cases for pg_upgrade improvement, please let me know.

One more question --- should I be sending pg_upgrade patches to the list
for approval?  The restructuring patch was large and didn't seem
necessary to post, and the speedups were tested by the bug reporter, so
I figured those were OK to apply.

Oh, and do we want to move pg_upgrade into /bin for 9.1?  There was
discussion about that six months ago.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade performance with 150k tables

2010-10-20 Thread Bruce Momjian
I received a private email report yesterday from someone using
pg_upgrade with PG 9.0 who found it took five hours for pg_upgrade to
upgrade a database with 150k tables.  Yes, that is a lot of tables, but
pg_upgrade should be able to do better than that.

I have modified pg_upgrade in git master to cache scandir() and reduce
array lookups and the time is down to 38 minutes.  (He prototyped a hash
implementation that was 30 minutes but it was too much code for my
taste.)

I don't think this is reasonable to backpatch.   If anyone else sees
cases for pg_upgrade improvement, please let me know.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Aidan Van Dyk
On Wed, Oct 20, 2010 at 1:09 PM, Tom Lane  wrote:

> [ thinks for a bit... ]  Perhaps we could stick some sort of unique
> ID into tables, which could be correlated to the same unique ID
> appearing in a metadata fork.

It would be awfully nice if the table name was in the first few bytes
of the segments ;-)

> [ thinks some more... ]  Of course, at this point you have to start
> asking why the metadata fork should be thought to be any more reliable
> than the system catalogs.

It's not.  But it's about data duplication.  If you've lost some
files, and others are found in lost+found, etc.  Or maybe you lost
only the "main" tablespace, and you have your big-honking-files from
another table-space around, etc.

The more copies of the data around, the more chance you have of being
able to get something usable from of it.

But we already have WAL archive as an external safe backup copy of *everything*.

Maybe the cost of those extra forks/duplicates is too much.  Maybe it
would be cheaper to try and parse the WAL archive, and just
specifically rebuild the system catalogs for the couple of times
people actually need this type of data?

Or maybe a query/function that ultra paranoid types can run daily,
etc, which would show the system catalog information about table
storage format that the could save safely instead of munging GB of WAL
logs when they want it...

> You can bet that there'll be somebody out there who whines because their
> app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
> grows forever.  I think we'd have to be a bit smarter than this.

Well, for bad habits, we have an equally bad solution ;-)
BEGIN;
ALTER TABLE bad_design RENAME TO bad_design_too_old;
CREATE TABLE bad_design LIKE bad_design_too_old INCLUDING DEFAULTS
CONSTRAINTS INDEXES;
INSERT INTO bad_design SELECT * FROM bad_design_too_old;
DROP TABLE bad_design_too_old;
COMMIT;

It's all about how much extra stuff do we want "around" to make
forensic/reconstruction type work easier when they can't go to the
documented, tried, tested, "normal restore from backup/WAL".

None?  Or as much as possible?  And what are the tradeoffs.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Andrew Dunstan



On 10/20/2010 01:15 PM, Florian Weimer wrote:

* Terry Laurenzo:


Agreed.  BSON was born out of implementations that either lacked
arbitrary precision numbers or had a strong affinity to an
int/floating point way of thinking about numbers.  I believe that if
BSON had an arbitrary precision number type, it would be a proper
superset of JSON.

But JSON has only double-precision numbers!?


AFAICT the JSON spec says nothing at all about the precision of numbers. 
It just provides a syntax for them. We should not confuse what can be 
allowed in JSON with what can be handled by some consumers of JSON such 
as ECMAScript.


However, since we would quite reasonably require that any JSON 
implementation be able to handle arbitrary precision numbers, that 
apparently rules out BSON as a storage engine for it, since BSON can not 
handle such things.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-20 Thread Florian Weimer
* Terry Laurenzo:

> Agreed.  BSON was born out of implementations that either lacked
> arbitrary precision numbers or had a strong affinity to an
> int/floating point way of thinking about numbers.  I believe that if
> BSON had an arbitrary precision number type, it would be a proper
> superset of JSON.

But JSON has only double-precision numbers!?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Tom Lane
Aidan Van Dyk  writes:
> On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane  wrote:
>> I can see the potential usefulness of a self-documenting table storage
>> format, but this proposal isn't that; it's just an unreliable kluge.

> If we're looking to have any sort of "out of catalog" documentation of
> table storage format, what about just having a new relation fork that
> just "appends" each and every change made to the table formats,
> including ones rolled back, etc.

I thought of that too, but I'm not sure if it's going to help enough.
The trouble is that the information is only tied to the table itself
via file names.  In a fsck-recovery scenario, you may not have the
correct file names.  (One of the multiple problems with Stephen's
proposal is that the metadata would only be physically tied to the
first segment of a large table, and unless you can trust the file
names you won't be able to match it to the other segments.)

[ thinks for a bit... ]  Perhaps we could stick some sort of unique
ID into tables, which could be correlated to the same unique ID
appearing in a metadata fork.

[ thinks some more... ]  Of course, at this point you have to start
asking why the metadata fork should be thought to be any more reliable
than the system catalogs.

> Make this relation fork append only,
> and dump a completely new set of metadata to it each and every ALTER
> TABLE.

You can bet that there'll be somebody out there who whines because their
app does lots and lots of repeated ALTER TABLEs, and so the metadata fork
grows forever.  I think we'd have to be a bit smarter than this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-20 Thread Robert Treat
On Tue, Oct 19, 2010 at 11:16 AM, Greg Smith  wrote:

> Josh Berkus wrote:
>
>> It is critical that we make replication easier to set up, administrate and
>> monitor than it currently is.  In my conversations with people, this is more
>> important to our users and the adoption of PostgreSQL than synchronous
>> replication is.
>
>



> I find this launch into a new round of bike-shedding a bit distracting.  If
> you want this to be easier to use, which it's obvious to any observer it
> should be because what's delivered in 9.0 is way too complicated, please
> work on finding development resources to assign to that problem.  Because
> that's the bottleneck on simplifying things, not ideas about what to do.  I
> would recommend finding or assigning a developer to work on integrating base
> backup in to the streaming protocol as the biggest single thing that would
> improve the built-in replication.  All of the rest of the trivia about what
> knobs to set and such are tiny details that make for only a minor
> improvement until that's taken care of.
>
>
Yeah, I'm sure we all think it should be easier, but figuring out what that
means is certainly a moving target. The idea of being able to create a base
backup automagically sounds good, but comparatively it's
not significantly more difficult than what many other systems make you do,
and actually if done incorrectly could be something rather limiting. On the
whole the customers we are talking with are far more concerned about things
like managing failover scenarios when you have multiple slaves, and it's the
lack of capabilities around those kinds of things that hurt postgres
adoption much more than it being hard to set up.


Robert Treat
play: http://www.xzilla.net
work: http://omniti.com/is/hiring


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Aidan Van Dyk
On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane  wrote:
> "Stephen R. van den Berg"  writes:
>> It's just that matching table and file, and subsequently figuring out
>> some missing columns which may have been added/removed later,
>> can be rather timeconsuming and could be made a lot easier (not necessarily
>> perfect) if that information would have been present in the first page of
>> a file.
>
> So you've already moved the goalposts from what was claimed in your
> prior message.  If the data is not maintained (with 100% reliability)
> during ALTER TABLE, how are you going to do something like "figure out
> missing columns"?
>
> I can see the potential usefulness of a self-documenting table storage
> format, but this proposal isn't that; it's just an unreliable kluge.

If we're looking to have any sort of "out of catalog" documentation of
table storage format, what about just having a new relation fork that
just "appends" each and every change made to the table formats,
including ones rolled back, etc.  Make this relation fork append only,
and dump a completely new set of metadata to it each and every ALTER
TABLE.  This fork would never need to be read by PG, so a relation
fork might even be too much.  All you really need is a file you can
tie to a relation, and blindly append "data" to on create/alter
statements.

Sure, it will have more information than *needed*, but I can't see it
ever growing too big, and people doing forensics rarely complain about
having *too much* information available.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pgcrypto: Test for NULL before dereferencing pointer

2010-10-20 Thread Marti Raudsepp
Hi pgsql-hackers,

Currently contrib/pgcrypto/pgp-pubenc.c contains code like:

uint8 algo = pk->algo;
if (pk == NULL)
...

However, if pk was NULL, then the if() condition would never be
reached because the pk->algo dereference would segfault.

This patch moves the dereference to below the condition which was the
intended behavior.

Regards,
Marti
From a2500cda9e0e82883854a412ea12942e174e3dd2 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp 
Date: Wed, 20 Oct 2010 18:32:17 +0300
Subject: [PATCH] pgcrypto: Test for NULL before dereferencing pointer

If pk is NULL, the backend would segfault when accessing ->algo and the
following NULL check was never reached.

This problem was found by Coccinelle (null_ref.cocci from coccicheck)
---
 contrib/pgcrypto/pgp-pubenc.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/contrib/pgcrypto/pgp-pubenc.c b/contrib/pgcrypto/pgp-pubenc.c
index 4b4d1bf..943d2e4 100644
--- a/contrib/pgcrypto/pgp-pubenc.c
+++ b/contrib/pgcrypto/pgp-pubenc.c
@@ -199,7 +199,7 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst)
 	PGP_PubKey *pk = ctx->pub_key;
 	uint8		ver = 3;
 	PushFilter *pkt = NULL;
-	uint8		algo = pk->algo;
+	uint8		algo;
 
 	if (pk == NULL)
 	{
@@ -207,6 +207,8 @@ pgp_write_pubenc_sesskey(PGP_Context *ctx, PushFilter *dst)
 		return PXE_BUG;
 	}
 
+	algo = pk->algo;
+
 	/*
 	 * now write packet
 	 */
-- 
1.7.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-20 Thread Tom Lane
Greg Stark  writes:
> On Tue, Oct 19, 2010 at 4:12 PM, Tom Lane  wrote:
>> I think they're more alike than you think.  If we had the infrastructure
>> to do local temp tables this way, it'd be pretty easy to use that to
>> instantiate per-backend copies of global temp tables.  (The global
>> entities would be templates, not actual tables.)

> Sure, but I think the idea was that you should be able to create
> temporary tables from whole cloth on the slave. Since they're local to
> the backend they never have to be stored on disk so logically from the
> user's point of view it seems like it should be possible.

The way I'm envisioning this working is that you instantiate temporary
child tables of all the system catalogs that are needed to describe
tables.  Being system catalogs, their schemas never change, so you don't
have a problem there.  Then you use these children to store the catalog
entries describing user temp tables.  Whether those temp tables are
instantiations of spec-style global temp tables, or our current flavor
of local temp tables, won't matter.

I think it's pointless to speculate about whether we might have divvied
up the meta-information about tables differently if we'd foreseen
wanting to do this.  It is what it is, and there is *way* too much code
depending on it, both inside the backend and in clients.  Any
reimplementation of temp tables will still have to expose largely the
same catalog information that exists for tables now.  We can probably
get away with marginal changes like redefining relfilenode, but we can't
avoid providing catalog entries that describe the schema and statistics
of a temp table.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane  writes:
> That is simply a horrid idea.  Just make it specify EXTENSION.

Black magic it is, will remove in v7.

> Is there any sane use-case for the control file to not be named the same
> as the extension?  It seems like that would accomplish little except to
> sow confusion.

The goal of the 3 variables EXTENSION, EXTVERSION, EXTCOMMENT is to
prepare the control file with 3 lines formatted variable = 'value'. If
you specify CONTROL instead, that should be the file name you're
providing directly.

It then grew up into being a directive to produce the right file set for
spi, but the simplest thing would be to hand prepare the files there. If
you agree with that, that's what I'll do in v7.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Dimitri Fontaine  writes:
> Tom Lane  writes:
>> I don't think that "no changes to the makefiles" is a requirement,
>> or even a wish-list item, for this.  I think it's perfectly reasonable
>> for the makefile to have to specify the module name; far better that
>> than that we get the name by some "magic" or other.

> It seemed easy to get a reasonable approach requiring very few edits in
> contribs so I favoured that. Now, it's still entirely possible to hand
> adjust. Determining the extension name automatically from DATA_built or
> DATA is only done where EXTENSION has not been provided,

That is simply a horrid idea.  Just make it specify EXTENSION.

> and guessing
> the CONTROL file name from the EXTENSION name only occurs when CONTROL
> has not been provided.

Here, on the other hand, I'm wondering why have two variables at all.
Is there any sane use-case for the control file to not be named the same
as the extension?  It seems like that would accomplish little except to
sow confusion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane  writes:
>> and use the equivalent of SET LOCAL in the CREATE EXTENSION code?
>
> I had assumed that that was how he was doing it ...

I'm currently doing:
SetConfigOption("client_min_messages", "warning", PGC_SUSET, 
PGC_S_SESSION);

And then manually reverting to what was there before the command:
SetConfigOption("client_min_messages", old_cmsgs, PGC_SUSET, 
PGC_S_SESSION);

The thing is that CREATE EXTENSION can be part of a transaction, so even
SET LOCAL ain't going to work here, we need to reset before continuing
the transaction. I don't know that SET LOCAL is RESET after a savepoint,
so we would still need to care about that "by hand", right?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 18:06, Tatsuo Ishii wrote:

This returns 'false' if you're in hot standby mode running against an
archive. That seems wrong, I don't think the walreceiver state should
play any role in this.


Apart this, I wonder why walsender/walreceiver do not transfer archive
logs as well.


What do you mean?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane  writes:
> I don't think that "no changes to the makefiles" is a requirement,
> or even a wish-list item, for this.  I think it's perfectly reasonable
> for the makefile to have to specify the module name; far better that
> than that we get the name by some "magic" or other.

It seemed easy to get a reasonable approach requiring very few edits in
contribs so I favoured that. Now, it's still entirely possible to hand
adjust. Determining the extension name automatically from DATA_built or
DATA is only done where EXTENSION has not been provided, and guessing
the CONTROL file name from the EXTENSION name only occurs when CONTROL
has not been provided.

Of course if those changes (inlined there after) are seen as a bad idea,
then I will change all contrib Makefiles to add EXTENSION, EXTVERSION
(which always is MAJORVERSION here) and CONTROL (which almost always is
EXTENSION.control).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

# create extension support
ifndef CONTROL
ifndef EXTENSION
ifdef DATA_built
EXTENSION = $(basename $(notdir $(firstword $(DATA_built
else ifdef DATA
EXTENSION = $(basename $(notdir $(firstword $(DATA
endif # DATA_built
endif # EXTENSION
ifndef EXTVERSION
EXTVERSION = $(MAJORVERSION)
endif
ifdef EXTENSION
CONTROL = $(EXTENSION).control
endif # EXTENSION
endif # CONTROL

control:
# create .control to keep track that we created the control file(s)
@for file in $(CONTROL); do \
  test -f `basename $$file .control`.sql -a ! -f $$file && touch 
.control || true ; \
  if [ -f .control ]; then \
if [ -n "$(EXTENSION)" ]; then \
(echo "name = '$(EXTENSION)'"; echo "version = 
'$(EXTVERSION)'") > $$file ; \
else \
(echo "name = '`basename $$file .control`'"; echo "version = 
'$(EXTVERSION)'") > $$file ; \
fi ; \
if [ -n "$(EXTCOMMENT)" ]; then echo "comment = '$(EXTCOMMENT)'" >> 
$$file ; fi ; \
  fi ; \
done

install: all installdirs control
ifneq (,$(DATA)$(DATA_built)$(CONTROL))
@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) $(CONTROL); 
do \
  echo "$(INSTALL_DATA) $$file 
'$(DESTDIR)$(datadir)/$(datamoduledir)'"; \
  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
done
endif # DATA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation patch

2010-10-20 Thread Kevin Grittner
Robert Haas  wrote:
> On Tue, Oct 19, 2010 at 6:28 PM, Kevin Grittner
>  wrote:
>> One thing that would work, but I really don't think I like it, is
>> that a request for a snapshot for such a transaction would not
>> only block until it could get a "clean" snapshot (no overlapping
>> serializable non-read-only transactions which overlap
>> serializable transactions which wrote data and then committed in
>> time to be visible to the snapshot being acquired), but it would
>> *also* block *other* serializable transactions, if they were
>> non-read-only, on an attempt to acquire a snapshot.
> 
> This seems pretty close to guaranteeing serializability by running
> transactions one at a time (i.e. I don't think it's likely to be
> acceptable from a performance standpoint).
 
It absolutely makes no sense except for long-running read-only
transactions, and would only be used when explicitly requested; and
like I said, I really don't think I like it even on that basis --
just putting it out there as the only alternative I've found so far
to either tolerating possible serialization anomalies in pg_dump
output (albeit only when compared to the state the database reached
after the dump's snapshot) or waiting indefinitely for a clean
snapshot to become available.
 
FWIW from a brainstorming perspective, while waiting for problem
transactions to clear so we could get a clean snapshot for the dump
I think it would work even better to block the *commit* of
serializable transactions which *had done* writes than to block
snapshot acquisition for serializable transactions which were not
read-only.  Still pretty icky, though.  I am loathe to compromise
the "no new blocking" promise of SSI.
 
[thinks]
 
Actually, maybe we can reduce the probability of needing to retry
at each iteration of the non-blocking alternative by checking the
conflict information for the problem transactions after they commit.
Any transaction which didn't *actually* generate a read-write
conflict out to a transaction visible to the dump's candidate
snapshot could not cause an anomaly.  If none of the problem
transactions actually generates a rw-conflict we can use the
candidate snapshot.  Adding that logic to the non-blocking
alternative might actually work pretty well.
 
There might be some workloads where conflicts would be repeatedly
generated, but there would be a lot where they wouldn't.  If we add
a switch to pg_dump to allow users to choose, I think this algorithm
works.  It never affects a transaction unless it has explicitly
requested SERIALIZABLE READ ONLY DEFERRABLE, and the only impact is
that startup may be deferred until a snapshot can be acquired which
ensures serializable behavior without worrying about SIRead locks.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Tatsuo Ishii
> This returns 'false' if you're in hot standby mode running against an
> archive. That seems wrong, I don't think the walreceiver state should
> play any role in this.

Apart this, I wonder why walsender/walreceiver do not transfer archive
logs as well.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:53 AM, Alvaro Herrera
 wrote:
> Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:
>
>> Actually, I think the best thing for default_statistics_target might
>> be to scale the target based on the number of rows in the table, e.g.
>> given N rows:
>>
>> 10 + (N / 1000), if N < 40,000
>> 46 + (N / 1), if 50,000 < N < 3,540,000
>> 400, if N > 3,540,000
>>
>> Consider a table with 2,000 rows.  With default_statistics_target =
>> 100, we can store up to 100 MCVs; and we break the remaining ~1900
>> values up into 100 buckets with 19 values/bucket.
>
> Maybe what should be done about this is to have separate sizes for the
> MCV list and the histogram, where the MCV list is automatically sized
> during ANALYZE.

I thought about that, but I'm not sure there's any particular
advantage.  Automatically scaling the histogram seems just as useful
as automatically scaling the MCV list - both things will tend to
reduce the estimation error.   For a table with 2,000,000 rows,
automatically setting the statistics target from 100 to the value that
would be computed by the above formula, which happens to be 246, will
help the 101th-246th most common values, because they will now be
MCVs.   It will also help all the remaining values, both because
you've pulled 146 fairly common values out of the histogram buckets
and also because each bucket now contains ~8130 values rather than
~20,000.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010:
>> Using SPI to execute the extension's script already means that it can
>> not contain explicit BEGIN and COMMIT commands. Now, is it possible to
>> force a Reset of all GUCs after script's execution?

> Would it work to force a new transaction internally in CREATE EXTENSION,

No, but maybe a savepoint?

> and use the equivalent of SET LOCAL in the CREATE EXTENSION code?

I had assumed that that was how he was doing it ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:

> Actually, I think the best thing for default_statistics_target might
> be to scale the target based on the number of rows in the table, e.g.
> given N rows:
> 
> 10 + (N / 1000), if N < 40,000
> 46 + (N / 1), if 50,000 < N < 3,540,000
> 400, if N > 3,540,000
> 
> Consider a table with 2,000 rows.  With default_statistics_target =
> 100, we can store up to 100 MCVs; and we break the remaining ~1900
> values up into 100 buckets with 19 values/bucket.

Maybe what should be done about this is to have separate sizes for the
MCV list and the histogram, where the MCV list is automatically sized
during ANALYZE.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:35 AM, Heikki Linnakangas
 wrote:
> On 20.10.2010 17:32, Tatsuo Ishii wrote:

 pg_is_in_recovery() returns a bool, are you proposing to change that?
>>>
>>> No. I just thought about adding more condition when it returns true.
>>
>> Here is the patch. Comments are welcome!
>> ...
>>   Datum
>>   pg_is_in_recovery(PG_FUNCTION_ARGS)
>>   {
>> !       /* use volatile pointer to prevent code rearrangement */
>> !       volatile WalRcvData *walrcv = WalRcv;
>> !
>> !       PG_RETURN_BOOL(RecoveryInProgress()&&  walrcv->walRcvState ==
>> WALRCV_RUNNING);
>>   }
>>
>
> This returns 'false' if you're in hot standby mode running against an
> archive. That seems wrong, I don't think the walreceiver state should play
> any role in this.

I think what we need is a second function, not a change to the
definition of this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] leaky views, yet again

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:00 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I get the impression that you think that there's a problem not only
>> with the approach but with any approach whatsoever to that underlying
>> problem.
>
> Let's just say that the approaches proposed so far have performance
> and/or functionality and/or code maintenance penalties that are utterly
> unacceptable from the standpoint of people who don't need RLS.  I don't
> know if there is a workable solution, but I do know I've not seen one.
>
>> With respect to selectivity estimation, do we have a live bug there
>> now?
>
> No, I don't believe so.  Given that you'd like to get the planner to
> call function XYZ, you could create an operator using XYZ and attach to
> it one of the estimation functions that will actually call the
> underlying function --- but you have to have call permission on the
> function in order to create the operator.

But suppose the operator already exists, but I don't have permission
to call the underlying function... can I get the planner to call the
function for me by EXPLAIN-ing the right query?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus arrays versus typmods

2010-10-20 Thread Robert Haas
On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane  wrote:
>>> We've already accepted the cost of doing getBaseTypeAndTypmod() in a
>>> whole lot of performance-critical parsing paths, on the off chance that
>>> the target datatype might be a domain.  It's not apparent to me that
>>> array subscripting is so important as to deserve an exemption from that.
>>> Especially when not doing so doesn't work.
>
>> Hmm... so are there no cases where zeroing out the typelem will cost
>> us an otherwise-unnecessary syscache lookup?
>
> My point is that anyplace that is relying on the surface typelem,
> without drilling down to see what the base type is, is wrong.
> So yeah, those lookups are (will be) necessary.

OK.  In that case, +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Dimitri Fontaine  writes:
> Tom Lane  writes:
>> If the extensions manager is dependent on the assumption that a module's
>> name matches the name of the directory it's built in

> It is not. There's some magic for simple cases so that contrib mostly
> "works" with no editing, but of course, that's only mostly.

> The version Itakagi-San worked with had not a single change to the
> contrib sources,

I don't think that "no changes to the makefiles" is a requirement,
or even a wish-list item, for this.  I think it's perfectly reasonable
for the makefile to have to specify the module name; far better that
than that we get the name by some "magic" or other.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of mié oct 20 07:22:53 -0300 2010:
> Itagaki Takahiro  writes:

> >  CREATE EXTENSION command 
> > * Environment could be modified by the installer script.
> > =# SHOW search_path; => "$user",public
> > =# CREATE EXTENSION dblink;
> > =# SHOW search_path; => public
> > because almost all of the modules have SET search_path in the scripts:
> >   -- Adjust this setting to control where the objects get created.
> >   SET search_path = public;
> >
> > Is is an intended behavior? Personally, I want the installer to run in 
> > sandbox.
> > One idea is to rewrite module scripts to use BEGIN - SET LOCAL - COMMIT,
> > but we cannot execute CREATE EXTENSION in transaction if do so.
> 
> Using SPI to execute the extension's script already means that it can
> not contain explicit BEGIN and COMMIT commands. Now, is it possible to
> force a Reset of all GUCs after script's execution?

Would it work to force a new transaction internally in CREATE EXTENSION,
and use the equivalent of SET LOCAL in the CREATE EXTENSION code?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Stephen R. van den Berg
Tom Lane wrote:
>"Stephen R. van den Berg"  writes:
>> It's just that matching table and file, and subsequently figuring out
>> some missing columns which may have been added/removed later,
>> can be rather timeconsuming and could be made a lot easier (not necessarily
>> perfect) if that information would have been present in the first page of
>> a file.

>So you've already moved the goalposts from what was claimed in your
>prior message.  If the data is not maintained (with 100% reliability)
>during ALTER TABLE, how are you going to do something like "figure out
>missing columns"?

Most alter table operations are well thought through and rarely undone
(at least not on production databases).  This means that most tables
can be restored.

>I can see the potential usefulness of a self-documenting table storage
>format, but this proposal isn't that; it's just an unreliable kluge.

Restoring tables/databases from table storage only is, by definition,
an unreliable kludge.  I'm not opposed to making the definition storage
more robust, but, since the records in the table already have lost their
relation to the pg_clog records, and therefore it *already* is uncertain
which records were deleted and/or have the wrong number of columns, it
seems to be a needless waste of time and energy to provide more reliable
information about the column structure.

I know for a fact that those who have lost data in such a way, and are
faced with the option to have this "unreliable kludgy information"
available now, or wait for a few years/months until a reliable solution
is present; they would (in every single case) opt for the former and
get at least some (if not all) of their data back in a shorter amount
of time.
-- 
Stephen.

Life is that brief interlude between nothingness and eternity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Tom Lane
Heikki Linnakangas  writes:
> Going from wal_level='minimal' to 'archivë́' incurs the penalty on 
> WAL-logging COPY etc. That's a big penalty. However, the difference 
> between wal_level='archive' and wal_level='hot_standby' should be tiny.

I'm not sure I believe that either, because of the costs associated with
logging lock acquisitions.

We really need some actual benchmarks in this area, rather than
handwaving ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 17:32, Tatsuo Ishii wrote:

pg_is_in_recovery() returns a bool, are you proposing to change that?


No. I just thought about adding more condition when it returns true.


Here is the patch. Comments are welcome!
...
   Datum
   pg_is_in_recovery(PG_FUNCTION_ARGS)
   {
!   /* use volatile pointer to prevent code rearrangement */
!   volatile WalRcvData *walrcv = WalRcv;
!
!   PG_RETURN_BOOL(RecoveryInProgress()&&  walrcv->walRcvState == 
WALRCV_RUNNING);
   }



This returns 'false' if you're in hot standby mode running against an 
archive. That seems wrong, I don't think the walreceiver state should 
play any role in this.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Heikki Linnakangas

On 20.10.2010 17:19, Tom Lane wrote:

Greg Smith  writes:

Josh Berkus wrote:

Well, now that you mention it, I also think that "hot standby" should be
the default.  Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.



... But much like
default_statistics_target, there needs to be some more formal work done
on quantifying just how bad each of these overheads really are first.


Quite.  Josh, have you got any evidence showing that the penalty is
only 10%?  There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose.  I'm not sure what the
penalty for "typical" workloads is, partly because I'm not sure what
should be considered a "typical" workload for this purpose.


Going from wal_level='minimal' to 'archivë́' incurs the penalty on 
WAL-logging COPY etc. That's a big penalty. However, the difference 
between wal_level='archive' and wal_level='hot_standby' should be tiny.


The big reason for separating those two in 9.0 was that it's all new 
code with new ways to fail and, yes, new bugs. It's not smart to expose 
people who are not interested in using hot standby to those issues. But 
maybe we feel more comfortable merging 'archive' and 'hot_standby' 
levels in 9.1.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL and HugePage

2010-10-20 Thread Kenneth Marshall
On Wed, Oct 20, 2010 at 10:10:00AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Oct 19, 2010 at 11:30 PM, daveg  wrote:
> >> On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote:
> >>> Heh - provided you specify
> >>> SHM_HUGETLB
> >>> in the relevant call that is :-)
> 
> >> I had a patch for this against 8.3 that I could update if there is any
> >> interest. I suspect it is helpful.
> 
> > I think it would be a good feature.  Of course, we would need
> > appropriate documentation, and some benchmarks showing that it really
> > works.
> 
> I believe that for the equivalent Solaris option, we just automatically
> enable it when available.  So there'd be no need for user documentation.
> However, I definitely *would* like to see some benchmarks proving that
> the change actually does something useful.  I've always harbored the
> suspicion that this is just a knob to satisfy people who need knobs to
> frob.
> 
>   regards, tom lane
> 

Oracle apparently uses hugepages if they are available by first trying
with the SHM_HUGETLB option. If it fails, they reissue the command
without that option. This article does mention some of the benefits
of the larger pagesizes with large shared memory regions:

http://appcrawler.com/wordpress/?p=686

Regard,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 20, 2010 at 6:22 AM, Dimitri Fontaine
>  wrote:
>> In v6 patch, should client_min_messages or log_min_messages be lower
>> than WARNING, they get set to WARNING for the script install context. We
>> still dump the extension's script at each WARNING, but you can set your
>> client_min_messages (and log_min_messages) to ERROR before hand.

> I would vote for overriding client_min_messages but not log_min_messages.

Why?  The problem with unreasonably bulky messages is just as
objectionable for the log.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to reliably detect if it's a promoting standby

2010-10-20 Thread Tatsuo Ishii
>> pg_is_in_recovery() returns a bool, are you proposing to change that?
> 
> No. I just thought about adding more condition when it returns true.

Here is the patch. Comments are welcome!

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 5604,5610  GetLatestXTime(void)
  Datum
  pg_is_in_recovery(PG_FUNCTION_ARGS)
  {
!   PG_RETURN_BOOL(RecoveryInProgress());
  }
  
  /*
--- 5604,5613 
  Datum
  pg_is_in_recovery(PG_FUNCTION_ARGS)
  {
!   /* use volatile pointer to prevent code rearrangement */
!   volatile WalRcvData *walrcv = WalRcv;
! 
!   PG_RETURN_BOOL(RecoveryInProgress() && walrcv->walRcvState == 
WALRCV_RUNNING);
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rawdump

2010-10-20 Thread Tom Lane
"Stephen R. van den Berg"  writes:
> It's just that matching table and file, and subsequently figuring out
> some missing columns which may have been added/removed later,
> can be rather timeconsuming and could be made a lot easier (not necessarily
> perfect) if that information would have been present in the first page of
> a file.

So you've already moved the goalposts from what was claimed in your
prior message.  If the data is not maintained (with 100% reliability)
during ALTER TABLE, how are you going to do something like "figure out
missing columns"?

I can see the potential usefulness of a self-documenting table storage
format, but this proposal isn't that; it's just an unreliable kluge.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Dimitri Fontaine
Tom Lane  writes:
> If the extensions manager is dependent on the assumption that a module's
> name matches the name of the directory it's built in

It is not. There's some magic for simple cases so that contrib mostly
"works" with no editing, but of course, that's only mostly.

The version Itakagi-San worked with had not a single change to the
contrib sources, I've only begun to change things there (in v6) with the
spi case, that now produces 5 extensions control files out of a single
Makefile, thanks to this single new line:

CONTROL = $(addsuffix .control, $(MODULES))

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-20 Thread Tom Lane
Greg Smith  writes:
> Josh Berkus wrote:
>> Well, now that you mention it, I also think that "hot standby" should be
>> the default.  Yes, I know about the overhead, but I also think that the
>> number of our users who want easy replication *far* outnumber the users
>> who care about an extra 10% WAL overhead.

> ... But much like 
> default_statistics_target, there needs to be some more formal work done 
> on quantifying just how bad each of these overheads really are first.  

Quite.  Josh, have you got any evidence showing that the penalty is
only 10%?  There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose.  I'm not sure what the
penalty for "typical" workloads is, partly because I'm not sure what
should be considered a "typical" workload for this purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-10-20 Thread Tom Lane
Itagaki Takahiro  writes:
> On Wed, Oct 20, 2010 at 12:58 PM, Alvaro Herrera
>  wrote:
>> Lets rename the directory.

> Hmmm, but we call it 'xml2' in the doc. There is no 'pgxml' at all in it.
> http://developer.postgresql.org/pgdocs/postgres/xml2.html

> However, I don't think we can change the module name because pg_upgrade
> will fail if the module (.so) name was changed. So, it might be the
> point of compromise to keep two names until we deprecate it completely.

If the extensions manager is dependent on the assumption that a module's
name matches the name of the directory it's built in, that assumption
needs to be removed anyway.  There are too many use-cases where that
wouldn't hold, even if we try to force the standard contrib modules to
follow such a rule.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >