date:20130404

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Boszormenyi Zoltan


2013-04-03 20:58 keltezéssel, Gavin Flower írta:

On 04/04/13 05:36, David E. Wheeler wrote:

On Apr 3, 2013, at 9:30 AM, Tom Lane  wrote:


Fortran ... Basic ... actually I'd have thought that zero was a
minority position.  Fashions change I guess.

I say we turn the default lower bound up to 11.

David


In keeping with the level of irrationality in this thread, maybe we should set it to an 
irrational number like the square root of 2, or transcend our selves and make in a 
transcendental number like pi!  :-)


I suppose using the square root of minus one would be consider too 
imaginative???  :-)


Nah, that would make arrays have 2 dimensions as a minimum... :-)




Cheers,
Gavin



--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-04 Thread Amit Kapila

On Thursday, April 04, 2013 2:52 AM Robert Haas wrote:
> On Wed, Apr 3, 2013 at 2:54 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Tue, Apr 2, 2013 at 12:19 PM, Peter Eisentraut 
> wrote:
> >>> It's weird that SET LOCAL and SET SESSION actually *set* the value,
> and
> >>> the second key word determines how long the setting will last.  SET
> >>> PERSISTENT doesn't actually set the value.  I predict that this
> will be
> >>> a new favorite help-it-doesn't-work FAQ.
> >
> >> I think this is another argument against this particular syntax.  I
> >> have always thought that something along the lines of ALTER SYSTEM
> >> would be more appropriate.  ALTER DATABASE .. SET and ALTER ROLE ..
> >> SET don't change the value immediately either, and nobody gets
> >> confused about that to my knowledge.  But I can see where SET
> >> PERSISTENT could cause that sort of confusion.
> >
> > Yeah, I think I argued for using the SET syntax to start with, but
> > I'm coming around to the position that SET PERSISTENT is too much
> > unlike the behavior of other varieties of SET.  ALTER is sounding
> > more attractive to me now.  Not sure about "ALTER SYSTEM" in
> particular
> > though --- it's not clear that that has any real merit other than
> > already existing as a keyword.  (Not that that's negligible.)
> > ALTER CONFIGURATION is another alternative using an existing keyword
> > that might be worth considering.
> 
> Yeah, I thought about something like that.  Aside from saving on
> keywords, the reason I like ALTER SYSTEM or similar is that I suspect
> there will be other system-wide things that we may want to let people
> ALTER in the future, so I think that route might avoid an unnecessary
> proliferation of top-level commands.  I am not, however, deadly
> attached to the idea, if someone's got a good reason for preferring
> something else.

I think second parameter in SET command telling the scope should be fine. As
I could see Oracle
also has similar syntax for it's ALTER SYSTEM Command (Alter System Scope
[Memory|Spfile|Both]). 
Description in short:
SPFILE indicates that the change is made in the server parameter file. The
new 
setting takes effect when the database is next shut down and started up
again.
MEMORY indicates that the change is made in memory, takes effect
immediately, 
and persists until the database is shut down.

The only reason to show above example is that second parameter telling Scope
exists in other databases as well.


However if you are not convinced with above reasoning, then the Alter syntax
can be as follows:
ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

With Regards,
Amit Kapila.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Amit Kapila

On Thursday, April 04, 2013 7:19 AM Greg Smith wrote:
> On 4/2/13 11:54 AM, Robert Haas wrote:
> > But, having said that, I still think the best idea is what Andres
> > proposed, which pretty much matches my own thoughts: the bgwriter
> > needs to populate the free list, so that buffer allocations don't
> have
> > to wait for linear scans of the buffer array.
> 
> I was hoping this one would make it to a full six years of being on the
> TODO list before it came up again, missed it by a few weeks.  The
> funniest part is that Amit even submitted a patch on this theme a few
> months ago without much feedback:
> http://www.postgresql.org/message-
> id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
>   That stalled where a few things have, on a) needing more regression
> test workloads, and b) wondering just what the deal with large
> shared_buffers setting degrading performance was.

For "b)", below are links where it decreased due to large shared buffers.

http://www.postgresql.org/message-id/attachment/27489/Results.htm
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C38285442C
5@szxeml509-mbx

As per my observation, it occur when I/O starts. The dip could be due to
fluctuation or may be due to some OS scheduling or it could be due to
Eviction of dirty pages sooner than it would otherwise.

I think the further investigation can be more meaningful if the results can
be taken by someone else other than me.

One idea to proceed in this line could be we start with this patch and then
based on results, do the further experiments to make it more useful.  

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Nicolas Barbier

2013/4/3 Tom Lane :

> Kevin Grittner  writes:
>
>> To be honest, I don't think I've personally seen a single use case
>> for matviews where they could be used if you couldn't count on an
>> error if attempting to use them without the contents reflecting a
>> materialization of the associated query at *some* point in time.
>
> Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED
> VIEW, that minimum requirement is satisfied no?

An argument against that is that computing the contents may be very expensive.

> Granting that throwing an error is actually of some use to some people,
> I would not think that people would want to turn it on via a command
> that throws away the existing view contents altogether, nor turn it off
> with a full-throated REFRESH.  There are going to need to be ways to
> incrementally update matviews, and ways to disable/enable access that
> are not tied to a complete rebuild, not to mention being based on
> user-determined rather than hard-wired criteria for what's too stale.
> So I don't think this is a useful base to build on.

Am I correct when I think that you are saying here, that the “zero
pages == unscannable” logic is not very future-proof? In that case I
concur, and I also think that this knowledge leaks in way too many
other places (the VACUUM bug mentioned by Kevin is a good example).

> If you feel that scannability disable is an absolute must for version 0,
> let's invent a matview reloption or some such to implement it and let
> users turn it on and off as they wish.  That seems a lot more likely
> to still be useful two years from now.

(In the context of making an unlogged matview unscannable after a crash:)

Is it imaginable that such a reloption could (in a future
implementation) be changed during or right after crash recovery? For
example, by storing the set of “truncated by crash recovery” relations
in a shared catalog table, which is then inspected when connecting to
a database to continue the truncation (in the case of a matview by
making it unscannable)?

> And if you're absolutely convinced that unlogged matviews mustn't work as I
> suggest, we can lose those from 9.3, too.

+1. Having unlogged matviews without having incremental updates yet,
isn’t super useful anyway.

> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

Please make the syntax for updating the “extent” (physical
representation) of a matview different from updating the view’s
logical contents. Examples:

(1) Require to use a special function to update the extent:

SELECT pg_mv_maintain('INSERT INTO example_matview ...');

While parsing the INSERT, the parser would know that it must interpret
“example_matview” as the matview’s extent; As currently the extent and
the view are the same, nothing must be done except for only allowing
the INSERT when it is parsed in the context of pg_mv_maintain, and
otherwise saying that matviews aren’t updatable yet (“NOTICE: did you
mean to update the extent? in that case use pg_mv_maintain”).

(2) Use a different schema (cf. TOAST) for the extent, e.g., view
“public.example_matview” vs. extent “pg_mv_extent.example_matview”. I
imagine future implementations to possibly require multiple extents
anyway, e.g., for storing the “not yet applied changesets” or other
intermediate things.

> Why exactly do you think that what I'm suggesting would cause a dump and
> reload to not regenerate the data?

Expensiveness: Matviews are used in cases where the data is expensive
to compute.

> We *need* to get rid of that aspect of things.  If you must have
> scannability state in version 0, okay, but it has to be a catalog property
> not this.

+1

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread Dimitri Fontaine

Hi,

I though we were more specific about an extension's object itself not
living in a schema in our documentation, but I agree we still have room
for progress here.

"David E. Wheeler"  writes:
> +Note that only the extension objects will be placed into the named
> +schema; the extension itself is a database-global object.

I think you're patching the right place, but I'm not sure about the term
"database-global object", that I can't find by grepping in sgml/ref.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Simon Riggs

On 4 April 2013 02:39, Andres Freund  wrote:

> Ok, I think I see the bug. And I think its been introduced in the
> checkpoints patch.

Well spotted. (I think you mean checksums patch).

> If by now the first backend has proceeded to PageSetLSN() we are writing
> different data to disk than the one we computed the checksum of
> before. Boom.

Right, so nothing else we were doing was wrong, that's why we couldn't
spot a bug. The problem is that we aren't replaying enough WAL because
the checksum on the WAL record is broke.

> I think the whole locking interactions in MarkBufferDirtyHint() need to
> be thought over pretty carefully.

When we write out a buffer with checksums enabled, we take a copy of
the buffer so that the checksum is consistent, even while other
backends may be writing hints to the same bufer.

I missed out on doing that with XLOG_HINT records, so the WAL CRC can
be incorrect because the data is scanned twice; normally that would be
OK because we have an exclusive lock on the block, but with hints we
only have share lock. So what we need to do is take a copy of the
buffer before we do XLogInsert().

Simple patch to do this attached for discussion. (Not tested).

We might also do this by modifying the WAL record to take the whole
block and bypass the BkpBlock mechanism entirely. But that's more work
and doesn't seem like it would be any cleaner. I figure lets solve the
problem first then discuss which approach is best.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

copy_before_XLOG_HINT.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Robert Haas

On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith  wrote:
> On 4/2/13 11:54 AM, Robert Haas wrote:
>> But, having said that, I still think the best idea is what Andres
>> proposed, which pretty much matches my own thoughts: the bgwriter
>> needs to populate the free list, so that buffer allocations don't have
>> to wait for linear scans of the buffer array.
>
> I was hoping this one would make it to a full six years of being on the TODO
> list before it came up again, missed it by a few weeks.  The funniest part
> is that Amit even submitted a patch on this theme a few months ago without
> much feedback:
> http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
> That stalled where a few things have, on a) needing more regression test
> workloads, and b) wondering just what the deal with large shared_buffers
> setting degrading performance was.

Those are impressive results.  I think we should seriously consider
doing something like that for 9.4.  TBH, although more workloads to
test is always better, I don't think this problem is so difficult that
we can't have some confidence in a theoretical analysis.  If I read
the original thread correctly (and I haven't looked at the patch
itself), the proposed patch would actually invalidate buffers before
putting them on the freelist.  That effectively amounts to reducing
shared_buffers, so workloads that are just on the edge of what can fit
in shared_buffers will be harmed, and those that benefit incrementally
from increased shared_buffers will be as well.

What I think we should do instead is collect the buffers that we think
are evictable and stuff them onto the freelist without invalidating
them.  When a backend allocates from the freelist, it can double-check
that the buffer still has usage_count 0.  The odds should be pretty
good.  But even if we sometimes notice that the buffer has been
touched again after being put on the freelist, we haven't expended all
that much extra effort, and that effort happened mostly in the
background.  Consider a scenario where only 10% of the buffers have
usage count 0 (which is not unrealistic).  We scan 5000 buffers and
put 500 on the freelist.  Now suppose that, due to some accident of
the workload, 75% of those buffers get touched again before they're
allocated off the freelist (which I believe to be a pessimistic
estimate for most workloads).  Now, that means that only 125 of those
500 buffers will succeed in satisfying an allocation request.  That's
still a huge win, because it means that each backend only has examine
an average of 4 buffers before it finds one to allocate.  If it had
needed to do the freelist scan itself, it would have had to touch 40
buffers before finding one to allocate.

In real life, I think the gains are apt to be, if anything, larger.
IME, it's common for most or all of the buffer pool to be pinned at
usage count 5.  So you could easily have a situation where the arena
scan has to visit millions of buffers to find one to allocate.  If
that's happening in the background instead of the foreground, it's a
huge win.  Also, note that there's nothing to prevent the arena scan
from happening in parallel with allocations off of the freelist - so
while foreground processes are emptying the freelist, the background
process can be looking for more things to add to it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Number of spinlocks

2013-04-04 Thread Heikki Linnakangas

With --disable-spinlocks, we need to know the number of spinlocks in the 
system at startup, so that we can reserve enough semaphores to mimic the 
spinlocks. It's calculated in SpinLockSemas():



/*
 * Report number of semaphores needed to support spinlocks.
 */
int
SpinlockSemas(void)
{
/*
 * It would be cleaner to distribute this logic into the affected 
modules,
 * similar to the way shmem space estimation is handled.
 *
 * For now, though, we just need a few spinlocks (10 should be plenty)
 * plus one for each LWLock and one for each buffer header. Plus one
 * for each XLog insertion slot in xlog.c.
 */
return NumLWLocks() + NBuffers + 10 + NUM_XLOGINSERT_SLOTS;
}


Ten spinlocks might've been plenty in 2001 when that comment was 
written, but we have reached that number. I grepped the sources for 
SpinLockInit, and found that we use:


1 spinlock for each LWLock
1 spinlock for each buffer
1 spinlock for each wal sender process slot (max_wal_senders in total)

1 spinlock for each partitioned hash table (2 in predicate.c, 2 in 
lock.c, 1 in buf_table.c)

1 spinlock in XLogCtl->info_lck
1 spinlock for WAL receiver (WalRcv->mutex)
1 spinlock for hot standby xid tracking (procArray->known_assigned_xids_lck)
1 spinlock for shared memory allocator (ShmemLock)
1 spinlock for shared inval messaging (shmInvalBuffer->msgnumLock)
1 spinlock for the proc array freelist (ProcStructLock)
1 spinlock for fast-path lock mechanism (FastPathStrongRelationLocks->mutex)
1 spinlock for the checkpointer (CheckpointerShmem->ckpt_lck)

That's a fixed number of 13 spinlocks, plus 1 for each LWLock, buffer, 
and wal sender.


I'll go and adjust SpinLockSemas() to take the walsenders into account, 
and bump the fixed number from 10 to 30. That should be enough headroom 
for the next ten years.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Andres Freund

On 2013-04-04 13:30:40 +0100, Simon Riggs wrote:
> On 4 April 2013 02:39, Andres Freund  wrote:
>
> > Ok, I think I see the bug. And I think its been introduced in the
> > checkpoints patch.
>
> Well spotted. (I think you mean checksums patch).

Heh, yes. I was slightly tired at that point ;)

> > If by now the first backend has proceeded to PageSetLSN() we are writing
> > different data to disk than the one we computed the checksum of
> > before. Boom.
>
> Right, so nothing else we were doing was wrong, that's why we couldn't
> spot a bug. The problem is that we aren't replaying enough WAL because
> the checksum on the WAL record is broke.

Well, there might be other bugs we can't see yet... But lets hope not.

> > I think the whole locking interactions in MarkBufferDirtyHint() need to
> > be thought over pretty carefully.
>
> When we write out a buffer with checksums enabled, we take a copy of
> the buffer so that the checksum is consistent, even while other
> backends may be writing hints to the same bufer.
>
> I missed out on doing that with XLOG_HINT records, so the WAL CRC can
> be incorrect because the data is scanned twice; normally that would be
> OK because we have an exclusive lock on the block, but with hints we
> only have share lock. So what we need to do is take a copy of the
> buffer before we do XLogInsert().
>
> Simple patch to do this attached for discussion. (Not tested).
>
> We might also do this by modifying the WAL record to take the whole
> block and bypass the BkpBlock mechanism entirely. But that's more work
> and doesn't seem like it would be any cleaner. I figure lets solve the
> problem first then discuss which approach is best.

Unfortunately I find that approach unacceptably ugly. I don't think its
ok to make an already *very* complicated function (XLogInsert()) in one
of the most complicated files (xlog.c) even more complicated. It
literally took me years to understand a large percentage of it.
I even think the XLOG_HINT escape hatch should be taken out and be
replaced by something outside of XLogInsert().

Don't think that approach would work with as few lines anyway, since you
need to properly pass it into XLogCheckBuffer() et al which checks the
buffer tags and such - which it needs to properly compute the struct
BkpBlock. Also we iterate over rdata->page for the CRC computation.

I think the route you quickly sketched is more realistic. That would
remove all knowledge obout XLOG_HINT from generic code hich is a very
good thing, I spent like 15minutes yesterday wondering whether the early
return in there might be the cause of the bug...

Something like:

XLogRecPtr
XLogSaveBufferForHint(Buffer buffer)
{
XLogRecPtr recptr = InvalidXLogRecPtr;
XLogRecPtr lsn;
XLogRecData rdata[2];
BkbBlock bkp;

/*
 * make sure no checkpoint can happen while we decide about logging
 * something or not, since we don't hold any lock preventing that
 * otherwise.
 */
Assert(MyPgXact->delayChkpt);

/* update RedoRecPtr */
GetRedoRecPtr();

/* setup phony rdata element */
rdata[0].buffer = buffer;
rdata[0].buffer_std = true; /* is that actually ok? */

/* force full pages on, otherwise checksums won't work? */
if (XLogCheckBuffer(rdata, true, &lsn, &bkp))
{
char copied_buffer[BLCKSZ];

/*
 * copy buffer so we don't have to worry about
 * concurrent hint bit or lsn updates. We assume pd_lower/upper
 * cannot be changed without an exclusive lock, so the contents
 * bkp are not racy.
 */
memcpy(copied_buffer, BufferGetBlock(buffer), BLCKSZ);

rdata[0].data = bkp;
rdata[0].len = sizeof(BkbBlock);
rdata[0].buffer = InvalidBuffer;
rdata[0].buffer_std = false;
rdata[0].next = &(rdata[1]);

rdata[1].data = copied_buffer;
rdata[1].len = BLCKSZ;
rdata[1].buffer = InvalidBuffer;
rdata[1].buffer_std = false;
rdata[1].next = NULL;

recptr = XLogInsert(RM_XLOG_ID, XLOG_HINT, rdata);
}

return recptr;
}

That should work?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Rodrigo Barboza

Hi guys.
I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
If postgres has no unsigned int type, what is the use of these macros?

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Merlin Moncure

On Wed, Apr 3, 2013 at 11:11 PM, Tom Lane  wrote:
> In any case, the whole exercise is pointless if we don't change the
> visible behavior of array_dims et al.  So I think the idea that this
> would be without visible consequence is silly.  What's up for argument
> is just how much incompatibility is acceptable.

The only reasonable answer for this (a provably used, non-security,
non-standards violating, non-gross functionality breakage case) is
*zero*.  Our historically cavalier attitude towards compatibility
breakage has been an immense disservice to our users and encourages
very bad upgrade habits and is, IMNSHO, embarrassing.

Changing the way array_dims works for a minor functionality
enhancement is gratuitous and should be done, if at all, via a loudly
advertised deprecation/replacement cycle with a guarding GUC (yes, I
hate them too, but not nearly as much as the expense of qualifying
vast code bases against random compatibility breakages every release).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Robert Haas

On Wed, Apr 3, 2013 at 9:40 PM, Greg Stark  wrote:
> I used to advocate a similar idea. But when questioned on list I tried to
> work out the details and ran into some problem coming up with a concrete
> plan.
>
> How do you compare a plan that you think has a 99% chance of running in 1ms
> but a 1% chance of taking 1s against a plan that has a 90% chance of 1ms and
> a 10% chance of taking 100ms? Which one is actually riskier? They might even
> both have the same 95% percentile run-time.
>
> And additionally there are different types of unknowns. Do you want to treat
> plans where we have a statistical sample that gives us a probabilistic
> answer the same as plans where we think our model just has a 10% chance of
> being wrong? The model is going to either be consistently right or
> consistently wrong for a given query but the sample will vary from run to
> run. (Or vice versa depending on the situation).

One idea that someone through up against a wall recently during an
EnterpriseDB development meeting - I think it was Kevin Grittner but
it might have been Noah Misch, or some combination - was to have a GUC
for estimate_worstcase_fraction.  So, when computing the cost of a
path, we'd compute our current expected-case estimate, and also a
worst-case estimate, and then compute the final cost as:

estimated_cost = estimate_worstcase_fraction * worst_case_estimate +
(1 - estimate_worstcase_fraction) * expected_case_estimate

I think Kevin and I both have the intuition that even a rather small
value for estimate_worstcase_fraction - like 0.01 or 0.001 or even
smaller - would prevent a pretty significant fraction of the problems
people encounter in this area.  But users could change it, in general
or for particular queries, if it ended up being wrong in their
environment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread David E. Wheeler

On Apr 4, 2013, at 5:16 AM, Dimitri Fontaine  wrote:

> "David E. Wheeler"  writes:
>> +Note that only the extension objects will be placed into the named
>> +schema; the extension itself is a database-global object.
> 
> I think you're patching the right place, but I'm not sure about the term
> "database-global object", that I can't find by grepping in sgml/ref.

Yeah, I wasn't sure, either, but figured someone here would know what to call 
those sorts of things.

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Simon Riggs

On 4 April 2013 15:53, Andres Freund  wrote:

> Unfortunately I find that approach unacceptably ugly.

Yeh. If we can confirm its a fix we can discuss a cleaner patch and
that is much better.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql crash fix

2013-04-04 Thread Bruce Momjian

On Tue, Apr  2, 2013 at 08:48:53PM -0400, Bruce Momjian wrote:
> I found that psql will crash if given a PSQLRC value containing a tilde:
> 
>   $ PSQLRC="~/x" psql test
>   *** glibc detected *** psql: free(): invalid pointer: 
> 0x7fffb7c933ec ***
> 
> This is on Debian Squeeze 6.0.7.  The fix is to pstrdup() the value
> returned by getenv(), so it can be free()'ed later --- you can't free
> getenv()-returned values:
> 
>As typically implemented, getenv() returns a pointer to a string
>within the environment list.  The caller must take care not to
>modify this string, since that would change the environment of
>the process.
> 
> This bug exists in 9.2 and git head.  I also removed the return value
> from expand_tilde() as no caller was using it.

Applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regex with > 32k different chars causes a backend crash

2013-04-04 Thread Heikki Linnakangas


On 04.04.2013 03:32, Noah Misch wrote:

On Wed, Apr 03, 2013 at 08:09:15PM +0300, Heikki Linnakangas wrote:

--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -148,6 +148,7 @@
  typedef short color;  /* colors of characters */
  typedef int pcolor;   /* what color promotes to */

+#define MAX_COLOR  32767   /* max value that fits in 'color' 
datatype */


This should use SHRT_MAX, no?  (Not that any supported platform differs here.)


I considered that, but I got all confused on whether limits.h needs to 
be included and where, if we use that. So I just used a constant 32767 
in the end. Committed that way.


I opened a ticket in TCL bug tracker for this: 
https://sourceforge.net/tracker/?func=detail&aid=3610026&group_id=10894&atid=110894. 



- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] puzzling JSON bug

2013-04-04 Thread Andrew Dunstan


David Wheeler has presented me with a nasty bug case.

If I do this:

   select '{"members": { "add": [3, 4]}}'::json #> '{members,add}';

then I get a crash.

If I comment out the pfree() at json.c:parse_object_field() lines 378-9 
then I get back the right result but instead get a warning like this:


   WARNING:  problem in alloc set ExprContext: bogus aset link in block
   0x1efaa80, chunk 0x1efb1f0


I'm not quite sure where I should go looking for what I've done wrong here.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost

Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> So that's at least going in the right direction.

I agree that this is going in the right direction; it certainly would
make the plan that I *expect* to be chosen more likely, however..

I've been fiddling with this on the very much larger overall database
where this test case came from and have found that hashing the large
table can actually be *faster* and appears to cause a more consistent
and constant amount of disk i/o (which is good).

The test case exhibits a bit of why this is the case- the per-tuple hash
lookup is way closer to the per-tuple cost of building the hash table
than I'd expect.

per-tuple cost to build the hash table (41M tuples): 0.33us
per-tuple cost to scan/do hash lookups (41M tuples): 0.29us
  (with a small hash table of only 41K entries)

The total difference being: 1233.854 vs. 1428.424, or only 194.570ms in
favor of scanning the big table instead of hashing it.

These numbers are just from those posted with my original email:

http://explain.depesz.com/s/FEq
http://explain.depesz.com/s/FOU

I've seen much worse though- I have one case where hash-the-big-table
took 5s and hash-the-small-table took almost 10s (total times).  I'm
trying to see if I can pull that out and isolate how it's different (and
see if it was just due to other load on the box).

What I'm trying to get at in this overall email is: why in the world is
it so expensive to do hash lookups?  I would have expected the cost of
the hash table to be *much* more than the cost to do a hash lookup, and
that doing hash lookups against a small hash table would be fast enough
to put serious pressure on the i/o subsystem.  Instead, the building of
the hash table actually puts more pressure and can end up being more
efficient overall.  We have a process that basically does this a whole
bunch and the "hash-the-big-table" operation takes about 4.7 hrs, while
the "hash-the-small-table" approach went well past 5 hours and was only
about 70% complete.

Thoughts?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Kevin Grittner

Early versions of the matview patch had a relisvalid flag in
pg_class to determine whether the relation was scannable.  The name
was chosen based on a similarity to the purpose of indisvalid,
although the proliferation of new bools for related issues left me
wondering if a "char" would be a better idea.  Based on on-list
reviews I removed that in favor of basing the state on a
zero-length heap file, in an attempt to work better with unlogged
matviews.  I can easily look back through my development branch to
find the commits which made this change and revert them if this
approach is preferred.

I realize this would need to be Real Soon Now, but before reverting
to the earlier code I want to allow a *little* more time for
opinions.

Responses to particular points below.

Tom Lane  wrote:

> Granting that throwing an error is actually of some use to some people,
> I would not think that people would want to turn it on via a command
> that throws away the existing view contents altogether, nor turn it off
> with a full-throated REFRESH.

There is no doubt that in future versions we will want to be able
to disallow scans based on other criteria than just whether the
data is valid as of *some* point in time.  I can't imagine a
circumstance under which we would want to allow scans if it
doesn't.  So, at least for this release and as a default for future
releases, I think it makes sense that a matview last CREATEd or
REFRESHed WITH NO DATA should not be scannable.  Additional knobs
for the users to control this can and should be added in future
releases.

> There are going to need to be ways to incrementally update
> matviews, and ways to disable/enable access that are not tied to
> a complete rebuild, not to mention being based on user-determined
> rather than hard-wired criteria for what's too stale.

Absolutely.  So far as I know, nobody has ever suggested or
expected otherwise.  This paper provides a useful summary of
techniques for incremental updates, with references to more
detailed papers:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf

I expect to be working on implementing the most obvious special
cases and a "catch-all" general solution for 9.4.  Efficient
incremental update seems to depend on the ability to have at least
one new "hidden" system column, so once we get to a good point for
discussing 9.4 issues, I'll be on about that.

> If you feel that scannability disable is an absolute must for version 0,
> let's invent a matview reloption or some such to implement it and let
> users turn it on and off as they wish.  That seems a lot more likely
> to still be useful two years from now.

What about the idea of a relisvalid bool or some "char" column in
pg_class?

> And if you're absolutely convinced that unlogged matviews mustn't
> work as I suggest, we can lose those from 9.3, too.

I was loath to do that, but as Nicolas points out, they really
aren't that interesting without incremental update.  Perhaps it is
better to drop those until next release and have a more
sophisticated way to deal with invalidation of those -- or as you
suggested, just punt them to be empty.  (I would hate to do that,
but it might be the lesser of evils.)

> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I think that should be the last resort, after we have a more
automated declarative way of maintaining the majority of cases.
Since I don't see too much there where using that technique with
matviews would give you more than you could do right now with
tables and triggers, hand-coding the incremental maintenance is
very low on my list of priorities.  In any event, I don't want to
rush into any such thing this close to release; that seems to me to
be clearly 9.4 material.

>> Individual judges have a "dashboard" showing such things as the
>> number of court cases which have gone beyond certain thresholds
>> without action

> If you need 100% accuracy, which is what this scenario appears to be
> demanding, matviews are not for you (yet).  The existing implementation
> certainly is nowhere near satisfying this scenario.

No, we're talking about timelines measured in weeks or months.  A
nightly update is acceptable, although there are occasional gripes
by a judge that they would like to see the results of cleaning up
such cases sooner than the next morning.  An hour latency would
probably make them happy.  If async increm

Re: [HACKERS] pg_dump selectively ignores extension configuration tables

2013-04-04 Thread Vibhor Kumar

On Mar 25, 2013, at 12:01 PM, Vibhor Kumar  
wrote:

> 
> On Mar 25, 2013, at 10:48 AM, Joe Conway  wrote:
> 
>> On 03/25/2013 08:12 AM, Vibhor Kumar wrote:
>>> Since, nobody has picked this one.
>>> 
>>> If there is no objection,then I can test this patch against 9.1 & 9.2.
>> 
>> Here are diffs for 9.1 and 9.2. The previous email was against 9.3 dev.
> 
> Thanks Joe!
> 
> will test both for 9.1 and 9.2

I did some testing on this patch with 9.1 and 9.2 source code. Testing included 
following:
1. Configured PostGIS with 9.1 and 9.2
2. verified all switches of pg_dump with regression db.
3. Checked other extensions, to verify if this impacting those. 

Everything is working as expected and I haven't found any issue during my test 
with this patch.

While testing this patch, some thoughts came in my mind and thought to share on 
this thread. 
Is it possible, if we can have two switches for extension in pg_dump:
1. extension dump with user data in extension tables.
2. User data-only dump from extensions.

Thanks & Regards,
Vibhor Kumar
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Blog:http://vibhork.blogspot.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Tom Lane

Stephen Frost  writes:
> I've been fiddling with this on the very much larger overall database
> where this test case came from and have found that hashing the large
> table can actually be *faster* and appears to cause a more consistent
> and constant amount of disk i/o (which is good).

Interesting.

> What I'm trying to get at in this overall email is: why in the world is
> it so expensive to do hash lookups?

perf or oprofile reveal anything?

Also, I assume that the cases you are looking at are large enough that
even the "small" table doesn't fit in a single hash batch?  It could
well be that the answer has to do with some bogus or at least
unintuitive behavior of the batching process, and it isn't really at all
a matter of individual hash lookups being slow.

(You never did mention what work_mem setting you're testing, anyway.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> > What I'm trying to get at in this overall email is: why in the world is
> > it so expensive to do hash lookups?
> 
> perf or oprofile reveal anything?

Working on a test case actually- I've got one now:
http://snowman.net/~sfrost/test_case2.sql

In this example, hashing the large table is actually 2 seconds *faster*
than hashing the small table (again, all on my laptop).

> Also, I assume that the cases you are looking at are large enough that
> even the "small" table doesn't fit in a single hash batch?  

No, quite the opposite, sorry for not mentioning that before.  Either
side fits completely into memory w/ a single batch.  The explain
analyze's that I posted before show that, either way, there's only one
batch involved.

> (You never did mention what work_mem setting you're testing, anyway.)

With the test case above (where I got a 2s faster run time by hashing
the big table) used a work_mem of 1GB.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Dimitri Fontaine

Robert Haas  writes:
> for estimate_worstcase_fraction.  So, when computing the cost of a
> path, we'd compute our current expected-case estimate, and also a
> worst-case estimate, and then compute the final cost as:

There also was the idea for the executor to be able to handle alternate
plans and some heuristic to determine that the actual cost of running a
plan is much higher than what's been estimated, so much so as to switch
to starting from scratch with the other plan instead.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> perf or oprofile reveal anything?

Here's what we get from oprofile (perhaps not too surprising):

Hash the small table / scan the big table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
167374   16737447.9491  47.9491nodeHash.c:915  postgres 
ExecScanHashBucket
8504125241524.3624  72.3115mcount.c:60 
libc-2.15.so __mcount_internal
28370280785 8.1274  80.4389_mcount.S:33
libc-2.15.so mcount
15856296641 4.5424  84.9814(no location information)   [vdso] 
(tgid:30643 range:0x7fffe6fff000-0x7fffe6ff) [vdso] (tgid:30643 
range:0x7fffe6fff000-0x7fffe6ff)
6291 302932 1.8022  86.7836xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
4555 307487 1.3049  88.0885instrument.c:70 postgres 
InstrStopNode
3849 311336 1.1027  89.1912heapam.c:711postgres 
heapgettup_pagemode
3567 314903 1.0219  90.2130nodeHashjoin.c:63   postgres 
ExecHashJoin

Hash the big table / scan the small table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
112060   11206039.2123  39.2123mcount.c:60 
libc-2.15.so __mcount_internal
3654714860712.7886  52.0009nodeHash.c:709  postgres 
ExecHashTableInsert
3357018217711.7469  63.7477_mcount.S:33
libc-2.15.so mcount
16383198560 5.7328  69.4805(no location information)   [vdso] 
(tgid:30643 range:0x7fffe6fff000-0x7fffe6ff) [vdso] (tgid:30643 
range:0x7fffe6fff000-0x7fffe6ff)
13200211760 4.6190  74.0995(no location information)   
no-vmlinux   /no-vmlinux
6345 218105 2.2203  76.3197xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
5250 223355 1.8371  78.1568nodeHash.c:915  postgres 
ExecScanHashBucket
4797 228152 1.6786  79.8354heapam.c:711postgres 
heapgettup_pagemode
4661 232813 1.6310  81.4664aset.c:563  postgres 
AllocSetAlloc
4588 237401 1.6054  83.0718instrument.c:70 postgres 
InstrStopNode
3550 240951 1.2422  84.3140memcpy-ssse3-back.S:60  
libc-2.15.so __memcpy_ssse3_back
3013 243964 1.0543  85.3684aset.c:1109 postgres 
AllocSetCheck

Looking at the 'Hash the small table / scan the big table', opannotate
claims that this is bar far the worst offender:

147588 42.2808 :hashTuple = hashTuple->next;

While most of the time in the 'Hash the big table / scan the small
table' is in:

 34572 12.0975 :hashTuple->next = hashtable->buckets[bucketno];

Neither of those strike me as terribly informative though.  To be
honest, I've not really played w/ oprofile all that much.  Now that I've
got things set up to support this, I'd be happy to provide more info if
anyone has suggestions on how to get something more useful.

It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
entries to consider instead of more.  Along those same lines, I really
wonder if we're being too generous wrt the bucket-depth goal of '10'
instead of, say, '1', especially when we've got plenty of work_mem
available.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
> 8504125241524.3624  72.3115mcount.c:60 
> libc-2.15.so __mcount_internal
> 28370280785 8.1274  80.4389_mcount.S:33
> libc-2.15.so mcount
[...]

And as a side-note, I'm rebuilding w/o profiling, asserts, etc and will
run it again, though I don't really expect the top-hitters to change.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] puzzling JSON bug

2013-04-04 Thread Tom Lane

Andrew Dunstan  writes:
> David Wheeler has presented me with a nasty bug case.
> If I do this:

> select '{"members": { "add": [3, 4]}}'::json #> '{members,add}';

> then I get a crash.

> If I comment out the pfree() at json.c:parse_object_field() lines 378-9 
> then I get back the right result but instead get a warning like this:

> WARNING:  problem in alloc set ExprContext: bogus aset link in block
> 0x1efaa80, chunk 0x1efb1f0

> I'm not quite sure where I should go looking for what I've done wrong here.

Routine array-overrun memory stomp.  The chunk header data for "fname"'s
alloc chunk is being overwritten here:

Watchpoint 2: *(int *) 1075253088

Old value = 1074925616
New value = -1
0x50fe14 in get_array_start (state=0x40170e88) at jsonfuncs.c:688
688 _state->array_level_index[lex_level] = -1;

It appears that lex_level is 2 but only enough room for 2 entries has
been allocated in array_level_index[].

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] puzzling JSON bug

2013-04-04 Thread Andrew Dunstan



On 04/04/2013 03:39 PM, Tom Lane wrote:

Andrew Dunstan  writes:

David Wheeler has presented me with a nasty bug case.
If I do this:
 select '{"members": { "add": [3, 4]}}'::json #> '{members,add}';
then I get a crash.
If I comment out the pfree() at json.c:parse_object_field() lines 378-9
then I get back the right result but instead get a warning like this:
 WARNING:  problem in alloc set ExprContext: bogus aset link in block
 0x1efaa80, chunk 0x1efb1f0
I'm not quite sure where I should go looking for what I've done wrong here.

Routine array-overrun memory stomp.  The chunk header data for "fname"'s
alloc chunk is being overwritten here:

Watchpoint 2: *(int *) 1075253088

Old value = 1074925616
New value = -1
0x50fe14 in get_array_start (state=0x40170e88) at jsonfuncs.c:688
688 _state->array_level_index[lex_level] = -1;

It appears that lex_level is 2 but only enough room for 2 entries has
been allocated in array_level_index[].



OK, many thanks, will fix.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis

Andres,

Thank you for diagnosing this problem!

On Thu, 2013-04-04 at 16:53 +0200, Andres Freund wrote:
> I think the route you quickly sketched is more realistic. That would
> remove all knowledge obout XLOG_HINT from generic code hich is a very
> good thing, I spent like 15minutes yesterday wondering whether the early
> return in there might be the cause of the bug...

I like this approach. It may have some performance impact though,
because there are a couple extra spinlocks taken, and an extra memcpy.
The code looks good to me except that we should be consistent about the
page hole -- XLogCheckBuffer is calculating it, but then we copy the
entire page. I don't think anything can change the size of the page hole
while we have a shared lock on the buffer, so it seems OK to skip the
page hole during the copy.

Another possible approach is to drop the lock on the buffer and
re-acquire it in exclusive mode after we find out we'll need to do
XLogInsert. It means that MarkBufferDirtyHint may end up blocking for
longer, but would avoid the memcpy. I haven't really thought through the
details.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
> It does look like reducing bucket depth, as I outlined before through
> the use of a 2-level hashing system, might help speed up
> ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
> entries to consider instead of more.  Along those same lines, I really
> wonder if we're being too generous wrt the bucket-depth goal of '10'
> instead of, say, '1', especially when we've got plenty of work_mem
> available.

Rerunning using a minimally configured build (only --enable-openssl
and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
results in a couple of interesting things-

First, the planner actually picks the plan to hash the small table and
seqscan the big one.  That also, finally, turns out to be *faster* for
this test case.

explain analyze results here:
Hash small table / seqscan big table: http://explain.depesz.com/s/nP1
Hash big table / seqscan small table: http://explain.depesz.com/s/AUv

Here's the oprofile reports:

Hash small table / seqscan big table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
3902339023 52.8574  52.8574nodeHash.c:915  postgres 
ExecScanHashBucket
3743 42766  5.0700  57.9273xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
3110 45876  4.2126  62.1399nodeHashjoin.c:63   postgres 
ExecHashJoin
2561 48437  3.4689  65.6088heapam.c:711postgres 
heapgettup_pagemode
2427 50864  3.2874  68.8962heapam.c:300postgres 
heapgetpage
2395 53259  3.2441  72.1403heaptuple.c:1028postgres 
slot_deform_tuple
2395 55654  3.2441  75.3843heaptuple.c:1135postgres 
slot_getattr
2383 58037  3.2278  78.6122nodeHash.c:786  postgres 
ExecHashGetHashValue
1811 59848  2.4530  81.0652tqual.c:1044postgres 
HeapTupleSatisfiesMVCC
1796 61644  2.4327  83.4979execScan.c:111  postgres 
ExecScan
1298 62942  1.7582  85.2561hashfunc.c:517  postgres 
hash_uint32
1274 64216  1.7257  86.9817execProcnode.c:356  postgres 
ExecProcNode
1011 65227  1.3694  88.3511heapam.c:1453   postgres 
heap_getnext
905  66132  1.2258  89.5770execTuples.c:333postgres 
ExecStoreTuple
858  66990  1.1622  90.7392fmgr.c:1291 postgres 
FunctionCall1Coll
835  67825  1.1310  91.8702execQual.c:668  postgres 
ExecEvalScalarVarFast
834  68659  1.1297  92.mcxt.c:126  postgres 
MemoryContextReset
818  69477  1.1080  94.1078nodeSeqscan.c:48postgres 
SeqNext

Hash big table / seqscan small table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
3861238612 41.2901  41.2901nodeHash.c:709  postgres 
ExecHashTableInsert
7435 46047  7.9507  49.2408(no location information)   
no-vmlinux   /no-vmlinux
4900 50947  5.2399  54.4806aset.c:563  postgres 
AllocSetAlloc
3803 54750  4.0668  58.5474xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
3335 58085  3.5663  62.1137heapam.c:711postgres 
heapgettup_pagemode
2532 60617  2.7076  64.8213nodeHash.c:786  postgres 
ExecHashGetHashValue
2523 63140  2.6980  67.5193memcpy-ssse3-back.S:60  
libc-2.15.so __memcpy_ssse3_back
2518 65658  2.6926  70.2119heaptuple.c:1028postgres 
slot_deform_tuple
2378 68036  2.5429  72.7549heapam.c:300postgres 
heapgetpage
2374 70410  2.5387  75.2935heaptuple.c:1135postgres 
slot_getattr
1852 72262  1.9805  77.2740nodeHash.c:915  postgres 
ExecScanHashBucket
1831 74093  1.9580  79.2320tqual.c:1044postgres 
HeapTupleSatisfiesMVCC
1732 75825  1.8521  81.0841heapam.c:1453   postgres 
heap_getnext
1320 77145  1.4116  82.4957nodeHash.c:76   postgres 
MultiExecHash
1219 78364  1.3

Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Robert Haas

On Thu, Apr 4, 2013 at 2:53 PM, Dimitri Fontaine  wrote:
> Robert Haas  writes:
>> for estimate_worstcase_fraction.  So, when computing the cost of a
>> path, we'd compute our current expected-case estimate, and also a
>> worst-case estimate, and then compute the final cost as:
>
> There also was the idea for the executor to be able to handle alternate
> plans and some heuristic to determine that the actual cost of running a
> plan is much higher than what's been estimated, so much so as to switch
> to starting from scratch with the other plan instead.

Yeah.  The thing is, if the plan has any side effects, that's not
really an option.  And even if it doesn't, it may throw away a lot of
work.  One thing we could do is switch from a unparameterized nested
loop to a hash join if the outer side turns out to be much larger than
expected, but that's only going to benefit a pretty narrow set of use
cases.  Which is why I think a planner-based approach is probably more
promising.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Tom Lane

Stephen Frost  writes:
> Looking with opannotate, there's two main hotspots in
> ExecScanHashBucket:

>  12846 17.4001 :hashTuple = 
> hashtable->buckets[hjstate->hj_CurBucketNo];
> and
>  22100 29.9348 :hashTuple = hashTuple->next;

Those are, of course, pretty trivial statements; so the way I read this
is that the fundamental slowdown comes from the hash table being large
compared to the CPU's cache, so that you're incurring lots of cache
misses at these particular fetches.  (You might be able to confirm that
if you can set oprofile to count cache misses rather than wall clock
time.)

> I'm certainly curious about those, but I'm also very interested in the
> possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
> depending on the work_mem setting.

Not sure about that.  That would make the hash-bucket-header array
larger without changing the size of the rest of the hash table, thus
probably making the CPU cache situation worse not better (which would
manifest as more time at the first of these two statements relative to
the second).

Can you add some instrumentation code to get us information about the
average/max length of the bucket chains?  And maybe try to figure out
how many distinct hash values per bucket, which would give us a clue
whether your two-level-list idea is worth anything.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Robert Haas

On Thu, Apr 4, 2013 at 11:10 AM, Merlin Moncure  wrote:
> The only reasonable answer for this (a provably used, non-security,
> non-standards violating, non-gross functionality breakage case) is
> *zero*.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-04 Thread Alvaro Herrera

Will Leinweber wrote:
> On ref 8507907 when compiling with clang on os x, I got this warning which
> seems like a possible bug.
> 
> I thought to report this because I imagine clang isn't frequently used
> day-to-day by most.

Ugh.  My fault.  Yes, this is a bug.

I don't see any nice way to convert ObjectType to ObjectClass or the
other way around; it seems to me the only simple way to fix this is to
add a new function EventTriggerSupportsObjectClass(), which takes
ObjectClass instead of ObjectType.  Patch attached.

Now, it annoys me that we now have three places that know about object
types supported by event triggers: there's a large struct of command tag
substrings (event_trigger_support), then there's these two functions.
It might be better to add ObjectType and ObjectClass entries to the
struct, so that only the struct needs to know about that.  The problem
is that these two functions would have to walk the struct every time,
instead of being a simple switch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
***
*** 210,216  deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
  			ObjectAddress *thisobj = targetObjects->refs + i;
  
  			if ((!(flags & PERFORM_DELETION_INTERNAL)) &&
! EventTriggerSupportsObjectType(getObjectClass(thisobj)))
  			{
  EventTriggerSQLDropAddObject(thisobj);
  			}
--- 210,216 
  			ObjectAddress *thisobj = targetObjects->refs + i;
  
  			if ((!(flags & PERFORM_DELETION_INTERNAL)) &&
! EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
  			{
  EventTriggerSQLDropAddObject(thisobj);
  			}
*** a/src/backend/commands/event_trigger.c
--- b/src/backend/commands/event_trigger.c
***
*** 912,917  EventTriggerSupportsObjectType(ObjectType obtype)
--- 912,939 
  }
  
  /*
+  * Do event triggers support this object class?
+  */
+ bool
+ EventTriggerSupportsObjectClass(ObjectClass objclass)
+ {
+ 	switch (objclass)
+ 	{
+ 		case OCLASS_DATABASE:
+ 		case OCLASS_TBLSPACE:
+ 		case OCLASS_ROLE:
+ 			/* no support for global objects */
+ 			return false;
+ 		case OCLASS_EVENT_TRIGGER:
+ 			/* no support for event triggers on event triggers */
+ 			return false;
+ 		default:
+ 			break;
+ 	}
+ 	return true;
+ }
+ 
+ /*
   * Prepare event trigger state for a new complete query to run, if necessary;
   * returns whether this was done.  If it was, EventTriggerEndCompleteQuery must
   * be called when the query is done, regardless of whether it succeeds or fails
*** a/src/include/commands/event_trigger.h
--- b/src/include/commands/event_trigger.h
***
*** 13,18 
--- 13,19 
  #ifndef EVENT_TRIGGER_H
  #define EVENT_TRIGGER_H
  
+ #include "catalog/dependency.h"
  #include "catalog/objectaddress.h"
  #include "catalog/pg_event_trigger.h"
  #include "nodes/parsenodes.h"
***
*** 41,46  extern Oid AlterEventTriggerOwner(const char *name, Oid newOwnerId);
--- 42,48 
  extern void AlterEventTriggerOwner_oid(Oid, Oid newOwnerId);
  
  extern bool EventTriggerSupportsObjectType(ObjectType obtype);
+ extern bool EventTriggerSupportsObjectClass(ObjectClass objclass);
  extern void EventTriggerDDLCommandStart(Node *parsetree);
  extern void EventTriggerDDLCommandEnd(Node *parsetree);
  extern void EventTriggerSQLDrop(Node *parsetree);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Andres Freund

On 2013-04-04 12:59:36 -0700, Jeff Davis wrote:
> Andres,
> 
> Thank you for diagnosing this problem!
> 
> On Thu, 2013-04-04 at 16:53 +0200, Andres Freund wrote:
> > I think the route you quickly sketched is more realistic. That would
> > remove all knowledge obout XLOG_HINT from generic code hich is a very
> > good thing, I spent like 15minutes yesterday wondering whether the early
> > return in there might be the cause of the bug...
> 
> I like this approach. It may have some performance impact though,
> because there are a couple extra spinlocks taken, and an extra memcpy.

I don't think its really slower. Earlier the code took WalInsertLock
everytime, even if we ended up not logging anything. Thats far more
epensive than a single spinlock. And the copy should also only be taken
in the case we need to log. So I think we end up ahead of the current
state.

> The code looks good to me except that we should be consistent about the
> page hole -- XLogCheckBuffer is calculating it, but then we copy the
> entire page. I don't think anything can change the size of the page hole
> while we have a shared lock on the buffer, so it seems OK to skip the
> page hole during the copy.

I don't think it can change either, but I doubt that there's a
performance advantage by not copying the hole. I'd guess the simpler
code ends up faster.

> Another possible approach is to drop the lock on the buffer and
> re-acquire it in exclusive mode after we find out we'll need to do
> XLogInsert. It means that MarkBufferDirtyHint may end up blocking for
> longer, but would avoid the memcpy. I haven't really thought through the
> details.

That sounds like it would be prone to deadlocks. So I would dislike to
go there.

Will write up a patch tomorrow.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane

Robert Haas  writes:
> On Thu, Apr 4, 2013 at 11:10 AM, Merlin Moncure  wrote:
>> The only reasonable answer for this (a provably used, non-security,
>> non-standards violating, non-gross functionality breakage case) is
>> *zero*.

> +1.

Well, if we're going to take that hard a line on it, then we can't
change anything about array data storage or the existing functions'
behavior; which leaves us with either doing nothing at all, or
inventing new functions that have saner behavior while leaving the
old ones in place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread k...@rice.edu

On Thu, Apr 04, 2013 at 04:16:12PM -0400, Stephen Frost wrote:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > It does look like reducing bucket depth, as I outlined before through
> > the use of a 2-level hashing system, might help speed up
> > ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
> > entries to consider instead of more.  Along those same lines, I really
> > wonder if we're being too generous wrt the bucket-depth goal of '10'
> > instead of, say, '1', especially when we've got plenty of work_mem
> > available.
> 
> Rerunning using a minimally configured build (only --enable-openssl
> and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
> results in a couple of interesting things-
> 
> First, the planner actually picks the plan to hash the small table and
> seqscan the big one.  That also, finally, turns out to be *faster* for
> this test case.
> 
> ...
> 
> I'm certainly curious about those, but I'm also very interested in the
> possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
> depending on the work_mem setting.  It's only used in
> ExecChooseHashTableSize, so while making it variable or depending on
> work_mem could slow planning down a bit, it's not a per-tuple cost item.
> 
+1 for adjusting this based on work_mem value.

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread Tom Lane

"David E. Wheeler"  writes:
> On Apr 3, 2013, at 11:41 AM, Alvaro Herrera  wrote:
>> Right -- an extension is not considered to live within a schema, they
>> are database-global.  The objects might live in a particular schema (if
>> it is "relocatable"), and there's support to move those to a different
>> schema, but this doesn't affect the extension itself.

> Thanks. I humbly submit this patch to help prevent silly questions like this 
> in the future.

I think this should be addressed in extend.sgml not only on the CREATE
EXTENSION reference page.  After thinking awhile I came up with the
attached wording.  Further wordsmithing anyone?

regards, tom lane

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 672d0df..bc1cd59 100644
*** a/doc/src/sgml/extend.sgml
--- b/doc/src/sgml/extend.sgml
***
*** 359,364 
--- 359,370 
  extension.)  Also notice that while a table can be a member of an
  extension, its subsidiary objects such as indexes are not directly
  considered members of the extension.
+ Another important point is that schemas can belong to extensions, but not
+ vice versa: an extension as such has an unqualified name and does not
+ exist within any schema.  The extension's member objects,
+ however, will belong to schemas whenever appropriate for their object
+ types.  It may or may not be appropriate for an extension to own the
+ schema(s) its member objects are within.
 
  
 
diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml
index 4f3b9a5..9c9bf6f 100644
*** a/doc/src/sgml/ref/create_extension.sgml
--- b/doc/src/sgml/ref/create_extension.sgml
*** CREATE EXTENSION [ IF NOT EXISTS ] 
+
+ Remember that the extension itself is not considered to be within any
+ schema: extensions have unqualified names that must be unique
+ database-wide.  But objects belonging to the extension can be within
+ schemas.
+

   
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Janes

On Thu, Apr 4, 2013 at 5:30 AM, Simon Riggs  wrote:

> On 4 April 2013 02:39, Andres Freund  wrote:
>
> > If by now the first backend has proceeded to PageSetLSN() we are writing
> > different data to disk than the one we computed the checksum of
> > before. Boom.
>
> Right, so nothing else we were doing was wrong, that's why we couldn't
> spot a bug. The problem is that we aren't replaying enough WAL because
> the checksum on the WAL record is broke.
>

This brings up a pretty frightening possibility to me, unrelated to data
checksums.  If a bit gets twiddled in the WAL file due to a hardware issue
or a "cosmic ray", and then a crash happens, automatic recovery will stop
early with the failed WAL checksum with an innocuous looking message.  The
system will start up but will be invisibly inconsistent, and will proceed
to overwrite that portion of the WAL file which contains the old data (real
data that would have been necessary to reconstruct, once the corruption is
finally realized ) with an end-of-recovery checkpoint record and continue
to chew up real data from there.

I don't know a solution here, though, other than trusting your hardware.
 Changing timelines upon ordinary crash recovery, not just media recovery,
seems excessive but also seems to be exactly what timelines were invented
for, right?

Back to the main topic here, Jeff Davis mentioned earlier "You'd still
think this would cause incorrect results, but...".  I didn't realize the
significance of that until now.  It does produce incorrect query results.
 I was just bailing out before detecting them.  Once I specify
ignore_checksum_failure=1
my test harness complains bitterly about the data not being consistent with
what the Perl program knows it is supposed to be.

> I missed out on doing that with XLOG_HINT records, so the WAL CRC can
> be incorrect because the data is scanned twice; normally that would be
> OK because we have an exclusive lock on the block, but with hints we
> only have share lock. So what we need to do is take a copy of the
> buffer before we do XLogInsert().
>
> Simple patch to do this attached for discussion. (Not tested).

> We might also do this by modifying the WAL record to take the whole
> block and bypass the BkpBlock mechanism entirely. But that's more work
> and doesn't seem like it would be any cleaner. I figure lets solve the
> problem first then discuss which approach is best.
>

I've tested your patch it and it seems to do the job.  It has survived far
longer than unpatched ever did, with neither checksum warnings, nor
complaints of inconsistent data.  (I haven't analyzed the code much, just
the results, and leave the discussion of the best approach to others)

 Thanks,

Jeff

Re: [HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-04 Thread Tom Lane

Alvaro Herrera  writes:
> Now, it annoys me that we now have three places that know about object
> types supported by event triggers: there's a large struct of command tag
> substrings (event_trigger_support), then there's these two functions.
> It might be better to add ObjectType and ObjectClass entries to the
> struct, so that only the struct needs to know about that.  The problem
> is that these two functions would have to walk the struct every time,
> instead of being a simple switch.

Yeah.  For the moment I think efficiency trumps having the information
in just one place, ie I agree with doing it like this.  If we find we're
constantly forgetting to fix the functions when we need to, it'd be time
enough to revisit the decision.

One thing you could do that might make it less likely we'd miss things
is to have the switches explicitly cover all the possible enum members,
instead of defaulting the majority of them as you do here.  I know when
I think about adding a new object type, I tend to choose the most
similar existing type and grep for references to it.  Likely you could
even draft the code so that most compilers would warn about an enum
value not covered in the switch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Noah Misch

Subject updated to account for the wider topics now appearing.

On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for.  Users wishing to do that should keep using current methods:

  CREATE VIEW mv_impl AS SELECT ...;
  CREATE TABLE mv AS SELECT * FROM mv_impl;
  -- REFRESH analogue: fancier approaches exist
  BEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;

If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.

> > ... Making sure that
> > the heap has at least one page if data has been generated seems
> > like a not-entirely-unreasonable way to do that, although there
> > remains at least one vacuum bug to fix if we keep it, in addition
> > to Tom's concerns.
> 
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.  If you must have scannability state in version 0, okay, but
> it has to be a catalog property not this.

In large part, this ended up outside the catalogs due to key limitations of
the startup process.  This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last.  Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly.  There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves.  We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery.  Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity.  The currently-committed approach is a good one given
the applicable constraints.

A slight variation on the committed approach would be to add a "_scannable"
relation fork.  The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable).  Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW.  Remove it in TRUNCATE.  When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork.  This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach.  The disadvantage is an
extra inode per healthy MV.  (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)

Note that I said "possibly-scannable" -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability.  If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.

Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:

1. Grab the pg_class entry from syscache.  If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry.  Return NULL on failure.
4. Return the scannability as reported by the relcache.

This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Noah Misch

On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:
> 2013/4/3 Tom Lane :
> > And if you're absolutely convinced that unlogged matviews mustn't work as I
> > suggest, we can lose those from 9.3, too.
> 
> +1. Having unlogged matviews without having incremental updates yet,
> isn't super useful anyway.

I would have surmised the opposite: since an unlogged MV requires a full
refresh at unpredictable moments, logged MVs will be preferred where a refresh
is prohibitively expensive.  Why might unlogged-MV applications desire
incremental updates more acutely than logged-MV applications?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Tom Lane

Noah Misch  writes:
> On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
>> No.  This is an absolute disaster.  It's taking something we have always
>> considered to be an irrelevant implementation detail and making it into
>> user-visible DDL state, despite the fact that it doesn't begin to satisfy
>> basic transactional behaviors.  We *need* to get rid of that aspect of
>> things.  If you must have scannability state in version 0, okay, but
>> it has to be a catalog property not this.

> In large part, this ended up outside the catalogs due to key limitations of
> the startup process.

Yeah, I realize that there's no other (easy) way to make unlogged
matviews reset to an invalid state on crash, but that doesn't make this
design choice less of a disaster.  It boxes us into something that's
entirely unable to support transitions between scannable and unscannable
states by any means short of a complete rewrite of the matview contents;
which seems fundamentally incompatible with any sort of incremental
update scenario.  And I remain of the opinion that it's going to box us
into not being able to fix the problems because of pg_upgrade
on-disk-compatibility issues.  We will be far better off to drop
unlogged matviews until we can come up with a better design.  If that's
so far off that no one can see it happening, well, that's tough.
Leaving the door open for incremental maintenance is more important.

> A slight variation on the committed approach would be to add a "_scannable"
> relation fork.

Not very transaction-safe, I think (consider crash midway through a
transaction that adds or removes the fork), and in any case orders of
magnitude more expensive than looking at a pg_class field.  This really
needs to be catalog state, not filesystem state.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Alvaro Herrera

Tom Lane escribió:
> Noah Misch  writes:

> > A slight variation on the committed approach would be to add a "_scannable"
> > relation fork.
> 
> Not very transaction-safe, I think (consider crash midway through a
> transaction that adds or removes the fork), and in any case orders of
> magnitude more expensive than looking at a pg_class field.  This really
> needs to be catalog state, not filesystem state.

We could revive the pg_class_nt patch proposed a decade ago, perhaps ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to add \watch to psql

2013-04-04 Thread Tom Lane

Will Leinweber  writes:
> Here is an updated patch that addresses several of the points brought up so
> far, such as the sleep, internationalization banner, and zero wait check,
> and it removes the premature input check.

I whacked this around some more, added basic docs, and committed it.

> Unfortunately rl_clear_screen() is not included at all in libedit, causing
> compilation to fail, and I was completely unable to find a way to
> distinguish libedit from readline on OS X. It tries extraordinarily hard to
> pretend that it's readline. Instead falling back to simple control
> characters to clear the screen worked very well on both linux and OS X.

I took that out; "works on the two cases I tried" does not mean "portable".

It's possible we could do something involving having configure check for
rl_clear_screen() etc, but that seems like more work than is justified,
not to mention that the results would then be platform-dependent
*by design*.  Frankly I kinda prefer the behavior without a screen clear
anyway; though this may just prove that I'm not accustomed to using the
original "watch".  Anyway, that's open to a followup patch if anybody is
sufficiently set on doing it differently.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to add \watch to psql

2013-04-04 Thread Will Leinweber

On Thu, Apr 4, 2013 at 5:04 PM, Tom Lane  wrote:

>
> I whacked this around some more, added basic docs, and committed it.
>
>
Thanks!


>  > Unfortunately rl_clear_screen() is not included at all in libedit,
> causing
> > compilation to fail, and I was completely unable to find a way to
> > distinguish libedit from readline on OS X. It tries extraordinarily hard
> to
> > pretend that it's readline. Instead falling back to simple control
> > characters to clear the screen worked very well on both linux and OS X.
>
> I took that out; "works on the two cases I tried" does not mean "portable".
>

Completely understandable.


I'm very excited to have helped contribute something, however small this
was, to the project.

Thanks again,
Will

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis

On Thu, 2013-04-04 at 14:21 -0700, Jeff Janes wrote:

> This brings up a pretty frightening possibility to me, unrelated to
> data checksums.  If a bit gets twiddled in the WAL file due to a
> hardware issue or a "cosmic ray", and then a crash happens, automatic
> recovery will stop early with the failed WAL checksum with
> an innocuous looking message.  The system will start up but will be
> invisibly inconsistent, and will proceed to overwrite that portion of
> the WAL file which contains the old data (real data that would have
> been necessary to reconstruct, once the corruption is finally realized
> ) with an end-of-recovery checkpoint record and continue to chew up
> real data from there.

I've been worried about that for a while, and I may have even seen
something like this happen before. We could perhaps do some checks, but
in general it seems hard to solve without writing flushing some data to
two different places. For example, you could flush WAL, and then update
an LSN stored somewhere else indicating how far the WAL has been
written. Recovery could complain if it gets an error in the WAL before
that point.

But obviously, that makes WAL flushes expensive (in many cases, about
twice as expensive).

Maybe it's not out of the question to offer that as an option if nobody
has a better idea. Performance-conscious users could place the extra LSN
on an SSD or NVRAM or something; or maybe use commit_delay or async
commits. It would only need to store a few bytes.

Streaming replication mitigates the problem somewhat, by being a second
place to write WAL.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis

On Thu, 2013-04-04 at 22:39 +0200, Andres Freund wrote:
> I don't think its really slower. Earlier the code took WalInsertLock
> everytime, even if we ended up not logging anything. Thats far more
> epensive than a single spinlock. And the copy should also only be taken
> in the case we need to log. So I think we end up ahead of the current
> state.

Good point.

> > The code looks good to me except that we should be consistent about the
> > page hole -- XLogCheckBuffer is calculating it, but then we copy the
> > entire page. I don't think anything can change the size of the page hole
> > while we have a shared lock on the buffer, so it seems OK to skip the
> > page hole during the copy.
> 
> I don't think it can change either, but I doubt that there's a
> performance advantage by not copying the hole. I'd guess the simpler
> code ends up faster.

I was thinking more about the WAL size, but I don't have a strong
opinion.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Tom Lane

Jeff Davis  writes:
> On Thu, 2013-04-04 at 14:21 -0700, Jeff Janes wrote:
>> This brings up a pretty frightening possibility to me, unrelated to
>> data checksums.  If a bit gets twiddled in the WAL file due to a
>> hardware issue or a "cosmic ray", and then a crash happens, automatic
>> recovery will stop early with the failed WAL checksum with
>> an innocuous looking message.  The system will start up but will be
>> invisibly inconsistent, and will proceed to overwrite that portion of
>> the WAL file which contains the old data (real data that would have
>> been necessary to reconstruct, once the corruption is finally realized
>> ) with an end-of-recovery checkpoint record and continue to chew up
>> real data from there.

> I've been worried about that for a while, and I may have even seen
> something like this happen before. We could perhaps do some checks, but
> in general it seems hard to solve without writing flushing some data to
> two different places. For example, you could flush WAL, and then update
> an LSN stored somewhere else indicating how far the WAL has been
> written. Recovery could complain if it gets an error in the WAL before
> that point.

> But obviously, that makes WAL flushes expensive (in many cases, about
> twice as expensive).

> Maybe it's not out of the question to offer that as an option if nobody
> has a better idea. Performance-conscious users could place the extra LSN
> on an SSD or NVRAM or something; or maybe use commit_delay or async
> commits. It would only need to store a few bytes.

At least on traditional rotating media, this is only going to perform
well if you dedicate two drives to the purpose.  At which point you
might as well just say "let's write two copies of WAL".  Or maybe three
copies, so that you can take a vote when they disagree.  While this is
not so unreasonable on its face for ultra-high-reliability requirements,
I can't escape the feeling that we'd just be reinventing software RAID.
There's no reason to think that we can deal with this class of problems
better than the storage system can.

> Streaming replication mitigates the problem somewhat, by being a second
> place to write WAL.

Yeah, if you're going to do this at all it makes more sense for the
redundant copies to be on other machines.  So the questions that that
leads to are how smart is our SR code about dealing with a master that
tries to re-send WAL regions it already sent, and what a slave ought to
do in such a situation if the new data doesn't match.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd

On 5 April 2013 07:43, Tom Lane  wrote:
> Well, if we're going to take that hard a line on it, then we can't
> change anything about array data storage or the existing functions'
> behavior; which leaves us with either doing nothing at all, or
> inventing new functions that have saner behavior while leaving the
> old ones in place.

And then we are in the awkward position of trying to explain the
differences in behaviour between the old and new functions ...
presumably with a dash of "for historical reasons" and a sprinkling of
"to preserve compatibility" in every other paragraph.

The other suggestion that had been tossed around elsewhere upthread
was inventing a new type that serves the demand for a straightforward
mutable list, which has exactly one dimension, and which may be
sensibly empty.  Those few who are interested in dimensions >= 2 could
keep on using "arrays", with all their backwards-compatible silliness
intact, and everybody else could migrate to "lists" at their leisure.

I don't hate the latter idea from a user perspective, but from a
developer perspective I suspect there are valid objections to be made.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis

On Thu, 2013-04-04 at 21:06 -0400, Tom Lane wrote:
> I can't escape the feeling that we'd just be reinventing software RAID.
> There's no reason to think that we can deal with this class of problems
> better than the storage system can.

The goal would be to reliably detect a situation where WAL that has been
flushed successfully was later corrupted; not necessarily to recover
when we find it. Unless it's something like ZFS, I don't think most
software RAID will reliably detect corruption.

A side benefit is that it would also help catch bugs like this in the
future.

I'm not advocating for this particular solution, but I do concur with
Jeff Janes that it's a little bit scary and that we can entertain some
ideas about how to mitigate it.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane

Brendan Jurd  writes:
> The other suggestion that had been tossed around elsewhere upthread
> was inventing a new type that serves the demand for a straightforward
> mutable list, which has exactly one dimension, and which may be
> sensibly empty.  Those few who are interested in dimensions >= 2 could
> keep on using "arrays", with all their backwards-compatible silliness
> intact, and everybody else could migrate to "lists" at their leisure.

> I don't hate the latter idea from a user perspective, but from a
> developer perspective I suspect there are valid objections to be made.

The real problem with that is that the existing arrays have glommed onto
the syntax that is both most natural and SQL-spec-required.  I don't
think there is a lot of room to shove in a different kind of critter
there.  (There's been a remarkable lack of attention to the question
of spec compliance in this thread, btw.  Surely the standard has
something to say on the matter of zero-length arrays?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd

On 5 April 2013 13:04, Tom Lane  wrote:
> (There's been a remarkable lack of attention to the question
> of spec compliance in this thread, btw.  Surely the standard has
> something to say on the matter of zero-length arrays?)

From 4.10 in my draft copy of "Foundation", arrays are one of two
"collection" types (the other being multisets), and:

  "A collection is a composite value comprising zero or more elements,
each a value of some data type DT"

  "The number of elements in C is the cardinality of C"

  "An array is a collection A in which each element is associated with
exactly one ordinal position in A. If n is
the cardinality of A, then the ordinal position p of an element is an
integer in the range 1 (one) ≤ p ≤ n."

The language specifically allows for zero elements, and does not
contemplate multiple dimensions.  The specification for the array
constructor syntax (6.36) and array element reference by subscript
(6.23) also make it fairly clear that only 1-D arrays were being
considered.

I'd say we've already gone way off-menu by having multidims.  A more
compliant approach would have been to implement arrays as 1-D only,
and then maybe have a separate thing ("matrices"?) for multidims.

While I was in there I noticed CARDINALITY, which would be pretty easy
to add and would at least provide a more productive way to get the
"real" length of an array without disrupting existing functionality:

" ::=
CARDINALITY   "

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane

Brendan Jurd  writes:
> On 5 April 2013 13:04, Tom Lane  wrote:
>> (There's been a remarkable lack of attention to the question
>> of spec compliance in this thread, btw.  Surely the standard has
>> something to say on the matter of zero-length arrays?)

> The language specifically allows for zero elements, and does not
> contemplate multiple dimensions.  The specification for the array
> constructor syntax (6.36) and array element reference by subscript
> (6.23) also make it fairly clear that only 1-D arrays were being
> considered.

> I'd say we've already gone way off-menu by having multidims.

Yeah, we knew that.  I don't have a problem with seeing multidim arrays
as an extension to the standard though.  The point is that the behavior
for the 1-D, default-lower-bound case ought to match the standard.

> While I was in there I noticed CARDINALITY, which would be pretty easy
> to add and would at least provide a more productive way to get the
> "real" length of an array without disrupting existing functionality:

Yeah, that would at least fix the null-result-for-empty-array problem
for that particular functionality.  Still, this is ammunition for the
position that null results for empty arrays are just broken.

BTW ... if you check the archives you will find that we had
cardinality() for a short while, and removed it before 8.4 release,
because we couldn't agree on what it ought to return when given a
multi-dimensional array.  I'm afraid that issue is still unresolved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Amit Kapila

On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:

> Hi guys.
> I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
> If postgres has no unsigned int type, what is the use of these macros?

They are mainly used for contrib module functionality or some built-in
functions which are not exposed.
For example, bt_page_items() receives relation name (text) and block number
(int), but internally the block number
is unit32 as max blocks can be oxFFFE.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Rodrigo Barboza

I am creating my own uint32 type and faced this function.
But my args are always of type my_int and one of the signed int types as
postgres doesn't hava unsigned.
Could I use those functions in operations between those types?
I can't see a place for this, I don't know if I am missing something

On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila  wrote:

> On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:
>
> > Hi guys.
> > I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
> > If postgres has no unsigned int type, what is the use of these macros?
>
> They are mainly used for contrib module functionality or some built-in
> functions which are not exposed.
> For example, bt_page_items() receives relation name (text) and block number
> (int), but internally the block number
> is unit32 as max blocks can be oxFFFE.
>
> With Regards,
> Amit Kapila.
>
>

Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Amit Kapila

On Friday, April 05, 2013 1:59 AM Robert Haas wrote:
> On Thu, Apr 4, 2013 at 2:53 PM, Dimitri Fontaine
>  wrote:
> > Robert Haas  writes:
> >> for estimate_worstcase_fraction.  So, when computing the cost of a
> >> path, we'd compute our current expected-case estimate, and also a
> >> worst-case estimate, and then compute the final cost as:
> >
> > There also was the idea for the executor to be able to handle
> alternate
> > plans and some heuristic to determine that the actual cost of running
> a
> > plan is much higher than what's been estimated, so much so as to
> switch
> > to starting from scratch with the other plan instead.
> 
> Yeah.  The thing is, if the plan has any side effects, that's not
> really an option.  And even if it doesn't, it may throw away a lot of
> work.  

Why to throw away all the work, it could as well try to repair the plan or
even if plan repair is not possible, it can keep multiple plans and 
next time can try to choose best among available one's.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd

On 5 April 2013 15:05, Tom Lane  wrote:
> Brendan Jurd  writes:
>> While I was in there I noticed CARDINALITY, which would be pretty easy
>> to add and would at least provide a more productive way to get the
>> "real" length of an array without disrupting existing functionality:
>
> Yeah, that would at least fix the null-result-for-empty-array problem
> for that particular functionality.  Still, this is ammunition for the
> position that null results for empty arrays are just broken.
>
> BTW ... if you check the archives you will find that we had
> cardinality() for a short while, and removed it before 8.4 release,
> because we couldn't agree on what it ought to return when given a
> multi-dimensional array.  I'm afraid that issue is still unresolved.

Well for what it's worth I would expect cardinality() to return the
total number of elements in the array (per ArrayGetNItems).  It's
consistent with the spec's identification of an array as a
"collection".  You can chunk the elements into dimensions however you
want, but it's still a collection of elements, and the cardinality is
still the number of elements.

The "nesting" interpretation doesn't accord with our internal
representation, nor with our requirement that multidim arrays be
regular, nor with the fact that we can't put an array of texts inside
an array of ints.  Our array input syntaxes for multidim arrays look
nest-ish but what they produce is not nested.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Amit Kapila

On Thursday, April 04, 2013 6:12 PM Robert Haas wrote:
> On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith 
> wrote:
> > On 4/2/13 11:54 AM, Robert Haas wrote:
> >> But, having said that, I still think the best idea is what Andres
> >> proposed, which pretty much matches my own thoughts: the bgwriter
> >> needs to populate the free list, so that buffer allocations don't
> have
> >> to wait for linear scans of the buffer array.
> >
> > I was hoping this one would make it to a full six years of being on
> the TODO
> > list before it came up again, missed it by a few weeks.  The funniest
> part
> > is that Amit even submitted a patch on this theme a few months ago
> without
> > much feedback:
> > http://www.postgresql.org/message-
> id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
> > That stalled where a few things have, on a) needing more regression
> test
> > workloads, and b) wondering just what the deal with large
> shared_buffers
> > setting degrading performance was.
> 
> Those are impressive results.  I think we should seriously consider
> doing something like that for 9.4.  TBH, although more workloads to
> test is always better, I don't think this problem is so difficult that
> we can't have some confidence in a theoretical analysis.  If I read
> the original thread correctly (and I haven't looked at the patch
> itself), the proposed patch would actually invalidate buffers before
> putting them on the freelist.  That effectively amounts to reducing
> shared_buffers, so workloads that are just on the edge of what can fit
> in shared_buffers will be harmed, and those that benefit incrementally
> from increased shared_buffers will be as well.
> 
> What I think we should do instead is collect the buffers that we think
> are evictable and stuff them onto the freelist without invalidating
> them.  When a backend allocates from the freelist, it can double-check
> that the buffer still has usage_count 0.  The odds should be pretty
> good.  But even if we sometimes notice that the buffer has been
> touched again after being put on the freelist, we haven't expended all
> that much extra effort, and that effort happened mostly in the
> background.  

If we just put it to freelist, then next time if it get allocated directly
from bufhash table, then who will remove it from freelist
or do you think that, in BufferAlloc, if it gets from bufhash table, then it
should verify if it's in freelist, then remove from freelist.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Amit Kapila

On Friday, April 05, 2013 10:00 AM Rodrigo Barboza wrote:

> I am creating my own uint32 type and faced this function.
> But my args are always of type my_int and one of the signed int types as
postgres doesn't hava unsigned.
> Could I use those functions in operations between those types?

It should not be a problem if your signed type doesn't have any negative
value.


> I can't see a place for this, I don't know if I am missing something

>On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila  wrote:
>>On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:

>> Hi guys.
>> I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
>> If postgres has no unsigned int type, what is the use of these macros?
> They are mainly used for contrib module functionality or some built-in
> functions which are not exposed.
> For example, bt_page_items() receives relation name (text) and block
number
> (int), but internally the block number
> is unit32 as max blocks can be oxFFFE.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo