Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
>> ... If we crash and restart, we'll have to get to the end
>> of this file before we start letting backends in; which might be further
>> than we actually got before the crash, but not too much further because
>> we already know the whole WAL file is available.

> Don't want to make it per file though. Big systems can whizz through WAL
> files very quickly, so we either make it a big number e.g. 255 files per
> xlogid, or we make it settable (and recorded in pg_control).

I think you are missing the point I made above.  If you set the
okay-to-resume point N files ahead, and then the master stops generating
files so quickly, you've got a problem --- it might be a long time until
the slave starts letting backends in after a crash/restart.

Fetching a new WAL segment from the archive is expensive enough that an
additional write/fsync per cycle doesn't seem that big a problem to me.
There's almost certainly a few fsync-equivalents going on in the
filesystem to create and delete the retrieved segment files.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> I think we can get away with writing the LSN value to disk, as you
> suggested, but only every so often. No need to do it after every WAL
> record, just consistently every so often, so it gives us a point at
> which we know we are safe.

Huh?  How does that make you safe?  What you need to know is the max
LSN that could possibly be on disk.

Hmm, actually we could get away with tying this to fetching WAL files
from the archive.  When switching to a new WAL file, write out the
*ending* WAL address of that file to pg_control.  Then process the WAL
records in it.  Whether or not any of the affected pages get to disk,
we know that there is no LSN on disk exceeding what we already put in
pg_control.  If we crash and restart, we'll have to get to the end
of this file before we start letting backends in; which might be further
than we actually got before the crash, but not too much further because
we already know the whole WAL file is available.

Or is that the same thing you were saying?  The detail about using
the end address seems fairly critical, and you didn't mention it...

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> ... That kinda works, but the problem is that restartpoints are time based,
> not log based. We need them to be deterministic for us to rely upon them
> in the above way.

Right, but the performance disadvantages of making them strictly
log-distance-based are pretty daunting.  We don't really want slaves
doing that while they're in catchup mode.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-28 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
>> It does nothing AFAICS for the
>> problem that when restarting archive recovery from a restartpoint,
>> it's not clear when it is safe to start letting in backends.  You need
>> to get past the highest LSN that has made it out to disk, and there is
>> no good way to know what that is.

> AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
> the controlfile, so whenever we restart we can see that it has been set
> previously and now we are beyond it.

Right ...

> So if we crash during recovery and
> then restart *after* we reached minRecoveryLoc then we resume in safe
> mode almost immediately.

Wrong.

What minRecoveryLoc is is an upper bound for the LSNs that might be
on-disk in the filesystem backup that an archive recovery starts from.
(Defined as such, it never changes during a restartpoint crash/restart.)
Once you pass that, the on-disk state as modified by any dirty buffers
inside the recovery process represents a consistent database state.
However, the on-disk state alone is not guaranteed consistent.  As you
flush some (not all) of your shared buffers you enter other
not-certainly-consistent on-disk states.  If we crash in such a state,
we know how to use the last restartpoint plus WAL replay to recover to
another state in which disk + dirty buffers are consistent.  However,
we reach such a state only when we have read WAL to beyond the highest
LSN that has reached disk --- and in recovery mode there is no clean
way to determine what that was.

Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
but have it scribble a highest-LSN somewhere on stable storage (maybe
scribble on pg_control itself, or maybe better someplace else).  I'm
not totally sure about that.  But I am sure that doing nothing will
be unreliable.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] get_relation_stats_hook()

2008-09-28 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> New version of Postgres patch, v5. Implements suggested changes.
> Ready for review and apply.

Applied with some revisions.  The method for passing back freefunc
didn't work, so I made it pass the whole VariableStatsData struct
instead; this might allow some additional flexibility by changing other
fields besides the intended statsTuple and freefunc.  Also, I was still
unhappy about adding a hook in the midst of code that clearly needs
improvement, without making it possible for the hook to override the
adjacent broken code paths; so I refactored the API a bit for that too.

The plugin function would now be something like this:

static bool
plugin_get_relation_stats(PlannerInfo *root,
  RangeTblEntry *rte,
  AttrNumber attnum,
  VariableStatData *vardata)
{
HeapTuplestatstup = NULL;

/* For now, we only cover the simple-relation case */
if (rte->rtekind != RTE_RELATION || rte->inh)
return false;

if (!get_tom_stats_tupletable(rte->relid, attnum))
return false;

/*
 * Get stats if present. We asked for only one row, so no need for loops.
 */
if (SPI_processed > 0)
statstup = SPI_copytuple(SPI_tuptable->vals[0]);

SPI_freetuptable(SPI_tuptable);
SPI_finish();

if (!statstup)
return false;/* should this happen? */

vardata->statsTuple = statstup;
/* define function to use when time to free the tuple */
vardata->freefunc = heap_freetuple;

return true;
}

and if you want to insert stats for expression indexes then there's a
separate get_index_stats_hook for that.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-28 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
>> After reading this for awhile, I realized that there is a rather
>> fundamental problem with it: it switches into "consistent recovery"
>> mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
>> In a crash recovery situation that typically is before the last
>> checkpoint (if indeed it's not still zero), and what that means is
>> that this patch will activate the bgwriter and start letting in
>> backends instantaneously after a crash, long before we can have any
>> certainty that the DB state really is consistent.
>> 
>> In a normal crash recovery situation this would be easily fixed by
>> simply not letting it go to "consistent recovery" state at all, but
>> what about recovery from a restartpoint?  We don't want a slave that's
>> crashed once to never let backends in again.  But I don't see how to
>> determine that we're far enough past the restartpoint to be consistent
>> again.  In crash recovery we assume (without proof ;-)) that we're
>> consistent once we reach the end of valid-looking WAL, but that rule
>> doesn't help for a slave that's following a continuing WAL sequence.
>> 
>> Perhaps something could be done based on noting when we have to pull in
>> a WAL segment from the recovery_command, but it sounds like a pretty
>> fragile assumption.

> Seems like we just say we only signal the postmaster if
> InArchiveRecovery. Archive recovery from a restartpoint is still archive
> recovery, so this shouldn't be a problem in the way you mention. The
> presence of recovery.conf overrides all other cases.

What that implements is my comment that we don't have to let anyone in
at all during a plain crash recovery.  It does nothing AFAICS for the
problem that when restarting archive recovery from a restartpoint,
it's not clear when it is safe to start letting in backends.  You need
to get past the highest LSN that has made it out to disk, and there is
no good way to know what that is.

Unless we can get past this problem the whole thing seems a bit dead in
the water :-(

>> * I'm a bit uncomfortable with the fact that the
>> IsRecoveryProcessingMode flag is read and written with no lock.

> It's not a dynamic state, so I can fix that inside
> IsRecoveryProcessingMode() with a local state to make check faster.

Erm, this code doesn't look like it can allow IsRecoveryProcessingMode
to become locally true in the first place?  I guess you could fix it
by initializing IsRecoveryProcessingMode to true, but that seems likely
to break other places.  Maybe better is to have an additional local
state variable showing whether the flag has ever been fetched from
shared memory.

The other issues don't seem worth arguing about ...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-25 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Version 7

After reading this for awhile, I realized that there is a rather
fundamental problem with it: it switches into "consistent recovery"
mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
In a crash recovery situation that typically is before the last
checkpoint (if indeed it's not still zero), and what that means is
that this patch will activate the bgwriter and start letting in
backends instantaneously after a crash, long before we can have any
certainty that the DB state really is consistent.

In a normal crash recovery situation this would be easily fixed by
simply not letting it go to "consistent recovery" state at all, but
what about recovery from a restartpoint?  We don't want a slave that's
crashed once to never let backends in again.  But I don't see how to
determine that we're far enough past the restartpoint to be consistent
again.  In crash recovery we assume (without proof ;-)) that we're
consistent once we reach the end of valid-looking WAL, but that rule
doesn't help for a slave that's following a continuing WAL sequence.

Perhaps something could be done based on noting when we have to pull in
a WAL segment from the recovery_command, but it sounds like a pretty
fragile assumption.

Anyway, that's sufficiently bad that I'm bouncing the patch for
reconsideration.  Some other issues I noted before giving up:

* I'm a bit uncomfortable with the fact that the
IsRecoveryProcessingMode flag is read and written with no lock.
There's no atomicity or concurrent-write problem, sure, but on
a multiprocessor with weak memory ordering guarantees (eg PPC)
readers could get a fairly stale value of the flag.  The false
to true transition happens before anyone except the startup process is
running, so that's no problem; the net effect is then that backends
might think that recovery mode was still active for awhile after it
wasn't.  This seems a bit scary, eg in the patch as it stands that'll
disable XLogFlush calls that should have happened.  You could fix that
by grabbing/releasing some spinlock (any old one) around the accesses,
but are any of the call sites performance-critical?  The one in
XLogInsert looks like it is, if nothing else.

* I kinda think you broke XLogFlush anyway.  It's certainly clear
that the WARNING case at the bottom is unreachable with the patch,
and I think that means that you've messed up an important error
robustness behavior.  Is it still possible to get out of recovery mode
if there are any bad LSNs in the shared buffer pool?

* The use of InRecovery in CreateCheckPoint seems pretty bogus, since
that function can be called from the bgwriter in which the flag will
never be true.  Either this needs to be IsRecoveryProcessingMode(),
or it's useless because we'll never create ordinary checkpoints until
after subtrans.c is up anyway.

* The patch moves the clearing of InRecovery from after to before
StartupCLOG, RecoverPreparedTransactions, etc.  Is that really a
good idea?  It makes it hard for those modules to know if they are
coming up after a normal or recovery startup.  I think they may not
care at the moment, but I'd leave that alone without a darn good
reason to change it.

* The comment about CheckpointLock being only pro forma is now wrong,
if you are going to use locking it to implement a delay in exitRecovery.
But I don't understand why the delay there.  If it's needed, seems like
the path where a restartpoint *isn't* in progress is wrong --- don't you
actually need to start one and wait for it?  Also I note that if the 
LWLockConditionalAcquire succeeds, you neglect to release the lock,
which surely can't be right.

* The tail end of StartupXLOG() looks pretty unsafe to me.  Surely
we mustn't clear IsRecoveryProcessingMode until after we have
established the safe-recovery checkpoint.  The comments there seem to
be only vaguely related to the current state of the patch, too.

* Logging of recoveryLastXTime seems pretty bogus now.  It's supposed to
be associated with a restartpoint completion report, but now it's just
out in the ether somewhere and doesn't represent a guarantee that we're
synchronized that far.

* backup.sgml needs to be updated to say that log_restartpoints is
obsolete.

* There are a bunch of disturbing assumptions in the SLRU-related
modules about their StartUp calls being executed without any concurrent
access.  This isn't really a problem that needs to be dealt with in this
patch, I think, but that will all have to be cleaned up before we dare
allow any backends to run concurrently with recovery.  btree's
suppression of relcache invals for metapage updates will be a problem
too.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Thinks: Why not just sort all of the time and skip the debate entirely?

The sort is demonstrably a loser for smaller indexes.  Admittedly,
if the index is small then the sort can't cost all that much, but if
the (correct) threshold is some large fraction of shared_buffers then
it could still take awhile on installations with lots-o-buffers.

The other side of that coin is that it's not clear this is really worth
arguing about, much less exposing a separate parameter for.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-23 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> maintenance_work_mem is already used for 3 separate operations that bear
> little resemblance to each other. If it's appropriate for all of those
> then its appropriate for this usage also.

No, it isn't.

The fundamental point here is that this isn't a memory allocation
parameter; it's a switchover threshold between two different behaviors.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> I wasn't very happy with effective_cache_size and not happy with
> shared_buffers either. If building hash indexes is memory critical then
> we just need to say so and encourage others to set memory use correctly.
> People are already aware that maintenance_work_mem needs to be increased
> for large index builds and we will confuse people if we ignore that and
> use another parameter instead.

I think you've got this completely backwards.  The general theory about
maintenance_work_mem is "set it as large as you can stand it".  The
issue at hand here is that the crossover point for hash index sort
building seems to be a good deal less than all-the-memory-you-have.

Perhaps there is a case for giving this behavior its very own
configuration parameter; but seeing that we still don't have all that
much of a use case for hash indexes at all, I don't feel a need to do
that yet.  In any case, tying it to maintenance_work_mem is certainly
wrong.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
>> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> I'm considering changing hashbuild to switch over at shared_buffers instead
>>> of effective_cache_size --- any thoughts about that?
>> 
>> Switching to shared_buffers gets my vote, on my test table with
>> 50,000,000 rows it takes about 8 minutes to create an index using the
>> default effective_cache_size.  With effective_cache_size set to 6GB
>> (machine has 8GB) its still going an hour later.

> Agreed.  I think using shared_buffers as a cutoff is a much better idea as 
> well.

Already done in CVS a week or so back, but thanks for following up with
some confirmation.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
>> Do we really need a checkpoint there at all?

> "Timelines only change at shutdown checkpoints".

Hmm.  I *think* that that is just a debugging crosscheck rather than a
critical property.  But yeah, it would take some close investigation,
which maybe isn't warranted if you have a less-invasive solution.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Having some trouble trying to get a clean state change from recovery to
> normal mode.

> Startup needs to be able to write WAL at the end of recovery so it can
> write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> that. Other services are still not fully up yet. So every other process
> apart from Startup musn't write WAL until Startup has fully completed
> its actions, which is sometime later.
> ...
> We *might* solve this by making the final checkpoint that Startup writes
> into an online checkpoint.

Do we really need a checkpoint there at all?  I can't recall right now
if there was a good reason why Vadim made it do that.  I have a feeling
that it might have had to do with an assumption that the recovery
environment was completely distinct from live environment; which is a
statement that's certainly not going to be true in a query-answering
slave.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-16 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> wrote:
> Testing takes a while on this, I probably won't complete it until
> Friday. So enclosed patch is for eyeballs only at this stage.

What's the status on that patch?

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-14 Thread Tom Lane
I did some testing of my own on the hash index patch, and mostly seem to
have confirmed Alex's results.  I used a table like this:

create table tab (f1 serial primary key, f2 some-datatype);
create index ind on tab using hash (f2);

and populated it with 2 million rows of data; then timed queries
like this:

select * from tab a join tab b using(f2)
  where f2 = (select f2 from tab c
where c.f1 = (select int4(random() * 2e6)));

using pgbench like this:

pgbench -n -c 1 -T 60 -M prepared -f query.sql hashdb

To test "wide" indexed columns I used a text column with entries of 100
random characters (identical to Alex's test).  I saw a performance gain
of about 50% with an empty kernel cache (26.9 vs 41.9 tps), dropping to
about 14% once the table and index were fully swapped in (4185 vs 4764
tps).  This was in a C-locale database.  Presumably the win would have
been significantly greater in a non-C-locale test, but I didn't try it.

To test "narrow" indexed columns I made f2 a bigint containing 200
consecutive integers.  Here I saw about a 5% improvement with either
empty cache (48.1 vs 50.5 tps) or full cache (4590 vs 4800 tps).
This is not too surprising since about the same amount of I/O is needed
either way, and bigint comparison is very cheap.  (This is a 64-bit
machine, and it's defaulting to pass-by-value for bigints, so value
comparisons are hardly more expensive than hashcode comparisons.)
But the patch still wins a bit by being able to do binary search within
index pages.

In both of the above cases there were only a negligible number of hash
collisions.  I made up another test case, still 2 million bigints, but
with values chosen so that almost every entry had a hash collision with
another entry (a small fraction had two or even 3 collisions).  This
showed about a 25% slowdown compared to CVS HEAD with empty cache
(49.9 tps down to 37.2), decreasing to 3% with full cache (4609 vs 4482
tps).

I experimented with some variant queries that did more hash index
fetches per query, and saw penalties as high as 50%.  However, this is
surely by far the worst case for the patch: no savings in index size,
and a ridiculously high collision rate.

Lastly, for comparison's sake I tried the "wide column" case with a
btree instead of hash index.  This gave me 31.5 tps with empty cache,
4749 tps with full cache.  Note that the btree is losing significantly
to the patched hash index in empty-cache conditions --- this presumably
reflects the much larger index size causing more I/O.

I'm thinking that we should go ahead and apply the patch.  AFAIR this is
the first example we've ever seen of hash beating btree for fetch
performance, and it's pretty obvious that removing the indexed value
from the index contents is the reason for the win.  So I've lost
interest in the alternative that involved storing both hashcode and
indexed value.  We now see a possible production use-case for hash,
namely indexing wide column values.

BTW, one thing I noticed was that the hash index build time for the
"wide column" case got a lot worse after applying the patch (from 56 to
237 sec).  The reason for this turned out to be that with the smaller
predicted index size, the code decided not to use the pre-sorting method
that was recently added.  Reducing effective_cache_size to less than the
index size brought the time back down, to about 54 sec.  So it would
seem that effective_cache_size is too large a cutoff value.  I'm
considering changing hashbuild to switch over at shared_buffers instead
of effective_cache_size --- any thoughts about that?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-09 Thread Tom Lane
"Ryan Bradetich" <[EMAIL PROTECTED]> writes:
> I am assuming you are seeing this error in the uint_test1.sql:
> ERROR:  could not find hash function for hash operator 16524
> I can bypass the error in uint_test1.sql by disabling the hash joins.
> I am going to dig in and figure out why the hashjoin operation is broken.

Well, the cause of that one would've been marking an operator as HASHES
without providing a hash opclass to back it up.

IIRC the test case involved ">"?  That shouldn't even be marked HASHES
anyway ...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:
>> Hmm, I dunno, it seems like that might be a bad choice.  Are you sure
>> it's not cleaner to just use the regular checkpoint code?

> When I tried to write it, it just looked to my eyes like every single
> line had a caveat which looked ugly and multiplied the testing. You're
> the code dude, always happy to structure things as you suggest. If
> you're sure, that is.

No, was just wondering if the other way would be better.  If you're
sure not, that's fine.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Included patch with the following changes:

> * new postmaster mode known as consistent recovery, entered only when
> recovery passes safe/consistent point. InRedo is now set in all
> processes when started/connected in consistent recovery mode.

I looked at this and didn't like the InRedo signaling at all.  In the
first place, it looks like you're expecting the flag to be passed down
via fork(), but that won't work in EXEC_BACKEND mode.  (Yes, easily
fixed, but it's wrong as-is.)  In the second place, the method of
signaling it to the bgwriter looks to have race conditions: in
principle, at least, I think the startup process could try to clear
the shared-memory InRedo flag before the bgwriter has gotten as far as
setting it.  (This might work if the startup process can't exit redo
mode without the bgwriter having finished a checkpoint, but such a
dependency is too rube goldbergian for me.)

ISTM that it would probably be better if there were exactly one InRedo
flag in shared memory, probably in xlog.c's shared state, with the
postmaster not being responsible for setting or clearing it; rather
the startup process should do those things.

> * bgwriter and stats process starts in consistent recovery mode.
> bgwriter changes mode when startup process completes.

I'm not sure about the interaction of this.  In particular, what about
recovery restart points before we have reached the safe stop point?
I don't think we want to give up the capability of having those.

Also, it seems pretty bogus to update the in-memory ControlFile
checkpoint values before the restart point is actually done.  It looks
to me like what you have done is to try to use those fields as signaling
for the restart request in addition to their existing purposes, which
I think is confusing and probably dangerous.  I'd rather there were a
different signaling path and ControlFile maintains its currrent
definition.

I found the changes in bgwriter.c unreadable.  You've evidently
moved blocks of code around, but exactly what did you mean to
change?  Why is there so much duplicated code now?

> I've kept the distinction between restartpoints and checkpoints in
> code, to avoid convoluted code.

Hmm, I dunno, it seems like that might be a bad choice.  Are you sure
it's not cleaner to just use the regular checkpoint code?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-08 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> I attach cleanup patch which we discussed last commit fest. It introduce new 
> macros HashGetMetaPage and HashGetBitmap and of course, it break backward on 
> disk format compatibility which we will break it anyway when Xiao's patch 
> will 
> be committed.

If we accept that patch, I'm okay with including this one too.  Still
hoping for some performance testing though.  (Note that this one isn't
going to affect performance, so no need to include it for that purpose.)

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-07 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On Sun, Sep 7, 2008 at 2:41 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> That obviously isn't supposed to happen.  Where's it coming from
>> exactly?

> convert_numeric_to_scalar() in src/backend/utils/adt/selfuncs.c

> the problem seems to be that we are asking for each type of numeric
> and of course that doesn't know nothing about unsigned integers so its
> treating it as a non-numeric.

Ah.  The scalarltsel/scalargtsel stuff has always been a bit bogus for
cross-type comparisons; it assumes it will know both or neither of the
two datatypes.  So you can get away with using those functions for
uint > uint (although they won't be very bright about it); but using
them for uint > int fails outright.

If you read the comments around that stuff it leaves quite a lot to be
desired, but I don't really have better ideas at the moment.  The best
near-term solution for the uint module is probably not to rely on
scalarltsel/scalargtsel for uint comparisons, but to make its own
selectivity functions that know the uint types plus whatever standard
types you want to have comparisons with.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-07 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> contrib_regression=# select * from t1 where f1 > 35;
> ERROR:  unsupported type: 16486

That obviously isn't supposed to happen.  Where's it coming from
exactly?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> then the patch is right but it seems to me like that is broking the
> law of less surprise i expected -2::uint1 to be equivalent to
> (-2)::uint1 that should be at least documented, no?

See the precedence table here:
http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE

:: binds more tightly than -, and always has.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> postgres=# select -256::uint1;
>>> ERROR:  uint1 out of range
>> 
>> No, that's just because this is parsed as -(256::uint1)

> actually, i thought that case is right but the -255::uint1 returning a
> negative number (aka -255) is what bothers me

Well, again, that's -(255::uint1).  I suppose uint1 hasn't got a
negation operator (what would it do??), so probably the sequence of
events is to form 255::uint1, then implicitly promote it to some signed
type or other (most likely int4), then negate.  Not much to be done
about this unless you want to get rid of the implicit coercion to signed
types, which would probably defeat most of the purpose.

Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO.
Casting any negative value to uint ought to fail, no?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-06 Thread Tom Lane
I wrote:
> I looked this over a bit and was immediately confused by one thing:
> the introductory comment says that the skip table size ought to be based
> on the length of the haystack, which makes sense to me, but the code is
> actually initializing it on the basis of len2, ie, the length of the
> needle.  Isn't that a bug?  Was the same bug present in the tests you
> made to determine the best table sizes?

BTW, to the extent that you feel like testing a different idea,
I would suggest:

* don't bother initializing the skiptable when len1 < len2

* otherwise, choose its size based on len1 - len2, not just len1 or
len2.  This is (one less than) the maximum number of search loop
consultations of the skip table that can happen, so it seems like a
plausible number, and better than either length alone.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-06 Thread Tom Lane
"David Rowley" <[EMAIL PROTECTED]> writes:
> I've made and attached the changes Heikki recommended.

I looked this over a bit and was immediately confused by one thing:
the introductory comment says that the skip table size ought to be based
on the length of the haystack, which makes sense to me, but the code is
actually initializing it on the basis of len2, ie, the length of the
needle.  Isn't that a bug?  Was the same bug present in the tests you
made to determine the best table sizes?

Stylistic issues:

I really dislike having two copies of the heuristic size-setting code.
I think it's worth introducing a second test of use_wchar in order to
arrange text_position_setup like this:

... existing code ...

choose skiptablesize

initialize skip table (this loop doesn't depend on char width)

if (!state->use_wchar)
process char needle
else
process wide-char needle

Also it might be worth having local-variable copies of skiptablesize and
len2 for this initialization loop:

for (ai = 0; ai <= state->skiptablesize; ai++)
state->skiptable[ai] = state->len2;

I'm not at all sure whether a C compiler will think that it's allowed to
avoid fetching these values again on each iteration, seeing that the
loop is storing into other integer members of *state.  Given that this
loop is exactly the overhead we're trying to control by adjusting
skiptablesize, making it as tight as possible seems worthwhile.

> Now that the skip table is a member of TextPositionState, I was not quite
> sure if I should #define a size for it. It would certainly look neater, only
> the code that defines the skip table size in text_position_start assumes 256
> elements. 

I tend to agree that a #define is pointless there, since there's no easy
way to make the relevant code do anything with it.  It would be
worthwhile to point out adjacent to the code what the max length is,
though.  I had gotten as far as rewriting your introductory comment like
this

/*
 * Prepare the skip table for Boyer-Moore-Horspool searching.  First we
 * must determine how big a skip table to use.  The declaration of
 * TextPositionState allows up to 256 elements, but with haystack lengths
 * of only a few characters we don't really want to have to initialize so
 * many elements --- it would take too long in comparison to the actual
 * search time.  So we choose a useful skip table size based on the
 * haystack length.

before noticing that what I was writing wasn't what the code actually
did.  Another point that doesn't seem to be included in your comments is
that the skip table size *must* be a power of 2 because we use bit
masking.  (In fact state->skiptablesize might better be named skiptablemask
since it's really 1 less than the table size.)

BTW, try to avoid introducing so much random vertical space.  IMHO it
does little for readability and much to make routines too long to fit in
one screenful.  Generally the PG code doesn't use double blank lines
anywhere inside a function, nor blank lines before a right brace line.
pg_indent will clean this up to some extent, but it would be better if
the code looked more like the project style in the first place.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> seems there is something wrong in the unlikely macro (i'm using GCC
> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64)

> postgres=# select -256::uint1;
> ERROR:  uint1 out of range

No, that's just because this is parsed as -(256::uint1)

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
>> Ok now that I made it so it actually *test* collisions, with the patch
>> it always returns all rows that matched the hashed "key".

> And here is the fix, we just forget to set the recheck flag for bitmap scans.

For the convenience of anyone intending to test, here is an updated
patch against CVS HEAD that incorporates Alex's fix.

regards, tom lane



binwQERo3PM5e.bin
Description: hash-v5.patch.gz

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> - tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, false);
> + tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, true);

Hah, I bet that explains Jonah's complaint that recheck didn't seem to
be happening in his tests.  Nice catch.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-05 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Also, it would be nice to use B-M(-H) for LIKE as well.

Right offhand, that seems impossible, at least in patterns with %.
Or were you thinking of trying to separate out the fixed substrings
of a pattern and search for them with BMH?

Anyway, it's not material for this patch, since it'd involve pretty
fundamental redesign of the LIKE code.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-05 Thread Tom Lane
"David Rowley" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Hmm.  B-M-H has worst case search speed O(M*N) (where M = length of
>> pattern, N = length of search string); whereas full B-M is O(N). Maybe we
>> should build the second table when M is large? 

> I'll look into this. If it pays off for longer searches I'll submit a patch.
> I won't have the time until after the 15th, so perhaps that's in November's
> commit fest?

Yes.  Let's get B-M-H in during this fest and then you can look into
whether a follow-on patch is worthwhile.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> * check that the queries actually use the indexes (not sure that the
>> proposed switch settings ensure this, not to mention you didn't create
>> the indexes)

> Well I was assuming I could just test the speed of a hash join...

Uh, no, hash joins have nearly zip to do with hash indexes.  They rely
on the same per-datatype support functions but that's the end of the
commonality.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> So what we need for testing is a few different key values that hash to
>> the same code.  Not sure about an easy way to find such.

> Hrm, well I have not really looked at the hash algorithm but I assume
> we could just reduce the number of buckets?

No, we need fully equal hash keys, else the code won't visit the heap.

I guess one thing we could do for testing purposes is lobotomize one of
the datatype-specific hash functions.  For instance, make int8_hash
return the input mod 2^32, ignoring the upper bytes.  Then it'd be easy
to compute different int8s that hash to the same thing.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-04 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> After reading the wikipedia article on Boyer-Moore search algorithm, it 
> looks to me like this patch actually implements the simpler 
> Boyer-Moore-Horspool algorithm that only uses one lookup table. That's 
> probably fine, as it ought to be faster on small needles and haystacks 
> because it requires less effort to build the lookup tables, even though 
> the worst-case performance is worse. It should still be faster than what 
> we have now.

Hmm.  B-M-H has worst case search speed O(M*N) (where M = length of
pattern, N = length of search string); whereas full B-M is O(N).
Maybe we should build the second table when M is large?  Although
realistically that is probably gilding the lily, since frankly there
haven't been many real-world complaints about the speed of these
functions anyway ...

> The skip table really should be constructed only once in 
> text_position_start and stored in TextPositionState. That would make a 
> big difference to the performance of those functions that call 
> text_position_next repeatedly: replace_text, split_text and text_to_array.

+1.  The heuristic about big a skip table to make may need some
adjustment as well, since it seems to be considering only a single
search.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
I wrote:
> You have the unique-versus-not dimension,

On second thought, actually not.  What we want to look at is the penalty
for false matches due to *distinct* key values that happen to have the
same hash codes.  Your test case for all-the-same is using all the same
key values, which means it'll hit the heap a lot, but none of those will
be wasted trips.

So what we need for testing is a few different key values that hash to
the same code.  Not sure about an easy way to find such.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> Ok let me know if this is to naive of an approach or not hitting the
> right cases you want tested.

You have the unique-versus-not dimension, but I'm also wondering about
narrow vs wide index keys (say about 8 bytes vs 50-100 or so).  In the
former case we're not saving any index space by storing only the hash
code, so these could be expected to have different performance
behaviors.

As for specifics of the suggested scripts:

* might be better to do select count(*) not select 1, so that client
communication is minimized

* check that the queries actually use the indexes (not sure that the
proposed switch settings ensure this, not to mention you didn't create
the indexes)

* make sure the pgbench transaction counts are large enough to ensure
significant runtime

* the specific table sizes suggested are surely not large enough

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Here is an updated patch incorporating Zdenek's review, my own
observation that we should make the index tupledesc tell the truth,
and some other fixes/improvements such as making backwards scans
work as expected.

The main thing lacking before this could be committed, from a code
standpoint, is a cleaner solution to the problem of adjusting the
index tupledesc (see the ugly hack in catalog/index.c).  However,
that complaint is irrelevant for functionality or performance testing,
so I'm throwing this back out there in hopes someone will do some...

I thought a little bit about how to extend this to store both hashcode
and original index key, and realized that the desire to have a truthful
index tupledesc makes that a *whole* lot harder.  The planner, and
really even the pg_index catalog representation, assume that the visible
columns of an index are one-for-one with the index keys.  We can slide
through with the attached patch because this is still true ---
effectively we're just using a "storage type" different from the indexed
column's type for hash indexes, as already works for GIST and GIN.
But having two visible columns would bollix up quite a lot of stuff.
So I think if we actually want to do that, we'd need to revert to the
concept of cheating on the tupledesc.  Aside from the various uglinesses
that I was able to remove from the original patch by not having that,
I'm still quite concerned that we'd find something else wrong with
doing that, further down the road.

So my thinking right now is that we should just test this patch as-is.
If it doesn't show really horrid performance when there are lots of
hash key collisions, we should forget the store-both-things idea and
just go with this.

regards, tom lane




binNKAVnsTxLq.bin
Description: hash-v4.patch.gz

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> pgsql/src/backend/access/hash/hashutil.c
> <http://reviewdemo.postgresql.org/r/26/#comment27>

>  It would be better remove #define from hash.h and setup it there 
> directly.

Actually, I don't like this aspect of the patch one bit: it means that
the system catalogs are lying about what is stored in the index, which
seems likely to break something somewhere, either now or down the road.
I think the correct way to handle this is to make the pg_attribute entries
(and hence the index's relation descriptor) accurately match what is
stored in the index.  For testing purposes I propose this crude hack
in catalog/index.c's ConstructTupleDescriptor():

*** src/backend/catalog/index.c.origMon Aug 25 18:42:32 2008
--- src/backend/catalog/index.c Thu Sep  4 16:20:12 2008
***
*** 133,138 
--- 133,139 
Form_pg_attribute to = indexTupDesc->attrs[i];
HeapTuple   tuple;
Form_pg_type typeTup;
+   Form_pg_opclass opclassTup;
Oid keyType;
  
if (atnum != 0)
***
*** 240,246 
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for opclass %u",
 classObjectId[i]);
!   keyType = ((Form_pg_opclass) GETSTRUCT(tuple))->opckeytype;
ReleaseSysCache(tuple);
  
if (OidIsValid(keyType) && keyType != to->atttypid)
--- 241,252 
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for opclass %u",
 classObjectId[i]);
!   opclassTup = (Form_pg_opclass) GETSTRUCT(tuple);
!   /* HACK: make hash always use int4 as storage (really it's 
uint32) */
!   if (opclassTup->opcmethod == HASH_AM_OID)
!   keyType = INT4OID;
!   else
!   keyType = opclassTup->opckeytype;
ReleaseSysCache(tuple);
  
if (OidIsValid(keyType) && keyType != to->atttypid)


Assuming the patch gets accepted, we should devise some cleaner way
of letting index AMs adjust their indexes' reldescs; maybe declare a
new entry point column in pg_am that lets the AM modify the tupledesc
constructed by this function before it gets used to create the index.
But that is irrelevant to performance testing, so I'm not going to do
it right now.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> I performed code review and see my comments.

Thanks for the comments.  I've incorporated all of these into an updated
patch that I'm preparing, except for

>  Why not define new datatype for example HashKey instead of uint32?

This seems like a good idea, but I think we should do it as a separate,
cosmetic-cleanup patch.  It'll touch a lot of parts of access/hash/ that
the current patch doesn't need to change, and thus complicate reviewing.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-03 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Right now it seems strange that the index is larger than a btree, yet
> the performance tests show that 3 times as much I/O was used accessing
> the btree.

Well, in an ideal world a hash index probe is O(1) while a btree probe
is O(log N), so that result is exactly what hash proponents would hope
for.  Whether it's real or not is another question, but it could be.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] WIP Join Removal

2008-09-02 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2008-09-02 at 12:28 -0400, Tom Lane wrote:
>> I haven't thought this through entirely, but wouldn't a partial index be
>> okay if it's marked predOK?  You might be right that the case is
>> unlikely, but if it's only one extra line to support it ...

> As of now, its it is predOK then that implies there was a qual on the
> checkrel so can't remove the join.

That conclusion seems utterly wrong to me.  Per the example of

a left join b on (a.x = b.y and b.z = 1)

b.z = 1 will bubble down to be a qual on b.  It doesn't prevent the join
optimization, and it does possibly allow the use of a partial index.
In particular a unique index on b.y with a predicate involving b.z would
be relevant to the optimization decision here.

In slightly more realistic cases, b might be a view on c that imposes
a WHERE condition on c.z, so that's another avenue for a qual to exist
below the join.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] WIP Join Removal

2008-09-02 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> + if (removable && 
> + joinrel->cheapest_total_path < keeprel->cheapest_total_path)
> + {
> + elog(LOG, "join removed");
> + joinrel->pathlist = keeprel->pathlist;
> + joinrel->joininfo = keeprel->baserestrictinfo;
> + }
> + }

On third thought: if you think that the join paths could possibly win,
then the correct coding here is something like

foreach(path, keeprel->pathlist)
add_path(joinrel, ...)

The reason is that it's not an all-or-nothing choice: some of the paths
might offer cheaper startup cost, or present a useful sort order.
So just present them as available alternatives and let add_path sort it
out.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] WIP Join Removal

2008-09-02 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> As discussed on 26 June, "Join Removal/Vertical Partitioning", here's a
> patch to remove joins in certain circumstances.

Some points not made in the thread so far:

> + if (checkrel->rtekind != RTE_RELATION)
> + return;

This isn't right, or at least isn't sufficient, because rtekind is
undefined for joinrels.  I think it should be something like

/*
 * Since we can't prove anything unless keeprel has a unique
 * index, give up immediately if it's a join or not a table.
 */
if (checkrel->reloptkind == RELOPT_JOINREL ||
checkrel->rtekind != RTE_RELATION)
return;

Maybe you should also check for checkrel->indexlist == NIL here, since
there's no point in doing the attr_needed bookkeeping if no indexes.

> + /*
> +  * Check that the query targetlist does not contain any non-junk
> +  * target entries for the relation. If it does, we cannot remove join.
> +  */
> + if (checkrel->min_attr > 0)
> + minattr = checkrel->min_attr;
> + 
> + for (attrno = minattr; attrno < checkrel->max_attr; attrno++)
> + {
> + if (!bms_is_empty(checkrel->attr_needed[attrno]))
> + return;
> + }

This part seems pretty wrong.  In the first place, you mustn't
discriminate against system columns like that.  (Consider for instance
that the only attr being used from the inner rel is its OID column.)
You're failing to check max_attr too --- I believe the loop ought to run
from min_attr to max_attr inclusive.  In the second place, I don't see
how the routine ever succeeds at all if it insists on attr_needed being
empty, because whatever attrs are used in the join condition will surely
have nonempty attr_needed.  What you want is to see whether there are
any attrs that are needed *above the join*, which would be something
like
if (!bms_is_subset(checkrel->attr_needed[attrno], joinrel->relids))
fail;
The reference to non-junk in the comment is off base too.  Attrs are
used or not, there's no concept of a junk attr.

> +  * XXX Seems not worth searching partial indexes 
> because those
> +  * are only usable with a appropriate restriction, so 
> the
> +  * only way they could ever be usable is if there was a
> +  * restriction that exactly matched the index predicate,
> +  * which is possible but generally unlikely.

I haven't thought this through entirely, but wouldn't a partial index be
okay if it's marked predOK?  You might be right that the case is
unlikely, but if it's only one extra line to support it ...

> + if (removable && 
> + joinrel->cheapest_total_path < keeprel->cheapest_total_path)

This test on cheapest_total_path seems pretty wacko --- even granting
that you meant to test the costs and not compare pointer addresses,
isn't it backward?  Also I doubt that the joinrel's cheapest_total_path
field is set yet where you're calling this.

In any case I'm unconvinced that join removal could ever not be a win,
so that test seems unnecessary anyhow.

> + {
> + elog(LOG, "join removed");
> + joinrel->pathlist = keeprel->pathlist;
> + joinrel->joininfo = keeprel->baserestrictinfo;
> + }

Huh?  What in the world are you doing to joininfo here?  This function
should only be manipulating the path list.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] WIP Join Removal

2008-09-02 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
>> Oh. How does the query look like after removing the join, then?

> Same answer, just slower. Removing the join makes the access to a into a
> SeqScan, whereas it was a two-table index plan when both tables present.

I don't really believe this: please show an actual case where the join
would be faster.

AFAICS, in the outer-join examples, it is not possible for a join to
enable some kind of indexscan on the outer table, because by definition
an outer join excludes none of the left-hand rows.  So a seqscan on the
outer is optimal.

I also find all the worry about generating other plans for the inner
relation to be off the mark.  You're not going to *use* any plan for the
inner rel, so who cares what plans it has?

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] WIP Join Removal

2008-09-02 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-09-01 at 22:23 +0300, Heikki Linnakangas wrote:
>> Did plan invalidation make it safe to rely on the presence of a unique 
>> index for planning decisions?

> My understanding was "Yes" and this case was the specific reason I
> originally wanted to pursue plan invalidation back in 2006.

Yeah, it should work.  The theory is that any schema change that could
affect planning should result in broadcasting a relcache inval message
for the table (not just the index, note).  I'm pretty confident that
that works for index addition and removal (cf index_update_stats and
index_drop).  There might be some situations where we need to force a
relcache inval but don't currently do so --- constraint addition/removal
for instance I'm not too sure about.  But that would represent an easily
fixable bug.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-08-02 Thread Tom Lane
chris <[EMAIL PROTECTED]> writes:
> Do we need to wait until a fully-parallelizing pg_restore is
> implemented before adding this functionality to pg_dump?

They're independent problems ... and I would venture that parallel
dump is harder.

> Further, it's actually not obvious that we *necessarily* care about
> parallelizing loading data.  The thing that happens every day is
> backups.

Maybe so, but I would say that routine backups shouldn't be designed
to eat 100% of your disk bandwidth anyway --- they'd be more like
background tasks.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-08-01 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I think it makes sense to commit this patch now, per previous
> discussions on which we have agreed to make incremental changes.

Yeah, but at the same time there is merit in the argument that the
proposed patch hasn't actually been proven to be usable for anything.
I would be a lot happier if there were even a trivial proof-of-concept
plugin example submitted with it, just to prove that there were no
showstopper problems in the plugin design, like failure to pass
essential information or not getting the locking straight.

> I'm just wondering if the change of usage_count from 16 to 8 bits was
> discussed and agreed?

Umm ... it was not, but given that we have logic in there to limit the
usage_count to 5 or so, it's hard to argue that there's a big problem.

I confess to not having read the patch in detail --- where did the other
8 bits go to?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-31 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Martin Zaun wrote:
>> With these avenues to be explored, can the pg_standby patch on the
>> CommitFest wiki be moved to the "Returned with Feedback" section?

> Yes, I think we can conclude that we don't want this patch as it is. 
> Instead, we want a documentation patch that describes the problem, 
> mentioning that GNU cp is safe, or you can use the copy+rename trick.

Right, after which we remove the presently hacked-in delay.

I've updated the commitfest page accordingly.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> I want to dump tables separately for performance reasons. There are
> documented tests showing 100% gains using this method. There is no gain
> adding this to pg_restore. There is a gain to be had - parallelising
> index creation, but this patch doesn't provide parallelisation.

Right, but the parallelization is going to happen sometime, and it is
going to happen in the context of pg_restore.  So I think it's pretty
silly to argue that no one will ever want this feature to work in
pg_restore.

To extend the example I just gave to Stephen, I think a fairly probable
scenario is where you only need to tweak some "before" object
definitions, and then you could do

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
psql -f before.sql target_db
pg_restore --data-only --schema-after-data -d target_db whole.dump

which (given a parallelizing pg_restore) would do all the time-consuming
steps in a fully parallelized fashion.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
>> This argument is nonsense.  The typical usage of this capability, IMHO,
>> will be
>> 
>> pg_dump -Fc >whole.dump
>> pg_restore --schema-before-data whole.dump >before.sql
>> pg_restore --data-only whole.dump >data.sql
>> pg_restore --schema-after-data whole.dump >after.sql
>> 
>> followed by editing the schema pieces and then loading.

> I dislike, and doubt that I'd use, this approach.  At the end of the
> day, it ends up processing the same (very large amount of data) multiple
> times.

Well, that's easily avoided: just replace the third step by restoring
directly to the target database.

pg_restore --schema-before-data whole.dump >before.sql
edit before.sql
pg_restore --schema-after-data whole.dump >after.sql
edit after.sql
psql -f before.sql target_db
pg_restore --data-only -d target_db whole.dump
psql -f after.sql target_db

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
>>> Another issue is that the rules for deciding which objects are "before
>>> data" and which are "after data" are wrong.  In particular ACLs are after
>>> data not before data, which is relatively easy to fix.  
>> 
>> OK

> This was partially why I was complaining about having documentation, and
> a policy for that matter, which goes into more detail about why X is before
> or after the data.  I agree that they're after today, but I don't see
> any particular reason why they should be one or the other.

If a table's ACL revokes owner insert privilege, and was placed before
the data load steps, those steps would fail.  We are relying on the
default table privileges until we are done doing everything we need to
do to the tables (and perhaps other objects, I'm not sure if there are
any other comparable problems).

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-26 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
>> The key problem is that pg_restore is broken:

> The key capability here is being able to split the dump into multiple
> pieces. The equivalent capability on restore is *not* required, because
> once the dump has been split the restore never needs to be.

This argument is nonsense.  The typical usage of this capability, IMHO,
will be

pg_dump -Fc >whole.dump
pg_restore --schema-before-data whole.dump >before.sql
pg_restore --data-only whole.dump >data.sql
pg_restore --schema-after-data whole.dump >after.sql

followed by editing the schema pieces and then loading.  One reason
is that this gives you a consistent dump, whereas three successive
pg_dump runs could never guarantee any such thing.  Another reason
is that you may well not know when you prepare the dump that you
will need split output, because the requirement to edit the dump
is likely to be realized only when you go to load it.

In any case, why did you put the switches into pg_restore.c if you
thought it wasn't useful for pg_restore to handle them?

>> Not so easy to fix
>> is that COMMENTs might be either before or after data depending on what
>> kind of object they are attached to.

> Is there anything to fix?

Well, yeah.  If you attach a comment to an after-data object and test
--schema-after-data, you'll notice the comment is lost.

>> And there's yet another issue here, which is that it's not entirely clear
>> that the type of an object uniquely determines whether it's before or
>> after data.

> Don't understand that. Objects are sorted in well-defined order,
> specified in pg_dump_sort.

After which we do a topological sort that enforces dependency ordering.
The question to worry about is whether there can ever be a dependency
from a normally-"before" object to a normally-"after" object, which
would cause the dependency sort to move the latter in front of the
former (in one way or another).  I'm not saying that any such case can
occur today, but I don't think it's an impossibility for it to arise in
future.  I don't want this relatively minor feature to be putting limits
on what kinds of dependencies the system can have.

> I'm conscious that the major work proposed will take weeks to complete

I don't think that what I am proposing is that complicated; I would
anticipate it requiring somewhere on the order of two dozen lines of
code.  I was thinking of doing a preliminary loop through the TocEntry
list to identify the ordinal numbers of the first and last data items,
and then the main loop could compare a counter to those numbers to
decide which of the three sections it was in.  Plus you'd need another
ArchiveEntry call someplace to prepare the "dummy data" item if one was
needed.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-25 Thread Tom Lane
bjects were "before" data.)  Is that a small enough corner case
to live with in order to gain implementation simplicity and robustness?

BTW, another incomplete item is that pg_dumpall should probably be taught
to accept and pass down --schema-before-data and --schema-after-data
switches.

regards, tom lane



binBy3Q4pEZad.bin
Description: pg_dump_beforeafter.v7.patch.gz

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-25 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2008-07-25 at 16:31 -0400, Tom Lane wrote:
>> I thought the latest conclusion was that changing the behavior of
>> pg_standby itself wouldn't address the problem anyway, and that what we
>> need is just a docs patch recommending that people use safe copying
>> methods in their scripts that copy to the archive area?

> Plus the rest of this patch, which is really very simple.

Why?  AFAICT the patch is just a kluge that adds user-visible complexity
without providing a solution that's actually sure to work.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-25 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
>> reviewing your patch 

> Current status is this:
> * My understanding is that Dave and Andrew (and therefore Simon) think
> the approach proposed here is an acceptable one. Heikki disagrees and
> wants different approach. Perhaps I misunderstand.
> * Patch needs work to complete the proposed approach
> * I'm willing to change the patch, but not able to test it on Windows.

I thought the latest conclusion was that changing the behavior of
pg_standby itself wouldn't address the problem anyway, and that what we
need is just a docs patch recommending that people use safe copying
methods in their scripts that copy to the archive area?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723

2008-07-24 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> Reviewers, please let me know if you find problems with the
> patches. If none, I would like to commit this weekend.

Given that everyone who has tested this has found a different way to
crash it, and that the frequency of crash reports shows no signs of
slowing down, I have to think that committing it is premature.

I tried to look through the patch just now and failed to make any
sense of it, because of the complete absence of documentation.
Two unexplained examples added to the SELECT reference page don't
do it for me.  I want to see an explanation of exactly what behaviors
are intended to be provided (and, in view of the long TODO list that
was posted awhile back, what isn't provided).  And there needs to be
more than zero internal documentation.  A README file, or perhaps
a very long file header comment, needs to be provided to explain
what's supposed to happen, when, and where when processing a recursive
query.  (For comparison look at the README.HOT file that was created
to explain the HOT patch --- something at about that level of detail
would help this patch a lot.  Or consider adding a section to
chapter 43 in the SGML docs.)

We really can't accept a patch that is so poorly documented as to
be unreviewable.  Unreviewable also means it'll be unmaintainable
going forward.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-24 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> [80k patch]

Surely there is a whole lot of unintended noise in this patch?
I certainly don't believe that you meant to change keywords.c
for instance.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> Reviewers, please let me know if you find problems with the
> patches. If none, I would like to commit this weekend.

Has this patch actually been reviewed yet?  The only reports I've
seen are from testing; nothing from anyone actually reading the
code.  I know I've not looked at it yet.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> Are there use cases for just --omit-post-load or --omit-pre-load?

Probably not many.  The thing that's bothering me is the
action-at-a-distance property of the positive-logic switches.
How are we going to explain this?

"By default, --schema-pre-load, --data-only, --schema-post-load
are all ON.  But if you turn one of them ON (never mind that
it was already ON by default), that changes the defaults for
the other two to OFF.  Then you have to turn them ON (never
mind that the default for them is ON) if you want two out of
the three categories."

You have to bend your mind into a pretzel to wrap it around this
behavior.  Yeah, it might be convenient once you understand it,
but how long will it take for the savings in typing to repay the
time to understand it and the mistakes along the way?

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
>> As far as the documentation/definition aspect goes, I think it should
>> just say the parts are
>> * stuff needed before you can load the data
>> * the data
>> * stuff needed after loading the data

> Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

>> I hadn't realized that Simon was using "pre-schema" and "post-schema"
>> to name the first and third parts.  I'd agree that this is confusing
>> nomenclature: it looks like it's trying to say that the data is the
>> schema, and the schema is not!  How about "pre-data and "post-data"?

> Argh.  The command-line options follow the 'data'/'load' line
> (--schema-pre-load and --schema-post-load), and so I think those are
> fine.  The problem was that in the documentation he switched to saying
> they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Ah, I see.  No objection to those switch names, at least assuming we
want to stick to positive-logic switches.  What did you think of the
negative-logic suggestion (--omit-xxx)?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-07-21 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-07-21 at 11:36 +0530, Pavan Deolasee wrote:
>> I think we should try at least one or two possible optimizations and
>> get some numbers before we jump and make substantial changes to the
>> code.

> You know you're suggesting months of tests and further discussion. :-(

I agree with Pavan that we should have something that'd at least serve
as test scaffolding to verify that the framework patch is sane.  The
test code needn't be anything we'd want to commit.  It seems like
largely the same kind of issue as with your stats-hooks patch.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-21 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Thanks for applying patch. I think that hash index "upgradebility" is
> currently broken or it will be with new hash index improvement. But if
> I think about it it does not make sense to break compatibility by this
> patch first.

Right, my point exactly.  Those necessarily-incompatible changes might
or might not ever get applied --- if they are, we can do cosmetic
cleanups afterwards.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-21 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Maybe invert the logic?
>> --omit-pre-data
>> --omit-data
>> --omit-post-data

> Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches.  The interactions of
the switches as Simon presents 'em seem less than obvious.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-21 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> On Sun, Jul 20, 2008 at 02:50:50PM -0400, Tom Lane wrote:
>> In most cases our policy has been that pg_dumpall should accept and pass
>> through any pg_dump option for which it's sensible to do so. I did not
>> make that happen but it seems it'd be a reasonable follow-on patch.

> I'll remember that next time.

Er .. actually that was a direct request for you to do it.

> Finally, you changed the option value and case from 1 to case 2. getopt_long
> only returns the value argument when there is no flag to set, so 1 was unique
> and could have been read as "the first no-short option with an argument".

Yeah.  The code *worked* as you submitted it, but what was bothering me
was that the "val = 1" table entries worked in two completely different
ways for the different argument types.  I first thought that you'd
broken the existing long argument options --- you hadn't, but I had to
go re-read the getopt_long source to convince myself of that.  So it
seemed like using a different "val" value might help clarify the
difference in behavior for future readers of the code.  In particular
the next guy who wants to add a long option with parameter value would
certainly not be able to use val = 1; but I thought the code as you
had it wouldn't give him any clue what to do.  If you've got a better
idea about how to deal with that, feel free...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> I also suggested having three options
> --want-pre-schema
> --want-data
> --want-post-schema
> so we could ask for any or all parts in the one dump. --data-only and
> --schema-only are negative options so don't allow this.
> (I don't like those names either, just thinking about capabilities)

Maybe invert the logic?

--omit-pre-data
--omit-data
--omit-post-data

Not wedded to these either, just tossing out an idea...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump additional options for performance

2008-07-20 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> * daveg ([EMAIL PROTECTED]) wrote:
>> One observation, indexes should be built right after the table data
>> is loaded for each table, this way, the index build gets a hot cache
>> for the table data instead of having to re-read it later as we do now.

> That's not how pg_dump has traditionally worked, and the point of this
> patch is to add options to easily segregate the main pieces of the
> existing pg_dump output (main schema definition, data dump, key/index
> building).  You suggestion brings up an interesting point that should
> pg_dump's traditional output structure change the "--schema-post-load"
> set of objects wouldn't be as clear to newcomers since the load and the
> indexes would be interleaved in the regular output.

Yeah.  Also, that is pushing into an entirely different line of
development, which is to enable multithreaded pg_restore.  The patch
at hand is necessarily incompatible with that type of operation, and
wouldn't be used together with it.

As far as the documentation/definition aspect goes, I think it should
just say the parts are
* stuff needed before you can load the data
* the data
* stuff needed after loading the data
and not try to be any more specific than that.  There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all.  (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts.  I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not!  How about "pre-data and "post-data"?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-20 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> Here is an updated version of this patch against head. It builds, runs and
> functions as expected. I did not build the sgml.

Applied with mostly minor cosmetic improvements --- the only actual
error I noticed was failing to check whether the server version supports
statement_timeout.

In most cases our policy has been that pg_dumpall should accept and pass
through any pg_dump option for which it's sensible to do so. I did not
make that happen but it seems it'd be a reasonable follow-on patch.

A minor point is that the syntax "-X lock-wait-timeout=n" or
"-X lock-wait-timeout n" won't work, although perhaps people used to
-X might expect it to.  Since we deprecate -X (and don't even document
it anymore), I thought that making this work would be much more trouble
than it's worth, but perhaps that's open to argument.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS] [PATCHES] WITH RECUSIVE patches 0717

2008-07-20 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> Thus I think we should avoid this kind of ORDER BY. Probably we should
> avoid LIMIT/OFFSET and FOR UPDATE as well.

What of index-optimized SELECT max(...) ?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Is autovacuum doing a wraparound-avoiding VACUUM?

2008-07-17 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-07-17 at 17:10 -0400, Alvaro Herrera wrote:
>> I don't like your wording though; it feels too verbose (and you're
>> losing the ANALYZE in case it's doing both things).  How about 
>> 
>> snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
>> "autovacuum: VACUUM%s%s", vac
>> tab->at_doanalyze ? " ANALYZE" : "",
>> tab->at_wraparound ? " (wraparound)" : "");

> Yes, looks good.

May I suggest "(to prevent wraparound)" or something like that?
Otherwise, +1.

>> You're not proposing it for 8.3 right?

> I think I am. It's an important diagnostic for your other fix.

I agree, this is important for visibility into what's happening.
The string isn't getting translated so I don't see any big downside
to applying the patch in back branches.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-15 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> this version is WIP - I have to clean comments, and will do some
> documentation. But I am sure, I am not able explain this feature in
> english well. Please, can some body help me with documentation? So
> now, plpgsql is more/less ruby :)

Applied with revisions.  The duplicate-argument detection logic in
FuncnameGetCandidates was pretty thoroughly broken, and there were some
internal API decisions I didn't like, as well as a few sins of omission
like not making ruleutils.c work.  I did some work on the docs but there
may be a few other places that could stand to mention variadic
functions.

I didn't do anything about the extra pg_proc column, but will start to
work on that now.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-14 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> sample: select mleast(variadic array[1,2,3,4,5]);

One other point here: in the patch as presently submitted, VARIADIC
can't be an unreserved keyword, but it seems to be enough to make it a
col_name_keyword.  Some preliminary testing says that that doesn't work
anymore after adding productions to allow the above: VARIADIC will have
to become a fully reserved keyword.  That's a bit annoying, since it's
not reserved according to the SQL spec, but it doesn't seem like a word
that's really likely to be in use as a user identifier.  Does anyone
think this issue is a showstopper?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-14 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> 2008/7/14 Tom Lane <[EMAIL PROTECTED]>:
>> Are you intending to change this right now and resubmit, or is it
>> work for later?

> I prefer separate it to other patch.

It seems a fairly important part of the feature, especially given the
connection to the zero-argument issue.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-14 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> I don't have a strong opinion, but allowing zero-argument variadic
> function calls -- and therefore causing foo(variadic int[]) and
> foo(variadic text[]) to conflict -- makes more sense than requiring one
> argument.

I hadn't even thought about that point, but the idea that those two
would conflict bothers me quite a lot.  Not least because there's no
reasonable way to enforce it with the existing unique indexes on pg_proc.
I think you'd have to leave the variadic argument out of proargtypes
altogether, and that seems mad.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-14 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> sample: select mleast(variadic array[1,2,3,4,5]);

Note this would also address Jeff's complaint about not being able to
pass zero variadic parameters:

select mleast(variadic array[]::int[]);

Looks a bit ugly but the type is specified, so it would work correctly
with polymorphic functions.

Are you intending to change this right now and resubmit, or is it
work for later?

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-13 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> ...  That macro is actually doing the 
> same thing as PageGetContents, so I switched to using that. As that 
> moves the data sligthly on those bitmap pages, I guess we'll need a 
> catversion bump.

I'm amazed that Zdenek didn't scream bloody murder about that.  You're
creating a work item for in-place-upgrade that would not otherwise
exist, in exchange for a completely trivial bit of code beautification.
(The same can be said of his proposed change to hash meta pages.)

I'm planning to go over this patch today and apply it sans the parts
that would require catversion bump.  We can argue later about whether
those are really worth doing, but I'm leaning to "not" --- unless Zdenek
says that he has no intention of making in-place-upgrade handle hash
indexes any time soon.

BTW, after further thought about the PageGetContents() situation:
right now we can change it to guarantee maxalignment "for free",
since SizeOfPageHeaderData happens to be maxaligned on all platforms
(this wasn't the case as recently as 8.2).  So I'm thinking we should
do that.  There's at least one place that thinks that PageGetContents
is the same as page + SizeOfPageHeaderData, but that's easily fixed.
On balance it seems like hidden assumptions about alignment are a bigger
risk than assumptions about that offset --- anyone want to argue the
contrary?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] page macros cleanup (ver 04)

2008-07-13 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Good catch. if we have to bump catalog version then I have there small patch
> which introduce following macro and remove PageHeaderData stucture from 
> HashMetaPageData:

Seems like a bad idea --- PageGetContents is not guaranteed to deliver
a maxaligned pointer, so at least in principle this could result in
alignment violations.  It would accidentally fail to fail as of CVS
HEAD, but any future rearrangement of PageHeaderData or HashMetaPageData
could create a problem.

(Possibly PageGetContents *should* guarantee a maxaligned result,
but the current coding does not.)

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-07-13 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> 2008/7/13 Jeff Davis <[EMAIL PROTECTED]>:
>> Also, it doesn't seem to allow calls to a variadic function with zero
>> arguments, e.g. "mleast()". I think this should be allowed.

> It's not possible for all cases, because empty array have be typed
> array still. But for non polymorphic variadic functions it's probably
> possible - I would to solve this question later - and for now use
> overloading etc

I don't really like the idea of a feature that would work except in the
polymorphic case --- that just seems too non-orthogonal.  Also I tend
to think that a pretty large fraction of variadic functions will be
polymorphic, making the feature's usefulness even more dubious.

I concur with the idea that variadic functions should only match to
calls that offer at least one value for the variadic array.  If you can
actually define the behavior sensibly for the zero-element case, a
separate function of the same name can cover that case.

As far as the "variadic int" versus "variadic int[]" business, I'm
starting to agree with Pavel that "variadic int[]" offers less potential
for confusion.  In particular, it seems to make it more obvious for the
function author that the argument he receives is an array.  Also, the
other one would mean that what we put into pg_proc.proargtypes doesn't
agree directly with what the user thinks the argument types are.  While
I think we could force that to work, it's not exactly satisfying the
principle of least surprise.


One issue that just occurred to me: what if a variadic function wants to
turn around and call another variadic function, passing the same array
argument on to the second one?  This is closely akin to the problem
faced by C "..." functions, and the solutions are pretty ugly (sprintf
vs vsprintf for instance).  Can we do any better?  At least in the
polymorphic case, I'm not sure we can :-(.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Relation forks & FSM rewrite patches

2008-07-12 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Here's an updated version of the "relation forks" patch, and an 
> incremental FSM rewrite patch on top of that. The relation forks patch 
> is ready for review. The FSM implementation is more work-in-progress 
> still, but I'd like to get some review on that as well, with the goal of 
> doing more performance testing and committing it after the commit fest.

I looked through the "relation forks" patch, and think it's close to
committable, but I have a few minor gripes.

> The one part that I'm not totally satisfied in the relation forks patch 
> is the smgrcreate() function. The question problem is: which piece of 
> code decides which forks to create for a relation, and when to create 
> them? I settled on the concept that all forks that a relation will need 
> are created at once, in one smgrcreate() call. There's no function to 
> create additional forks later on.

I think that's an extremely bad idea.  You need look no further than
Zdenek's hopes of in-place-upgrade to see a need for adding a fork
to an existing relation; but even without that I can see possibly
wanting to not create a fork right away.  I think smgrcreate() should
just create one fork as it's told (ie, same API you gave mdcreate).

> Currently, heap_create() decides which forks to create. That's fine at 
> the moment, but will become problematic in the future, as it's 
> indexam-specific which forks an index requires. That decision should 
> really be done in the indexam. One possibility would be to only create 
> the main fork in heap_create(), and let indexam create any additional 
> forks it needs in ambuild function.

Problem goes away if you redo smgrcreate API as above.

Other nits:

relpath() is now in danger of buffer overrun: you need to increase
the palloc() sizes.  Seems like a #define for the max number of digits
in a ForkNumber wouldn't be out of place (though I sure hope it never
gets past 1 ...).

Also, I strongly suggest *not* appending _0 to the name of the main fork's
file.  This'd break on-disk compatibility and people's expectations,
for no particular reason.

Don't like the name NUM_FORKS; it seems to suggest that's the actual
number of forks in existence.  MAX_NUM_FORKS would be better.

I think that setting InvalidForkNumber to -1 is unportable: there is
nothing compelling the compiler to represent enum ForkNumber as a signed
type, so the places where you assume a ForkNumber variable can hold
InvalidForkNumber all look like trouble.  One solution is to include
InvalidForkNumber in the enum:

enum ForkNumber {
InvalidForkNumber = -1,
MAIN_FORKNUM = 0
};

This would be a bit annoying if someone wrote a switch statement listing
different possible fork numbers, as the compiler would complain if
there's no case for InvalidForkNumber; but I can't see a reason for code
like that to be common.

Another possibility is to set InvalidForkNumber = 0, MAIN_FORKNUM = 1;
which is still a bit of a type cheat if InvalidForkNumber isn't listed
in the enum, but one that's very unlikely to cause trouble.  A bigger
problem is it would mess up a lot of your loops.

BTW, don't put a comma at end of enum ForkNumber, I think some compilers
will barf on that.

Last, don't forget to update the sgml docs chapter on database physical
storage.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] VACUUM Improvements - WIP Patch

2008-07-12 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> Here is a WIP patch based on the discussions here:
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php

I do not like this patch in any way, shape, or form.

(1) It's enormously overcomplicated, and therefore fragile.

(2) It achieves speedup of VACUUM by pushing work onto subsequent
regular accesses of the page, which is exactly the wrong thing.
Worse, once you count the disk writes those accesses will induce it's
not even clear that there's any genuine savings.

(3) The fact that it doesn't work until concurrent transactions have
gone away makes it of extremely dubious value in real-world scenarios,
as already noted by Simon.


It strikes me that what you are trying to do here is compensate for
a bad decision in the HOT patch, which was to have VACUUM's first
pass prune/defrag a page even when we know we are going to have to
come back to that page later.  What about trying to fix things so
that if the page contains line pointers that need to be removed,
the first pass doesn't dirty it at all, but leaves all the work
to be done at the second visit?  I think that since heap_page_prune
has been refactored into a "scan" followed by an "apply", it'd be
possible to decide before the "apply" step whether this is the case
or not.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] GIN improvements

2008-07-11 Thread Tom Lane
Teodor Sigaev <[EMAIL PROTECTED]> writes:
> http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz
> http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz

I've committed the multicolumn one with minor revisions (fix some poor
English in docs and comments, add regression-test coverage).  Do you
need more review of fast_insert yet?  It looked like a number of people
commented on it already.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] get_relation_stats_hook()

2008-07-10 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:
>> I think you need to move up a level, and perhaps refactor some of the
>> existing code to make it easier to inject made-up stats.

> I've looked at ways of refactoring this, but the hooks provided here
> complement the existing functionality fairly well. 

The hooks proposed here are completely useless --- it's impossible
to return anything except actual catalog data, because whatever is
returned is going to be subjected to ReleaseSysCache() later on.
I'd think you at least need to be able to turn that into a no-op,
or perhaps a pfree() call.  Also it's clear that some calls of
examine_variable would still return regular syscache entries, so
the cleanup behavior has to be determinable on a per-call basis.

I also still think that you need to reconsider the level at which
the hook gets control.  It's impossible given these hook definitions
to create "fake" data for an appendrel or sub-SELECT, which seems to
me to be an important requirement, at least till such time as the
core planner handles those cases better.

My feeling is that the hooks ought to be at more or less the same
semantic level as examine_variable and ReleaseVariableStats, and that
rather than making special-case hooks for the other couple of direct
accesses to STATRELATT, the right thing is to make sure that the
examine_variable hook will work for them too.

I do now concur that having the hook return a pg_statistic tuple
is appropriate --- after looking at the code that uses this stuff,
decoupling that representation would be more work than it's worth.

I think it might be a good idea to prepare an actual working example
of a plug-in that does something with the hook in order to verify
that you've got a useful definition.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-08 Thread Tom Lane
Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I looked this over and it looks good in general. 

> May I think that patch passed review and commit it?

I'd still like to take a look.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pgbench minor fixes

2008-07-05 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Sun, 2008-07-06 at 00:06 +1000, Russell Smith wrote:
>> Simon Riggs wrote:
>>> 1. -i option should run vacuum analyze only on pgbench tables, not *all*
>>> tables in database.
> 
>> How does this work with custom scripts?

> I don't think we can justify vacuuming every table in the database just
> so we get any tables being used for one set of tests.

Actually your point is that the -i option wouldn't be used anyway
with a custom script ...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets

2008-07-05 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Robert Treat wrote:
>> Hmm... I've always been told that Solaris didn't support this because the 
>> Solaris developers feel that IDENT is inherently insecure.

> We don't actually use the Ident protocol for Unix sockets on any 
> platform.

Indeed.  If the Solaris folk feel that getupeercred() is insecure,
they had better explain why their kernel is that broken.  This is
entirely unrelated to the known shortcomings of the "ident" IP
protocol.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] psql command setting

2008-07-05 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Recent patch allowed \timing [on|off]
> This patch allows same option on all remaining toggles, so every option
> works in the same way.

We already discussed that and rejected it, because the remaining toggles
aren't really toggles.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Sorting writes during checkpoint

2008-07-04 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Anyway, I note that we don't have an easy way of sorting by tablespace,

Say what?  tablespace is the first component of relfilenode.

> but I'm sure it would be possible to look up the tablespace for a file
> within a plugin.

If the information weren't readily available from relfilenode, it would
*not* be possible for a bufmgr plugin to look it up.  bufmgr is much too
low-level to be dependent on performing catalog lookups.

        regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS] [PATCHES] Explain XML patch v2

2008-07-04 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 4. Juli 2008 schrieb Tom Raney:
>> Regarding the XML datum, in order to support that, will all users need  
>> to compile with libxml?  Are there any lighter weight solutions to  
>> serialize other than libxml?

> You can create XML without libxml.

Seems to me that anyone who wants this feature will probably also want
the existing libxml-based features, so they'll be building with libxml
anyway.  So I'd not be in favor of expending any extra code on a
roll-your-own solution.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Exposing keywords to clients

2008-07-03 Thread Tom Lane
Nikhils <[EMAIL PROTECTED]> writes:
>>> Attached is an updated patch, giving the following output.

Applied with minor revisions.

> Here are some comments from me:

> a) Changed "localised" to "localized" to be consistent with the references
> elsewhere in the same file.

What I didn't like about the docs was the classification of the function
as a "session information function".  There's certainly nothing
session-dependent about it.  I ended up putting it under "System Catalog
Information Functions", which isn't really quite right either but it's
closer than the other one.

> b) I wonder if we need the default case in the switch statement at all,
> since we are scanning the statically populated ScanKeywords array with
> proper category values for each entry.

I left it in for safety, but made it just return nulls --- there's no
point in wasting more code space on it than necessary, and certainly
no point in asking translators to translate a useless string.

> c) There was a warning during compilation since we were assigning a const
> pointer to a char pointer
>  values[0] = ScanKeywords[funcctx->call_cntr].name;

Yeah, I couldn't see a good way around that either --- we could pstrdup
but that seems overkill, and adjusting the API of BuildTupleFromCStrings
to take const char ** doesn't look appetizing either.

> d) oid 2700 has been claimed by another function in the meanwhile. Modified
> it to 2701.
> DATA(insert OID = 2701 ( pg_get_keywordsPGNSP PGUID 12 10 400 f f t t s
> 0 2249

2701 was claimed too.  You need to use the unused_oids script to find
unique free OIDs.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Solaris ident authentication using unix domain sockets

2008-07-03 Thread Tom Lane
Garick Hamlin <[EMAIL PROTECTED]> writes:
>   I have a patch that I have been using to support postgresql's
> notion of ident authentication when using unix domain sockets on
> Solaris.  This patch basically just adds support for using 
> getupeercred() on Solaris so unix sockets and ident auth works just
> like it does on Linux and elsewhere.

Cool.

> + #if defined(HAVE_GETPEERUCRED) 
> + #include 
> + #endif

But this is not cool.  There might be systems out there that have
getpeerucred() but not , and this coding would cause a compile
failure (even if they actually wouldn't be trying to use getpeerucred()
because they have some other way to do it).  You need an explicit
configure probe for the header file too, I think.

Also, what is the rationale for putting this before the
HAVE_STRUCT_CMSGCRED case instead of after?  Again, that seems like it
could cause unexpected behavioral changes on platforms that work fine
now (consider possibility that getpeerucred is there but broken).

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] pg_dump lock timeout

2008-07-03 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> On Thu, Jul 03, 2008 at 11:15:10AM +0300, Marko Kreen wrote:
>> - The statement_timeout is set back with "statement_timeout = default"
>> Maybe it would be better to do "= 0" here?  Although such decision
>> would go outside the scope of the patch, I see no sense having
>> any other statement_timeout for actual dumping.

> I'd prefer to leave whatever policy is otherwise in place alone.

The policy in place in CVS HEAD is that pg_dump explicitly sets
statement_timeout to 0.  Setting it to default would break that,
and will certainly not be accepted.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Patch to change psql default banner v6

2008-07-02 Thread Tom Lane
Guillaume Lelarge <[EMAIL PROTECTED]> writes:
> Attached is a new version of the patch. It fixes a few issues when one 
> adds a pattern to metacommands.

Applied with some editorialization.  There were some places you
evidently hadn't tested against all supported server versions ...

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES][UPDATED] A GUC variable to replace PGBE_ACTIVITY_SIZE

2008-07-01 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Huh?  How could we be assigning to a slot that is not in use?

> Before the patch, we loop through the shared PgBackendStatus slots 
> (there is MaxBackends of them), and issue a memcpy for each to copy it 
> to our local slot. After that, we check if it was actually in use.
> After the patch, we still loop through the shared slots, but only issue 
> the memcpy for slots that are in use.

Oh, you're talking about *fetching* the slot contents, not *assigning*
to them.  Sorry, probably should have read the patch instead of just
reacting to the comment ...

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES][UPDATED] A GUC variable to replace PGBE_ACTIVITY_SIZE

2008-06-30 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> Another simple optimization occurred to me while looking at this: we 
> should skip the memcpy/strcpy altogether if the BackendActivity slot is 
> not in use. That seems like a good bet, you usually don't try to max out 
> max_connections.

Huh?  How could we be assigning to a slot that is not in use?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Explain XML patch

2008-06-27 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Anybody mind if I update that to put more emphasis on the need for
>> documentation?  As in "documentation patches are *required* if
>> your patch adds or changes user-visible behavior"?

> Sure, but I do documentation updates for non-English speakers
> occasionally.

I know that docs from non-native speakers often need heavy
editorialization, but that is little excuse for leaving the
committer to do *all* the docs work.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Explain XML patch

2008-06-27 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> You probably need to say a whole lot more about this patch.
> I've updated the wiki with things I've learned while submitting patches.
> http://wiki.postgresql.org/wiki/Submitting_a_Patch

Anybody mind if I update that to put more emphasis on the need for
documentation?  As in "documentation patches are *required* if
your patch adds or changes user-visible behavior"?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Fix pg_ctl restart bug

2008-06-26 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> I've always assumed that I'm supposed to backpatch the bugs I fix in
>> HEAD, however far is reasonable.

> I thought we only backatched major bugs to prevent possible instability
> when fixing minor bugs.

Actually, Bruce, this *is* a minor bug; if it were major we'd have heard
about it from the field.

My take on it is that "pg_ctl restart" must be practically unused.
Given that we now know it's completely broken, the only way that
patching it could make the situation worse would be if the patch
affected some other code path that people actually do use.  As
long as you're sure it doesn't do that, I see no downside to an
attempted fix, even if it fails.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Fix pg_ctl restart bug

2008-06-25 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> However, as of 2004-10-15, this has not worked.  :-(  The attached patch
> is the one that caused the bug --- on non-Unix systems, SYSTEMQUOTE is
> "", meaning zero-length string.  I should have seen the bug when I
> applied the contributed patch in 2004.

So, shouldn't this fix be back-patched?

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-06-24 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Your point about the syntax is good though.  It would be better if
>>> the syntax were like
>>> create function foo (a text, variadic b int[])
>>> or maybe even better
>>> create function foo (a text, variadic b int)

> I don't see problem with your syntax. It well block combination OUT
> and VARIADIC parameter - my one request, variadic parameter have to be
> array.

Well, we should certainly store the parameter type as an array in
proargtypes, because that makes this feature transparent to all the
PLs.  However, it doesn't follow that the CREATE FUNCTION syntax
has to specify the array type rather than the element type.  I think
the Java precedent might be good reason to go with using the element
type in the function declaration.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout

2008-06-24 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> Are we talking about the same patch?

Maybe not --- I thought you were talking about a backend-side behavioral
change.

> Because I don't know what you are
> refering to with "timer management code" and "scheduling the interrupt" in
> the context of pg_dump.

I'm not sure that I see a good argument for making pg_dump deliberately
fail, if that's what you're proposing.  Maybe I'm just too old-school,
but there simply is not any other higher priority for a database than
safeguarding your data.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout

2008-06-24 Thread Tom Lane
daveg <[EMAIL PROTECTED]> writes:
> lock-timeout sets statement_timeout to a small value while locks are being
> taken on all the tables. Then it resets it to default. So it could reset it
> to whatever the new default is.

"reset to default" is *surely* not the right behavior; resetting to the
setting that had been in effect might be a bit sane.  But the whole
design sounds pretty broken to start with.  The timer management code
already understands the concept of scheduling the interrupt for the
nearest of a couple of potentially active timeouts.  ISTM any patch
intended to add a feature like this ought to extend that logic rather
than hack around changing the values of global variables.

    regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] variadic function support

2008-06-23 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> But if I have
>   foo( a text, b int[])
> it looks odd if both these calls are legal:
>   foo('a',1,2,3,)
>   foo('a',ARRAY[1,2,3])
> which I understand would be the case with the current patch.

Maybe I misunderstand what is supposed to happen, but I believe that
if the function is marked VARIADIC then the second case would in fact
be rejected: the signature of the function for parameter-matching
purposes is text followed by one or more ints, never text and int[].

> I'm also still curious to know how the following would be handled:
>   foo(a text[], b text[])

I think a is just text[], full stop.  Only the last parameter is
interpreted differently for variadic.

Your point about the syntax is good though.  It would be better if
the syntax were like

create function foo (a text, variadic b int[])

or maybe even better

create function foo (a text, variadic b int)

since (a) this makes it much more obvious to the reader what the
function might match, and (b) it leaves the door open for marking
multiple parameters as variadic, if we can figure out what that means.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


  1   2   3   4   5   6   7   8   9   10   >